Stop using the terms "dependent variable" and "independent variable" in regression models
The word "independent" has a very specific meaning in probability and statistics
When I took math and science courses in high school, I learned the equation of a line as
with
y being called the dependent variable,
x being called the independent variable.
This is correct in the context of linear functions in mathematics. However, I strongly DISCOURAGE using the terms “dependent variable” and “independent variable” in the context of statistics and regression, because these terms have other meanings in statistics.
In probability, 2 random variables X and Y are
independent if their joint distribution is simply a product of their marginal distributions,
dependent if otherwise.
If a multiple-regression model assumes that the predictors are random variables, then the usage of the term “independent variable” becomes problematic. A random effects model is an example with such an assumption. An obvious question for such models is whether or not the independent variables are independent; this is a rather confusing question with 2 instances of the word “independent”. A better way to phrase that question is whether or not the predictors are independent.
Thus, in a statistical regression model, I strongly encourage the use of
the terms “response variable” or “target variable” (or just “response” and “target”) for Y,
the terms “explanatory variables”, “predictor variables”, “predictors”, or “covariates” for X.
If you are reading my posts for the first time: I'm Eric Cai, a statistician based in Toronto, Canada. I write about statistics, communication, and career development for professionals in data & analytics. Subscribe to get my articles delivered to your inbox at 9:30 AM Eastern time on Monday to Friday.