The coefficient of determination (denoted
The Math
A data set will have observed values and modelled values, sometimes known as predicted values. The "variability" of the data set is measured through different sums of squares, such as:
- the total sum of squares (proportional to the sample variance);
- the regression sum of squares (also called the explained sum of squares); and
- the sum of squares of residuals, also called the residual sum of squares.
The most general definition of the coefficient of determination is:
where
Properties and Interpretation of $r^2$
The coefficient of determination is actually the square of the correlation coefficient. It is is usually stated as a percent, rather than in decimal form. In context of data,
$r^2$ , when expressed as a percent, represents the percent of variation in the dependent variable$y$ that can be explained by variation in the independent variable$x$ using the regression (best fit) line.$1-r^2$ when expressed as a percent, represents the percent of variation in$y$ that is NOT explained by variation in$x$ using the regression line. This can be seen as the scattering of the observed data points about the regression line.
So
In many (but not all) instances where
Note that
- the independent variables are a cause of the changes in the dependent variable;
- omitted-variable bias exists;
- the correct regression was used;
- the most appropriate set of independent variables has been chosen;
- there is collinearity present in the data on the explanatory variables; or
- the model might be improved by using transformed versions of the existing set of independent variables.
Example
Consider the third exam/final exam example introduced in the previous section. The correlation coefficient is
The interpretation of