Chapter 11
Correlation and Regression
By Boundless
![Thumbnail](../../../../../figures.boundless-cdn.com/18115/square/02552975-9b29069a5f-z.jpeg)
Correlation refers to any of a broad class of statistical relationships involving dependence.
![Thumbnail](../../../../../figures.boundless-cdn.com/18117/square/-05-20-20at-206.24.34-20pm.jpeg)
A scatter diagram is a type of mathematical diagram using Cartesian coordinates to display values for two variables in a set of data.
![Thumbnail](../../../../../figures.boundless-cdn.com/18127/square/-05-21-20at-202.52.24-20pm.jpeg)
The correlation coefficient is a measure of the linear dependence between two variables
The coefficient of determination provides a measure of how well observed outcomes are replicated by a model.
![Thumbnail](../../../../../figures.boundless-cdn.com/18131/square/lement-fieldelemformat-gif.gif)
The trend line (line of best fit) is a line that can be drawn on a scatter diagram representing a trend in the data.
Other types of correlation coefficients include intraclass correlation and the concordance correlation coefficient.
![Thumbnail](../../../../../figures.boundless-cdn.com/18144/raw/re-and-prediction-interval.jpg)
A prediction interval is an estimate of an interval in which future observations will fall with a certain probability given what has already been observed.
![Thumbnail](../../../../../figures.boundless-cdn.com/18133/raw/spearman-fig1.jpg)
A rank correlation is a statistic used to measure the relationship between rankings of ordinal variables or different rankings of the same variable.
![Thumbnail](../../../../../figures.boundless-cdn.com/18148/raw/son-27s-paradox-continuous.jpg)
An ecological fallacy is an interpretation of statistical data where inferences about individuals are deduced from inferences about the group as a whole.
![Thumbnail](../../../../../figures.boundless-cdn.com/18147/square/greenhouse-effect.jpg)
The conventional dictum "correlation does not imply causation" means that correlation cannot be used to infer a causal relationship between variables.
![Thumbnail](../../../../../figures.boundless-cdn.com/31476/square/1jkmu7itrlsjnnwonio9.jpg)
Regression models are often used to predict a response variable
![Thumbnail](../../../../../figures.boundless-cdn.com/18183/raw/linear-regression.jpg)
A graph of averages and the least-square regression line are both good ways to summarize the data in a scatterplot.
![Thumbnail](../../../../../figures.boundless-cdn.com/18184/raw/extrapolation-example.jpg)
The regression method utilizes the average from known data to make predictions about new data.
![Thumbnail](../../../../../figures.boundless-cdn.com/18175/square/francis-galton-1850s.jpeg)
The regression fallacy fails to account for natural fluctuations and rather ascribes cause where none exists.
In the regression line equation the constant
![Thumbnail](../../../../../figures.boundless-cdn.com/18388/square/cova-partitioning-variance.jpeg)
ANCOVA can be used to compare regression lines by testing the effect of a categorial value on a dependent variable, controlling the continuous covariate.
![Thumbnail](../../../../../figures.boundless-cdn.com/18246/square/linrgs-regeq1.jpg)
The criteria for determining the least squares regression line is that the sum of the squared errors is made as small as possible.
![Thumbnail](../../../../../figures.boundless-cdn.com/18257/raw/linear-regression.jpg)
Standard linear regression models with standard estimation techniques make a number of assumptions.
The slope of the best fit line tells us how the dependent variable
![Thumbnail](../../../../../figures.boundless-cdn.com/18248/square/220px-francis-galton-1850s.jpeg)
Regression toward the mean says that if a variable is extreme on its 1st measurement, it will tend to be closer to the average on its 2nd.
RMS error measures the differences between values predicted by a model or an estimator and the values actually observed.
![Thumbnail](../../../../../figures.boundless-cdn.com/18187/square/high-residual.jpeg)
The residual plot illustrates how far away each of the values on the graph is from the expected value (the value on the line).
![Thumbnail](../../../../../figures.boundless-cdn.com/18189/square/5-regressionassumptions-1c.jpg)
By drawing vertical strips on a scatter plot and analyzing the spread of the resulting new data sets, we are able to judge degree of homoscedasticity.
![Thumbnail](../../../../../figures.boundless-cdn.com/18191/square/beetle.jpg)
Multiple regression is used to find an equation that best predicts the
![Thumbnail](../../../../../figures.boundless-cdn.com/18222/raw/linear-regression.jpg)
The purpose of a multiple regression is to find an equation that best predicts the
![Thumbnail](../../../../../figures.boundless-cdn.com/18195/raw/linear-regression.jpg)
The results of multiple regression should be viewed with caution.
![Thumbnail](../../../../../figures.boundless-cdn.com/18192/raw/linear-regression.jpg)
Standard multiple regression involves several independent variables predicting the dependent variable.
![Thumbnail](../../../../../figures.boundless-cdn.com/18229/square/interaction1.jpg)
In regression analysis, an interaction may arise when considering the relationship among three or more variables.
![Thumbnail](../../../../../figures.boundless-cdn.com/18197/raw/polyreg-scheffe.jpg)
The goal of polynomial regression is to model a non-linear relationship between the independent and dependent variables.
![Thumbnail](../../../../../figures.boundless-cdn.com/18231/square/anova-graph.jpeg)
Dummy, or qualitative variables, often act as independent variables in regression and affect the results of the dependent variables.
![Thumbnail](../../../../../figures.boundless-cdn.com/18232/square/ancova-graph.jpeg)
A regression model that contains a mixture of quantitative and qualitative variables is called an Analysis of Covariance (ANCOVA) model.
![Thumbnail](../../../../../figures.boundless-cdn.com/18221/raw/nestedsetmodel.jpg)
Multilevel (nested) models are appropriate for research designs where data for participants are organized at more than one level.
![Thumbnail](../../../../../figures.boundless-cdn.com/18365/square/stepwise.jpeg)
Stepwise regression is a method of regression modeling in which the choice of predictive variables is carried out by an automatic procedure.
![Thumbnail](../../../../../figures.boundless-cdn.com/18228/raw/linear-regression.jpg)
There are a number of assumptions that must be made when using multiple regression models.
![Thumbnail](../../../../../figures.boundless-cdn.com/18378/square/explore-correlations.jpg)
Some problems with multiple regression include multicollinearity, variable selection, and improper extrapolation assumptions.