The trend line, or line of best fit, is a line that can be drawn on a scatter diagram representing a trend in the data. It tells whether a particular data set has increased or decreased over a period of time. A trend line could simply be drawn by eye through a set of data points, but more properly its position and slope are calculated using statistical techniques like linear regression. Trend lines typically are straight lines, although some variations use higher degree polynomials depending on the degree of curvature desired in the line.
Trend lines are often used to argue that a particular action or event (such as training, or an advertising campaign) caused observed changes at a point in time. This is a simple technique, and does not require a control group, experimental design, or a sophisticated analysis technique. However, it suffers from a lack of scientific validity in cases where other potential changes can affect the data.
The mathematical process which determines the unique line of best fit is based on what is called the method of least squares - which explains why this line is sometimes called the least squares line. This method works by:
- finding the difference of each data
$Y$ value from the line; - squaring all the differences;
- summing all the squared differences;
- repeating this process for all positions of the line until the smallest sum of squared differences is reached.
Drawing a Trend Line
The line of best fit is drawn by:
- having the same number of data points on each side of the line - i.e., the line is in the median position;
- NOT going from the first data to the last data - since extreme data often deviate from the general trend and this will give a biased sense of direction.
The closeness (or otherwise) of the cloud of data points to the line suggests the concept of spread or dispersion.
The graph below shows what happens when we draw the line of best fit from the first data to the last data - it does not go through the median position as there is one data above and three data below the blue line. This is a common mistake to avoid.
Trend Line Mistake
This graph shows what happens when we draw the line of best fit from the first data to the last data.
To determine the equation for the line of best fit:
- draw the scatterplot on a grid and draw the line of best fit;
- select two points on the line which are, as near as possible, on grid intersections so that you can accurately estimate their position;
- calculate the gradient (
$B$ ) of the line using the formula:$\text{gradient}=\frac { \text{difference in vertical measures}}{\text{difference in horizontal measures}}$ - write the partial equation;
- substitute one of the chosen points into the partial equation to evaluate the "
$A$ " term; - write the full equation of the line.
Example
Consider the data in the graph below:
Example Graph
This graph will be used in our example for drawing a trend line.
To determine the equation for the line of best fit:
- a computer application has calculated and plotted the line of best fit for the data - it is shown as a black line - and it is in the median position with 3 data on one side and 3 data on the other side;
- the two points chosen on the line are
$(50, 700)$ and$(110, 1100)$ ; - calculate the gradient (
$B$ ) of the line using the formula:
- the part equation:
- substitute the point
$(50, 700)$ into the equation:
- write the full equation of the line: