normal probability plot
a graphical technique used to assess whether or not a data set is approximately normally distributed
Examples of normal probability plot in the following topics:
-
Constructing a normal probability plot (special topic)
- We construct a normal probability plot for the heights of a sample of 100 men as follows:
- If the observations are normally distributed, then their Z scores will approximately correspond to their percentiles and thus to the zi in Table 3.16.
- Because of the complexity of these calculations, normal probability plots are generally created using statistical software.
- Construction details for a normal probability plot of 100 men's heights.
- To create the plot based on this table, plot each pair of points, (zi,xi).
-
Normal probability plot
- We outline the construction of the normal probability plot in Section 3.2.2
- The histogram shows more normality and the normal probability plot shows a better fit.
- We first create a histogram and normal probability plot of the NBA player heights.
- A histogram and normal probability plot of these data are shown in Figure 3.13.
- A histogram of poker data with the best fitting normal plot and a normal probability plot.
-
Conclusion
- The above example of a probability histogram is an example of one that is normal.
- There is another method, however, than can help: a normal probability plot.
- A normal probability plot is a graphical technique for normality testing--assessing whether or not a data set is approximately normally distributed.
- The data are plotted against a theoretical normal distribution in such a way that the points form an approximate straight line .
- Explain how a probability histogram is used to normality of data
-
Probability Histograms and the Normal Curve
- How can we tell if data in a probability histogram are normal, or at least approximately normal?
- There is another method, however, than can help: a normal probability plot.
- A normal probability plot is a graphical technique for normality testing--assessing whether or not a data set is approximately normally distributed.
- This is a sample of size 50 from a right-skewed distribution, plotted as a normal probability plot.
- This is a sample of size 50 from a normal distribution, plotted as a normal probability plot.
-
Graphical diagnostics for an ANOVA analysis
- As with one- and two-sample testing for means, the normality assumption is especially important when the sample size is quite small.
- The normal probability plots for each group of the MLB data are shown in Figure 5.31; there is some deviation from normality for infielders, but this isn't a substantial concern since there are about 150 observations in that group and the outliers are not extreme.
- Then to check the normality condition, create a normal probability plot using all the residuals simultaneously.
- This assumption can be checked by examining a side- by-side box plot of the outcomes across the groups, as in Figure 5.28 on page 239.
- The normality condition is very important when the sample sizes for each group are relatively small.
-
Introduction to evaluating the normal approximation
- Many processes can be well approximated by the normal distribution.
- While using a normal model can be extremely convenient and helpful, it is important to remember normality is always an approximation.
- Testing the appropriateness of the normal assumption is a key step in many data analyses.
- The observations are rounded to the nearest whole inch, explaining why the points appear to jump in increments in the normal probability plot.
-
A sampling distribution for the mean
- Now we'll take 100,000 samples, calculate the mean of each, and plot them in a histogram to get an especially accurate depiction of the sampling distribution.
- The distribution of sample means closely resembles the normal distribution (see Section 3.1).
- A normal probability plot of these sample means is shown in the right panel of Figure 4.9.
- Under the normal model, we can make this more accurate by using 1.96 in place of 2.
- The right panel shows a normal probability plot of those sample means.
-
Checking model assumptions using graphs
- A normal probability plot of the residuals is shown in Figure 8.9.
- In a normal probability plot for residuals, we tend to be most worried about residuals that appear to be outliers, since these indicate long tails in the distribution of residuals.
- These plots are shown in Figure 8.12.
- There appears to be curvature in the residuals, indicating the relationship is probably not linear.
- A normal probability plot of the residuals is helpful in identifying observations that might be outliers.
-
Homogeneity and Heterogeneity
- Imagine that you have a scatter plot, on top of which you draw a narrow vertical strip.
- To the extent that the histogram matches the normal distribution, the residuals are normally distributed.
- When various vertical strips drawn on a scatter plot, and their corresponding data sets, show a similar pattern of spread, the plot can be said to be homoscedastic.
- Consequently, each probability distribution for (response variable) has the same standard deviation regardless of the -value (predictor).
- To the extent that a residual histogram matches the normal distribution, the residuals are normally distributed.
-
Quantile-Quantile (q-q) Plots
- That is, the probability a normal sample is less than ξq is in fact just q.
- As before, a normal q-q plot can indicate departures from normality.
- The q-q plots may be thought of as being "probability graph paper" that makes a plot of the ordered data values into a straight line.
- Every density has its own special probability graph paper.
- Figure 12. q-q plots for standardized non-normal data (n = 1000)