The mean, which is also known as the average, is the total sum of values in a sample divided by the number of values in your sample.[1] For example, to figure out a grade at the end of a course, you calculate the mean of all of your test scores. If you scored a 95%, 90%, 97%, and 92% on tests, your mean test score would be:
Mathematically the mean is the sum of the values in a set divided by the number of values in the set. For a given set of n values, one can add together all of the values in the set then divide by n to find the mean.
The most common mistake is distinguishing between the mean, median, and mode. The mean is the topic of this article.[2]
The median is the middle value of a data set. If the elements of a data set are organized from low to high, the middle value would be the median. If there an even number of elements populate the data set, then the average of the two middle values would be the median. As an example, if a data set has the values (5, 5, 6, 7, 9, 11, 12), then the median would be 7 or the middle value. If the data set has the values (5, 6, 7, 9, 11, 12), then the median would be the average of the two middle values (average of 7 and 9), which is 8.
The mode is the most common element in the set or the value that happens the most often. In the first data set (4, 5, 6, 8, 8, 9, 23), the number 8 occurs twice, which is more common than any of the other numbers, so the mode of the data set is 8. One can have more than one mode as well as no mode for a dataset. In the data set (2, 3, 3, 5, 5, 7, 7), there are three modes as 3, 5, and 7 all appear twice, which is more than two, which only appears once. This dataset would be said to be trimodal. If there were two modes, the dataset would be bimodal, and if there were more than three modes to the dataset, then the dataset would be multimodal. In a different dataset of (6, 7, 8, 9), no number appears more frequently than the others, so there is no mode to the data set.
The second issue of concern is the sensitivity of the mean to outlier values. One very large or very small outlier (compared to the rest of the data set) can move or skew the mean of the dataset higher or lower, respectively. An outlier is usually defined:
< Q1 - (1.5)IQR
OR
> Q3 + (1.5)IQR
The first quartile (Q1) is the 25th percentile of the data set.
The third quartile (Q3) is the 75th percentile of the data set.The interquartile range (IQR) is Q3 - Q1.[1]
For example, say there was a study of how many baskets in basketball your classmates could make in 2 minutes. Your data for the number of baskets is as follows:
The Q1 for this data 5.5The Q2 for this data is 7The IQR is 1.5Outliers are <4.25 or >9.25
There was one classmate who played varsity basketball and was able to make 40 baskets in 2 minutes. With this data, your mean is 10.25 baskets, which is higher than all but one of the data points you collected—the varsity basketball player's score of 40 skewed the mean. If we were to exclude outliers in this data, one would exclude the '40' value, and your new mean would be 6. The mean of 6 baskets is a better representation of the average number of baskets your classmates can make in 2 minutes.
As shown in the above example, datasets with significant outliers can be affected or skewed by the mean. A very large outlier will drag the mean higher, and a very low outlier will drag the mean lower. Thus the mean is said to be sensitive to outliers.
As another example, consider the average temperature for a given spring day. For the past five years, the average temperatures have been 42, 46, 10, 40, 48 Fahrenheit. The mean is 37.2 Fahrenheit, which including the low value and 44 Fahrenheit if not including the low value.
The median is typically less affected by outliers then the mean and thus may represent a better estimator of the central value of the data set.
Studies often report the mean value of what they are researching. For example, Eid et al. studied how well surgical residents can interpret clinical images compared to medical students and other residents. They reported the mean scores for medical students, general surgery residents, internal medicine residents, and radiology trainees of a quiz they gave to evaluate how well their subjects can interpret radiological results.[3]
Some studies use the mean of a sample of the population to extrapolate the population's mean. Because the true mean of a population in most cases cannot be known, researchers will report a range with an upper limit and a lower limit of the true mean, which is called a confidence interval. A 95% confidence interval means, if multiple different samples were selected from a population, the researchers would expect that 95% of the time, the mean of the selected sample would fall within the range they reported.[4]
In addition to a confidence interval, the mean can also be reported with a standard deviation to report the range of the values reported. A mean reported with a range of +/- one standard deviation is the range of 68% of the values in the sample size if the sample has a normal Gaussian distribution.[1]
As mentioned previously, the mean can be affected by outliers. If there are a few significant outliers in the data set, then the median may be a better representation of the center of the data set than the mean.
Miscommunication between nurses and physicians can lead to poor patient outcomes.[5] It is crucial to know when to include and exclude outliers in averages and when to report them to the care team. For example, a nurse reported to a physician that a patient’s average blood pressure for the past 24 hours had been 120/80. The nurse excluded the outlier of 70/40 blood pressure when the patient went from sitting to standing when reporting the average. Because of this, the physician could miss that the patient has orthostatic hypotension, leading to increased morbidity and mortality.[6]
[1] | Vetter TR, Descriptive Statistics: Reporting the Answers to the 5 Basic Questions of Who, What, Why, When, Where, and a Sixth, So What? Anesthesia and analgesia. 2017 Nov; [PubMed PMID: 28891910] |
[2] | Whitley E,Ball J, Statistics review 1: presenting and summarising data. Critical care (London, England). 2002 Feb [PubMed PMID: 11940268] |
[3] | Eid JJ,Reiley MI,Miciura AL,Macedo FI,Negussie E,Mittal VK, Interpretation of Basic Clinical Images: How Are Surgical Residents Performing Compared to Other Trainees? Journal of surgical education. 2019 May 9; [PubMed PMID: 31080122] |
[4] | O'Brien SF,Yi QL, How do I interpret a confidence interval? Transfusion. 2016 Jul; [PubMed PMID: 27184382] |
[5] | Foronda C,MacWilliams B,McArthur E, Interprofessional communication in healthcare: An integrative review. Nurse education in practice. 2016 Jul; [PubMed PMID: 27428690] |
[6] | Joseph A,Wanono R,Flamant M,Vidal-Petiot E, Orthostatic hypotension: A review. Nephrologie [PubMed PMID: 28577744] |