Range (statistics)

In statistics, the range of a set of data is the difference between the largest and smallest values,[1] the result of subtracting the sample maximum and minimum. It is expressed in the same units as the data.

In descriptive statistics, range is the size of the smallest interval which contains all the data and provides an indication of statistical dispersion. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets.[2]

For continuous IID random variables

For n independent and identically distributed continuous random variables X₁, X₂, ..., X_n with the cumulative distribution function G(x) and a probability density function g(x), let T denote the range of them, that is, T= max(X₁, X₂, ..., X_n)- min(X₁, X₂, ..., X_n).

Distribution

The range, T, has the cumulative distribution function[3][4]

F(t)=n\int _{-\infty }^{\infty }g(x)[G(x+t)-G(x)]^{n-1}\,{\text{d}}x.

Gumbel notes that the "beauty of this formula is completely marred by the facts that, in general, we cannot express G(x + t) by G(x), and that the numerical integration is lengthy and tiresome."[3]^: 385

If the distribution of each X_i is limited to the right (or left) then the asymptotic distribution of the range is equal to the asymptotic distribution of the largest (smallest) value. For more general distributions the asymptotic distribution can be expressed as a Bessel function.[3]

Moments

The mean range is given by[5]

n\int _{0}^{1}x(G)[G^{n-1}-(1-G)^{n-1}]\,{\text{d}}G

where x(G) is the inverse function. In the case where each of the X_i has a standard normal distribution, the mean range is given by[6]

\int _{-\infty }^{\infty }(1-(1-\Phi (x))^{n}-\Phi (x)^{n})\,{\text{d}}x.

For continuous non-IID random variables

For n nonidentically distributed independent continuous random variables X₁, X₂, ..., X_n with cumulative distribution functions G₁(x), G₂(x), ..., G_n(x) and probability density functions g₁(x), g₂(x), ..., g_n(x), the range has cumulative distribution function [4]

F(t)=\sum _{i=1}^{n}\int _{-\infty }^{\infty }g_{i}(x)\prod _{j=1,j\neq i}^{n}[G_{j}(x+t)-G_{j}(x)]\,{\text{d}}x.

For discrete IID random variables

For n independent and identically distributed discrete random variables X₁, X₂, ..., X_n with cumulative distribution function G(x) and probability mass function g(x) the range of the X_i is the range of a sample of size n from a population with distribution function G(x). We can assume without loss of generality that the support of each X_i is {1,2,3,...,N} where N is a positive integer or infinity.[7][8]

Distribution

The range has probability mass function[7][9][10]

f(t)={\begin{cases}\sum _{x=1}^{N}[g(x)]^{n}&t=0\\[6pt]\sum _{x=1}^{N-t}\left({\begin{alignedat}{2}&[G(x+t)-G(x-1)]^{n}\\{}-{}&[G(x+t)-G(x)]^{n}\\{}-{}&[G(x+t-1)-G(x-1)]^{n}\\{}+{}&[G(x+t-1)-G(x)]^{n}\\\end{alignedat}}\right)&t=1,2,3\ldots ,N-1.\end{cases}}

Example

If we suppose that g(x) = 1/N, the discrete uniform distribution for all x, then we find[9][11]

f(t)={\begin{cases}{\frac {1}{N^{n-1}}}&t=0\\[4pt]\sum _{x=1}^{N-t}\left(\left[{\frac {t+1}{N}}\right]^{n}-2\left[{\frac {t}{N}}\right]^{n}+\left[{\frac {t-1}{N}}\right]^{n}\right)&t=1,2,3\ldots ,N-1.\end{cases}}

Derivation

The probability of having a specific range value, t, can be determined by adding the probabilities of having two samples differing by t, and every other sample having a value between the two extremes. The probability of one sample having a value of x is $ng(x)$ . The probability of another having a value t greater than x is:

(n-1)g(x+t).

The probability of all other values lying between these two extremes is:

\left(\int _{x}^{x+t}g(x)\,{\text{d}}x\right)^{n-2}=\left(G(x+t)-G(x)\right)^{n-2}.

Combining the three together yields:

f(t)=n(n-1)\int _{-\infty }^{\infty }g(x)g(x+t)[G(x+t)-G(x)]^{n-2}\,{\text{d}}x

Related quantities

The range is a specific example of order statistics. In particular, the range is a linear function of order statistics, which brings it into the scope of L-estimation.

References

George Woodbury (2001). An Introduction to Statistics. Cengage Learning. p. 74. ISBN 0534377556.
Carin Viljoen (2000). Elementary Statistics: Vol 2. Pearson South Africa. pp. 7–27. ISBN 186891075X.
E. J. Gumbel (1947). "The Distribution of the Range". The Annals of Mathematical Statistics. 18 (3): 384–412. doi:10.1214/aoms/1177730387. JSTOR 2235736.
Tsimashenka, I.; Knottenbelt, W.; Harrison, P. (2012). "Controlling Variability in Split-Merge Systems". Analytical and Stochastic Modeling Techniques and Applications (PDF). Lecture Notes in Computer Science. Vol. 7314. p. 165. doi:10.1007/978-3-642-30782-9_12. ISBN 978-3-642-30781-2.
H. O. Hartley; H. A. David (1954). "Universal Bounds for Mean Range and Extreme Observation". The Annals of Mathematical Statistics. 25 (1): 85–99. doi:10.1214/aoms/1177728848. JSTOR 2236514.
L. H. C. Tippett (1925). "On the Extreme Individuals and the Range of Samples Taken from a Normal Population". Biometrika. 17 (3/4): 364–387. doi:10.1093/biomet/17.3-4.364. JSTOR 2332087.
Evans, D. L.; Leemis, L. M.; Drew, J. H. (2006). "The Distribution of Order Statistics for Discrete Random Variables with Applications to Bootstrapping". INFORMS Journal on Computing. 18: 19. doi:10.1287/ijoc.1040.0105.
Irving W. Burr (1955). "Calculation of Exact Sampling Distribution of Ranges from a Discrete Population". The Annals of Mathematical Statistics. 26 (3): 530–532. doi:10.1214/aoms/1177728500. JSTOR 2236482.
Abdel-Aty, S. H. (1954). "Ordered variables in discontinuous distributions". Statistica Neerlandica. 8 (2): 61–82. doi:10.1111/j.1467-9574.1954.tb00442.x.
Siotani, M. (1956). "Order statistics for discrete case with a numerical application to the binomial distribution". Annals of the Institute of Statistical Mathematics. 8: 95–96. doi:10.1007/BF02863574.
Paul R. Rider (1951). "The Distribution of the Range in Samples from a Discrete Rectangular Population". Journal of the American Statistical Association. 46 (255): 375–378. doi:10.1080/01621459.1951.10500796. JSTOR 2280515.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] George Woodbury (2001). An Introduction to Statistics. Cengage Learning. p. 74. ISBN 0534377556.

[2] Carin Viljoen (2000). Elementary Statistics: Vol 2. Pearson South Africa. pp. 7–27. ISBN 186891075X.

[gumbel-3] E. J. Gumbel (1947). "The Distribution of the Range". The Annals of Mathematical Statistics. 18 (3): 384–412. doi:10.1214/aoms/1177730387. JSTOR 2235736.

[tsimashenka-4] Tsimashenka, I.; Knottenbelt, W.; Harrison, P. (2012). "Controlling Variability in Split-Merge Systems". Analytical and Stochastic Modeling Techniques and Applications (PDF). Lecture Notes in Computer Science. Vol. 7314. p. 165. doi:10.1007/978-3-642-30782-9_12. ISBN 978-3-642-30781-2.

[5] H. O. Hartley; H. A. David (1954). "Universal Bounds for Mean Range and Extreme Observation". The Annals of Mathematical Statistics. 25 (1): 85–99. doi:10.1214/aoms/1177728848. JSTOR 2236514.

[6] L. H. C. Tippett (1925). "On the Extreme Individuals and the Range of Samples Taken from a Normal Population". Biometrika. 17 (3/4): 364–387. doi:10.1093/biomet/17.3-4.364. JSTOR 2332087.

[evans-7] Evans, D. L.; Leemis, L. M.; Drew, J. H. (2006). "The Distribution of Order Statistics for Discrete Random Variables with Applications to Bootstrapping". INFORMS Journal on Computing. 18: 19. doi:10.1287/ijoc.1040.0105.

[8] Irving W. Burr (1955). "Calculation of Exact Sampling Distribution of Ranges from a Discrete Population". The Annals of Mathematical Statistics. 26 (3): 530–532. doi:10.1214/aoms/1177728500. JSTOR 2236482.

[aty-9] Abdel-Aty, S. H. (1954). "Ordered variables in discontinuous distributions". Statistica Neerlandica. 8 (2): 61–82. doi:10.1111/j.1467-9574.1954.tb00442.x.

[10] Siotani, M. (1956). "Order statistics for discrete case with a numerical application to the binomial distribution". Annals of the Institute of Statistical Mathematics. 8: 95–96. doi:10.1007/BF02863574.

[11] Paul R. Rider (1951). "The Distribution of the Range in Samples from a Discrete Rectangular Population". Journal of the American Statistical Association. 46 (255): 375–378. doi:10.1080/01621459.1951.10500796. JSTOR 2280515.

Range (statistics)

For continuous IID random variables

Distribution

Moments

For continuous non-IID random variables

For discrete IID random variables

Distribution

Example

Derivation

Related quantities

See also

References