Trimmed estimator

In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers.[1] Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.

Given an estimator, the x% trimmed version is obtained by discarding the x% lowest or highest observations or on both end: it is a statistic on the middle of the data. For instance, the 5% trimmed mean is obtained by taking the mean of the 5% to 95% range. In some cases a trimmed estimator discards a fixed number of points (such as maximum and minimum) instead of a percentage.

Examples

The median is the most trimmed statistic (nominally 50%), as it discards all but the most central data, and equals the fully trimmed mean – or indeed fully trimmed mid-range, or (for odd-size data sets) the fully trimmed maximum or minimum. Likewise, no degree of trimming has any effect on the median – a trimmed median is the median – because trimming always excludes an equal number of the lowest and highest values.

Quantiles can be thought of as trimmed maxima or minima: for instance, the 5th percentile is the 5% trimmed minimum.

Trimmed estimators used to estimate a location parameter include:

Trimmed mean
Modified mean, discarding the minimum and maximum values
Interquartile mean, the 25% trimmed mean
Midhinge, the 25% trimmed mid-range

Trimmed estimators used to estimate a scale parameter include:

Interquartile range, the 25% trimmed range
Interdecile range, the 10% trimmed range

Trimmed estimators involving only linear combinations of points are examples of L-estimators.

Applications

Estimation

Most often, trimmed estimators are used for parameter estimation of the same parameter as the untrimmed estimator. In some cases the estimator can be used directly, while in other cases it must be adjusted to yield an unbiased consistent estimator.

For example, when estimating a location parameter for a symmetric distribution, a trimmed estimator will be unbiased (assuming the original estimator was unbiased), as it removes the same amount above and below. However, if the distribution has skew, trimmed estimators will generally be biased and require adjustment. For example, in a skewed distribution, the nonparametric skew (and Pearson's skewness coefficients) measure the bias of the median as an estimator of the mean.

When estimating a scale parameter, using a trimmed estimator as a robust measures of scale, such as to estimate the population variance or population standard deviation, one generally must multiply by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation.

For example, dividing the IQR by $2{\sqrt {2}}\operatorname {erf} ^{-1}(1/2)\approx 1.349$ (using the error function) makes it an unbiased, consistent estimator for the population standard deviation if the data follow a normal distribution.

Other uses

Trimmed estimators can also be used as statistics in their own right – for example, the median is a measure of location, and the IQR is a measure of dispersion. In these cases, the sample statistics can act as estimators of their own expected value. For example, the MAD of a sample from a standard Cauchy distribution is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.

References

Kaltenbach, Hans-Michael (2012). A concise guide to statistics. Heidelberg: Springer. ISBN 978-3-642-23502-3. OCLC 763157853.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Kaltenbach, Hans-Michael (2012). A concise guide to statistics. Heidelberg: Springer. ISBN 978-3-642-23502-3. OCLC 763157853.