Continuous NHANES Web Tutorial: Specifying Weighting Parameters: Weighting in NHANES

Key Concepts About Weighting in NHANES

Weights are created in NHANES to account for the complex survey design (including oversampling), survey non-response, and post-stratification. When a sample is weighted in NHANES it is representative of the U.S. civilian noninstitutionalized Census population.

How weights are created in the Continuous NHANES

Each Sample person in the NHANES dataset is assigned a sample weight. This sample weight is created in three steps:

(1) Calculating the base weight

In general a sample person is assigned a weight that is equivalent to the reciprocal of his/her probability of selection. In other words:

Equation sample person's weight

However, calculating the base weight for a sample person in NHANES is much more complicated due to the survey's complex, multistage design. In NHANES, the following equation, which takes into account the survey design, is used to determine the base weight for a sample person:

Equation for base weight

where

Equation for final probability

(2) Adjusting for nonresponse

To the interview or exams

The base weights were adjusted for nonresponse to the in-home interview when creating interview weights and further adjusted for non-response to the MEC exam when creating exam weights.

In NHANES, an individual can be classified as a non-respondent to the interview portion of the survey and/or the exam portion. An individual is considered a non-respondent to the interview if he/she was selected to be in the sample, but did not participate in the in-home interview. Similarly, an individual who agreed to complete the interview but did not agree to, or come in for, the MEC portion of the survey is considered a non-respondent to the exam. Adjustments made for survey non-response account only for sample person interview or exam non-response, but not for component/item non-response (i.e., a sample person declined to have their blood pressure measured in the examination component but completed all other examination components).

Table of Nonresponse Rates NHANES 1999-2002 - All Ages

For more information on component/item nonresponse adjustment and re-weighting the data for analyses, see

1. Lohr, Sharon L. Sampling: Design and Analysis, pp.265-272. Duxbury Press, 1999; and

Examples of papers with re-weighted NHANES data

2. Gregg E, Sorlie P, Paulose-Ram R, Gu Q, Wolz M, Eberhardt MS, Burt VL, Engelgau MM, and Geiss LS. Prevalence of lower extremity disease among persons 40 years and older in the US with and without diabetes. Diabetes Care. 2004 Jul;27(7):1591-7.

3. Ostchega Y, Dillon CF, Lindle R, Carroll M, Hurley BF. Isokinetic leg muscle strength in older americans and its relationship to a standardized walk test: data from the national health and nutrition examination survey 1999-2000. J Am Geriatr Soc. 2004 Jun;52(6):977-82.

To NHANES subsample components

NHANES respondents are asked to participate in a variety of survey components that are statistically defined (or random) subsamples of the NHANES MEC-examined sample. These include a variety of lab, nutrition/dietary, environmental, or mental health components. (Please see the respective survey protocol/documentation for more specific information.) For example, some, but not all, participants are selected to give a fasting blood sample on the morning of their MEC exam. The subsamples selected for these components are chosen at random with a specified sampling fraction (for example, 1/2 or 1/3 of the total examined group) according to the protocol for that component. Each component subsample has its own designated weight, which accounts for the additional probability of selection into the subsample component, as well as the additional nonresponse.

An example of a component with subsamples is described below. Subsamples of NHANES environmental chemicals are most often mutually exclusive therefore it is not possible to conduct an analysis where more than one analyte from different subsamples is examined together. For example in 2005-2006, phthalates were measured in subsample “B”, but polyfluorinated compounds were measured in subsample “A”. Sometimes analytes are obtained in the same subsample and these can be analyzed together with their subsample weights. Most often these are available for analysis beginning in 2003. For example, in 2007-2008 urinary mercury and urinary arsenic were both measured in the 1/3 subsample “A”). As with all of the data files, users are encouraged to combine like subsample components across survey cycles; for example 2005-2006 heavy metals in subsample “A” and 2007-2008 heavy metals in subsample “A”. This will improve the statistical reliability of the estimate. In rare cases, there are subsamples that overlap with one another but not completely; for example the persons who are part of the 2003-2004 1/3 subsample for urinary arsenic would also be found in the ½ subsample for volatile organic compounds in blood. In this situation, the data from the subsamples cannot be combined and the sample weights cannot be used. If a user attempts to combine partially overlapping subsamples the existing 1/3 and ½ sample weights would not be appropriate for analysis.

In summary, users are encouraged to combine like subsample components across survey cycles; for example 2005-2006 heavy metals in subsample “A” and 2007-2008 heavy metals in subsample “A”. Subsample weights from the same survey cycle (e.g. 2003-2004) are not designed to be combined because many subsamples from the same survey cycle are mutually exclusive or partially overlapping. If it is necessary to combine two or more subsamples for your analyses that are mutually exclusive or partially overlapping, then appropriate weights would need to be recalculated. However, details on how to recalculate weights when combining subsamples go well beyond the scope of this tutorial. Therefore, it is strongly advised that you do not attempt to combine different subsamples from a single survey cycle in any analysis.

Diagram of Nonresponse Rates

The diagram above demonstrates the varying levels of sampling nonresponse. In the example above, the selected sample of persons age 20 and over included a total of 13,312 sample persons for the years 1999-2002. Only 10,291 of those sample persons actually completed the in-home interview. Therefore 22% of the individuals sampled did not complete the in-home interview. This is interview nonresponse. Among the 10,291 sample persons who were interviewed, only 9,471 completed the MEC exam. Therefore an additional 8% of the interviewed sample persons did not respond to the MEC exam. This is the MEC exam nonresponse. This example also shows the additional subsampling for the AM fasting blood sample.

Approximately 50% of MEC participants (4,696 persons) were partitioned to fast for 9 hours and come to the morning MEC exam. Of the 4696 persons partitioned to the morning subsample, only 4,157 actually fasted so the AM fasting sample was adjusted for the additional 11.5% nonresponse to the AM fast.

(3) Post-stratification adjustment to match 2000 U.S. Census population control totals

In addition to accounting for sample person non-response, weights are also post-stratified to match the population control totals for each sampling subdomain. This additional adjustment makes the weighted counts the same as an independent count of the Current Population Survey (CPS) of the U.S. Census.

Please see CPS website for more information: http://www.bls.gov/cps/home.htm

Summary

In summary, it is important to utilize the weights in analyses to account for the complex survey design (including oversampling), survey nonresponse, and post-stratification in order to ensure that calculated estimates are truly representative of the U.S. civilian noninstitutionalized population.