Task 1c: How to Append NHANES Data in Stata

Here are the steps to appending NHANES data in Stata:

 

Step 1: Compare variable names and labels

The first step before appending data is to examine the contents of the data files. Using the Stata describe command, you will be able to get a list of variable names and variable labels for each data file selected. While reviewing the output of the describe command, you should compare variable names and labels to see whether any changes or differences occurred from cycle to cycle.

Code to Check Datasets' Contents

Use the use command to open a Stata dataset. Use the clear option to replace data in memory. Use the describe command to list the contents of the file.  The general format is below.

Use filename, [clear]
describe

 

Next, you will apply these codes to check the contents of four data files: Alcohol Questionnaire data from the 1999-2000 and 2001-2002 cycles; and Demographic data from the 1999-2000 and 2001-2002 cycles. You should review carefully and find out whether there are any changes between these two survey cycles in the alcohol or demographic files. Note that in Stata, the use command is employed to open each dataset separately, rather than opening all files in one command statement.

 

The following sample code opens the 1999-2000 Alcohol Questionnaire file, and list its contents.

Use "C:\NHANES\DATA\alq.dta", clear
describe

 

The following sample code opens the 2001-2002 Alcohol Questionnaire file, and list its contents.

Use "C:\NHANES\DATA\alq_b.dta", clear
describe

 

The following sample code opens the 1999-2000 Demographic file, and list its contents.

Use "C:\NHANES\DATA\demo.dta", clear
describe

 

The following sample code opens the 2001-2002 Demographic file, and list its contents.

Use "C:\NHANES\DATA\demo_b.dta", clear
describe

 

Following the sample code from above, you can compare contents from the Blood Pressure Examination files (BPX.dta and BPX_B.dta), Blood Pressure Questionnaire files (BPQ.dta and BPQ_B.dta), Medical Condition files (MCQ.dta and MCQ_B.dta), and Lab files (LAB13.dta and L13_B.dta). After reviewing the contents for any changes between cycles, you will be ready to append these data files together. 

 

Highlighted results from this demonstration:

 

 

Step 2: Append directly, if variables are identical

After carefully reviewing the Demographic, Blood Pressure Examination, Blood Pressure Questionnaire, Laboratory 13, and Medical Conditions Questionnaire files, you will find that the variables of interest in the two cycles remain the same. Therefore, you can directly append without any further changes.

Because you are interested only in a subset of the variables, you can use the keep command to select relevant variables. No output is associated with this command. You can use the list or describe commands to see content for the dataset. Additionally, you can use Windows Explorer to see that the new 4-year datasets (demo_4yr, bpx_4yr, bpq_4yr, mcq_4yr, and lab13_4yr) are in the C: /NHANES/DATA folder.

The general form of the append command is as follows:

 append using filename [, options]

Use the append command to append the 2001-2002 demographic data file (DEMO_B.dta) to the 1999-2000 demographic data file (DEMO.dta). Use the keep statement to select the variables of interest. Use the save command to save the new 4-year dataset.

Notice that in the keep statement, a variable named "seqn" is included. SEQN stands for sequence number and should be included whenever datasets are appended. SEQN is a unique identifier for each observation (participant) in NHANES. Every time you extract variables from an NHANES data file, you should include the SEQN variable in your selection. Failing to do so will lead to problems if you want to sort or merge your data files at a later time. See Append & Merge Module Task 2 for more information on Merging.

 

use C:\Nhanes\Data\demo, clear
keep seqn sddsrvyr ridstatr ridpreg sdmvpsu sdmvstra wtmec4yr riagendr ridageyr ridreth1 dmdeduc
append using C:\Nhanes\Data\demo_b,keep(seqn sddsrvyr ridstatr ridpreg sdmvpsu sdmvstra wtmec4yr riagendr ridageyr ridreth1 dmdeduc)
save C:\Nhanes\Data\demo_4yr, replace

 

Use the append command to append the 2001-2002 blood pressure examination data file (BPX_B.dta) to the 1999-2000 blood pressure examination data file (BPX.dta). Use the keep statement to select the variables of interest. Use the save command to save the new 4-year dataset.

use "C:\Nhanes\Data\bxq"
keep seqn bpxsy1 bpxsy2 bpxsy3 bpxsy4 bpxdi1 bpxdi2 bpxdi3 bpxdi4
append using C:\Nhanes\Data\bpq_b, keep(seqn bpxsy1 bpxsy2 bpxsy3 bpxsy4 bpxdi1 bpxdi2 bpxdi3 bpxdi4)
save "C:\Nhanes\Data\bpx_4yr", replace

 

Use the append command to append the 2001-2002 blood pressure questionnaire data file (BPQ_B.dta) to the 1999-2000 blood pressure questionnaire data file (BPQ.dta). Use the keep statement to select the variables of interest. Use the save command to save the new 4-year dataset.

use "C:\Nhanes\Data\bpq"
keep seqn bpq010 bpq020 bpq030 bpq050a bpq070 bpq080 bpq100d
append using C:\Nhanes\Data\bpq_b, keep(seqn bpq010 bpq020 bpq030 bpq050a bpq070 bpq080 bpq100d)
save "C:\Nhanes\Data\bpq_4yr", replace

 

Use the append command to append the 2001-2002 medical conditions questionnaire data file (MCQ_B.dta) to the 1999-2000 medical conditions data file (MCQ.dta). Use the keep statement to select the variables of interest. Use the save command to save the new 4-year dataset.

use "C:\Nhanes\Data\mcq"
keep seqn mcq160b mcq160c mcq160d mcq160e mcq160f
append using C:\Nhanes\Data\mcq_b, keep(seqn mcq160b mcq160c mcq160d
mcq160e mcq160f)
save "C:\Nhanes\Data\mcq_4yr", replace

 

Use the append command to append the 2001-2002 laboratory data file (LAB13_B.dta) to the 1999-2000 laboratory data file (LAB13.dta). Use the keep statement to select the variables of interest. Use the save command to save the new 4-year dataset.

use "C:\Nhanes\Data\lab13"
keep seqn lbxtc
append using C:\Nhanes\Data\l13_b, keep(seqn lbxtc)
save "C:\Nhanes\Data\lab13_4yr", replace

 

 

 

Step 3: Rename variables and/or recode variables before appending, if variables are different

Because the 1999-2000 Alcohol Questionnaire data files contains a variable (ALQ100) that was subsequently renamed in 2001-2002 (ALD100), you will need to rename the variable first and then append the data. If the response categories of the variables are different, you will also need to recode.

You will see in the code that the variable ALD100 in the 2001-2002 Alcohol Questionnaire data file was renamed to ALQ100, the same as the variable name in the 1999-2000 Alcohol Questionnaire data file. After renaming the 2001-2002 variable, you will be ready to append the data files with selected variables of interest.

use "C:\Nhanes\Data\alq_b"
rename ald100 alq100
save "C:\Nhanes\Data\alq_b", replace

 

use "C:\Nhanes\Data\alq"
append using C:\Nhanes\Data\alq_b
save "C:\Nhanes\Data\alq_4yr", replace

 

No output is associated with this procedure. You can use the list or describe commands to see content for the dataset.

 

 

Step 4: Check results

After appending the data files, it is a good idea to check the contents again to make sure that the files were appended correctly. Use the describe command, as demonstrated in Step 1, to check the combined files. Please consult the code in the previous steps, above for further instruction, if necessary.

Double check variable names and labels, and make sure that variables are renamed correctly. Pay special attention to the number of observations in the combined dataset, which should be the sum of the observations in the two data files. 

Highlighted results from the describe command are: