In this example, the relationship between calcium intake from foods and beverages and systolic blood pressure will be modeled for adult women ages 19 years and older.
This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to w0304_16 are the weights (sample weights and Balanced Repeated Replication [BRR] weights) used in the analysis of 2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16 runs using BRR (see Module 18, Task 4 for more information). BRR uses weights w0304_1 to w0304_16.
Note: if 4 years of NHANES data are used, 32 BRR runs are required.
A SAS macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables; the macro BRR212 is created and called in this example. The BRR212 macro calls the MIXTRAN, DISTRIB, and INDIVINT macros, and calculates BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model fit. Recall that modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can account for differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts for the correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized survey procedures. The DISTRIB macro estimates the distribution of usual intake, producing estimates of percentiles and the percent of the population below a cutpoint.
The MIXTRAN, DISTRIB, and INDIVINT macros used in this example were downloaded from the NCI website. Version 1.1 of the macros was used. Check this website for macro updates before starting any analysis. Additional details regarding the macros and additional examples also may be found on the website.
Statements | Explanation |
---|---|
data demoadv; |
First, select only those people with dietary data by selecting those without missing BRR weights. |
data adultf; |
Because the INDIVINT macro also requires a dataset with 1 record per person, a dataset called adultf is created for adult females ages 19 years and older. |
data day1;
data day2;
|
The variables DR1TCALC and DR2TCALC are NHANES variables representing total calcium consumed on days 1 and 2 respectively from all foods and beverages (other than water). To create a dataset with 2 records per person, the demoadv dataset is set 2 times to create 2 datasets, one where day=1 and one where day=2. The same variable name, DRTCALC, is used for calcium on both days. It is created by setting it equal to DR1TCALC for day 1 and DR2TCALC for day 2. |
data calcium; |
Finally, these data sets are appended. |
It is important to sort the dataset by respondent and day of intake because the NLMIXED procedure uses this information to estimate the model parameters.
proc sort ; by seqn day; run ;
The BRR212 macro calls the MIXTRAN macro and computes standard errors of parameter estimates. After creating this macro and running it 1 time, it may be called several times, each time changing the macro variables.
Statements | Explanation |
---|---|
%include 'C:\NHANES\Macros\mixtran_macro_v1.1.sas' ; |
This code reads the MIXTRAN, DISTRIB, and INDIVINT macros into SAS so that these macros may be called. |
title2 "Task 2 - Calcium Intake and Systolic Blood Pressure" ; |
This title should be printed on all output. |
%macro BRR212(data, response, datam, var12, regresponse, foodtype, subject, repeat, covars_prob, covars_amt, outlib, modeltype, lambda, covars_reg, seq, weekend, vargroup, numvargroups, subgroup, start_val1, start_val2, start_val3, vcontrol, nloptions, titles, printlevel, cutpts, ncutpts, nsim_mc, byvar, t0, t1, final);
|
The start of the BRR212 macro is defined. All of the terms inside the parentheses are the macro variables. |
%MIXTRAN (data=&data, response=&response, foodtype=&foodtype, subject=&subject, repeat=&repeat, covars_prob=&covars_prob, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var=w0304_0, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups= &numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2=&start_val2, start_val3= &start_val3, vcontrol= &vcontrol, nloptions=&nloptions, titles= &titles, printlevel= &printlevel) |
Within the BRR212 macro, the MIXTRAN macro is called. All of the variables preceded by & will be defined by the BRR212 macro call. The only variable without an & is the replicate_var macro variable; it is set to w0304_0 for the first run. |
data parampred(keep = &subject a_var_u2 a_var_e a_lambda x2b2); |
Input datasets for the INDIVINT macro are created. The datasets _pred_unc_&foodtype and _param_unc_&foodtype are created in the MIXTRAN run. The syntax “if (_n_=1) then set” is a way of doing a 1-to-many merge, where the dataset _param_unc_&foodtype is merged with each row of the _pred_unc_&foodtype dataset. |
data parsubj1rec; merge parampred &datam(keep = &subject &var12); by &subject; run; |
The INDIVINT macro requires a dataset with only one line per person called parsubj1rec. This is defined by the macro variable datam in the BRR212 macro, and the response variables are defined by the macro variable var12. |
title&titles; title %eval (&titles+ 1 ); title "Task 2 - Calcium Intake and Systolic Blood Pressure" ;
|
This code clears the titles created in the DISTRIB macro and resets the title for the example. |
%DISTRIB (seed= 0 , nsim_mc= 100 ,modeltype=&modeltype, pred=& outlib.._ pred_unc_&foodtype, param= & outlib.._ param_unc_&foodtype, outlib=&outlib, subject=&subject,food=&foodtype);
|
The DISTRIB macro is used to find the lambda for the transformation of the estimate of T (true usual intake) that will be used in the regression model. |
data _dist; set & outlib..d escript_&foodtype._w0304_0; keep tpercentile1-tpercentile99; run; |
The dataset descript_&foodtype_w0304_0 is from the DISTRIB macro. The percentiles are saved from this dataset. |
data _lambda (keep=lambda sse); do i= 1 to dim(z); MZ = mean(of Z1-Z99); MT = mean(of T1-T99); STZ = sum(of CP1-CP99); lambda = best_lambda; |
This code finds the best lambda to transform the estimate of T. This is done by finding the lambda that minimizes the error sum of squares. Z is the inverse of the normal cumulative distribution function (probit). At the end of the code, the best lambda is saved as a macro variable called lamt. |
proc univariate data=&data noprint; |
The minimum amount consumed on a consumption day is calculated and saved in a dataset called outmin_amt. |
data paramsubj1rec; proc sort; by &subject; run; |
The minimum amount is added to the dataset. Zero values are set to ½ of the mimimum amount. The dataset is sorted by subject. The variable lamt is created from the macro value lamt that was created previously. |
%indivint(model12= 1 , subj1recdata=paramsubj1rec, recid=&subject, r24vars=&var12, min_amt=min_amt, var_u1=, var_u2=a_var_u2, cov_u1u2=, var_e=a_var_e, lambda=a_lambda, xbeta1=, xbeta2=x2b2, boxcox_t_lamt=y, lamt=lamt, dencalc=y, denopt=y, u1nlmix=, u2nlmix=, titles=&titles, notesprt=y); |
The INDIVINT macro is called. Because we are fitting a one-part model, the macro variable model12 is set equal to 1. This macro uses the dataset with 1 record per person (paramsubj1rec), and specifies the two 24-hour recall variables (var12). The minimum amount that was saved in the paramsubj1rec dataset is specified for the min_amt macro variable, as are the parameters that are saved in that dataset, including a_var_u2, a_var_e, and a_lambda. The macro variable boxcox_t_lamt is set to y for ‘Yes’, indicating that a Box-Cox parameter is specified. This parameter is lamt, which was defined in the paramsubj1rec dataset in the last step. The macro variables dencalc, denopt, and notesprt are set to y for ‘Yes’, indicating that denominator integration is perfomed, denominator optimization is performed, and notes are printed in the SAS log, respectively. |
data subj1recres; |
The dataset _resdata is created in the INDIVINT macro. It is merged with the dataset adultf. |
proc reg data=subj1recres outest=regout; |
The regression model is run using the expected usual intake from the INDIVINT macro, called indusint. Because boxcox_t_lamt was set to y, this variable was already transformed in the INDIVINT macro, using the value of lamt in a Box-Cox transformation. |
data diethealthout0; |
The output dataset from the regression above (regout) is set, and the parameter for the expected true usual intake is renamed to rcbeta. In addition, lamt is saved, and the variables t0boxcox and t1boxcox and their difference multipled by rcbeta are computed. The macro variables t0 and t1 are set to a range that is relevant to the data, e.g., the 10th and 90th percentile. |
proc datasets nolist; delete parampred parsubj1rec paramsubj1rec subj1recres _resdata regout; run; |
Unneeded datasets are deleted. |
%do run= 1 %to 16 ; |
This code starts a loop to run the 16 BRR runs. |
%put ~~~~~~~~~~~~~~~~~~~ Run &run ~~~~~~~~~~~~~~~~~~~~; |
The run number is printed to the log. |
%MIXTRAN (data=&data, response=&response, foodtype= &foodtype, subject=&subject, repeat=&repeat, covars_prob=&covars_prob, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var=w0304_&run, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups= &numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2=&start_val2, start_val3=&start_val3, vcontrol=&vcontrol, nloptions=&nloptions, titles= &titles, printlevel=&printlevel) |
Within the BRR212 macro, the MIXTRAN macro is called. All of the variables preceded by & will be defined by the BRR212 macro call. The only variable without an & is the replicate_var macro variable; it is set to w0304_&run where &run equals 1 to 16. |
data parampred(keep = &subject a_var_u2 a_var_e a_lambda x2b2); if (_n_ = 1 ) then set & outlib.._ param_unc_&foodtype; set & outlib.._ pred_unc_&foodtype; run; |
Input datasets for the INDIVINT macro are created. The datasets _pred_unc_&foodtype and _param_unc_&foodtype are created in the MIXTRAN run. The syntax “if (_n_=1) then set” is a way of doing a 1-to-many merge, where the dataset _param_unc_&foodtype is merged with each row of the _pred_unc_&foodtype dataset. |
data parsubj1rec; merge parampred &datam(keep = &subject &var12); by &subject; run; |
The INDIVINT macro requires a dataset with only one line per person called parsubj1rec. This is defined by the macro variable datam in the BRR212 macro, and the response variables are defined by the macro variable var12. |
proc univariate data=&data noprint; |
The minimum amount consumed on a consumption day is calculated and saved in a dataset called outmin_amt. |
data paramsubj1rec; |
The minimum amount is added to the dataset. Zero values are set to ½ of the mimimum amount. The dataset is sorted by subject. The variable lamt is created from the macro value lamt that was created previously. |
%indivint(model12= 1 , subj1recdata=paramsubj1rec, recid=&subject, r24vars=&var12, min_amt=min_amt, var_u1=, var_u2=a_var_u2, cov_u1u2=, var_e=a_var_e, lambda=a_lambda, xbeta1=, xbeta2=x2b2, boxcox_t_lamt=y, lamt=lamt, dencalc=y, denopt=y, u1nlmix=, u2nlmix=, titles=&titles, notesprt=y); |
The INDIVINT macro is called. Because we are fitting a one-part model, the macro variable model12 is set equal to 1. This macro uses the dataset with 1 record per person (paramsubj1rec), and specifies the two 24-hour recall variables (var12). The minimum amount that was saved in the paramsubj1rec dataset is specified for the min_amt macro variable, as are the parameters that are saved in that dataset, including a_var_u2, a_var_e, and a_lambda. The predicted values x2b2 from the MIXTRAN run are specified. The macro variable boxcox_t_lamt is set to y for ‘Yes’, indicating that a Box-Cox parameter is specified. This parameter is lamt, which was defined in the paramsubj1rec dataset in the last step. The macro variables dencalc, denopt, and notesprt are set to y for ‘Yes’, indicating that denominator integration is perfomed, denominator optimization is performed, and notes are printed in the SAS log, respectively. |
data subj1recres; |
The dataset _resdata is created in the INDIVINT macro. It is merged with the dataset adultf. |
proc reg data=subj1recres outest=regout; |
The regression model is run using the expected usual intake from the INDIVINT macro, called indusint. Because boxcox_t_lamt was set to y, this variable was already transformed in the INDIVINT macro, using the value of lamt in a Box-Cox transformation. Title3 is reset. |
data diethealthout; |
The output dataset from the regression above (regout) is set, and the parameter for the expected true usual intake is renamed to rcbeta. In addition, lamt is saved, and the variables t0boxcox and t1boxcox and their difference multipled by rcbeta are computed. The macro variables t0 and t1 are set to a range that is relevant to the data, e.g., the 10th and 90th percentile. |
proc append base=brr_runs data=diethealthout; |
The BRR datasets are appended into a dataset called brr_runs. |
proc datasets nolist; delete parampred parsubj1rec paramsubj1rec subj1recres _resdata regout diethealthout; run; |
After appending the information to brr_runs, diethealthout and the other datasets created in the BRR runs can be deleted. |
%end ; |
The BRR runs end. |
data _null_; |
This code creates macro calls that are a list of the variables with the prefix ‘b’ and ‘db’. |
data parambrr; |
The brr_runs dataset is set, and a new dataset is created. This dataset renames the BRR parameters to have a prefix of ‘b’ using a rename statement and the macro calls created above. |
data ss; |
The estimates from the first run are merged with the BRR estimates, and the squared difference is computed for each BRR run. |
proc means data=ss sum; |
The sum of squares is computed. |
data brr; |
The standard errors are computed. |
data wparms; |
The point estimates are merged with the standard errors. |
proc transpose data=wparms out=wparmst; |
The data are transposed. |
data &final; |
The final dataset is created, and p-values and 95% confidence intervals are computed. |
proc print; |
The final dataset is printed. |
%mend BRR212; |
The end of the BRR212 macro is indicated. |
Statements | Explanation |
---|---|
%BRR212(data=calcium, response=DRTCALC, datam=adultf, var12=dr1tcalc dr2tcalc, regresponse=mean_sbp, foodtype=Calcium, subject=seqn, repeat=day, seq=, covars_amt=ridageyr, covars_reg=ridageyr, weekend=, outlib=work, modeltype=amount, titles= 2 , printlevel= 2 , nsim_mc= 100 , t0= 424 , t1= 1200 , final=nh.m21task2) |
This code calls the BRR212 macro. The dataset calcium defined in Step 1 is used; the macro variable response for which we want to model the distribution is DRTCALC. We also have to define a dataset with one line per person, datam, which is set equal to the dataset adultf, calculated in Step 1. The response variables for this dataset are defined by macro variable var12 and equal DR1TCALC and DR2TCALC. The dependent variable for the regression is mean_sbp. The macro variable foodtype is used to label the datasets from the MIXTRAN and DISTRIB macros. The variable seqn identifies the subject, and the macro variable repeat defines the variable that identifies the repeats on the subject, which is day. The covars_amt macro variable needs to include the covariates for the usual intake model, which is defined by ridageyr. The same covariates must be put in the macro variable covars_reg; however, there may be additional variables in covars_amt that are not in covars_reg. Note that a numeric id is required for “subject” in the INDIVINT macro. The macro variable outlib specifies the library where the data are to be stored. In this case, the working directory, work, was used. Because this is a ubiquitously consumed dietary constituent, modeltype= amount is specified. This fits the amount model. The macro variable titles saves 2 lines for titles supplied by the user. The printlevel is 2, which prints the output from the NLMIXED runs and the summary. The macro variable nsim_mc is used to specify the number of pseudo-individuals for which the distribution is simulated per respondent. The macro variables t0 and t1 must be specified. In this case, the approximate 10th and 90th percentiles of the distribution were used. The variable final specifies the name of the final dataset produced.
IMPORTANT NOTE
Although the macro variable weekend is defined in the macro, this macro currently is not capable of adjusting for weekends as described earlier in the course (see Module 18, Task 3 for more information) as the DISTRIB macro does. |
Note that SAS 9.2 TS1MO will produce an error with missing data in PROC IML, which is used in the INDIVINT macro. The problem is fixed in SAS 9.2 TS2M2. The SAS Problem Note documents the problem. From the HOT FIX tab a link to download the hot fix is available. This error is not encountered in SAS 9.1.3.
Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they would not be expected to vary by more than 1%.