SAS Programs for Analyzing Individual Responses in Controlled Trials Will G Hopkins Sportscience 22, 1-10, 2018 (sportsci.org/2018/sasir.htm) Institute of Sport Exercise and Active Living, Victoria University, Melbourne, Australia. Email. Reviewer: Alan M Batterham, School of Health and Social Care, University of Teesside, Middlesbrough, UK. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Analysis of
single pre- and post-tests Analysis of
two post-tests with differences in error between groups Analysis of
two post-tests with changes in individual responses between tests Analysis of
four post-tests with changes in individual responses & errors between
tests When researchers analyze the effect of a treatment on a sample of individuals, they derive a probability value or confidence interval that describes uncertainty in the mean effect of the treatment, not the uncertainty in the effect on a given individual. Indeed, in a large-scale study, a treatment could be beneficial with an extremely high level of statistical significance or a vanishingly narrow confidence interval, yet the treatment could be harmful for a substantial proportion of the population. Accounting for individual responses to treatments is therefore an important issue, especially as ever cheaper genotyping and pervasive monitoring provide researchers with subject characteristics that could help explain individual responses and thereby permit personalized targeting of training and other treatments to improve health or performance (Hopkins, 2015). I was recently invited to take part in a symposium on individual differences in the fitness response to changes in habitual physical activity. I contributed to issues surrounding design and especially analysis for assessment of individual responses in controlled trials. My contribution included programs written in the language of the Statistical Analysis System (SAS), which were too long and technical to append to the publication arising from the symposium (in preparation). Readers should consult that publication to gain an understanding of individual responses in controlled trials, especially when the response is endurance fitness and the treatment is an increase in habitual physical activity. The present article represents a repository of the programs that will be useful to anyone also wishing to analyze data from a controlled trial with SAS. There are five programs: the first two are for a simple controlled trial consisting of single pre- and post-tests in an experimental and control group, one without and one with gender as a modifier; the other three increase in complexity to account for multiple post-tests and the potential for differences and changes between groups and tests not only in the individual responses but also in errors of measurement. All programs are written to analyze change scores from a single pre-test. If your data include more than one pre-test, you should average each individual's pre-test scores. It is possible to analyze original scores rather than change scores, but it is more difficult to understand and specify the fixed and random effects, especially the modifying effect of the pre-test scores and the effects of mediators. The programs, which appear below, can be copied from the docx version of this article and modified to suit your data. For an explanation of the mysteries of mixed modeling, peruse the suite of materials on mixed modeling with SAS available at this site, where you will also learn how to access, install, and run the free University Edition of SAS Studio. I also devised five matching simulation programs in SAS to check that the models correctly estimate not only the mean effect of a treatment and its individual responses, but also the modifying effect of the pre-test and of any other subject characteristics. These simulations can also be used to check on the precision of the estimates of these effects for a given sample size and population parameters. The simulations for the two simplest programs include bootstrapping to check the confidence limits for individual responses expressed as a standard deviation and to derive confidence limits for individual responses expressed as proportions of positive, trivial, and negative responders and as probabilities that a given individual is a positive, trivial, and negative responder. The simulation programs are written in Microsoft Word and are available via links. Copy the code directly into the editor window of the main SAS package or the code window of SAS Studio to run it. Then choose different values for one or more parameters highlighted at the top of the programs. Note that the number of bootstrapped samples is set to only 100. If you set it to the necessary 3000, the program may take hours to run or even run out of memory, depending on the resources of your computer. If you choose 3000, choose also a small sample size (e.g., 20 per group) and a small number of simulated samples (e.g., 5). The remainder of this article consists of an explanation of the format of the data for use with the programs, followed by each program and a link to its matching simulation. Data formatThe programs for a simple controlled trial are written for a dataset as shown below, consisting of a control and experimental group of mixed gender, a numeric subject characteristic X (e.g., hours per week of moderate-vigorous activity), and single pre and post-test measurements of maximum oxygen uptake (VO2maxPre and VO2maxPost) for each subject. In the simulation programs, for simplicity VO2max is generated as a normally distributed variable that does not need log transformation, but with real data log transformation is often appropriate. The characteristic X is generated as a normally distributed variable with a mean and SD of 1, as if it were the base-2 logarithm of moderate-vigorous activity with a mean of 2 h.wk-1 and a factor SD of ´¤¸2.0. You should consider whether it is appropriate to log-transform a predictor variable before including in the analysis.
The programs for a simple controlled trial are written for the above dataset, consisting of a control and experimental group of mixed gender, a numeric subject characteristic X (e.g., hours per week of moderate-vigorous activity), and single pre and post-test measurements of maximum oxygen uptake (VO2maxPre and VO2maxPost) for each subject. In the simulation programs, for simplicity VO2max is generated as a normally distributed variable that does not need log transformation, but with real data log transformation is often appropriate. The characteristic X is generated as a normally distributed variable with a mean and SD of 1, as if it were the base-2 logarithm of moderate-vigorous activity with a mean of 2 h.wk-1 and a factor SD of ´¤¸2.0. You should consider whether it is appropriate to log-transform a predictor variable before including in the analysis. Analysis of single pre- and post-testsA data step is required to create the change scores (VOzmaxDelta) and a dummy variable (xVarExp) for estimating the standard deviation for individual responses. (xVar is short for extra variance.) Assume the above dataset is called dat1. data
dat2; set
dat1; VOzmaxDelta=VO2maxPost-VO2maxPre; xVarExp=0; if
Group="Exptal" then xVarExp=1; Here is the simplest possible mixed model, without any estimation and adjustment for covariates and with default 95% confidence limits: proc
mixed data=dat2 covtest cl nobound; class
Group; model
VO2maxDelta=Group/ddfm=sat; random
xVarExp/subject=SubjectID s cl; lsmeans
Group/diff cl; The nobound in the proc mixed statement is required to allow for negative variance, which is the only way to properly account for the possibility that sampling variation or a ceiling effect or other phenomenon could result in less variation in change scores in the experimental group than in the control group. The covtest and cl result in output of estimates of the random-effect variances and their confidence limits. The ddfm=sat option in the model statement sets the denominator degrees of freedom to the Satterthwaite value. SAS should have made this the default. The random statement is equivalent to, and can be replaced by, random xVarExp*SubjectID, from which it is perhaps easier to see that the change score for each subject in the experimental group has a unique value additional to the residual (which is estimated by default). The options s cl in the random statement provide the "solution" for the random effect: the response of each individual (to which the mean effect must be added), with confidence limits. The output from Proc Mixed shows a variance and confidence limits for xVarExp. These have to be processed to SD by changing the sign of any negative values before taking the square root, then making the resulting SD negative. When accounting for the numeric modifiers, it simplifies programming to rescale their means to 0 and their SD to 0.5 within each gender: proc
sort data=dat2; by
Gender; proc
standard data=dat2 mean=0 std=0.5 out=dat3; var
VO2maxPre X; by
Gender; In the remaining code, I have inserted a line break before most option slashes (/), so that the lines of code wrap reasonably well when this article is viewed in a browser. SAS treats such line breaks and other invisible punctuation as spaces. Note that SAS Studio cannot read smart quotes (‘…’ and “…”), so if you are writing code in Microsoft Word for pasting into the code page in Studio, turn off smart quotes. Here is the mixed model that analyzes the change scores, after specification of a macro variable to define a 90% level of confidence. It includes ods statements to suppress the usual output from Proc Mixed and to output the results as datasets for subsequent processing (not shown). %let
alpha=0.1; *90% confidence intervals; ods
select none; *for running in SAS Studio; ods
listing close; *for running in the main SAS package; proc
mixed data=dat3 covtest cl alpha=&alpha nobound; class
Group Gender; model
VO2maxDelta=Group Group*VO2maxPre Group*X Group*Gender random
xVarExp lsmeans
Group*Gender lsmeans
Group*Gender estimate
"Effect of 2SD of VO2maxPre:"; estimate
" Control 2SD of VO2maxPre"
Group*VO2maxPre 1 0 estimate
" Exptal 2SD of VO2maxPre"
Group*VO2maxPre 0 1 estimate
" Expt-Cont 2SD of
VO2maxPre" Group*VO2maxPre -1 1 estimate
"Effect of 2SD of X:"; estimate
" Control 2SD of X" Group*X
1 0 estimate
" Exptal 2SD of X" Group*X 0
1 estimate
" Expt-Cont 2SD of X"
Group*X -1 1 estimate
"Effect of Gender:"; estimate "
Female Expt-Control" Group -1 1 Group*Gender -1 0 1 0 estimate "
Male Exptal-Control" Group -1 1 Group*Gender 0 -1 0 1 estimate
" Control F-M" Group*Gender
1 -1 0 0 estimate
" Exptal F-M" Group*Gender 0
0 1 -1 estimate
" Expt-Cont F-M" Group*Gender
-1 1 1 -1 ods
output covparms=cov; ods
output estimates=est; ods
output lsmeans=lsm; ods
output diffs=lsmdiff; ods
output solutionr=solr; run; ods
listing; ods
select all; Download the simulation program for a simple RCT with gender included as above or without gender (effected mainly by replacing "female" and "male" with "both"). To estimate separate individual responses and residual errors for the two genders, replace the above random statement with these two lines: random xVarExp/subject=SubjectID s cl
alpha=&alpha group=Gender; repeated/group=Gender; However, I recommend removing gender from the model entirely and running separate analyses for the gender subgroups, by sorting the dataset (if you haven't already)… proc
sort data=dat3; by Gender; …then adding by Gender; before run; at the end of the Proc Mixed statements. You should perform similar separate analyses for any other subgroups. In this way you will properly account for and estimate separate errors, individual responses, treatment effects and modifying effects of covariates without the challenge of the more complex coding. Compare the errors and effects using the combine/compare effects spreadsheet at Sportscience (Hopkins, 2006). The spreadsheet for a pre-post parallel groups trial at this site (Hopkins, 2017) performs an analysis with results identical to those of the above program, when the effects of gender are analyzed separately. The spreadsheet is a far more sensible option for analysis than SAS, when you do not have short-term repeats to account for any difference in post-test error between control and experimental groups. Analysis of two post-tests with differences in error between groupsThe following programs are for a similar dataset, but with two post-tests. VO2maxPost is replaced by VO2maxPost1 and there is an extra column of values for VO2maxPost2. The analysis accounts for a difference between control and experimental groups in the post-test error of measurement, which could confound estimation of the standard deviation representing individual responses. It is assumed that the two post-tests are sufficiently close together that there is no change in the individual responses between the post-tests. Perform the rescaling of VO2maxPre and X first: proc
standard data=dat1 mean=0 std=0.5 out=dat2; var
VO2maxPre X; by
Gender; Next, generate change scores VO2maxDelta and a variable Time: data
dat3; set
dat2; xVarExp=0; if
Group="Exptal" then xVarExp=1; VO2maxDelta=VO2maxPost1-VO2maxPre; Time="Post1"; output; VO2maxDelta=VO2maxPost2-VO2maxPre; Time="Post2"; output; run; Here is the Proc Mixed code, with additions to the previous program shown in bold. The effect of gender has been removed, on the assumption that you will perform separate analyses for females and males. %let
alpha=0.1; *90% confidence intervals; ods
select none; *for running in SAS Studio; ods
listing close; *for running in the main SAS package; proc
mixed data=dat3 covtest cl alpha=&alpha nobound; class
Group Time; model VO2maxDelta=Group Group*Time Group*VO2maxPre Group*X random intercept
xVarExp/subject=SubjectID s cl alpha=α repeated/group=Group; lsmeans
Group/diff=control('Control') alpha=α estimate
"Effect of 2SD of VO2maxPre:"; estimate
" Control 2SD of VO2maxPre"
Group*VO2maxPre 1 0 estimate
" Exptal 2SD of VO2maxPre"
Group*VO2maxPre 0 1 estimate "
Expt-Cont 2SD of VO2maxPre" Group*VO2maxPre -1 1 estimate
"Effect of 2SD of X:"; estimate
" Control 2SD of X" Group*X
1 0/cl alpha=α estimate
" Exptal 2SD of X" Group*X 0
1/cl alpha=α estimate
" Expt-Cont 2SD of X"
Group*X -1 1/cl alpha=α ods
output covparms=cov; ods
output estimates=est; ods
output lsmeans=lsm; ods
output diffs=lsmdiff; ods
output solutionr=solr; run; ods
listing; ods
select all; The model statement includes the Group*Time interaction, in case there is a change in the mean between the two post-tests, but code is not included to estimate the mean changes at the two time points. The modifying effects of VO2maxPre and X are also assumed not to change between the two post-tests. (See the next program for the extra code to estimate the mean changes in the two post-tests and code that allows for and estimates changes in the modifying effects.) The term random intercept generates the variance for pre-test error (adjusted downwards by inclusion of VO2maxpre as a moderator) plus any individual responses that occur equally in the control and experimental groups. Such individual responses represent stable random changes in the performance of individual subjects that would occur in the absence of any intervention. The random effect for xVarExp is the net individual responses–the responses resulting specifically from the intervention. The term repeated/group=Group generates different post-test residuals (errors of measurement) in the two groups. (The group= option with the repeated and random statements is one of the many unique features that sets SAS above SPSS and R.) The lsmeans statement generates the mean changes in each group and the difference in the changes. Download the simulation program for an RCT with difference in error. Analysis of two post-tests with changes in individual responses between testsWhen the two post-tests are sufficiently separated in time for the individual responses to change, the aim of the analysis is to estimate the individual responses in each post-test and the individual responses that are sustained or shared between the two post-tests. The code for rescaling VO2maxPre and X is the same, but two new dummy variables are defined in the data step generating the change scores (shown in bold). data
dat3; set
dat2; xVarExpPost1=0; xVarExpPost2=0; VO2maxDelta=VO2maxPost1-VO2maxPre; Time="Post1"; if
Group="Exptal" then do; xVarExpPost1=1; xVarExpPost2=0; end; output; VO2maxDelta=VO2maxPost2-VO2maxPre; Time="Post2"; if
Group="Exptal" then do; xVarExpPost1=0; xVarExpPost2=1; end; output; run; Here is the code for Proc Mixed, with changes from the previous program in bold. proc
mixed data=dat3 covtest cl alpha=&alpha nobound; class
Group Time; model
VO2maxDelta=Group Group*Time Group*Time*VO2maxPre
random
int/subject=SubjectID s cl alpha=α random xVarExpPost1
xVarExpPost2 repeated/group=Time; lsmeans
Group/diff=control('Control') alpha=α estimate
"Least-squares mean changes:"; estimate "Control @
Post1" Group 1 0 Group*Time 1 0 0 0 estimate "Exptal @
Post1" Group 0 1 Group*Time 0 0 1 0 estimate
"Exptal-Control" Group -1 1 Group*Time -1 0 1 0 estimate ""; estimate "Control @
Post2" Group 1 0 Group*Time 0 1 0 0 estimate "Exptal @
Post2" Group 0 1 Group*Time 0 0 0 1 estimate
"Exptal-Control" Group -1 1 Group*Time 0 -1 0 1 estimate ""; estimate
"Effect of 2SD of VO2maxPre @ Post1:"; estimate "
Control 2SD of VO2maxPre" Group*Time*VO2maxPre 1 0 0 0 estimate "
Exptal 2SD of VO2maxPre" Group*Time*VO2maxPre 0 0 1 0 estimate "
Exp-Con 2SD of VO2maxPre" Group*Time*VO2maxPre -1 0 1 0 estimate
"Effect of 2SD of X @ Post1:"; estimate
" Control 2SD of X" Group*Time*X 1 0 0 0 estimate
" Exptal 2SD of X" Group*Time*X 0 0 1 0 estimate
" Exp-Con 2SD of X" Group*Time*X -1 0 1 0 ods
output covparms=cov; ods
output estimates=est; ods
output lsmeans=lsm; ods
output diffs=lsmdiff; ods
output solutionr=solr; run; The model statement includes interactions of the covariates VO2maxPre and X with Time to allow for different modifying effects in the two post-tests. The estimate statements show estimation of the modifying effects only in the first post-test. Similar code can be written for estimating modification in the second post-test, and to compare the modification in the two post-tests, if required. The random statements now separate the intercept and dummy variables, because the dummy variables require the unstructured covariance matrix (type=un) to allow them to be correlated and to allow estimation of their common variance (the sustained or shared individual responses) as covariance. As before, random intercept generates the variance for adjusted pre-test error plus any individual responses that occur equally in the control and experimental groups. The variances defined by xVarExpPost1 and xVarExpPost2 are the net individual responses due to the treatment in the first and second post-tests. The repeated statement now generates different post-test residuals (errors of measurement) in the first and second post-tests, which are assumed the same in the control and experimental groups. Finally, the changes in the means in the first and second post-tests are obtained with estimate statements. It is possible but more awkward to use lsmeans/diff statements to generate these. An lsmestimate statement can also be used (not shown). To estimate the one-time-only contribution to these individual responses and their confidence limits, a third dummy variable, xVarExp, is needed, with values of 1 for both post-tests in the experimental group and 0 for both post-tests in the control group. The above two random statements are then replaced with the following single random statement: random int xVarExpPost1
xVarExp xVarExpPost2 The type=un option is no longer required, because all three dummies now specify independent random effects. Download the simulation program for an RCT with change in individual responses. Analysis of four post-tests with changes in individual responses & errors between testsRepeats of the post-tests (with a short period between the first and repeated tests at Post1 and Post2) are required to account for and estimate differences and changes in individual responses and post-test errors of measurement. Assume variables VO2maxPost1 and VO2maxPost2 are now replaced by VO2maxPost1a, VO2maxPost1b, VO2maxPost2a, and VO2maxPost2b, The following data step generates change scores from the pre-test for each of these post-tests, and a new variable Rep (standing for each repetition), with changes from the previous data step in bold: data
dat3; set
dat2; xVarExpPost1=0; xVarExpPost2=0; Time="Post1"; if
Group="Exptal" then do; xVarExpPost1=1; xVarExpPost2=0; end; Rep="a"; VO2maxDelta=VO2maxPost1a-VO2maxPre; output; Rep="b"; VO2maxDelta=VO2maxPost1b-VO2maxPre; output; Time="Post2"; if
Group="Exptal" then do; xVarExpPost1=0; xVarExpPost2=1; end; Rep="a"; VO2maxDelta=VO2maxPost2a-VO2maxPre; output; Rep="b"; VO2maxDelta=VO2maxPost2b-VO2maxPre; output; run; Here is the code for Proc Mixed, with changes from the previous program in bold: proc
mixed data=dat3 covtest cl alpha=&alpha nobound; class
Group Time Rep; model
VO2maxDelta=Group Group*Time Group*Time*Rep
random
Time/subject=SubjectID s cl
alpha=&alpha type=un; random
xVarExpPost1 xVarExpPost2 /subject=SubjectID s cl alpha=&alpha
type=un; repeated/group=Group*Time; lsmeans
Group/diff=control('Control') alpha=α estimate
"Least-squares mean changes:"; estimate
"Control @ Post1" Group 1 0 Group*Time 1 0 0 0 estimate
"Exptal @ Post1" Group 0 1 Group*Time 0 0 1 0 estimate
"Exptal-Control" Group -1 1 Group*Time -1 0 1 0 estimate
""; estimate
"Control @ Post2" Group 1 0 Group*Time 0 1 0 0 estimate
"Exptal @ Post2" Group 0 1 Group*Time 0 0 0 1 estimate
"Exptal-Control" Group -1 1 Group*Time 0 -1 0 1 estimate
""; estimate
"Effect of 2SD of VO2maxPre @ Post1:"; estimate
" Control 2SD of VO2maxPre" estimate
" Exptal 2SD of VO2maxPre" estimate
" Expt-Cont 2SD of
VO2maxPre" estimate
"Effect of 2SD of X @ Post1:"; estimate
" Control 2SD of X"
Group*Time*X 1 0 0 0 estimate
" Exptal 2SD of X"
Group*Time*X 0 0 1 0 estimate
" Expt-Cont 2SD of X"
Group*Time*X -1 0 1 0 ods
output covparms=cov; ods
output estimates=est; ods
output lsmeans=lsm; ods
output diffs=lsmdiff; ods
output solutionr=solr; run; The fixed effect Group*Time*Rep allows for changes in the mean between repetitions, but the code for estimating such changes is not shown. The estimate statements average the changes across reps. The term random Time
with subject=SubjectID and type=un provides estimates of a
between-subject variance at Post1 and Post2, and their covariance. These
represent the individual responses that occur equally in the control and
experimental groups at Post1 and Post2, and the individual responses
sustained (shared) between post1 and post2; pre-test error adjusted downwards
by inclusion of the VO2maxPre moderator contributes to each of these. The
other random effect estimates the individual responses due solely to the
treatment, as before. The repeated statement with group=Group*Time estimates separate errors of measurement in control and experimental groups at Post1 and Post2. Download
the simulation program for an RCT with differences and
changes in errors and individual responses. ReferencesHopkins WG (2015). Individual
responses made easy. Journal of Applied Physiology 118, 1444-1446 Acknowledgements: I thank Claude Bouchard for the opportunity to
participate in the symposium on individual responses and the Pennington
Biomedical Research Center for funding travel and accommodation to Baton
Rouge. Thanks also to Alan Batterham for reviewing the manuscript. Published Jan 2018. |