Solution file for additional exercise 10.7
------------------------------------------

Data on excretion of salsolinol measured over 4 consecutive days, for 
alcoholized persons, divided into two groups based on degree of addiction.
Group 1 contains 6 persons, and group 2 contains 8 persons. Note that
the datafile codes persons as 1-6 in group 1 and 1-8 in group 2; you
might want to create unique person id's although that is not strictly
required for analysis in Minitab.

These are repeated measures or longitudinal data because measurements are taken
on the same person (experimental unit), and in this case over time.
Notation:

y_ijk = salsolinol value for patient j in patient group i measured at day k,
   i = 1,2 (group),
   j = 1,..., n_i (patient; n_1=6, n_2=8),
   k = 1,2,3,4 (day),

The first step is to construct profile plots over time for each person.

MTB > WOpen "H:\VHM\VHM802\Data_csv\hs10_7.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\hs10_7.csv’
Worksheet was saved on 03/03/2011

MTB > Plot 'salsolinol'*'day';
SUBC>   Symbol 'person';
SUBC>   Connect 'person';
SUBC>   Panel 'group'.
Scatterplot of salsolinol vs day 

Comments:
---------
The plot seems to indicate more variability for higher values of the outcome
than for lower values. To confirm this first impression, we fit a split-plot
model with persons as replicates within groups, and look at the residuals.

MTB > GLM;
SUBC>   Response 'salsolinol';
SUBC>   Nodefault;
SUBC>   Categorical 'group' 'person' 'day';
SUBC>   Nest person(group);
SUBC>   Random person;
SUBC>   Terms group day person group*day;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: salsolinol versus group, person, day 

Method
Factor coding  (-1, 0, +1)

Factor Information
Factor         Type    Levels  Values
group          Fixed        2  1, 2
person(group)  Random      14  1(1), 2(1), 3(1), 4(1), 5(1), 6(1), 1(2), 2(2), 3(2), 4(2), 5(2),
                               6(2), 7(2), 8(2)
day            Fixed        4  1, 2, 3, 4

Analysis of Variance
Source           DF   Seq SS  Contribution   Adj SS  Adj MS  F-Value  P-Value
  group           1    6.565         2.79%    6.565   6.565     1.95    0.188
  day             3   14.888         6.32%   11.723   3.908     0.85    0.476
  person(group)  12   40.329        17.12%   40.329   3.361     0.73    0.713
  group*day       3    8.029         3.41%    8.029   2.676     0.58    0.631
Error            36  165.692        70.36%  165.692   4.603
Total            55  235.503       100.00%

Model Summary
      S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
2.14536  29.64%      0.00%  398.019       0.00%

Fits and Diagnostics for Unusual Observations
                                                      Std    Del
Obs  salsolinol   Fit  SE Fit     95% CI     Resid  Resid  Resid       HI  Cook’s D    DFITS
 22        6.34  2.82    1.31  (0.16, 5.49)   3.52   2.07   2.18  0.37500      0.13  1.68713  R
 45        7.80  2.76    1.26  (0.21, 5.32)   5.04   2.90   3.26  0.34375      0.22  2.36129  R
 56        8.10  3.77    1.26  (1.22, 6.32)   4.33   2.49   2.70  0.34375      0.16  1.95545  R
R  Large residual

Expected Mean Squares, using Adjusted SS
                  Expected Mean Square for
   Source         Each Term
1  group          (5) + 4.0000 (3) + Q[1, 4]
2  day            (5) + Q[2, 4]
3  person(group)  (5) + 4.0000 (3)
4  group*day      (5) + Q[4]
5  Error          (5)

Variance Components, using Adjusted SS
Source           Variance  % of Total    StDev  % of Total
person(group)  -0.310441*       0.00%  0.00000       0.00%
Error            4.60255      100.00%  2.14536     100.00%
Total            4.60255               2.14536
* Value is negative, and is estimated by zero.
 
Residual Plots for salsolinol 

Comments:
---------
The plot of residuals vs fitted values shows a clear cone or fan shape (increasing
variation with increasing fitted values). Also, the residuals extend further on the 
positive than the negative side, and the histogram indicates a right-skewed distribution. 
We try a log-transformation (below), and the residuals look much better. It may of interest
to explore which transformations are best for these data, but we will here work only with
the logs. The boxcox command in Stata and the General Linear Model menu (below) are 
only for fixed effects models, but may nevertheless give an indication of a suitable 
transformation, and this analysis also suggests a log transformation, with an optimal 
lambda of -0.16 and hence fairly close to 0 and with 0 included in the 95% CI.

MTB > GLM;
SUBC>   Response 'salsolinol';
SUBC>   Nodefault;
SUBC>   Categorical 'group' 'person' 'day';
SUBC>   Nest person(group);
SUBC>   Terms group day person group*day;
SUBC>   Boxcox;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TCoefficients;
SUBC>   TFactor;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK.
General Linear Model: salsolinol versus group, person, day 

Box-Cox transformation
Rounded ?               0
Estimated ?             -0.155716
95% CI for ?            (-0.466216, 0.153784)
...

MTB > Name C5 'lnsals'
MTB > Let 'lnsals' = ln('salsolinol')
MTB > Plot 'lnsals'*'day';
SUBC>   Symbol 'person';
SUBC>   Connect 'person';
SUBC>   Panel 'group'.
Scatterplot of lnsals vs day 

Comments:
---------
The profiles now appear equally noisy across the range of lnsals values.
There are no obvious person effects, and nor are average effects of
group or day clearly seen. The results for the split-plot model are shown
further below; the model checks look much better now.

As our first analysis, we analyse the data separately for each day. This
requires the data structure to be changed from long to wide.

MTB > Unstack ('lnsals' 'group');
SUBC>   Subscripts 'day';
SUBC>   NewWS;
SUBC>   VarNames.

Results for: Worksheet 2

MTB > OneWay;
SUBC>   Response 'lnsals_1';
SUBC>   Categorical 'group_1';
SUBC>   IType 0;
SUBC>   GMCI;
SUBC>   TMethod;
SUBC>   TFactor;
SUBC>   TANOVA;
SUBC>   TSummary;
SUBC>   TMeans;
SUBC>   Nodefault.
One-way ANOVA: lnsals_1 versus group_1 

Method
Null hypothesis         All means are equal
Alternative hypothesis  At least one mean is different
Significance level      a = 0.05
Equal variances were assumed for the analysis.

Factor Information
Factor   Levels  Values
group_1       2  1, 2

Analysis of Variance
Source   DF   Adj SS  Adj MS  F-Value  P-Value
group_1   1   0.1738  0.1738     0.15    0.708
Error    12  14.2197  1.1850
Total    13  14.3935

Model Summary
      S   R-sq  R-sq(adj)  R-sq(pred)
1.08856  1.21%      0.00%       0.00%

Means
group_1  N    Mean  StDev       95% CI
1        6  -0.110  1.186  (-1.078, 0.859)
2        8   0.116  1.013  (-0.723, 0.954)

Pooled StDev = 1.08856

Results for time points 2-4, using similar commands:
---------------------------------------------------
One-way ANOVA: lnsals_2 versus group_1 

Analysis of Variance
Source   DF   Adj SS   Adj MS  F-Value  P-Value
group_1   1  0.06873  0.06873     0.09    0.765
Error    12  8.79155  0.73263
Total    13  8.86028

       S   R-sq  R-sq(adj)  R-sq(pred)
0.855938  0.78%      0.00%       0.00%

Means
group_1  N   Mean  StDev       95% CI
1        6  0.103  1.046  (-0.659, 0.864)
2        8  0.244  0.689  (-0.415, 0.904)
Pooled StDev = 0.855938

 
One-way ANOVA: lnsals_3 versus group_1 

Analysis of Variance
Source   DF  Adj SS  Adj MS  F-Value  P-Value
group_1   1  0.4618  0.4618     0.76    0.399
Error    12  7.2593  0.6049
Total    13  7.7211

       S   R-sq  R-sq(adj)  R-sq(pred)
0.777781  5.98%      0.00%       0.00%

Means
group_1  N   Mean  StDev       95% CI
1        6  0.356  0.705  (-0.336, 1.047)
2        8  0.723  0.826  ( 0.123, 1.322)
Pooled StDev = 0.777781

 
One-way ANOVA: lnsals_4 versus group_1 

Analysis of Variance
Source   DF  Adj SS  Adj MS  F-Value  P-Value
group_1   1   1.214  1.2142     1.79    0.206
Error    12   8.140  0.6783
Total    13   9.354

       S    R-sq  R-sq(adj)  R-sq(pred)
0.823606  12.98%      5.73%       0.00%

Means
group_1  N   Mean  StDev       95% CI
1        6  0.441  0.789  (-0.291, 1.174)
2        8  1.036  0.848  ( 0.402, 1.671)
Pooled StDev = 0.823606

Comments:
---------
The 4 separate analyses for each day show no significant differences between the
two groups. Only the last one (day 4) gets even close to something
interesting. There is no sign of problems with heterogeneous variances.
Instead of the one-way ANOVA, we could also have used 2-sample t-tests.

The profile plot (above) does not suggest any obvious response features.
We therefore drop this part here.

The split-plot model may be appropriate for the relatively short series of 4 measures
per person. It takes
  * persons = whole plots,
  * groups = whole plot factor,
  * dayly measurements = sub-plots,
  * day = sub-plot factor,
  * no blocks - the persons are replications within groups.
Another way of say this is that the data may be viewed as having hierarchical 
structure with measurements within persons. The statistical model is

y_ijk = mu + alpha_i + (AB)_ij + gamma_k + (alpha gamma)_ik + eps_ijk, 
      where AB_ij's are assumed i.i.d. N(0,sigma_AB^2), and
      where eps_ijk's are assumed i.i.d. N(0,sigma^2),

Results for: hs10_7.csv

MTB > Name C6 "SRES".
MTB > GLM;
SUBC>   Response 'lnsals';
SUBC>   Nodefault;
SUBC>   Categorical 'group' 'person' 'day';
SUBC>   Nest person(group);
SUBC>   Random person;
SUBC>   Terms group day person group*day;
SUBC>   Means group day group*day;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2;
SUBC>  GFOURPACK;
SUBC>  SResiduals 'SRES'.
General Linear Model: lnsals versus group, person, day 

Factor Information
Factor         Type    Levels  Values
group          Fixed        2  1, 2
person(group)  Random      14  1(1), 2(1), 3(1), 4(1), 5(1), 6(1), 1(2), 2(2), 3(2), 4(2), 5(2),
                               6(2), 7(2), 8(2)
day            Fixed        4  1, 2, 3, 4

Analysis of Variance
Source           DF   Seq SS  Contribution   Adj SS  Adj MS  F-Value  P-Value
  group           1   1.5136         3.33%   1.5136  1.5136     2.48    0.141
  day             3   5.0968        11.22%   4.6327  1.5442     1.79    0.167
  person(group)  12   7.3142        16.10%   7.3142  0.6095     0.71    0.736
  group*day       3   0.4050         0.89%   0.4050  0.1350     0.16    0.925
Error            36  31.0963        68.46%  31.0963  0.8638
Total            55  45.4258       100.00%

       S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
0.929401  31.54%      0.00%  76.1600       0.00%

Fits and Diagnostics for Unusual Observations
                                                      Std    Del
Obs  lnsals    Fit  SE Fit       95% CI      Resid  Resid  Resid       HI  Cook’s D    DFITS
  5   1.668  0.141   0.569  (-1.013, 1.296)  1.526   2.08   2.18  0.37500      0.13  1.69134  R
 22   1.847  0.296   0.569  (-0.858, 1.451)  1.551   2.11   2.22  0.37500      0.13  1.72182  R
 45   2.054  0.531   0.545  (-0.574, 1.636)  1.523   2.02   2.12  0.34375      0.11  1.53372  R
R  Large residual

Expected Mean Squares, using Adjusted SS
                  Expected Mean Square for
   Source         Each Term
1  group          (5) + 4.0000 (3) + Q[1, 4]
2  day            (5) + Q[2, 4]
3  person(group)  (5) + 4.0000 (3)
4  group*day      (5) + Q[4]
5  Error          (5)

Means
Term       Fitted Mean
group
  1            0.19750
  2            0.52971
day
  1            0.00297
  2            0.17350
  3            0.53913
  4            0.73883
group*day
  1 1         -0.10961
  1 2          0.10271
  1 3          0.35563
  1 4          0.44128
  2 1          0.11555
  2 2          0.24429
  2 3          0.72263
  2 4          1.03638

Variance Components, using Adjusted SS
Source            Variance  % of Total     StDev  % of Total
person(group)  -0.0635683*       0.00%  0.000000       0.00%
Error            0.863786      100.00%  0.929401     100.00%
Total            0.863786               0.929401
* Value is negative, and is estimated by zero.
 
Residual Plots for lnsals 

MTB > FacPlot 'lnsals';
SUBC>   Factors group day;
SUBC>   GInt;
SUBC>     Full.
Interaction Plot for lnsals 

MTB > NormTest 'SRES'.
Probability Plot of SRES 
The P-value for the Anderson-Darling test of normality is 0.211.

MTB > Name c9 "ByVar1" c10 "ByVar2" c11 "Mean1"
MTB > Statistics 'lnsals';
SUBC>   By 'group' 'person';
SUBC>   NoEmpty;
SUBC>   GValues 'ByVar1'-'ByVar2';
SUBC>   Mean 'Mean1'.

MTB > Name C10 "RESI".
MTB > OneWay;
SUBC>   Response 'Mean1';
SUBC>   Categorical 'ByVar1';
SUBC>   IType 0;
SUBC>   GMCI;
SUBC>   GFourpack;
SUBC>   TMethod;
SUBC>   TFactor;
SUBC>   TANOVA;
SUBC>   TSummary;
SUBC>   TMeans;
SUBC>   Residuals 'RESI';
SUBC>   Nodefault.
One-way ANOVA: Mean1 versus ByVar1 

Analysis of Variance
Source  DF  Adj SS  Adj MS  F-Value  P-Value
ByVar1   1  0.3784  0.3784     2.48    0.141
Error   12  1.8285  0.1524
Total   13  2.2069

       S    R-sq  R-sq(adj)  R-sq(pred)
0.390356  17.15%     10.24%       0.00%

Means
ByVar1  N   Mean  StDev       95% CI
1       6  0.198  0.273  (-0.150, 0.545)
2       8  0.530  0.456  ( 0.229, 0.830)
Pooled StDev = 0.390356
 
Residual Plots for Mean1 

MTB > NormTest 'RESI'.
Probability Plot of RESI 
The P-value for the Anderson-Darling test of normality is 0.661.

Comments:
---------
The residuals look nice both when plotted against the fitted values and in the
normal plot, and there are no alarmingly extreme values. This is true both for 
the sub-plot and the whole plot residuals, even if the latter seem to have somewhat
more variation in group 2 than in group 1. The ANOVA table shows most effects 
to be insignificant. The whole-plot variation seems to be very small (negative
estimated variance component which should therefore be set to 0), but we keep the
whole-plots in the model anyway. The group*day interaction seems to be very small -
presumably a disappointment for the researchers, because it would seem to be the
most interesting effect, depending though on the interpretation of days (days
after some event?). There are weak but clearly insignificant effects of groups
and days. The estimated means show group 2 to have higher values, and the values
to be increasing with days. The interaction plot shows a roughly linear
relationship with days, with almost parallel curves. To study this further, we 
fit a model with a linear day effect
 
MTB > GLM;
SUBC>   Response 'lnsals';
SUBC>   Nodefault;
SUBC>   Continuous 'day';
SUBC>   Categorical 'group' 'person';
SUBC>   Unstandardized;
SUBC>   Nest person(group);
SUBC>   Random person;
SUBC>   Terms group day person day*group;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2.
General Linear Model: lnsals versus day, group, person 

Factor Information
Factor         Type    Levels  Values
group          Fixed        2  1, 2
person(group)  Random      14  1(1), 2(1), 3(1), 4(1), 5(1), 6(1), 1(2), 2(2), 3(2), 4(2), 5(2),
                               6(2), 7(2), 8(2)

Analysis of Variance
Source           DF   Seq SS  Contribution   Adj SS   Adj MS  F-Value  P-Value
  group           1   1.5136         3.33%   0.0000  0.00001     0.00    0.998  x
  day             1   4.9850        10.97%   4.5404  4.54040     5.80    0.021
  person(group)  12   7.3142        16.10%   7.3142  0.60951     0.78    0.668
  day*group       1   0.3056         0.67%   0.3056  0.30565     0.39    0.536
Error            40  31.3075        68.92%  31.3075  0.78269
Total            55  45.4258       100.00%
x Not an exact F-test.

       S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
0.884697  31.08%      5.23%  63.4590       0.00%

Fits and Diagnostics for Unusual Observations
                                                      Std    Del
Obs  lnsals    Fit  SE Fit       95% CI      Resid  Resid  Resid        HI  Cook’s D    DFITS
  5   1.668  0.163   0.504  (-0.857, 1.182)  1.505   2.07   2.16  0.325000      0.13  1.50162  R
 22   1.847  0.296   0.450  (-0.613, 1.205)  1.551   2.04   2.12  0.258333      0.09  1.25307  R
 45   2.054  0.459   0.490  (-0.531, 1.448)  1.595   2.16   2.28  0.306250      0.13  1.51165  R
R  Large residual

Expected Mean Squares, using Adjusted SS
                  Expected Mean Square
   Source         for Each Term
1  group          (5) + 0.6667 (3) + Q[1]
2  day            (5) + Q[2, 4]
3  person(group)  (5) + 4.0000 (3)
4  day*group      (5) + Q[4]
5  Error          (5)

Variance Components, using Adjusted SS
Source            Variance  % of Total     StDev  % of Total
person(group)  -0.0432939*       0.00%  0.000000       0.00%
Error            0.782688      100.00%  0.884697     100.00%
Total            0.782688               0.884697
* Value is negative, and is estimated by zero.

Comment:
--------
In random effects models, two nested models cannot be compared as easily using an 
F-test as for usual linear models. If a statistical test is desired, one may use
the more advanced theory for tests based on the likelihood function (mixed command in 
Stata or proc Mixed in SAS). In this case, however, it is quite clear that the linear 
day effect is very appropriate. It explains 4.985/5.097=98% of the variation between
days, and most of the (very small) variation in day*group. There is no need of
non-parallel slopes here, so we remove them from the model.


MTB > GLM;
SUBC>   Response 'lnsals';
SUBC>   Nodefault;
SUBC>   Continuous 'day';
SUBC>   Categorical 'group' 'person';
SUBC>   Unstandardized;
SUBC>   Nest person(group);
SUBC>   Random person;
SUBC>   Terms group day person;
SUBC>   Means group;
SUBC>   TExpand;
SUBC>   TMethod;
SUBC>   TAnova;
SUBC>   TSummary;
SUBC>   TFactor;
SUBC>   TEMS;
SUBC>   TVariance;
SUBC>   TMeans;
SUBC>   TDiagnostics 0;
SUBC>   Rtype 2.
General Linear Model: lnsals versus day, group, person 

Factor Information
Factor         Type    Levels  Values
group          Fixed        2  1, 2
person(group)  Random      14  1(1), 2(1), 3(1), 4(1), 5(1), 6(1), 1(2), 2(2), 3(2), 4(2), 5(2),
                               6(2), 7(2), 8(2)

Analysis of Variance
Source           DF  Seq SS  Contribution  Adj SS  Adj MS  F-Value  P-Value
  group           1   1.514         3.33%   1.514  1.5136     2.48    0.141
  day             1   4.985        10.97%   4.985  4.9850     6.47    0.015
  person(group)  12   7.314        16.10%   7.314  0.6095     0.79    0.657
Error            41  31.613        69.59%  31.613  0.7711
Total            55  45.426       100.00%

       S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
0.878096  30.41%      6.64%  59.8399       0.00%

Fits and Diagnostics for Unusual Observations
                                                      Std    Del
Obs  lnsals    Fit  SE Fit       95% CI      Resid  Resid  Resid        HI  Cook’s D    DFITS
  5   1.668  0.048   0.466  (-0.894, 0.990)  1.620   2.18   2.29  0.282143      0.12  1.43345  R
 22   1.847  0.258   0.442  (-0.635, 1.151)  1.589   2.09   2.19  0.253571      0.10  1.27622  R
 45   2.054  0.545   0.466  (-0.397, 1.487)  1.509   2.03   2.11  0.282143      0.11  1.32463  R
 56   2.092  0.545   0.466  (-0.397, 1.487)  1.547   2.08   2.17  0.282143      0.11  1.36153  R
R  Large residual

Expected Mean Squares, using Adjusted SS
                  Expected Mean Square
   Source         for Each Term
1  group          (4) + 4.0000 (3) + Q[1]
2  day            (4) + Q[2]
3  person(group)  (4) + 4.0000 (3)
4  Error          (4)

Means

Term   Fitted Mean
group
  1       0.197503
  2       0.529714

           Data
Covariate  Mean  StDev
day        2.50   1.13

Variance Components, using Adjusted SS
Source            Variance  % of Total     StDev  % of Total
person(group)  -0.0403851*       0.00%  0.000000       0.00%
Error            0.771053      100.00%  0.878096     100.00%
Total            0.771053               0.878096
* Value is negative, and is estimated by zero.

Comments:
---------
The resulting model has a significant regression coefficient for days:
the ln-salsolinol levels increase by 0.27 units per day, or equivalently it can
be said that the salsolinol levels increase by a factor of exp(0.27)=1.3 per
day. There is perhaps an indication of higher levels in group 2 than in group 1,
but it is non-significant (P=0.14).

One assumption of the split-plot analysis that cannot easily be checked in
Minitab is that the correlation within persons is independent of time. As the
person-variability is estimated as very low, it means that observations on
the same person are essentially independent. A repeated measures ANOVA or a
full mixed model analysis is required to assess the assumption, see the
SAS-program hs10_7.sas or the Stata do-file hs10_7.do.