Solution file for Problems 3.5 and 4.3 (GO)
-------------------------------------------

Data: measurements of compressive strength of concrete cubes. Fifteen cubes 
were produced and randomly assigned to five levels of poplypropylene fiber 
content, with three cubes per group, and fiber content ranging from 0 to 1%, 
in equidistant steps of 0.25%.

The data constitute 5 independent samples with continuous outcome, and 
the model immediately suggested is a one-way ANOVA.

Problem 3.5:
------------
We run the one-way ANOVA analysis, including checks of the assumptions of 
normality and same standard deviations in the groups.

MTB > WOpen "H:\VHM\VHM802\Data_csv\ch03pr5.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\ch03pr5.csv’
Worksheet was saved on 11/02/2011

MTB > OneWay;
SUBC>   Response 'strength';
SUBC>   Categorical 'fiber';
SUBC>   IType 0;
SUBC>   GMCI;
SUBC>   GIntPlot;
SUBC>   GFourpack;
SUBC>   TMethod;
SUBC>   TFactor;
SUBC>   TANOVA;
SUBC>   TSummary;
SUBC>   TMeans;
SUBC>   Nodefault.
One-way ANOVA: strength versus fiber 

Method
Null hypothesis         All means are equal
Alternative hypothesis  At least one mean is different
Significance level      a = 0.05
Equal variances were assumed for the analysis.

Factor Information
Factor  Levels  Values
fiber        5  0.00, 0.25, 0.50, 0.75, 1.00

Analysis of Variance
Source  DF  Adj SS  Adj MS  F-Value  P-Value
fiber    4   6.263  1.5657    12.98    0.001
Error   10   1.207  0.1207
Total   14   7.469

Model Summary
       S    R-sq  R-sq(adj)  R-sq(pred)
0.347371  83.85%     77.38%      63.65%

Means
fiber  N    Mean   StDev       95% CI
0.00   3   7.467   0.306  ( 7.020,  7.914)
0.25   3   7.567   0.306  ( 7.120,  8.014)
0.50   3   6.867   0.551  ( 6.420,  7.314)
0.75   3   6.700   0.300  ( 6.253,  7.147)
1.00   3  5.7667  0.1528  (5.3198, 6.2135)
Pooled StDev = 0.347371
 
Interval Plot of strength vs fiber 
Residual Plots for strength 

Comments:
---------
The residual plot and the table of standard deviations do not indicate
serious problems with model assumptions. The standard deviations differ
somewhat between groups but that could very well be a result of the
small within-group sample size only. Tests for homogeneity of variance
are clearly non-significant (not shown).

The ANOVA table shows a strongly significant difference between groups,
and the groups account for more than 80% of the variation in the data
(such a high R^2 is easier to achieve in a small dataset).

The estimated means and the graphical representation of confidence
intervals shows that strength tends to decrease with increasing fiber 
levels. The decline does not seem to entirely linear because both the 
first two means and the third and fourth mean are fairly similar. When
the groups are defined by quantitative values (the fiber content), it is
often less obvious to carry out multiple comparisons, and one would
instead focus on describing the relationship with the quantitative
variable. The interval plot and the table of means with 95% confidence 
intervals were already included above.


Problem 4.3:
------------
Coefficients for polynomial contrasts (with equidistant x-values and
equal group sizes) are listed in Appendix D, Table D.6. As Minitab does
not allow easy manipulation of estimates, all calculations need to be
done by hand. One alternative is to fit the linear, quadratic, cubic and
4th order regression models directly. This will give tests for each of
the polynomial contrasts as the tests for the highest order polynomial
coefficient, although these tests will be based on the reduced model
instead of the full oneway model. The table of sequential sum of squares
for the fourth order model also gives the sum of squares for each
contrast.

MTB > Regress;
SUBC>   Response 'strength';
SUBC>   Nodefault;
SUBC>   Continuous 'fiber';
SUBC>   Terms fiber;
SUBC>   Constant;
SUBC>   Unstandardized;
SUBC>   TExpand;
SUBC>   Tmethod;
SUBC>   Tanova;
SUBC>   Tsummary;
SUBC>   Tcoefficients;
SUBC>   Tequation;
SUBC>   TDiagnostics 0.
Regression Analysis: strength versus fiber 

Analysis of Variance
Source         DF  Seq SS  Contribution  Adj SS  Adj MS  F-Value  P-Value
Regression      1  5.4613        73.12%  5.4613  5.4613    35.36    0.000
  fiber         1  5.4613        73.12%  5.4613  5.4613    35.36    0.000
Error          13  2.0080        26.88%  2.0080  0.1545
  Lack-of-Fit   3  0.8013        10.73%  0.8013  0.2671     2.21    0.149
  Pure Error   10  1.2067        16.15%  1.2067  0.1207
Total          14  7.4693       100.00%

Model Summary
       S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
0.393016  73.12%     71.05%  2.63262      64.75%

Coefficients
Term        Coef  SE Coef       95% CI       T-Value  P-Value   VIF
Constant   7.727    0.176  ( 7.347,  8.106)    43.96    0.000
fiber     -1.707    0.287  (-2.327, -1.087)    -5.95    0.000  1.00
...

Comments:
---------
The expanded ANOVA table shows the lack-of-fit test to be non-
significant. This does however not preclude that some additional
polynomial terms could be of interest. Because the menu only allows
inclusion of polynomial terms up till order 3, we generate the four
polynomial terms manually and include them in order. Note that at this
point we're not terribly concerned about collinearity, so we won't
bother centring the fiber variable first. Note that the breakdown of
the sum of squares is in the column labelled Seq SS (sequential sum of
squares).

MTB > Name C4 'fiber2'
MTB > Let 'fiber2' = fiber**2
MTB > Name C5 'fiber3'
MTB > Let 'fiber3' = fiber**3
MTB > Name C6 'fiber4'
MTB > Let 'fiber4' = fiber**4
MTB > Regress;
SUBC>   Response 'strength';
SUBC>   Nodefault;
SUBC>   Continuous 'fiber' 'fiber2';
SUBC>   Terms fiber fiber2;
SUBC>   Constant;
SUBC>   Unstandardized;
SUBC>   TExpand;
SUBC>   Tmethod;
SUBC>   Tanova;
SUBC>   Tsummary;
SUBC>   Tcoefficients;
SUBC>   Tequation;
SUBC>   TDiagnostics 0.
Regression Analysis: strength versus fiber, fiber2 

Analysis of Variance
Source         DF  Seq SS  Contribution   Adj SS   Adj MS  F-Value  P-Value
Regression      2  5.9651        79.86%  5.96514  2.98257    23.79    0.000
  fiber         1  5.4613        73.12%  0.00032  0.00032     0.00    0.961
  fiber2        1  0.5038         6.75%  0.50381  0.50381     4.02    0.068
Error          12  1.5042        20.14%  1.50419  0.12535
  Lack-of-Fit   2  0.2975         3.98%  0.29752  0.14876     1.23    0.332
  Pure Error   10  1.2067        16.15%  1.20667  0.12067
Total          14  7.4693       100.00%

Model Summary
       S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
0.354047  79.86%     76.51%  2.22323      70.24%

Coefficients
Term        Coef  SE Coef       95% CI      T-Value  P-Value    VIF
Constant   7.508    0.192  ( 7.088, 7.927)    39.03    0.000
fiber      0.046    0.912  (-1.940, 2.032)     0.05    0.961  12.43
fiber2    -1.752    0.874  (-3.657, 0.152)    -2.00    0.068  12.43
...

MTB > Regress;
SUBC>   Response 'strength';
SUBC>   Nodefault;
SUBC>   Continuous 'fiber' 'fiber2' 'fiber3';
SUBC>   Terms fiber fiber2 fiber3;
SUBC>   Constant;
SUBC>   Unstandardized;
SUBC>   TExpand;
SUBC>   Tmethod;
SUBC>   Tanova;
SUBC>   Tsummary;
SUBC>   Tcoefficients;
SUBC>   Tequation;
SUBC>   TDiagnostics 0.
Regression Analysis: strength versus fiber, fiber2, fiber3 

Analysis of Variance
Source         DF   Seq SS  Contribution   Adj SS   Adj MS  F-Value  P-Value
Regression      3  5.96548        79.87%  5.96548  1.98849    14.54    0.000
  fiber         1  5.46133        73.12%  0.00059  0.00059     0.00    0.949
  fiber2        1  0.50381         6.75%  0.01858  0.01858     0.14    0.719
  fiber3        1  0.00033         0.00%  0.00033  0.00033     0.00    0.962
Error          11  1.50386        20.13%  1.50386  0.13671
  Lack-of-Fit   1  0.29719         3.98%  0.29719  0.29719     2.46    0.148
  Pure Error   10  1.20667        16.15%  1.20667  0.12067
Total          14  7.46933       100.00%

Model Summary
       S    R-sq  R-sq(adj)    PRESS  R-sq(pred)
0.369749  79.87%     74.38%  2.52835      66.15%

Coefficients
Term       Coef  SE Coef       95% CI      T-Value  P-Value     VIF
Constant  7.504    0.212  ( 7.038, 7.971)    35.41    0.000
fiber      0.14     2.16  ( -4.61,  4.89)     0.07    0.949   63.79
fiber2    -2.02     5.48  (-14.07, 10.04)    -0.37    0.719  447.43
fiber3     0.18     3.60  ( -7.75,  8.10)     0.05    0.962  200.69
...

MTB > Regress;
SUBC>   Response 'strength';
SUBC>   Nodefault;
SUBC>   Continuous 'fiber' 'fiber2' 'fiber3' 'fiber4';
SUBC>   Terms fiber fiber2 fiber3 fiber4;
SUBC>   Constant;
SUBC>   Unstandardized;
SUBC>   TExpand;
SUBC>   Tmethod;
SUBC>   Tanova;
SUBC>   Tsummary;
SUBC>   Tcoefficients;
SUBC>   Tequation;
SUBC>   TDiagnostics 0.
Regression Analysis: strength versus fiber, fiber2, fiber3, fiber4 

Analysis of Variance
Source      DF   Seq SS  Contribution  Adj SS  Adj MS  F-Value  P-Value
Regression   4  6.26267        83.85%  6.2627  1.5657    12.98    0.001
  fiber      1  5.46133        73.12%  0.2472  0.2472     2.05    0.183
  fiber2     1  0.50381         6.75%  0.3157  0.3157     2.62    0.137
  fiber3     1  0.00033         0.00%  0.2964  0.2964     2.46    0.148
  fiber4     1  0.29719         3.98%  0.2972  0.2972     2.46    0.148
Error       10  1.20667        16.15%  1.2067  0.1207
Total       14  7.46933       100.00%

Model Summary
       S    R-sq  R-sq(adj)  PRESS  R-sq(pred)
0.347371  83.85%     77.38%  2.715      63.65%

Coefficients
Term       Coef  SE Coef      95% CI      T-Value  P-Value       VIF
Constant  7.467    0.201  (7.020, 7.914)    37.23    0.000
fiber      6.41     4.48  (-3.57, 16.39)     1.43    0.183    311.81
fiber2    -36.4     22.5  (-86.5,  13.7)    -1.62    0.137   8547.15
fiber3     56.4     36.0  (-23.8, 136.5)     1.57    0.148  22678.47
fiber4    -28.1     17.9  (-68.0,  11.8)    -1.57    0.148   5747.15

Comments:
---------
The table below summarizes the information that can be extracted from
these analyses.

term        coefficient     SE      t     P       SS 
------------------------------------------------------
linear          -1.7067   0.2870  -5.95  0.000   5.461
quadratic       -1.7524   0.8741  -2.00  0.068   0.504  
cubic            0.178    3.600    0.05  0.962   0.000
quartic        -28.09    17.90    -1.57  0.148   0.297
------------------------------------------------------

From the table it is clear that the two highest order terms do not
improve the model substantially, and the choice is therefore between the
linear and quadratic model. The slope is negative, corresponding to the
already noted strong declining trend with fiber content, and the sign of
the quadratic is negative as well, corresponding to a downwards bending
curve. The Fitted Line Plot menu can be used to get a graph of the
estimated quadratic curve. With P so close to 0.05, the choice between
the two models should involve subject matter considerations (i.e.,
whether it is preferable to have a linear or quadratic prediction
equation).

Computing and evaluating the polynomial contrasts in the oneway model
(using either a calculator or another software) yields the following
table:

Contrast   Estimate   SE    SS    SS(%)   t     P(t)
------------------------------------------------------
linear      -4.267  .6342  5.461  87.2  -6.73  0.000 
quadratic   -1.533  .7504  0.504   8.0  -2.04  0.068  
cubic        0.033  .6342  0.000   0.0   0.05  0.959 
quartic     -2.633  1.678  0.297   4.7  -1.57  0.148  
------------------------------------------------------

The difference in t-values and P-values is due to use of different
SEs and DFs, as explained above.

Note that one would usually not use the Scheffe test to adjust for
contrasts derived from the data when assessing polynomial contrasts. One
could argue that the idea of splitting the variation between into groups
into orthogonal parts based on polynomial terms is universal and hence
not derived from the actual data.