Solution file for Problems 3.5 and 4.3 (GO) ------------------------------------------- Data: measurements of compressive strength of concrete cubes. Fifteen cubes were produced and randomly assigned to five levels of poplypropylene fiber content, with three cubes per group, and fiber content ranging from 0 to 1%, in equidistant steps of 0.25%. The data constitute 5 independent samples with continuous outcome, and the model immediately suggested is a one-way ANOVA. Problem 3.5: ------------ We run the one-way ANOVA analysis, including checks of the assumptions of normality and same standard deviations in the groups. MTB > WOpen "H:\VHM\VHM802\Data_csv\ch03pr5.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘H:\VHM\VHM802\Data_csv\ch03pr5.csv’ Worksheet was saved on 11/02/2011 MTB > OneWay; SUBC> Response 'strength'; SUBC> Categorical 'fiber'; SUBC> IType 0; SUBC> GMCI; SUBC> GIntPlot; SUBC> GFourpack; SUBC> TMethod; SUBC> TFactor; SUBC> TANOVA; SUBC> TSummary; SUBC> TMeans; SUBC> Nodefault. One-way ANOVA: strength versus fiber Method Null hypothesis All means are equal Alternative hypothesis At least one mean is different Significance level a = 0.05 Equal variances were assumed for the analysis. Factor Information Factor Levels Values fiber 5 0.00, 0.25, 0.50, 0.75, 1.00 Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value fiber 4 6.263 1.5657 12.98 0.001 Error 10 1.207 0.1207 Total 14 7.469 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.347371 83.85% 77.38% 63.65% Means fiber N Mean StDev 95% CI 0.00 3 7.467 0.306 ( 7.020, 7.914) 0.25 3 7.567 0.306 ( 7.120, 8.014) 0.50 3 6.867 0.551 ( 6.420, 7.314) 0.75 3 6.700 0.300 ( 6.253, 7.147) 1.00 3 5.7667 0.1528 (5.3198, 6.2135) Pooled StDev = 0.347371 Interval Plot of strength vs fiber Residual Plots for strength Comments: --------- The residual plot and the table of standard deviations do not indicate serious problems with model assumptions. The standard deviations differ somewhat between groups but that could very well be a result of the small within-group sample size only. Tests for homogeneity of variance are clearly non-significant (not shown). The ANOVA table shows a strongly significant difference between groups, and the groups account for more than 80% of the variation in the data (such a high R^2 is easier to achieve in a small dataset). The estimated means and the graphical representation of confidence intervals shows that strength tends to decrease with increasing fiber levels. The decline does not seem to entirely linear because both the first two means and the third and fourth mean are fairly similar. When the groups are defined by quantitative values (the fiber content), it is often less obvious to carry out multiple comparisons, and one would instead focus on describing the relationship with the quantitative variable. The interval plot and the table of means with 95% confidence intervals were already included above. Problem 4.3: ------------ Coefficients for polynomial contrasts (with equidistant x-values and equal group sizes) are listed in Appendix D, Table D.6. As Minitab does not allow easy manipulation of estimates, all calculations need to be done by hand. One alternative is to fit the linear, quadratic, cubic and 4th order regression models directly. This will give tests for each of the polynomial contrasts as the tests for the highest order polynomial coefficient, although these tests will be based on the reduced model instead of the full oneway model. The table of sequential sum of squares for the fourth order model also gives the sum of squares for each contrast. MTB > Regress; SUBC> Response 'strength'; SUBC> Nodefault; SUBC> Continuous 'fiber'; SUBC> Terms fiber; SUBC> Constant; SUBC> Unstandardized; SUBC> TExpand; SUBC> Tmethod; SUBC> Tanova; SUBC> Tsummary; SUBC> Tcoefficients; SUBC> Tequation; SUBC> TDiagnostics 0. Regression Analysis: strength versus fiber Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 5.4613 73.12% 5.4613 5.4613 35.36 0.000 fiber 1 5.4613 73.12% 5.4613 5.4613 35.36 0.000 Error 13 2.0080 26.88% 2.0080 0.1545 Lack-of-Fit 3 0.8013 10.73% 0.8013 0.2671 2.21 0.149 Pure Error 10 1.2067 16.15% 1.2067 0.1207 Total 14 7.4693 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.393016 73.12% 71.05% 2.63262 64.75% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 7.727 0.176 ( 7.347, 8.106) 43.96 0.000 fiber -1.707 0.287 (-2.327, -1.087) -5.95 0.000 1.00 ... Comments: --------- The expanded ANOVA table shows the lack-of-fit test to be non- significant. This does however not preclude that some additional polynomial terms could be of interest. Because the menu only allows inclusion of polynomial terms up till order 3, we generate the four polynomial terms manually and include them in order. Note that at this point we're not terribly concerned about collinearity, so we won't bother centring the fiber variable first. Note that the breakdown of the sum of squares is in the column labelled Seq SS (sequential sum of squares). MTB > Name C4 'fiber2' MTB > Let 'fiber2' = fiber**2 MTB > Name C5 'fiber3' MTB > Let 'fiber3' = fiber**3 MTB > Name C6 'fiber4' MTB > Let 'fiber4' = fiber**4 MTB > Regress; SUBC> Response 'strength'; SUBC> Nodefault; SUBC> Continuous 'fiber' 'fiber2'; SUBC> Terms fiber fiber2; SUBC> Constant; SUBC> Unstandardized; SUBC> TExpand; SUBC> Tmethod; SUBC> Tanova; SUBC> Tsummary; SUBC> Tcoefficients; SUBC> Tequation; SUBC> TDiagnostics 0. Regression Analysis: strength versus fiber, fiber2 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 2 5.9651 79.86% 5.96514 2.98257 23.79 0.000 fiber 1 5.4613 73.12% 0.00032 0.00032 0.00 0.961 fiber2 1 0.5038 6.75% 0.50381 0.50381 4.02 0.068 Error 12 1.5042 20.14% 1.50419 0.12535 Lack-of-Fit 2 0.2975 3.98% 0.29752 0.14876 1.23 0.332 Pure Error 10 1.2067 16.15% 1.20667 0.12067 Total 14 7.4693 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.354047 79.86% 76.51% 2.22323 70.24% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 7.508 0.192 ( 7.088, 7.927) 39.03 0.000 fiber 0.046 0.912 (-1.940, 2.032) 0.05 0.961 12.43 fiber2 -1.752 0.874 (-3.657, 0.152) -2.00 0.068 12.43 ... MTB > Regress; SUBC> Response 'strength'; SUBC> Nodefault; SUBC> Continuous 'fiber' 'fiber2' 'fiber3'; SUBC> Terms fiber fiber2 fiber3; SUBC> Constant; SUBC> Unstandardized; SUBC> TExpand; SUBC> Tmethod; SUBC> Tanova; SUBC> Tsummary; SUBC> Tcoefficients; SUBC> Tequation; SUBC> TDiagnostics 0. Regression Analysis: strength versus fiber, fiber2, fiber3 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 3 5.96548 79.87% 5.96548 1.98849 14.54 0.000 fiber 1 5.46133 73.12% 0.00059 0.00059 0.00 0.949 fiber2 1 0.50381 6.75% 0.01858 0.01858 0.14 0.719 fiber3 1 0.00033 0.00% 0.00033 0.00033 0.00 0.962 Error 11 1.50386 20.13% 1.50386 0.13671 Lack-of-Fit 1 0.29719 3.98% 0.29719 0.29719 2.46 0.148 Pure Error 10 1.20667 16.15% 1.20667 0.12067 Total 14 7.46933 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.369749 79.87% 74.38% 2.52835 66.15% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 7.504 0.212 ( 7.038, 7.971) 35.41 0.000 fiber 0.14 2.16 ( -4.61, 4.89) 0.07 0.949 63.79 fiber2 -2.02 5.48 (-14.07, 10.04) -0.37 0.719 447.43 fiber3 0.18 3.60 ( -7.75, 8.10) 0.05 0.962 200.69 ... MTB > Regress; SUBC> Response 'strength'; SUBC> Nodefault; SUBC> Continuous 'fiber' 'fiber2' 'fiber3' 'fiber4'; SUBC> Terms fiber fiber2 fiber3 fiber4; SUBC> Constant; SUBC> Unstandardized; SUBC> TExpand; SUBC> Tmethod; SUBC> Tanova; SUBC> Tsummary; SUBC> Tcoefficients; SUBC> Tequation; SUBC> TDiagnostics 0. Regression Analysis: strength versus fiber, fiber2, fiber3, fiber4 Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 4 6.26267 83.85% 6.2627 1.5657 12.98 0.001 fiber 1 5.46133 73.12% 0.2472 0.2472 2.05 0.183 fiber2 1 0.50381 6.75% 0.3157 0.3157 2.62 0.137 fiber3 1 0.00033 0.00% 0.2964 0.2964 2.46 0.148 fiber4 1 0.29719 3.98% 0.2972 0.2972 2.46 0.148 Error 10 1.20667 16.15% 1.2067 0.1207 Total 14 7.46933 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.347371 83.85% 77.38% 2.715 63.65% Coefficients Term Coef SE Coef 95% CI T-Value P-Value VIF Constant 7.467 0.201 (7.020, 7.914) 37.23 0.000 fiber 6.41 4.48 (-3.57, 16.39) 1.43 0.183 311.81 fiber2 -36.4 22.5 (-86.5, 13.7) -1.62 0.137 8547.15 fiber3 56.4 36.0 (-23.8, 136.5) 1.57 0.148 22678.47 fiber4 -28.1 17.9 (-68.0, 11.8) -1.57 0.148 5747.15 Comments: --------- The table below summarizes the information that can be extracted from these analyses. term coefficient SE t P SS ------------------------------------------------------ linear -1.7067 0.2870 -5.95 0.000 5.461 quadratic -1.7524 0.8741 -2.00 0.068 0.504 cubic 0.178 3.600 0.05 0.962 0.000 quartic -28.09 17.90 -1.57 0.148 0.297 ------------------------------------------------------ From the table it is clear that the two highest order terms do not improve the model substantially, and the choice is therefore between the linear and quadratic model. The slope is negative, corresponding to the already noted strong declining trend with fiber content, and the sign of the quadratic is negative as well, corresponding to a downwards bending curve. The Fitted Line Plot menu can be used to get a graph of the estimated quadratic curve. With P so close to 0.05, the choice between the two models should involve subject matter considerations (i.e., whether it is preferable to have a linear or quadratic prediction equation). Computing and evaluating the polynomial contrasts in the oneway model (using either a calculator or another software) yields the following table: Contrast Estimate SE SS SS(%) t P(t) ------------------------------------------------------ linear -4.267 .6342 5.461 87.2 -6.73 0.000 quadratic -1.533 .7504 0.504 8.0 -2.04 0.068 cubic 0.033 .6342 0.000 0.0 0.05 0.959 quartic -2.633 1.678 0.297 4.7 -1.57 0.148 ------------------------------------------------------ The difference in t-values and P-values is due to use of different SEs and DFs, as explained above. Note that one would usually not use the Scheffe test to adjust for contrasts derived from the data when assessing polynomial contrasts. One could argue that the idea of splitting the variation between into groups into orthogonal parts based on polynomial terms is universal and hence not derived from the actual data.