. * do-file for lecture 9 of VHM 802, Winter 2016 . version 14 /* works also with version 13 */ . set more off . cd "h:\vhm\vhm802\data_csv" h:\vhm\vhm802\data_csv . . * Floral scent example . import delimited florallong.csv, clear (4 vars, 42 obs) . encode tx, gen(Tx) . anova time id per Tx Number of obs = 42 R-squared = 0.8970 Root MSE = 6.46827 Adj R-squared = 0.7778 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 6923.9317 22 314.72417 7.52 0.0000 | id | 6135.3106 20 306.76553 7.33 0.0000 per | 779.00183 1 779.00183 18.62 0.0004 Tx | 3.1290194 1 3.1290194 0.07 0.7874 | Residual | 794.93171 19 41.838511 -----------+---------------------------------------------------- Total | 7718.8634 41 188.26496 . . * Example 13.12 . import delimited ch13ta4.csv, clear (9 vars, 54 obs) . xi: boxcox yield i.tx i.period i.cow i.tx _Itx_1-3 (naturally coded; _Itx_1 omitted) i.period _Iperiod_1-3 (naturally coded; _Iperiod_1 omitted) i.cow _Icow_1-18 (naturally coded; _Icow_1 omitted) Fitting comparison model Iteration 0: log likelihood = -385.98591 Iteration 1: log likelihood = -384.22867 Iteration 2: log likelihood = -384.22712 Iteration 3: log likelihood = -384.22712 Fitting full model Iteration 0: log likelihood = -314.69985 Iteration 1: log likelihood = -309.99214 Iteration 2: log likelihood = -309.52468 Iteration 3: log likelihood = -309.52451 Iteration 4: log likelihood = -309.52451 Number of obs = 54 LR chi2(21) = 149.41 Log likelihood = -309.52451 Prob > chi2 = 0.000 ------------------------------------------------------------------------------ yield | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /theta | -.0033826 .3020123 -0.01 0.991 -.5953159 .5885506 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coef. -------------+-------------- Notrans | _Itx_2 | .1283452 _Itx_3 | .20625 _Iperiod_2 | -.1368166 _Iperiod_3 | -.3236702 _Icow_2 | .3283877 _Icow_3 | .2962175 _Icow_4 | .28772 _Icow_5 | .15384 _Icow_6 | .1675437 _Icow_7 | .179774 _Icow_8 | .1756407 _Icow_9 | .1328141 _Icow_10 | .1065849 _Icow_11 | .1235255 _Icow_12 | .0969087 _Icow_13 | .0610728 _Icow_14 | -.0698756 _Icow_15 | .0134546 _Icow_16 | -.0703826 _Icow_17 | -.1146751 _Icow_18 | -.0336909 _cons | 7.090896 -------------+-------------- /sigma | .0523218 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic P-value H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- theta = -1 -314.64566 10.24 0.001 theta = 0 -309.52458 0.00 0.991 theta = 1 -314.69985 10.35 0.001 --------------------------------------------------------- . gen lnyield=ln(yield) . anova lnyield tx period cow Number of obs = 54 R-squared = 0.9372 Root MSE = .069653 Adj R-squared = 0.8959 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 2.315333 21 .11025395 22.73 0.0000 | tx | .40999477 2 .20499738 42.25 0.0000 period | .99806797 2 .49903398 102.86 0.0000 cow | .90727032 17 .05336884 11.00 0.0000 | Residual | .15524921 32 .00485154 -----------+---------------------------------------------------- Total | 2.4705823 53 .04661476 . * splitting cow effects into squares and cows within squares . anova lnyield square tx period cow|square Number of obs = 54 R-squared = 0.9372 Root MSE = .069653 Adj R-squared = 0.8959 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 2.315333 21 .11025395 22.73 0.0000 | square | .62302927 5 .12460585 25.68 0.0000 tx | .40999477 2 .20499738 42.25 0.0000 period | .99806797 2 .49903398 102.86 0.0000 cow|square | .28424105 12 .02368675 4.88 0.0002 | Residual | .15524921 32 .00485154 -----------+---------------------------------------------------- Total | 2.4705823 53 .04661476 . anova lnyield square tx period cow /* same model fit but wrong SS for squares */ Number of obs = 54 R-squared = 0.9372 Root MSE = .069653 Adj R-squared = 0.8959 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 2.315333 21 .11025395 22.73 0.0000 | square | .10313425 5 .02062685 4.25 0.0045 tx | .40999477 2 .20499738 42.25 0.0000 period | .99806797 2 .49903398 102.86 0.0000 cow | .28424105 12 .02368675 4.88 0.0002 | Residual | .15524921 32 .00485154 -----------+---------------------------------------------------- Total | 2.4705823 53 .04661476 . anova lnyield tx period cow square /* square effects empty */ Number of obs = 54 R-squared = 0.9372 Root MSE = .069653 Adj R-squared = 0.8959 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 2.315333 21 .11025395 22.73 0.0000 | tx | .40999477 2 .20499738 42.25 0.0000 period | .99806797 2 .49903398 102.86 0.0000 cow | .90727032 17 .05336884 11.00 0.0000 square | 0 0 | Residual | .15524921 32 .00485154 -----------+---------------------------------------------------- Total | 2.4705823 53 .04661476 . * period effects nested in squares . anova lnyield square tx period|square cow|square Number of obs = 54 R-squared = 0.9584 Root MSE = .06831 Adj R-squared = 0.8999 Source | Partial SS df MS F Prob>F --------------+---------------------------------------------------- Model | 2.3679239 31 .07638464 16.37 0.0000 | square | .62302927 5 .12460585 26.70 0.0000 tx | .40999477 2 .20499738 43.93 0.0000 period|square | 1.0506588 12 .0875549 18.76 0.0000 cow|square | .28424105 12 .02368675 5.08 0.0005 | Residual | .1026584 22 .00466629 --------------+---------------------------------------------------- Total | 2.4705823 53 .04661476 . anova lnyield square tx period##square cow|square Number of obs = 54 R-squared = 0.9584 Root MSE = .06831 Adj R-squared = 0.8999 Source | Partial SS df MS F Prob>F --------------+---------------------------------------------------- Model | 2.3679239 31 .07638464 16.37 0.0000 | square | .62302927 5 .12460585 26.70 0.0000 tx | .40999477 2 .20499738 43.93 0.0000 period | .99806797 2 .49903398 106.94 0.0000 period#square | .05259081 10 .00525908 1.13 0.3868 cow|square | .28424105 12 .02368675 5.08 0.0005 | Residual | .1026584 22 .00466629 --------------+---------------------------------------------------- Total | 2.4705823 53 .04661476 . . * Carton example . import delimited ch11ex2.csv, clear (4 vars, 400 obs) . keep if glue==1 & operator==1 (380 observations deleted) . anova strength machine Number of obs = 20 R-squared = 0.6296 Root MSE = 3.85669 Adj R-squared = 0.2963 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 252.87771 9 28.097523 1.89 0.1679 | machine | 252.87771 9 28.097523 1.89 0.1679 | Residual | 148.74058 10 14.874058 -----------+---------------------------------------------------- Total | 401.61828 19 21.137804 . * likelihood-based analysis, with diagnostics . mixed strength || machine:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -56.967725 Iteration 1: log restricted-likelihood = -56.966358 Iteration 2: log restricted-likelihood = -56.966358 Computing standard errors: Mixed-effects REML regression Number of obs = 20 Group variable: machine Number of groups = 10 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(0) = . Log restricted-likelihood = -56.966358 Prob > chi2 = . ------------------------------------------------------------------------------ strength | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 156.743 1.185275 132.24 0.000 154.4199 159.0661 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ machine: Identity | var(_cons) | 6.611732 7.410896 .7349005 59.48425 -----------------------------+------------------------------------------------ var(Residual) | 14.87406 6.651881 6.190996 35.73538 ------------------------------------------------------------------------------ LR test vs. linear model: chibar2(01) = 0.95 Prob >= chibar2 = 0.1645 . predict pred, fitted . predict stdres, rstandard . scatter stdres pred . qnorm stdres . predict pred_ref, reffects . bysort machine: generate within_n=_n . qnorm pred_ref if within_n==1 . swilk pred_ref if within_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ pred_ref | 10 0.96555 0.531 -1.023 0.84677 . . * Lab example, dilution 1 . import delimited add9ex1_2.csv, clear (3 vars, 30 obs) . encode lab, gen(Lab) . anova phenol Lab if dilu==1 Number of obs = 10 R-squared = 0.9144 Root MSE = .371484 Adj R-squared = 0.8459 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 7.3700002 4 1.8425001 13.35 0.0070 | Lab | 7.3700002 4 1.8425001 13.35 0.0070 | Residual | .69000011 5 .13800002 -----------+---------------------------------------------------- Total | 8.0600004 9 .8955556 . * likelihood-based analysis . mixed phenol || lab: if dilu==1, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -10.192733 Iteration 1: log restricted-likelihood = -10.192733 Computing standard errors: Mixed-effects REML regression Number of obs = 10 Group variable: lab Number of groups = 5 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(0) = . Log restricted-likelihood = -10.192733 Prob > chi2 = . ------------------------------------------------------------------------------ phenol | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 6.7 .4292435 15.61 0.000 5.858698 7.541302 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ lab: Identity | var(_cons) | .8522501 .6528823 .1898838 3.825129 -----------------------------+------------------------------------------------ var(Residual) | .138 .0872789 .0399514 .4766793 ------------------------------------------------------------------------------ LR test vs. linear model: chibar2(01) = 6.47 Prob >= chibar2 = 0.0055 . display 2*sqrt(2)*sqrt(.138) /* Repeatability */ 1.050714 . display 2*sqrt(2)*sqrt(.8522501+.138) /* Reproducibility */ 2.8146049 . . * Lab example, full data analysis . anova phenol Lab dilu / Lab#dilu /* note specification */ Number of obs = 30 R-squared = 0.9927 Root MSE = .401663 Adj R-squared = 0.9859 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 330.50667 14 23.607619 146.33 0.0000 | Lab | 47.699997 4 11.924999 8.59 0.0054 dilu | 271.69867 2 135.84934 97.84 0.0000 Lab#dilu | 11.108 8 1.3885 -----------+---------------------------------------------------- | Residual | 2.4200013 15 .16133342 -----------+---------------------------------------------------- Total | 332.92667 29 11.48023 . generate lab_dil=Lab+10*dilu . mixed phenol i.dilu || lab: || lab_dil:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -34.353272 Iteration 1: log restricted-likelihood = -34.353272 Computing standard errors: Mixed-effects REML regression Number of obs = 30 ------------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+-------------------------------------------- lab | 5 6 6.0 6 lab_dil | 15 2 2.0 2 ------------------------------------------------------------- Wald chi2(2) = 195.68 Log restricted-likelihood = -34.353272 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ phenol | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dilu | 2 | 4.26 .5269725 8.08 0.000 3.227153 5.292847 3 | 7.34 .5269725 13.93 0.000 6.307153 8.372847 | _cons | 6.7 .7000478 9.57 0.000 5.327931 8.072069 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ lab: Identity | var(_cons) | 1.756085 1.410132 .3639426 8.473409 -----------------------------+------------------------------------------------ lab_dil: Identity | var(_cons) | .6135833 .3483724 .2016458 1.867058 -----------------------------+------------------------------------------------ var(Residual) | .1613334 .0589106 .0788696 .3300191 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(2) = 36.93 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. . display 2*sqrt(2)*sqrt(.1613334) 1.1360753 . display 2*sqrt(2)*sqrt(1.756085+.6135833+.1613334) 4.4997793 . predict pred, fitted . predict stdres, rstandard . scatter stdres pred . qnorm stdres . predict pred_ref1 pred_ref2, reffects . bysort lab: gen lab_n=_n . qnorm pred_ref1 if lab_n==1 . swilk pred_ref1 if lab_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ pred_ref1 | 5 0.91709 0.979 -0.028 0.51137 . bysort lab dilu: gen labdilu_n=_n . qnorm pred_ref2 if labdilu_n==1 . swilk pred_ref2 if labdilu_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ pred_ref2 | 15 0.93024 1.353 0.597 0.27511 . . * Pig breeding example . import delimited add9ex4_1.csv, clear (3 vars, 20 obs) . anova wgain sire / dam|sire /* note specification */ Number of obs = 20 R-squared = 0.6315 Root MSE = .196723 Adj R-squared = 0.2999 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | .66327996 9 .07369777 1.90 0.1649 | sire | .09973005 4 .02493251 0.22 0.9155 dam|sire | .56354991 5 .11270998 -----------+---------------------------------------------------- | Residual | .38700002 10 .0387 -----------+---------------------------------------------------- Total | 1.05028 19 .05527789 . generate damid=sire*10+dam . mixed wgain i.sire || damid:, reml Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -3.0333619 Iteration 1: log restricted-likelihood = -3.0328927 Iteration 2: log restricted-likelihood = -3.0328926 Computing standard errors: Mixed-effects REML regression Number of obs = 20 Group variable: damid Number of groups = 10 Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(4) = 0.88 Log restricted-likelihood = -3.0328926 Prob > chi2 = 0.9267 ------------------------------------------------------------------------------ wgain | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- sire | 2 | -.1375 .2373921 -0.58 0.562 -.6027799 .3277798 3 | -.035 .2373921 -0.15 0.883 -.5002799 .4302799 4 | -.1975001 .2373921 -0.83 0.405 -.6627799 .2677798 5 | -.0975 .2373921 -0.41 0.681 -.5627799 .3677799 | _cons | 2.6675 .1678615 15.89 0.000 2.338497 2.996503 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ damid: Identity | var(_cons) | .037005 .0366775 .0053039 .2581836 -----------------------------+------------------------------------------------ var(Residual) | .0387 .0173072 .016108 .0929779 ------------------------------------------------------------------------------ LR test vs. linear model: chibar2(01) = 2.05 Prob >= chibar2 = 0.0760 . predict pred, fitted . predict stdres, rstandard . scatter stdres pred . qnorm stdres . predict pred_ref, reffects . bysort damid: gen within_n=_n . qnorm pred_ref if within_n==1 . swilk pred_ref if within_n==1 Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ pred_ref | 10 0.96150 0.593 -0.852 0.80287 .