. * do-file for lecture 9 of VHM 802, Winter 2016
. version 14  /* works also with version 13 */

. set more off

. cd "h:\vhm\vhm802\data_csv"
h:\vhm\vhm802\data_csv

. 
. * Floral scent example
. import delimited florallong.csv, clear
(4 vars, 42 obs)

. encode tx, gen(Tx)

. anova time id per Tx

                         Number of obs =         42    R-squared     =  0.8970
                         Root MSE      =    6.46827    Adj R-squared =  0.7778

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  6923.9317         22   314.72417      7.52  0.0000
                         |
                      id |  6135.3106         20   306.76553      7.33  0.0000
                     per |  779.00183          1   779.00183     18.62  0.0004
                      Tx |  3.1290194          1   3.1290194      0.07  0.7874
                         |
                Residual |  794.93171         19   41.838511  
              -----------+----------------------------------------------------
                   Total |  7718.8634         41   188.26496  

. 
. * Example 13.12
. import delimited ch13ta4.csv, clear
(9 vars, 54 obs)

. xi: boxcox yield i.tx i.period i.cow
i.tx              _Itx_1-3            (naturally coded; _Itx_1 omitted)
i.period          _Iperiod_1-3        (naturally coded; _Iperiod_1 omitted)
i.cow             _Icow_1-18          (naturally coded; _Icow_1 omitted)
Fitting comparison model

Iteration 0:   log likelihood = -385.98591  
Iteration 1:   log likelihood = -384.22867  
Iteration 2:   log likelihood = -384.22712  
Iteration 3:   log likelihood = -384.22712  

Fitting full model

Iteration 0:   log likelihood = -314.69985  
Iteration 1:   log likelihood = -309.99214  
Iteration 2:   log likelihood = -309.52468  
Iteration 3:   log likelihood = -309.52451  
Iteration 4:   log likelihood = -309.52451  

                                                  Number of obs   =         54
                                                  LR chi2(21)     =     149.41
Log likelihood = -309.52451                       Prob > chi2     =      0.000
 
------------------------------------------------------------------------------
       yield |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      /theta |  -.0033826   .3020123    -0.01   0.991    -.5953159    .5885506
------------------------------------------------------------------------------
 
Estimates of scale-variant parameters
----------------------------
             |      Coef.
-------------+--------------
Notrans      |
      _Itx_2 |   .1283452
      _Itx_3 |     .20625
  _Iperiod_2 |  -.1368166
  _Iperiod_3 |  -.3236702
     _Icow_2 |   .3283877
     _Icow_3 |   .2962175
     _Icow_4 |     .28772
     _Icow_5 |     .15384
     _Icow_6 |   .1675437
     _Icow_7 |    .179774
     _Icow_8 |   .1756407
     _Icow_9 |   .1328141
    _Icow_10 |   .1065849
    _Icow_11 |   .1235255
    _Icow_12 |   .0969087
    _Icow_13 |   .0610728
    _Icow_14 |  -.0698756
    _Icow_15 |   .0134546
    _Icow_16 |  -.0703826
    _Icow_17 |  -.1146751
    _Icow_18 |  -.0336909
       _cons |   7.090896
-------------+--------------
      /sigma |   .0523218
----------------------------

---------------------------------------------------------
   Test         Restricted     LR statistic      P-value
    H0:       log likelihood       chi2       Prob > chi2
---------------------------------------------------------
theta = -1      -314.64566        10.24           0.001
theta =  0      -309.52458         0.00           0.991
theta =  1      -314.69985        10.35           0.001
---------------------------------------------------------

. gen lnyield=ln(yield)

. anova lnyield tx period cow

                         Number of obs =         54    R-squared     =  0.9372
                         Root MSE      =    .069653    Adj R-squared =  0.8959

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |   2.315333         21   .11025395     22.73  0.0000
                         |
                      tx |  .40999477          2   .20499738     42.25  0.0000
                  period |  .99806797          2   .49903398    102.86  0.0000
                     cow |  .90727032         17   .05336884     11.00  0.0000
                         |
                Residual |  .15524921         32   .00485154  
              -----------+----------------------------------------------------
                   Total |  2.4705823         53   .04661476  

. * splitting cow effects into squares and cows within squares
. anova lnyield square tx period cow|square

                         Number of obs =         54    R-squared     =  0.9372
                         Root MSE      =    .069653    Adj R-squared =  0.8959

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |   2.315333         21   .11025395     22.73  0.0000
                         |
                  square |  .62302927          5   .12460585     25.68  0.0000
                      tx |  .40999477          2   .20499738     42.25  0.0000
                  period |  .99806797          2   .49903398    102.86  0.0000
              cow|square |  .28424105         12   .02368675      4.88  0.0002
                         |
                Residual |  .15524921         32   .00485154  
              -----------+----------------------------------------------------
                   Total |  2.4705823         53   .04661476  

. anova lnyield square tx period cow /* same model fit but wrong SS for squares */

                         Number of obs =         54    R-squared     =  0.9372
                         Root MSE      =    .069653    Adj R-squared =  0.8959

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |   2.315333         21   .11025395     22.73  0.0000
                         |
                  square |  .10313425          5   .02062685      4.25  0.0045
                      tx |  .40999477          2   .20499738     42.25  0.0000
                  period |  .99806797          2   .49903398    102.86  0.0000
                     cow |  .28424105         12   .02368675      4.88  0.0002
                         |
                Residual |  .15524921         32   .00485154  
              -----------+----------------------------------------------------
                   Total |  2.4705823         53   .04661476  

. anova lnyield tx period cow square /* square effects empty */

                         Number of obs =         54    R-squared     =  0.9372
                         Root MSE      =    .069653    Adj R-squared =  0.8959

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |   2.315333         21   .11025395     22.73  0.0000
                         |
                      tx |  .40999477          2   .20499738     42.25  0.0000
                  period |  .99806797          2   .49903398    102.86  0.0000
                     cow |  .90727032         17   .05336884     11.00  0.0000
                  square |          0          0
                         |
                Residual |  .15524921         32   .00485154  
              -----------+----------------------------------------------------
                   Total |  2.4705823         53   .04661476  

. * period effects nested in squares
. anova lnyield square tx period|square cow|square

                         Number of obs =         54    R-squared     =  0.9584
                         Root MSE      =     .06831    Adj R-squared =  0.8999

                  Source | Partial SS         df         MS        F    Prob>F
           --------------+----------------------------------------------------
                   Model |  2.3679239         31   .07638464     16.37  0.0000
                         |
                  square |  .62302927          5   .12460585     26.70  0.0000
                      tx |  .40999477          2   .20499738     43.93  0.0000
           period|square |  1.0506588         12    .0875549     18.76  0.0000
              cow|square |  .28424105         12   .02368675      5.08  0.0005
                         |
                Residual |   .1026584         22   .00466629  
           --------------+----------------------------------------------------
                   Total |  2.4705823         53   .04661476  

. anova lnyield square tx period##square cow|square

                         Number of obs =         54    R-squared     =  0.9584
                         Root MSE      =     .06831    Adj R-squared =  0.8999

                  Source | Partial SS         df         MS        F    Prob>F
           --------------+----------------------------------------------------
                   Model |  2.3679239         31   .07638464     16.37  0.0000
                         |
                  square |  .62302927          5   .12460585     26.70  0.0000
                      tx |  .40999477          2   .20499738     43.93  0.0000
                  period |  .99806797          2   .49903398    106.94  0.0000
           period#square |  .05259081         10   .00525908      1.13  0.3868
              cow|square |  .28424105         12   .02368675      5.08  0.0005
                         |
                Residual |   .1026584         22   .00466629  
           --------------+----------------------------------------------------
                   Total |  2.4705823         53   .04661476  

. 
. * Carton example
. import delimited ch11ex2.csv, clear
(4 vars, 400 obs)

. keep if glue==1 & operator==1
(380 observations deleted)

. anova strength machine 

                         Number of obs =         20    R-squared     =  0.6296
                         Root MSE      =    3.85669    Adj R-squared =  0.2963

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  252.87771          9   28.097523      1.89  0.1679
                         |
                 machine |  252.87771          9   28.097523      1.89  0.1679
                         |
                Residual |  148.74058         10   14.874058  
              -----------+----------------------------------------------------
                   Total |  401.61828         19   21.137804  

. * likelihood-based analysis, with diagnostics
. mixed strength || machine:, reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -56.967725  
Iteration 1:   log restricted-likelihood = -56.966358  
Iteration 2:   log restricted-likelihood = -56.966358  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =         20
Group variable: machine                         Number of groups  =         10

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2

                                                Wald chi2(0)      =          .
Log restricted-likelihood = -56.966358          Prob > chi2       =          .

------------------------------------------------------------------------------
    strength |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |    156.743   1.185275   132.24   0.000     154.4199    159.0661
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
machine: Identity            |
                  var(_cons) |   6.611732   7.410896      .7349005    59.48425
-----------------------------+------------------------------------------------
               var(Residual) |   14.87406   6.651881      6.190996    35.73538
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 0.95          Prob >= chibar2 = 0.1645

. predict pred, fitted

. predict stdres, rstandard

. scatter stdres pred

. qnorm stdres

. predict pred_ref, reffects

. bysort machine: generate within_n=_n

. qnorm pred_ref if within_n==1

. swilk pred_ref if within_n==1

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
    pred_ref |         10    0.96555      0.531    -1.023    0.84677

. 
. * Lab example, dilution 1
. import delimited add9ex1_2.csv, clear
(3 vars, 30 obs)

. encode lab, gen(Lab)

. anova phenol Lab if dilu==1

                         Number of obs =         10    R-squared     =  0.9144
                         Root MSE      =    .371484    Adj R-squared =  0.8459

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  7.3700002          4   1.8425001     13.35  0.0070
                         |
                     Lab |  7.3700002          4   1.8425001     13.35  0.0070
                         |
                Residual |  .69000011          5   .13800002  
              -----------+----------------------------------------------------
                   Total |  8.0600004          9    .8955556  

. * likelihood-based analysis
. mixed phenol || lab: if dilu==1, reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -10.192733  
Iteration 1:   log restricted-likelihood = -10.192733  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =         10
Group variable: lab                             Number of groups  =          5

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2

                                                Wald chi2(0)      =          .
Log restricted-likelihood = -10.192733          Prob > chi2       =          .

------------------------------------------------------------------------------
      phenol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |        6.7   .4292435    15.61   0.000     5.858698    7.541302
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
lab: Identity                |
                  var(_cons) |   .8522501   .6528823      .1898838    3.825129
-----------------------------+------------------------------------------------
               var(Residual) |       .138   .0872789      .0399514    .4766793
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 6.47          Prob >= chibar2 = 0.0055

. display 2*sqrt(2)*sqrt(.138)           /* Repeatability */
1.050714

. display 2*sqrt(2)*sqrt(.8522501+.138)  /* Reproducibility */
2.8146049

. 
. * Lab example, full data analysis
. anova phenol Lab dilu / Lab#dilu  /* note specification */

                         Number of obs =         30    R-squared     =  0.9927
                         Root MSE      =    .401663    Adj R-squared =  0.9859

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  330.50667         14   23.607619    146.33  0.0000
                         |
                     Lab |  47.699997          4   11.924999      8.59  0.0054
                    dilu |  271.69867          2   135.84934     97.84  0.0000
                Lab#dilu |     11.108          8      1.3885  
              -----------+----------------------------------------------------
                         |
                Residual |  2.4200013         15   .16133342  
              -----------+----------------------------------------------------
                   Total |  332.92667         29    11.48023  

. generate lab_dil=Lab+10*dilu

. mixed phenol i.dilu || lab: || lab_dil:, reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -34.353272  
Iteration 1:   log restricted-likelihood = -34.353272  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =         30

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
            lab |          5          6        6.0          6
        lab_dil |         15          2        2.0          2
-------------------------------------------------------------

                                                Wald chi2(2)      =     195.68
Log restricted-likelihood = -34.353272          Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      phenol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        dilu |
          2  |       4.26   .5269725     8.08   0.000     3.227153    5.292847
          3  |       7.34   .5269725    13.93   0.000     6.307153    8.372847
             |
       _cons |        6.7   .7000478     9.57   0.000     5.327931    8.072069
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
lab: Identity                |
                  var(_cons) |   1.756085   1.410132      .3639426    8.473409
-----------------------------+------------------------------------------------
lab_dil: Identity            |
                  var(_cons) |   .6135833   .3483724      .2016458    1.867058
-----------------------------+------------------------------------------------
               var(Residual) |   .1613334   .0589106      .0788696    .3300191
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 36.93                 Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. display 2*sqrt(2)*sqrt(.1613334)
1.1360753

. display 2*sqrt(2)*sqrt(1.756085+.6135833+.1613334)
4.4997793

. predict pred, fitted

. predict stdres, rstandard

. scatter stdres pred

. qnorm stdres

. predict pred_ref1 pred_ref2, reffects

. bysort lab: gen lab_n=_n

. qnorm pred_ref1 if lab_n==1

. swilk pred_ref1 if lab_n==1

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
   pred_ref1 |          5    0.91709      0.979    -0.028    0.51137

. bysort lab dilu: gen labdilu_n=_n

. qnorm pred_ref2 if labdilu_n==1

. swilk pred_ref2 if labdilu_n==1

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
   pred_ref2 |         15    0.93024      1.353     0.597    0.27511

. 
. * Pig breeding example
. import delimited add9ex4_1.csv, clear
(3 vars, 20 obs)

. anova wgain sire / dam|sire /* note specification */

                         Number of obs =         20    R-squared     =  0.6315
                         Root MSE      =    .196723    Adj R-squared =  0.2999

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  .66327996          9   .07369777      1.90  0.1649
                         |
                    sire |  .09973005          4   .02493251      0.22  0.9155
                dam|sire |  .56354991          5   .11270998  
              -----------+----------------------------------------------------
                         |
                Residual |  .38700002         10       .0387  
              -----------+----------------------------------------------------
                   Total |    1.05028         19   .05527789  

. generate damid=sire*10+dam

. mixed wgain i.sire || damid:, reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -3.0333619  
Iteration 1:   log restricted-likelihood = -3.0328927  
Iteration 2:   log restricted-likelihood = -3.0328926  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =         20
Group variable: damid                           Number of groups  =         10

                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2

                                                Wald chi2(4)      =       0.88
Log restricted-likelihood = -3.0328926          Prob > chi2       =     0.9267

------------------------------------------------------------------------------
       wgain |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        sire |
          2  |     -.1375   .2373921    -0.58   0.562    -.6027799    .3277798
          3  |      -.035   .2373921    -0.15   0.883    -.5002799    .4302799
          4  |  -.1975001   .2373921    -0.83   0.405    -.6627799    .2677798
          5  |     -.0975   .2373921    -0.41   0.681    -.5627799    .3677799
             |
       _cons |     2.6675   .1678615    15.89   0.000     2.338497    2.996503
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
damid: Identity              |
                  var(_cons) |    .037005   .0366775      .0053039    .2581836
-----------------------------+------------------------------------------------
               var(Residual) |      .0387   .0173072       .016108    .0929779
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 2.05          Prob >= chibar2 = 0.0760

. predict pred, fitted

. predict stdres, rstandard

. scatter stdres pred

. qnorm stdres

. predict pred_ref, reffects

. bysort damid: gen within_n=_n

. qnorm pred_ref if within_n==1

. swilk pred_ref if within_n==1

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
    pred_ref |         10    0.96150      0.593    -0.852    0.80287

.