. * do-file for part of lecture 11a of VHM 802/812, Winter 2016
. version 14 /* works also with version 13 */

. set more off

. cd "h:\vhm\vhm802\data_stata"
h:\vhm\vhm802\data_stata

. 
. * 2-level somatic cell count data
. use scc40_2level, clear

. mixed t_lnscc h_size c_heifer i.t_season t_dim || herdid:, reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -3624.9622  
Iteration 1:   log restricted-likelihood = -3624.9622  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =      2,178
Group variable: herdid                          Number of groups  =         40

                                                Obs per group:
                                                              min =         12
                                                              avg =       54.5
                                                              max =        105

                                                Wald chi2(6)      =     244.36
Log restricted-likelihood = -3624.9622          Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     t_lnscc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      h_size |   .0040837   .0037726     1.08   0.279    -.0033105    .0114778
    c_heifer |  -.7367168   .0554447   -13.29   0.000    -.8453863   -.6280472
             |
    t_season |
    apr-jun  |   .1609431   .0906574     1.78   0.076    -.0167422    .3386285
    jul-sep  |   .0015031   .0863774     0.02   0.986    -.1677935    .1707997
    oct-dec  |   .0014582   .0918936     0.02   0.987    -.1786499    .1815663
             |
       t_dim |   .0027731   .0004991     5.56   0.000     .0017949    .0037513
       _cons |   4.641202   .1974215    23.51   0.000     4.254263    5.028141
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
herdid: Identity             |
                  var(_cons) |   .1491533   .0436191      .0840821    .2645832
-----------------------------+------------------------------------------------
               var(Residual) |   1.557228   .0477206      1.466451    1.653625
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 97.01         Prob >= chibar2 = 0.0000

. mixed t_lnscc h_size c_heifer i.t_season t_dim || herdid:, reml stddev

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -3624.9622  
Iteration 1:   log restricted-likelihood = -3624.9622  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =      2,178
Group variable: herdid                          Number of groups  =         40

                                                Obs per group:
                                                              min =         12
                                                              avg =       54.5
                                                              max =        105

                                                Wald chi2(6)      =     244.36
Log restricted-likelihood = -3624.9622          Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     t_lnscc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      h_size |   .0040837   .0037726     1.08   0.279    -.0033105    .0114778
    c_heifer |  -.7367168   .0554447   -13.29   0.000    -.8453863   -.6280472
             |
    t_season |
    apr-jun  |   .1609431   .0906574     1.78   0.076    -.0167422    .3386285
    jul-sep  |   .0015031   .0863774     0.02   0.986    -.1677935    .1707997
    oct-dec  |   .0014582   .0918936     0.02   0.987    -.1786499    .1815663
             |
       t_dim |   .0027731   .0004991     5.56   0.000     .0017949    .0037513
       _cons |   4.641202   .1974215    23.51   0.000     4.254263    5.028141
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
herdid: Identity             |
                   sd(_cons) |   .3862037   .0564716      .2899691    .5143765
-----------------------------+------------------------------------------------
                sd(Residual) |   1.247889   .0191205      1.210971    1.285933
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 97.01         Prob >= chibar2 = 0.0000

. di 1.96*sqrt(.1491533)
.75695926

. di .1491533/(.1491533+1.557228)  /* ICC and VPC */
.08740913

. testparm i.t_season  /* Wald test for season */

 ( 1)  [t_lnscc]2.t_season = 0
 ( 2)  [t_lnscc]3.t_season = 0
 ( 3)  [t_lnscc]4.t_season = 0

           chi2(  3) =    6.21
         Prob > chi2 =    0.1017

. mixed t_lnscc || herdid:, reml /* "null" model ~ no predictors */

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -3722.5104  
Iteration 1:   log restricted-likelihood = -3722.5104  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =      2,178
Group variable: herdid                          Number of groups  =         40

                                                Obs per group:
                                                              min =         12
                                                              avg =       54.5
                                                              max =        105

                                                Wald chi2(0)      =          .
Log restricted-likelihood = -3722.5104          Prob > chi2       =          .

------------------------------------------------------------------------------
     t_lnscc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   4.746742   .0679798    69.83   0.000     4.613505     4.87998
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
herdid: Identity             |
                  var(_cons) |   .1482566   .0426236      .0843907    .2604553
-----------------------------+------------------------------------------------
               var(Residual) |   1.730413   .0529421      1.629699    1.837352
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 101.05        Prob >= chibar2 = 0.0000

. 
end of do-file

. do "C:\Users\DEFAUL~1.SID\AppData\Local\Temp\STD00000000.tmp"

. * Reunion Island data, 3-level model for (natural) log cfs
. use reu_cfs, clear

. mixed lncfs i.heifer || herd: || cow:, reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -1500.2224  
Iteration 1:   log restricted-likelihood = -1495.8919  
Iteration 2:   log restricted-likelihood = -1495.8804  
Iteration 3:   log restricted-likelihood = -1495.8804  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =      3,027

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           herd |         50         13       60.5        226
            cow |      1,575          1        1.9          5
-------------------------------------------------------------

                                                Wald chi2(1)      =       0.12
Log restricted-likelihood = -1495.8804          Prob > chi2       =     0.7341

------------------------------------------------------------------------------
       lncfs |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      heifer |
primiparous  |  -.0058409    .017194    -0.34   0.734    -.0395405    .0278587
       _cons |    4.21903   .0195052   216.30   0.000     4.180801     4.25726
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
herd: Identity               |
                  var(_cons) |   .0145193   .0036025      .0089279    .0236125
-----------------------------+------------------------------------------------
cow: Identity                |
                  var(_cons) |   .0200572   .0040814      .0134606    .0298868
-----------------------------+------------------------------------------------
               var(Residual) |   .1341146   .0048118      .1250077     .143885
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 221.39                Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. * ICC: same herd 
. di .0145193/(.0145193+.0200572+.1341146)
.08607034

. * ICC: same cow 
. di (.0145193+.0200572)/(.0145193+.0200572+.1341146) 
.20496932

. * ICC calculation by Stata
. estat icc

Residual intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
                        herd |   .0860701   .0196752      .0545356    .1332689
                    cow|herd |   .2049693   .0285332      .1546348    .2665226
------------------------------------------------------------------------------

. * model checking
. predict res_lact, rstandard

. summarize res_lact, d

                   Standardized residuals
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -1.902202      -2.385942
 5%    -1.376046      -2.376865
10%    -1.118899       -2.24191       Obs               3,027
25%    -.6474585      -2.235393       Sum of Wgt.       3,027

50%    -.0968802                      Mean          -1.91e-10
                        Largest       Std. Dev.      .9372907
75%     .5640011       3.537244
90%     1.182198       3.713121       Variance       .8785139
95%     1.670863       4.115439       Skewness       .6035033
99%     2.719746       4.525204       Kurtosis       3.792215

. qnorm res_lact

. histogram res_lact
(bin=34, start=-2.3859422, width=.20326901)

. swilk res_lact

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
    res_lact |      3,027    0.98077     33.204     9.042    0.00000

. predict fitted, fit /* includes all random effects */

. scatter res_lact fitted

. * predicted random effects at higher levels
. predict ref*, reffects

. bysort cow: generate within_c=_n

. bysort herd: generate within_h=_n

. * herd-level random effects
. summarize ref1 if within_h==1, d

                  BLUP r.e. for herd: _cons
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -.2785034      -.2785034
 5%    -.1536851       -.183448
10%    -.1307647      -.1536851       Obs                  50
25%     -.079566      -.1500316       Sum of Wgt.          50

50%    -.0008414                      Mean           1.12e-10
                        Largest       Std. Dev.      .1074885
75%     .0726699       .1262656
90%     .1181743       .1290758       Variance       .0115538
95%     .1290758       .2022623       Skewness       .2345923
99%     .3407647       .3407647       Kurtosis       4.076068

. qnorm ref1 if within_h==1

. swilk ref1 if within_h==1

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
        ref1 |         50    0.97805      1.032     0.068    0.47286

. * cow-level random effects
. summarize ref2 if within_c==1, d

                  BLUP r.e. for cow: _cons
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -.1376604      -.1763716
 5%    -.1015879      -.1629009
10%     -.078711       -.160541       Obs               1,575
25%    -.0428026      -.1563797       Sum of Wgt.       1,575

50%    -.0046796                      Mean           4.10e-11
                        Largest       Std. Dev.      .0646133
75%     .0381531       .2430278
90%     .0797535       .2502964       Variance       .0041749
95%     .1140806       .2507932       Skewness       .4459166
99%     .1757991       .2671055       Kurtosis       3.654445

. qnorm ref2 if within_c==1

. swilk ref2 if within_c==1

                   Shapiro-Wilk W test for normal data

    Variable |        Obs       W           V         z       Prob>z
-------------+------------------------------------------------------
        ref2 |      1,575    0.98805     11.402     6.136    0.00000