. * do-file for lecture 10 of VHM 802, Winter 2023
. version 17 /* works also with versions 14-16 */

. set more off

. cd "r:\"
r:\

. 
. * Manly Example 4.1
. import delimited r:\sparrow.csv, clear
(encoding automatically selected: ISO-8859-1)
(6 vars, 49 obs)

. encode survivorship, gen(surv)

. foreach var of varlist total_length-l_keel_sternum {
  2.   ttest `var', by(surv) unequal /* unequal variances are preferable, though not the version used in Manly */
  3.   robvar `var', by(surv) /* Levene's test W50 for equal variances */
  4.   }

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
      NS |      28    158.4286    .7336013    3.881853    156.9233    159.9338
       S |      21     157.381    .7253117    3.323796     155.868    158.8939
---------+--------------------------------------------------------------------
Combined |      49    157.9796    .5220396    3.654277      156.93    159.0292
---------+--------------------------------------------------------------------
    diff |            1.047619    1.031624               -1.028801     3.12404
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   1.0155
H0: diff = 0                     Satterthwaite's degrees of freedom =  46.1076

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.8424         Pr(|T| > |t|) = 0.3152          Pr(T > t) = 0.1576

Survivorshi |       Summary of Total_length
          p |        Mean   Std. dev.       Freq.
------------+------------------------------------
         NS |   158.42857    3.881853          28
          S |   157.38095   3.3237959          21
------------+------------------------------------
      Total |   157.97959   3.6542772          49

W0  =  1.8734718   df(1, 47)     Pr > F = 0.17758478

W50 =  1.4470443   df(1, 47)     Pr > F = 0.23502763

W10 =  1.9971389   df(1, 47)     Pr > F = 0.1641853

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
      NS |      28    241.5714    1.078197    5.705284    239.3592    243.7837
       S |      21         241    .9128709      4.1833    239.0958    242.9042
---------+--------------------------------------------------------------------
Combined |      49    241.3265    .7239746    5.067822    239.8709    242.7822
---------+--------------------------------------------------------------------
    diff |            .5714286    1.412743               -2.270663    3.413521
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   0.4045
H0: diff = 0                     Satterthwaite's degrees of freedom =  46.9877

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.6562         Pr(|T| > |t|) = 0.6877          Pr(T > t) = 0.3438

Survivorshi |       Summary of Alar_extent
          p |        Mean   Std. dev.       Freq.
------------+------------------------------------
         NS |   241.57143   5.7052839          28
          S |         241   4.1833001          21
------------+------------------------------------
      Total |   241.32653   5.0678223          49

W0  =  1.4064059   df(1, 47)     Pr > F = 0.24161486

W50 =  1.4029851   df(1, 47)     Pr > F = 0.24217978

W10 =  1.4446477   df(1, 47)     Pr > F = 0.23540988

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
      NS |      28    31.47857    .1612908    .8534709    31.14763    31.80951
       S |      21    31.43333    .1590647    .7289262    31.10153    31.76514
---------+--------------------------------------------------------------------
Combined |      49    31.45918    .1135362    .7947532     31.2309    31.68746
---------+--------------------------------------------------------------------
    diff |            .0452381    .2265311               -.4107081    .5011843
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   0.1997
H0: diff = 0                     Satterthwaite's degrees of freedom =  46.1395

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.5787         Pr(|T| > |t|) = 0.8426          Pr(T > t) = 0.4213

Survivorshi |       Summary of L_beak_head
          p |        Mean   Std. dev.       Freq.
------------+------------------------------------
         NS |   31.478571   .85347093          28
          S |   31.433333   .72892623          21
------------+------------------------------------
      Total |   31.459184   .79475321          49

W0  =  0.67188596   df(1, 47)     Pr > F = 0.41653233

W50 =  0.66377265   df(1, 47)     Pr > F = 0.4193404

W10 =  0.77479963   df(1, 47)     Pr > F = 0.38321515

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
      NS |      28    18.44643    .1245608    .6591139    18.19085    18.70201
       S |      21        18.5    .0915475    .4195234    18.30904    18.69096
---------+--------------------------------------------------------------------
Combined |      49    18.46939    .0806122    .5642856    18.30731    18.63147
---------+--------------------------------------------------------------------
    diff |           -.0535715    .1545844               -.3647433    .2576003
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =  -0.3466
H0: diff = 0                     Satterthwaite's degrees of freedom =   45.948

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.3653         Pr(|T| > |t|) = 0.7305          Pr(T > t) = 0.6347

Survivorshi |        Summary of L_humerous
          p |        Mean   Std. dev.       Freq.
------------+------------------------------------
         NS |   18.446429   .65911387          28
          S |        18.5   .41952343          21
------------+------------------------------------
      Total |   18.469388   .56428562          49

W0  =  3.9250944   df(1, 47)     Pr > F = 0.0534381

W50 =  3.6558491   df(1, 47)     Pr > F = 0.06197919

W10 =  3.8783962   df(1, 47)     Pr > F = 0.05482183

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
      NS |      28    20.83929    .2172057    1.149344    20.39362    21.28495
       S |      21    20.80952    .1654582    .7582247    20.46438    21.15466
---------+--------------------------------------------------------------------
Combined |      49    20.82653    .1416249    .9913744    20.54177    21.11129
---------+--------------------------------------------------------------------
    diff |            .0297618    .2730471                 -.51974    .5792636
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   0.1090
H0: diff = 0                     Satterthwaite's degrees of freedom =  46.3548

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.5432         Pr(|T| > |t|) = 0.9137          Pr(T > t) = 0.4568

Survivorshi |      Summary of L_keel_sternum
          p |        Mean   Std. dev.       Freq.
------------+------------------------------------
         NS |   20.839286   1.1493443          28
          S |   20.809524    .7582247          21
------------+------------------------------------
      Total |   20.826531   .99137445          49

W0  =  2.3376228   df(1, 47)     Pr > F = 0.13298511

W50 =  1.9840519   df(1, 47)     Pr > F = 0.16554548

W10 =  2.3216450   df(1, 47)     Pr > F = 0.13428598

. hotelling total_length-l_keel_sternum, by(surv)

-----------------------------------------------------------------------------------------------------------------
-> surv = NS

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
total_length |         28    158.4286    3.881853        152        165
 alar_extent |         28    241.5714    5.705284        230        252
 l_beak_head |         28    31.47857    .8534709       30.1       33.4
  l_humerous |         28    18.44643    .6591139       17.2       19.8
l_keel_ste~m |         28    20.83929    1.149344       18.6       23.1

-----------------------------------------------------------------------------------------------------------------
-> surv = S

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
total_length |         21     157.381    3.323796        153        164
 alar_extent |         21         241      4.1833        235        248
 l_beak_head |         21    31.43333    .7289262       30.3       32.8
  l_humerous |         21        18.5    .4195234       17.7       19.3
l_keel_ste~m |         21    20.80952    .7582247       19.6         22


2-group Hotelling's T-squared = 2.823701
F test statistic: ((49-5-1)/(49-2)(5)) x 2.823701 = .5166772

H0: Vectors of means are equal for the two groups
              F(5,43) =    0.5167
       Prob > F(5,43) =    0.7622

. manova total_length-l_keel_sternum = surv /* same result as hotelling's T2 */

                       Number of obs =         49

                       W = Wilks' lambda      L = Lawley-Hotelling trace
                       P = Pillai's trace     R = Roy's largest root

              Source | Statistic        df    F(df1,     df2) =   F   Prob>F
          -----------+-------------------------------------------------------
                surv |W   0.9433         1      5.0     43.0     0.52 0.7622 e
                     |P   0.0567                5.0     43.0     0.52 0.7622 e
                     |L   0.0601                5.0     43.0     0.52 0.7622 e
                     |R   0.0601                5.0     43.0     0.52 0.7622 e
                     |-------------------------------------------------------
            Residual |                  47
          -----------+-------------------------------------------------------
               Total |                  48
          -------------------------------------------------------------------
                       e = exact, a = approximate, u = upper bound on F

. mvreg

Equation             Obs   Parms        RMSE    "R-sq"          F      P>F
--------------------------------------------------------------------------
total_length          49       2    3.654812    0.0205    .985957   0.3258
alar_extent           49       2    5.113306    0.0032   .1498655   0.7004
l_beak_head           49       2    .8028382    0.0008   .0381008   0.8461
l_humerous            49       2    .5696142    0.0023   .1061421   0.7460
l_keel_ste~m          49       2    1.001753    0.0002    .010592   0.9185

--------------------------------------------------------------------------------
               | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
total_length   |
          surv |
            S  |  -1.047619   1.055053    -0.99   0.326    -3.170113    1.074874
         _cons |   158.4286   .6906945   229.38   0.000     157.0391    159.8181
---------------+----------------------------------------------------------------
alar_extent    |
          surv |
            S  |  -.5714286   1.476084    -0.39   0.700    -3.540927     2.39807
         _cons |   241.5714    .966324   249.99   0.000     239.6274    243.5154
---------------+----------------------------------------------------------------
l_beak_head    |
          surv |
            S  |  -.0452381   .2317594    -0.20   0.846    -.5114779    .4210017
         _cons |   31.47857   .1517222   207.48   0.000     31.17335     31.7838
---------------+----------------------------------------------------------------
l_humerous     |
          surv |
            S  |   .0535715   .1644335     0.33   0.746    -.2772259     .384369
         _cons |   18.44643    .107647   171.36   0.000     18.22987    18.66299
---------------+----------------------------------------------------------------
l_keel_sternum |
          surv |
            S  |  -.0297618   .2891811    -0.10   0.918    -.6115191    .5519954
         _cons |   20.83929   .1893134   110.08   0.000     20.45844    21.22014
--------------------------------------------------------------------------------

. matrix errorsscp=e(E)

. matrix corr=corr(errorsscp)

. matrix list corr

symmetric corr[5,5]
              total_length   alar_extent   l_beak_head    l_humerous  l_keel_ste~m
total_length             1
 alar_extent     .73563757             1
 l_beak_head     .66486471     .67348012             1
  l_humerous     .65963604     .77328517     .76571386             1
l_keel_ste~m     .60933337     .52906854     .52611511     .60811557             1

. mvtest covariances total_length-l_keel_sternum, by(surv) /* Box's M-test */

Test of equality of covariance matrices across 2 samples

      Modified LR chi2 =  11.78584
      Box F(15,7429.4) =      0.69      Prob > F =  0.7948
          Box chi2(15) =     10.41   Prob > chi2 =  0.7933

. 
. * univariate two-sample permutation test
. ttest total_length, by(surv) unequal /* to see the actual analysis */

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
      NS |      28    158.4286    .7336013    3.881853    156.9233    159.9338
       S |      21     157.381    .7253117    3.323796     155.868    158.8939
---------+--------------------------------------------------------------------
Combined |      49    157.9796    .5220396    3.654277      156.93    159.0292
---------+--------------------------------------------------------------------
    diff |            1.047619    1.031624               -1.028801     3.12404
------------------------------------------------------------------------------
    diff = mean(NS) - mean(S)                                     t =   1.0155
H0: diff = 0                     Satterthwaite's degrees of freedom =  46.1076

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.8424         Pr(|T| > |t|) = 0.3152          Pr(T > t) = 0.1576

. permute surv meandiff=(r(mu_1)-r(mu_2)), saving(permdist, replace) reps(1000) seed(210327) nodrop nowarn: ttest
>  total_length, by(surv)
(running ttest on estimation sample)

Permutations (1,000): ..........10..........20..........30..........40..........50..........60..........70.......
> ...80..........90..........100..........110..........120..........130..........140..........150..........160...
> .......170..........180..........190..........200..........210..........220..........230..........240..........
> 250..........260..........270..........280..........290..........300..........310..........320..........330....
> ......340..........350..........360..........370..........380..........390..........400..........410..........4
> 20..........430..........440..........450..........460..........470..........480..........490..........500.....
> .....510..........520..........530..........540..........550..........560..........570..........580..........59
> 0..........600..........610..........620..........630..........640..........650..........660..........670......
> ....680..........690..........700..........710..........720..........730..........740..........750..........760
> ..........770..........780..........790..........800..........810..........820..........830..........840.......
> ...850..........860..........870..........880..........890..........900..........910..........920..........930.
> .........940..........950..........960..........970..........980..........990..........1,000 done

Monte Carlo permutation results                  Number of observations =    49
Permutation variable: surv                       Number of permutations = 1,000

      Command: ttest total_length, by(surv)
     meandiff: r(mu_1)-r(mu_2)

-------------------------------------------------------------------------------
             |                                               Monte Carlo error
             |                                              -------------------
           T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
-------------+-----------------------------------------------------------------
    meandiff |  1.047619      lower     860    1000  .8600  .0110  .8369  .8809
             |                upper     155    1000  .1550  .0114  .1331  .1789
             |            two-sided                  .3100  .0146  .2813  .3387
-------------------------------------------------------------------------------
Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
       For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
       For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.

. preserve

. use permdist, clear
(permute surv : ttest)

. histogram meandiff
(bin=29, start=-3.1190476, width=.2183908)

. sum meandiff, d

                       r(mu_1)-r(mu_2)
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -2.619048      -3.119048
 5%    -1.785714      -3.035714
10%    -1.369048      -2.952381       Obs               1,000
25%     -.702381      -2.869048       Sum of wgt.       1,000

50%    -.0357143                      Mean          -.0272976
                        Largest       Std. dev.      1.027804
75%      .672619       2.797619
90%     1.297619       2.880952       Variance       1.056381
95%     1.630952       2.880952       Skewness      -.0872877
99%     2.297619       3.214286       Kurtosis       3.007328

. restore

. 
. * Manly Example 5.2
. import delimited r:\skull.csv, clear
(encoding automatically selected: ISO-8859-1)
(5 vars, 150 obs)

. encode period, gen(Period)

. discrim lda maximumbreadth-nasalheight, group(Period) notable

. estat grdistances /* Mahalanobis distance matrix, with different group ordering than in Manly's Table 5.4 */

Mahalanobis squared distances between groups

            | Period                
    --------+-----------------------
     group1 | 12th and 13th Dynasty 
     group2 | Early predynastic     
     group3 | Late predynastic      
     group4 | Ptolemaic period      
     group5 | Roman period          

                 |    group1     group2     group3     group4     group5
    -------------+------------------------------------------------------
          group1 |         0                                            
          group2 |  .9030738          0                                 
          group3 |  .7289377   .0910342          0                      
          group4 |   .443113   1.881126   1.594014          0           
          group5 |  .9108716   2.696817   2.175689   .2192851          0

. * semi-manual calculation of Penrose distances: note 10 pairs, 4 variables
. matrix define penrose=(0,0,0,0,0,0,0,0,0,0)

. foreach var of varlist maximumbreadth-nasalheight {
  2.   quietly anova `var' Period
  3.   scalar errorvar=e(rmse)^2
  4.   quietly pwcompare Period
  5.   matrix define penrose=penrose+vecdiag(r(b_vs)'*r(b_vs))/errorvar/4
  6. }

. matrix list penrose

penrose[1,10]
         2vs1.      3vs1.      4vs1.      5vs1.      3vs2.      4vs2.      5vs2.      4vs3.      5vs3.      5vs4.
       Period     Period     Period     Period     Period     Period     Period     Period     Period     Period
r1  .21576853  .16297628  .10812399  .24427289  .02278452  .49286459  .73555643  .40443847  .58265481  .06634178

. 
. * univariate one-way ANOVA permutation test
. anova basibregmaticheight Period /* to see the actual analysis */

                         Number of obs =        150    R-squared     =  0.0632
                         Root MSE      =    4.84609    Adj R-squared =  0.0374

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  229.90667          4   57.476667      2.45  0.0490
                         |
                  Period |  229.90667          4   57.476667      2.45  0.0490
                         |
                Residual |  3405.2667        145   23.484598  
              -----------+----------------------------------------------------
                   Total |  3635.1733        149   24.397136  

. permute Period Fperiod=(e(F_1)), saving(permdist, replace) reps(1000) seed(210327) nodrop nowarn: anova basibre
> gmaticheight Period
(running anova on estimation sample)

Permutations (1,000): ..........10..........20..........30..........40..........50..........60..........70.......
> ...80..........90..........100..........110..........120..........130..........140..........150..........160...
> .......170..........180..........190..........200..........210..........220..........230..........240..........
> 250..........260..........270..........280..........290..........300..........310..........320..........330....
> ......340..........350..........360..........370..........380..........390..........400..........410..........4
> 20..........430..........440..........450..........460..........470..........480..........490..........500.....
> .....510..........520..........530..........540..........550..........560..........570..........580..........59
> 0..........600..........610..........620..........630..........640..........650..........660..........670......
> ....680..........690..........700..........710..........720..........730..........740..........750..........760
> ..........770..........780..........790..........800..........810..........820..........830..........840.......
> ...850..........860..........870..........880..........890..........900..........910..........920..........930.
> .........940..........950..........960..........970..........980..........990..........1,000 done

Monte Carlo permutation results                  Number of observations =   150
Permutation variable: Period                     Number of permutations = 1,000

      Command: anova basibregmaticheight Period
      Fperiod: e(F_1)

-------------------------------------------------------------------------------
             |                                               Monte Carlo error
             |                                              -------------------
           T |    T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]
-------------+-----------------------------------------------------------------
     Fperiod |   2.44742      lower     944    1000  .9440  .0073  .9279  .9574
             |                upper      56    1000  .0560  .0073  .0426  .0721
             |            two-sided                  .1120  .0100  .0925  .1315
-------------------------------------------------------------------------------
Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
       For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
       For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.

.