. * do-file for lecture 10 of VHM 802, Winter 2023 . version 17 /* works also with versions 14-16 */ . set more off . cd "r:\" r:\ . . * Manly Example 4.1 . import delimited r:\sparrow.csv, clear (encoding automatically selected: ISO-8859-1) (6 vars, 49 obs) . encode survivorship, gen(surv) . foreach var of varlist total_length-l_keel_sternum { 2. ttest `var', by(surv) unequal /* unequal variances are preferable, though not the version used in Manly */ 3. robvar `var', by(surv) /* Levene's test W50 for equal variances */ 4. } Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- NS | 28 158.4286 .7336013 3.881853 156.9233 159.9338 S | 21 157.381 .7253117 3.323796 155.868 158.8939 ---------+-------------------------------------------------------------------- Combined | 49 157.9796 .5220396 3.654277 156.93 159.0292 ---------+-------------------------------------------------------------------- diff | 1.047619 1.031624 -1.028801 3.12404 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 1.0155 H0: diff = 0 Satterthwaite's degrees of freedom = 46.1076 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.8424 Pr(|T| > |t|) = 0.3152 Pr(T > t) = 0.1576 Survivorshi | Summary of Total_length p | Mean Std. dev. Freq. ------------+------------------------------------ NS | 158.42857 3.881853 28 S | 157.38095 3.3237959 21 ------------+------------------------------------ Total | 157.97959 3.6542772 49 W0 = 1.8734718 df(1, 47) Pr > F = 0.17758478 W50 = 1.4470443 df(1, 47) Pr > F = 0.23502763 W10 = 1.9971389 df(1, 47) Pr > F = 0.1641853 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- NS | 28 241.5714 1.078197 5.705284 239.3592 243.7837 S | 21 241 .9128709 4.1833 239.0958 242.9042 ---------+-------------------------------------------------------------------- Combined | 49 241.3265 .7239746 5.067822 239.8709 242.7822 ---------+-------------------------------------------------------------------- diff | .5714286 1.412743 -2.270663 3.413521 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 0.4045 H0: diff = 0 Satterthwaite's degrees of freedom = 46.9877 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.6562 Pr(|T| > |t|) = 0.6877 Pr(T > t) = 0.3438 Survivorshi | Summary of Alar_extent p | Mean Std. dev. Freq. ------------+------------------------------------ NS | 241.57143 5.7052839 28 S | 241 4.1833001 21 ------------+------------------------------------ Total | 241.32653 5.0678223 49 W0 = 1.4064059 df(1, 47) Pr > F = 0.24161486 W50 = 1.4029851 df(1, 47) Pr > F = 0.24217978 W10 = 1.4446477 df(1, 47) Pr > F = 0.23540988 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- NS | 28 31.47857 .1612908 .8534709 31.14763 31.80951 S | 21 31.43333 .1590647 .7289262 31.10153 31.76514 ---------+-------------------------------------------------------------------- Combined | 49 31.45918 .1135362 .7947532 31.2309 31.68746 ---------+-------------------------------------------------------------------- diff | .0452381 .2265311 -.4107081 .5011843 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 0.1997 H0: diff = 0 Satterthwaite's degrees of freedom = 46.1395 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.5787 Pr(|T| > |t|) = 0.8426 Pr(T > t) = 0.4213 Survivorshi | Summary of L_beak_head p | Mean Std. dev. Freq. ------------+------------------------------------ NS | 31.478571 .85347093 28 S | 31.433333 .72892623 21 ------------+------------------------------------ Total | 31.459184 .79475321 49 W0 = 0.67188596 df(1, 47) Pr > F = 0.41653233 W50 = 0.66377265 df(1, 47) Pr > F = 0.4193404 W10 = 0.77479963 df(1, 47) Pr > F = 0.38321515 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- NS | 28 18.44643 .1245608 .6591139 18.19085 18.70201 S | 21 18.5 .0915475 .4195234 18.30904 18.69096 ---------+-------------------------------------------------------------------- Combined | 49 18.46939 .0806122 .5642856 18.30731 18.63147 ---------+-------------------------------------------------------------------- diff | -.0535715 .1545844 -.3647433 .2576003 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = -0.3466 H0: diff = 0 Satterthwaite's degrees of freedom = 45.948 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.3653 Pr(|T| > |t|) = 0.7305 Pr(T > t) = 0.6347 Survivorshi | Summary of L_humerous p | Mean Std. dev. Freq. ------------+------------------------------------ NS | 18.446429 .65911387 28 S | 18.5 .41952343 21 ------------+------------------------------------ Total | 18.469388 .56428562 49 W0 = 3.9250944 df(1, 47) Pr > F = 0.0534381 W50 = 3.6558491 df(1, 47) Pr > F = 0.06197919 W10 = 3.8783962 df(1, 47) Pr > F = 0.05482183 Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- NS | 28 20.83929 .2172057 1.149344 20.39362 21.28495 S | 21 20.80952 .1654582 .7582247 20.46438 21.15466 ---------+-------------------------------------------------------------------- Combined | 49 20.82653 .1416249 .9913744 20.54177 21.11129 ---------+-------------------------------------------------------------------- diff | .0297618 .2730471 -.51974 .5792636 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 0.1090 H0: diff = 0 Satterthwaite's degrees of freedom = 46.3548 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.5432 Pr(|T| > |t|) = 0.9137 Pr(T > t) = 0.4568 Survivorshi | Summary of L_keel_sternum p | Mean Std. dev. Freq. ------------+------------------------------------ NS | 20.839286 1.1493443 28 S | 20.809524 .7582247 21 ------------+------------------------------------ Total | 20.826531 .99137445 49 W0 = 2.3376228 df(1, 47) Pr > F = 0.13298511 W50 = 1.9840519 df(1, 47) Pr > F = 0.16554548 W10 = 2.3216450 df(1, 47) Pr > F = 0.13428598 . hotelling total_length-l_keel_sternum, by(surv) ----------------------------------------------------------------------------------------------------------------- -> surv = NS Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- total_length | 28 158.4286 3.881853 152 165 alar_extent | 28 241.5714 5.705284 230 252 l_beak_head | 28 31.47857 .8534709 30.1 33.4 l_humerous | 28 18.44643 .6591139 17.2 19.8 l_keel_ste~m | 28 20.83929 1.149344 18.6 23.1 ----------------------------------------------------------------------------------------------------------------- -> surv = S Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- total_length | 21 157.381 3.323796 153 164 alar_extent | 21 241 4.1833 235 248 l_beak_head | 21 31.43333 .7289262 30.3 32.8 l_humerous | 21 18.5 .4195234 17.7 19.3 l_keel_ste~m | 21 20.80952 .7582247 19.6 22 2-group Hotelling's T-squared = 2.823701 F test statistic: ((49-5-1)/(49-2)(5)) x 2.823701 = .5166772 H0: Vectors of means are equal for the two groups F(5,43) = 0.5167 Prob > F(5,43) = 0.7622 . manova total_length-l_keel_sternum = surv /* same result as hotelling's T2 */ Number of obs = 49 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+------------------------------------------------------- surv |W 0.9433 1 5.0 43.0 0.52 0.7622 e |P 0.0567 5.0 43.0 0.52 0.7622 e |L 0.0601 5.0 43.0 0.52 0.7622 e |R 0.0601 5.0 43.0 0.52 0.7622 e |------------------------------------------------------- Residual | 47 -----------+------------------------------------------------------- Total | 48 ------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . mvreg Equation Obs Parms RMSE "R-sq" F P>F -------------------------------------------------------------------------- total_length 49 2 3.654812 0.0205 .985957 0.3258 alar_extent 49 2 5.113306 0.0032 .1498655 0.7004 l_beak_head 49 2 .8028382 0.0008 .0381008 0.8461 l_humerous 49 2 .5696142 0.0023 .1061421 0.7460 l_keel_ste~m 49 2 1.001753 0.0002 .010592 0.9185 -------------------------------------------------------------------------------- | Coefficient Std. err. t P>|t| [95% conf. interval] ---------------+---------------------------------------------------------------- total_length | surv | S | -1.047619 1.055053 -0.99 0.326 -3.170113 1.074874 _cons | 158.4286 .6906945 229.38 0.000 157.0391 159.8181 ---------------+---------------------------------------------------------------- alar_extent | surv | S | -.5714286 1.476084 -0.39 0.700 -3.540927 2.39807 _cons | 241.5714 .966324 249.99 0.000 239.6274 243.5154 ---------------+---------------------------------------------------------------- l_beak_head | surv | S | -.0452381 .2317594 -0.20 0.846 -.5114779 .4210017 _cons | 31.47857 .1517222 207.48 0.000 31.17335 31.7838 ---------------+---------------------------------------------------------------- l_humerous | surv | S | .0535715 .1644335 0.33 0.746 -.2772259 .384369 _cons | 18.44643 .107647 171.36 0.000 18.22987 18.66299 ---------------+---------------------------------------------------------------- l_keel_sternum | surv | S | -.0297618 .2891811 -0.10 0.918 -.6115191 .5519954 _cons | 20.83929 .1893134 110.08 0.000 20.45844 21.22014 -------------------------------------------------------------------------------- . matrix errorsscp=e(E) . matrix corr=corr(errorsscp) . matrix list corr symmetric corr[5,5] total_length alar_extent l_beak_head l_humerous l_keel_ste~m total_length 1 alar_extent .73563757 1 l_beak_head .66486471 .67348012 1 l_humerous .65963604 .77328517 .76571386 1 l_keel_ste~m .60933337 .52906854 .52611511 .60811557 1 . mvtest covariances total_length-l_keel_sternum, by(surv) /* Box's M-test */ Test of equality of covariance matrices across 2 samples Modified LR chi2 = 11.78584 Box F(15,7429.4) = 0.69 Prob > F = 0.7948 Box chi2(15) = 10.41 Prob > chi2 = 0.7933 . . * univariate two-sample permutation test . ttest total_length, by(surv) unequal /* to see the actual analysis */ Two-sample t test with unequal variances ------------------------------------------------------------------------------ Group | Obs Mean Std. err. Std. dev. [95% conf. interval] ---------+-------------------------------------------------------------------- NS | 28 158.4286 .7336013 3.881853 156.9233 159.9338 S | 21 157.381 .7253117 3.323796 155.868 158.8939 ---------+-------------------------------------------------------------------- Combined | 49 157.9796 .5220396 3.654277 156.93 159.0292 ---------+-------------------------------------------------------------------- diff | 1.047619 1.031624 -1.028801 3.12404 ------------------------------------------------------------------------------ diff = mean(NS) - mean(S) t = 1.0155 H0: diff = 0 Satterthwaite's degrees of freedom = 46.1076 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.8424 Pr(|T| > |t|) = 0.3152 Pr(T > t) = 0.1576 . permute surv meandiff=(r(mu_1)-r(mu_2)), saving(permdist, replace) reps(1000) seed(210327) nodrop nowarn: ttest > total_length, by(surv) (running ttest on estimation sample) Permutations (1,000): ..........10..........20..........30..........40..........50..........60..........70....... > ...80..........90..........100..........110..........120..........130..........140..........150..........160... > .......170..........180..........190..........200..........210..........220..........230..........240.......... > 250..........260..........270..........280..........290..........300..........310..........320..........330.... > ......340..........350..........360..........370..........380..........390..........400..........410..........4 > 20..........430..........440..........450..........460..........470..........480..........490..........500..... > .....510..........520..........530..........540..........550..........560..........570..........580..........59 > 0..........600..........610..........620..........630..........640..........650..........660..........670...... > ....680..........690..........700..........710..........720..........730..........740..........750..........760 > ..........770..........780..........790..........800..........810..........820..........830..........840....... > ...850..........860..........870..........880..........890..........900..........910..........920..........930. > .........940..........950..........960..........970..........980..........990..........1,000 done Monte Carlo permutation results Number of observations = 49 Permutation variable: surv Number of permutations = 1,000 Command: ttest total_length, by(surv) meandiff: r(mu_1)-r(mu_2) ------------------------------------------------------------------------------- | Monte Carlo error | ------------------- T | T(obs) Test c n p SE(p) [95% CI(p)] -------------+----------------------------------------------------------------- meandiff | 1.047619 lower 860 1000 .8600 .0110 .8369 .8809 | upper 155 1000 .1550 .0114 .1331 .1789 | two-sided .3100 .0146 .2813 .3387 ------------------------------------------------------------------------------- Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n. For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n. For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate. . preserve . use permdist, clear (permute surv : ttest) . histogram meandiff (bin=29, start=-3.1190476, width=.2183908) . sum meandiff, d r(mu_1)-r(mu_2) ------------------------------------------------------------- Percentiles Smallest 1% -2.619048 -3.119048 5% -1.785714 -3.035714 10% -1.369048 -2.952381 Obs 1,000 25% -.702381 -2.869048 Sum of wgt. 1,000 50% -.0357143 Mean -.0272976 Largest Std. dev. 1.027804 75% .672619 2.797619 90% 1.297619 2.880952 Variance 1.056381 95% 1.630952 2.880952 Skewness -.0872877 99% 2.297619 3.214286 Kurtosis 3.007328 . restore . . * Manly Example 5.2 . import delimited r:\skull.csv, clear (encoding automatically selected: ISO-8859-1) (5 vars, 150 obs) . encode period, gen(Period) . discrim lda maximumbreadth-nasalheight, group(Period) notable . estat grdistances /* Mahalanobis distance matrix, with different group ordering than in Manly's Table 5.4 */ Mahalanobis squared distances between groups | Period --------+----------------------- group1 | 12th and 13th Dynasty group2 | Early predynastic group3 | Late predynastic group4 | Ptolemaic period group5 | Roman period | group1 group2 group3 group4 group5 -------------+------------------------------------------------------ group1 | 0 group2 | .9030738 0 group3 | .7289377 .0910342 0 group4 | .443113 1.881126 1.594014 0 group5 | .9108716 2.696817 2.175689 .2192851 0 . * semi-manual calculation of Penrose distances: note 10 pairs, 4 variables . matrix define penrose=(0,0,0,0,0,0,0,0,0,0) . foreach var of varlist maximumbreadth-nasalheight { 2. quietly anova `var' Period 3. scalar errorvar=e(rmse)^2 4. quietly pwcompare Period 5. matrix define penrose=penrose+vecdiag(r(b_vs)'*r(b_vs))/errorvar/4 6. } . matrix list penrose penrose[1,10] 2vs1. 3vs1. 4vs1. 5vs1. 3vs2. 4vs2. 5vs2. 4vs3. 5vs3. 5vs4. Period Period Period Period Period Period Period Period Period Period r1 .21576853 .16297628 .10812399 .24427289 .02278452 .49286459 .73555643 .40443847 .58265481 .06634178 . . * univariate one-way ANOVA permutation test . anova basibregmaticheight Period /* to see the actual analysis */ Number of obs = 150 R-squared = 0.0632 Root MSE = 4.84609 Adj R-squared = 0.0374 Source | Partial SS df MS F Prob>F -----------+---------------------------------------------------- Model | 229.90667 4 57.476667 2.45 0.0490 | Period | 229.90667 4 57.476667 2.45 0.0490 | Residual | 3405.2667 145 23.484598 -----------+---------------------------------------------------- Total | 3635.1733 149 24.397136 . permute Period Fperiod=(e(F_1)), saving(permdist, replace) reps(1000) seed(210327) nodrop nowarn: anova basibre > gmaticheight Period (running anova on estimation sample) Permutations (1,000): ..........10..........20..........30..........40..........50..........60..........70....... > ...80..........90..........100..........110..........120..........130..........140..........150..........160... > .......170..........180..........190..........200..........210..........220..........230..........240.......... > 250..........260..........270..........280..........290..........300..........310..........320..........330.... > ......340..........350..........360..........370..........380..........390..........400..........410..........4 > 20..........430..........440..........450..........460..........470..........480..........490..........500..... > .....510..........520..........530..........540..........550..........560..........570..........580..........59 > 0..........600..........610..........620..........630..........640..........650..........660..........670...... > ....680..........690..........700..........710..........720..........730..........740..........750..........760 > ..........770..........780..........790..........800..........810..........820..........830..........840....... > ...850..........860..........870..........880..........890..........900..........910..........920..........930. > .........940..........950..........960..........970..........980..........990..........1,000 done Monte Carlo permutation results Number of observations = 150 Permutation variable: Period Number of permutations = 1,000 Command: anova basibregmaticheight Period Fperiod: e(F_1) ------------------------------------------------------------------------------- | Monte Carlo error | ------------------- T | T(obs) Test c n p SE(p) [95% CI(p)] -------------+----------------------------------------------------------------- Fperiod | 2.44742 lower 944 1000 .9440 .0073 .9279 .9574 | upper 56 1000 .0560 .0073 .0426 .0721 | two-sided .1120 .0100 .0925 .1315 ------------------------------------------------------------------------------- Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n. For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n. For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate. .