. * do-file for lecture 11 of VHM 802, Winter 2023 . version 17 /* works also with versions 14-16 */ . set more off . cd "r:\" r:\ . . * classification by logistic regression . import delimited "r:\sparrow", clear varnames(1) (encoding automatically selected: ISO-8859-1) (6 vars, 49 obs) . encode survivorship, gen(surv) . generate surv01=surv-1 . logit surv01 total_length-l_keel_sternum Iteration 0: log likelihood = -33.462497 Iteration 1: log likelihood = -32.041086 Iteration 2: log likelihood = -32.035688 Iteration 3: log likelihood = -32.035688 Logistic regression Number of obs = 49 LR chi2(5) = 2.85 Prob > chi2 = 0.7225 Log likelihood = -32.035688 Pseudo R2 = 0.0426 -------------------------------------------------------------------------------- surv01 | Coefficient Std. err. z P>|z| [95% conf. interval] ---------------+---------------------------------------------------------------- total_length | -.1625675 .1396369 -1.16 0.244 -.4362508 .1111159 alar_extent | -.0276413 .1060235 -0.26 0.794 -.2354436 .1801609 l_beak_head | -.0837496 .628623 -0.13 0.894 -1.315828 1.148329 l_humerous | 1.061744 1.023129 1.04 0.299 -.9435529 3.067041 l_keel_sternum | .0715755 .4166297 0.17 0.864 -.7450037 .8881547 _cons | 13.58231 15.86496 0.86 0.392 -17.51244 44.67706 -------------------------------------------------------------------------------- . estat classification Logistic model for surv01 -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 7 4 | 11 - | 14 24 | 38 -----------+--------------------------+----------- Total | 21 28 | 49 Classified + if predicted Pr(D) >= .5 True D defined as surv01 != 0 -------------------------------------------------- Sensitivity Pr( +| D) 33.33% Specificity Pr( -|~D) 85.71% Positive predictive value Pr( D| +) 63.64% Negative predictive value Pr(~D| -) 63.16% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 14.29% False - rate for true D Pr( -| D) 66.67% False + rate for classified + Pr(~D| +) 36.36% False - rate for classified - Pr( D| -) 36.84% -------------------------------------------------- Correctly classified 63.27% -------------------------------------------------- . lsens . lroc Logistic model for surv01 Number of observations = 49 Area under ROC curve = 0.6633 . predict prob1, p . * now using discrim command: same results . discrim logistic total_length-l_keel_sternum, group(surv) prior(proportional) Iteration 0: log likelihood = -33.462497 Iteration 1: log likelihood = -32.041086 Iteration 2: log likelihood = -32.035688 Iteration 3: log likelihood = -32.035688 Logistic discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S | Total -------------+----------------+------- NS | 24 4 | 28 | 85.71 14.29 | 100.00 | | S | 14 7 | 21 | 66.67 33.33 | 100.00 -------------+----------------+------- Total | 38 11 | 49 | 77.55 22.45 | 100.00 | | Priors | 0.5714 0.4286 | . predict prob0, pr . predict gr01, classification . * new: classification from equal priors instead of data priors . estat classtable, priors(equal) Resubstitution classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S | Total -------------+----------------+------- NS | 18 10 | 28 | 64.29 35.71 | 100.00 | | S | 8 13 | 21 | 38.10 61.90 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5000 0.5000 | . predict prob0eq, pr priors(equal) . predict gr01eq, classification priors(equal) . * leave-one-out cross-validation . * this does not work, although it should??: estat classtable, loo . * manual coding of leave-one-out cross-validation for logistic discrimination with equal priors . capture drop temp* . capture drop rowno . capture drop logitgroupcv . generate rowno=_n . generate logitgroupcv=0 . quietly { . tab surv logitgroupcv /* crossvalidation classification table */ Survivorsh | logitgroupcv ip | 0 1 | Total -----------+----------------------+---------- NS | 12 16 | 28 S | 11 10 | 21 -----------+----------------------+---------- Total | 23 26 | 49 . . * linear discriminant analysis (LDA) . * for illustration with two categories and predictors: total_length l_humerous . logit surv01 total_length l_humerous Iteration 0: log likelihood = -33.462497 Iteration 1: log likelihood = -32.10122 Iteration 2: log likelihood = -32.095449 Iteration 3: log likelihood = -32.095448 Logistic regression Number of obs = 49 LR chi2(2) = 2.73 Prob > chi2 = 0.2549 Log likelihood = -32.095448 Pseudo R2 = 0.0409 ------------------------------------------------------------------------------ surv01 | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- total_length | -.1770048 .1137912 -1.56 0.120 -.4000315 .0460219 l_humerous | .922332 .7219004 1.28 0.201 -.4925668 2.337231 _cons | 10.62317 13.16035 0.81 0.420 -15.17064 36.41698 ------------------------------------------------------------------------------ . discrim lda total_length l_humerous, group(surv) Linear discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S | Total -------------+----------------+------- NS | 19 9 | 28 | 67.86 32.14 | 100.00 | | S | 7 14 | 21 | 33.33 66.67 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5000 0.5000 | . estat list +---------------------------------------------+ | | Classification | Probabilities | | | | | | Obs | True Class. | NS S | |-----+----------------------+----------------| | 1 | NS S * | 0.2580 0.7420 | | 2 | NS S * | 0.2809 0.7191 | | 3 | S S | 0.3178 0.6822 | | 4 | S S | 0.3742 0.6258 | | 5 | S S | 0.3571 0.6429 | |-----+----------------------+----------------| | 6 | S S | 0.3309 0.6691 | | 7 | NS S * | 0.3742 0.6258 | | 8 | S S | 0.3539 0.6461 | | 9 | S S | 0.4325 0.5675 | | 10 | NS S * | 0.3742 0.6258 | |-----+----------------------+----------------| | 11 | S S | 0.3539 0.6461 | | 12 | S S | 0.4761 0.5239 | | 13 | S S | 0.4490 0.5510 | | 14 | S S | 0.4145 0.5855 | | 15 | S S | 0.4779 0.5221 | |-----+----------------------+----------------| | 16 | NS NS | 0.5077 0.4923 | | 17 | S S | 0.4631 0.5369 | | 18 | NS S * | 0.4797 0.5203 | | 19 | S S | 0.4613 0.5387 | | 20 | NS S * | 0.4797 0.5203 | |-----+----------------------+----------------| | 21 | NS S * | 0.4893 0.5107 | | 22 | NS S * | 0.4928 0.5072 | | 23 | NS S * | 0.4815 0.5185 | | 24 | S S | 0.4981 0.5019 | | 25 | NS NS | 0.5088 0.4912 | |-----+----------------------+----------------| | 26 | NS NS | 0.5166 0.4834 | | 27 | S NS * | 0.5237 0.4763 | | 28 | NS NS | 0.5307 0.4693 | | 29 | NS NS | 0.5508 0.4492 | | 30 | NS NS | 0.5034 0.4966 | |-----+----------------------+----------------| | 31 | NS NS | 0.5166 0.4834 | | 32 | S NS * | 0.5438 0.4562 | | 33 | NS NS | 0.5473 0.4527 | | 34 | NS NS | 0.5620 0.4380 | | 35 | NS NS | 0.5403 0.4597 | |-----+----------------------+----------------| | 36 | NS NS | 0.5350 0.4650 | | 37 | NS NS | 0.5350 0.4650 | | 38 | S NS * | 0.5603 0.4397 | | 39 | NS NS | 0.5997 0.4003 | | 40 | NS NS | 0.6048 0.3952 | |-----+----------------------+----------------| | 41 | S NS * | 0.5980 0.4020 | | 42 | S NS * | 0.6460 0.3540 | | 43 | S NS * | 0.6173 0.3827 | | 44 | NS NS | 0.6611 0.3389 | | 45 | NS NS | 0.6256 0.3744 | |-----+----------------------+----------------| | 46 | S NS * | 0.6790 0.3210 | | 47 | NS NS | 0.7518 0.2482 | | 48 | NS NS | 0.6774 0.3226 | | 49 | NS NS | 0.7666 0.2334 | +---------------------------------------------+ * indicates misclassified observations . estat loadings, unstandardized Canonical discriminant function coefficients | function1 -------------+----------- total_length | -.3567887 l_humerous | 1.858881 _cons | 22.03294 . generate fitlda=(22.03294+1.858881*l_humerous)/0.3567887 . twoway (scatter total_length l_humerous if surv01==0, msymbol(smx) jitter(2)) (scatter total_length l_humerous > if surv01==1, msymbol(smcircle_hollow) jitter(2)) (line fitlda l_humerous, sort), xtitle (l_humerous) ytitle(to > tal_length) title(LDA separation for uniform prior (x=NS, o=S (surv=1)), size(medium)) legend(off) . estat classtable, Resubstitution classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S | Total -------------+----------------+------- NS | 19 9 | 28 | 67.86 32.14 | 100.00 | | S | 7 14 | 21 | 33.33 66.67 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5000 0.5000 | . estat classtable, loo Leave-one-out classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | LOO Classified True surv | NS S | Total -------------+----------------+------- NS | 15 13 | 28 | 53.57 46.43 | 100.00 | | S | 7 14 | 21 | 33.33 66.67 | 100.00 -------------+----------------+------- Total | 22 27 | 49 | 44.90 55.10 | 100.00 | | Priors | 0.5000 0.5000 | . * code for logistic classification and cross-validation not shown (again) . . * multinomial logistic regression for 3 categories . use "r:\beef_ultra", clear . mlogit grade sex-carc_wt, base(1) Iteration 0: log likelihood = -443.33618 Iteration 1: log likelihood = -374.26182 Iteration 2: log likelihood = -364.20181 Iteration 3: log likelihood = -363.9063 Iteration 4: log likelihood = -363.90546 Iteration 5: log likelihood = -363.90546 Multinomial logistic regression Number of obs = 487 LR chi2(16) = 158.86 Prob > chi2 = 0.0000 Log likelihood = -363.90546 Pseudo R2 = 0.1792 ------------------------------------------------------------------------------ grade | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- AAA | (base outcome) -------------+---------------------------------------------------------------- AA | sex | .9521876 .2845745 3.35 0.001 .3944318 1.509943 bckgrnd | -1.379689 .3476694 -3.97 0.000 -2.061109 -.69827 implant | -.4543614 .2854138 -1.59 0.111 -1.013762 .1050395 backfat | -.2656307 .1180485 -2.25 0.024 -.4970015 -.0342599 ribeye | .3311029 .0861079 3.85 0.000 .1623346 .4998712 imfat | -.4826611 .1261669 -3.83 0.000 -.7299437 -.2353786 days | -.0015295 .0015713 -0.97 0.330 -.0046093 .0015502 carc_wt | -.017968 .0036668 -4.90 0.000 -.0251548 -.0107812 _cons | 7.464408 1.37777 5.42 0.000 4.764029 10.16479 -------------+---------------------------------------------------------------- A | sex | 1.929613 .5309941 3.63 0.000 .8888833 2.970342 bckgrnd | -2.42452 .5154775 -4.70 0.000 -3.434837 -1.414202 implant | -1.379545 .5415622 -2.55 0.011 -2.440987 -.3181021 backfat | -.7707347 .2664633 -2.89 0.004 -1.292993 -.2484762 ribeye | .3248934 .1642367 1.98 0.048 .0029953 .6467915 imfat | -1.09812 .2387782 -4.60 0.000 -1.566117 -.6301236 days | -.0069193 .0032564 -2.12 0.034 -.0133018 -.0005369 carc_wt | -.039161 .006837 -5.73 0.000 -.0525612 -.0257608 _cons | 17.49636 2.525537 6.93 0.000 12.5464 22.44632 ------------------------------------------------------------------------------ . predict pm*, p . egen pmax=rowmax(pm1-pm3) . generate highp123=(pm1==pmax)+2*(pm2==pmax)+3*(pm3==pmax) . tab grade highp123 /* confusion matrix */ Carcass | grade | 1=AAA 2=AA | highp123 3=A | 1 2 3 | Total -----------+---------------------------------+---------- AAA | 78 86 0 | 164 AA | 41 231 5 | 277 A | 3 37 6 | 46 -----------+---------------------------------+---------- Total | 122 354 11 | 487 . discrim logistic sex-carc_wt, group(grade) prior(prop) Iteration 0: log likelihood = -443.33618 Iteration 1: log likelihood = -374.26182 Iteration 2: log likelihood = -364.41906 Iteration 3: log likelihood = -363.91187 Iteration 4: log likelihood = -363.90546 Iteration 5: log likelihood = -363.90546 Logistic discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 78 86 0 | 164 | 47.56 52.44 0.00 | 100.00 | | AA | 41 231 5 | 277 | 14.80 83.39 1.81 | 100.00 | | A | 3 37 6 | 46 | 6.52 80.43 13.04 | 100.00 -------------+------------------------+------- Total | 122 354 11 | 487 | 25.05 72.69 2.26 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . * leave-one-out crossclassification (for full multinomial model) . capture drop rowno . capture drop cvpm* . generate rowno=_n . forvalues k=1/3 { 2. generate cvpm`k'=. 3. } (487 missing values generated) (487 missing values generated) (487 missing values generated) . quietly { . egen cvpmax=rowmax(cvpm1-cvpm3) . generate highcvp123=(cvpm1==cvpmax)+2*(cvpm2==cvpmax)+3*(cvpm3==cvpmax) . tab grade highcvp123 /* crossvalidation classification table */ Carcass | grade | 1=AAA 2=AA | highcvp123 3=A | 1 2 3 | Total -----------+---------------------------------+---------- AAA | 74 90 0 | 164 AA | 43 229 5 | 277 A | 3 40 3 | 46 -----------+---------------------------------+---------- Total | 120 359 8 | 487 . drop pm1-highcvp123 /* clean-up */ . . * LDA: three categories and full model . candisc sex-carc_wt, group(grade) prior(prop) lootable Canonical linear discriminant analysis | | Like- | Canon. Eigen- Variance | lihood Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F ----+---------------------------------+------------------------------------ 1 | 0.5148 .360563 0.9517 0.9517 | 0.7218 10.557 16 954 0.0000 e 2 | 0.1341 .018314 0.0483 1.0000 | 0.9820 1.2506 7 478 0.2734 e --------------------------------------------------------------------------- H0: This and smaller canon. corr. are zero; e = exact F Standardized canonical discriminant function coefficients | function1 function2 -------------+---------------------- sex | .4370393 .4008008 bckgrnd | -.5406284 .0744169 implant | -.2749724 .1062096 backfat | -.3677969 -.0014345 ribeye | .4268066 1.089007 imfat | -.4979424 .0526061 days | -.2114979 .352345 carc_wt | -.7996381 -.1815683 Canonical structure | function1 function2 -------------+---------------------- sex | .0591467 .327256 bckgrnd | -.5496828 .3288711 implant | .016047 -.0649989 backfat | -.1348509 .3978866 ribeye | .002019 .8897859 imfat | -.423941 .0144226 days | -.1496699 -.175766 carc_wt | -.48544 .3082858 Group means on canonical variables grade | function1 function2 -------------+---------------------- AAA | -.7407635 -.0893078 AA | .2348255 .1048708 A | 1.226925 -.313103 Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 80 84 0 | 164 | 48.78 51.22 0.00 | 100.00 | | AA | 42 232 3 | 277 | 15.16 83.75 1.08 | 100.00 | | A | 3 37 6 | 46 | 6.52 80.43 13.04 | 100.00 -------------+------------------------+------- Total | 125 353 9 | 487 | 25.67 72.48 1.85 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 76 88 0 | 164 | 46.34 53.66 0.00 | 100.00 | | AA | 43 230 4 | 277 | 15.52 83.03 1.44 | 100.00 | | A | 3 39 4 | 46 | 6.52 84.78 8.70 | 100.00 -------------+------------------------+------- Total | 122 357 8 | 487 | 25.05 73.31 1.64 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . discrim lda sex-carc_wt, group(grade) prior(prop) lootable Linear discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 80 84 0 | 164 | 48.78 51.22 0.00 | 100.00 | | AA | 42 232 3 | 277 | 15.16 83.75 1.08 | 100.00 | | A | 3 37 6 | 46 | 6.52 80.43 13.04 | 100.00 -------------+------------------------+------- Total | 125 353 9 | 487 | 25.67 72.48 1.85 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 76 88 0 | 164 | 46.34 53.66 0.00 | 100.00 | | AA | 43 230 4 | 277 | 15.52 83.03 1.44 | 100.00 | | A | 3 39 4 | 46 | 6.52 84.78 8.70 | 100.00 -------------+------------------------+------- Total | 122 357 8 | 487 | 25.05 73.31 1.64 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . loadingplot . generate gradenolabel=grade . scoreplot, msymbol(i) mlabsize(vsmall) mlabel(gradenolabel) . * quadratic distriminant analysis (QDA) . discrim qda sex-carc_wt, group(grade) prior(prop) lootable Quadratic discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 106 57 1 | 164 | 64.63 34.76 0.61 | 100.00 | | AA | 70 189 18 | 277 | 25.27 68.23 6.50 | 100.00 | | A | 4 25 17 | 46 | 8.70 54.35 36.96 | 100.00 -------------+------------------------+------- Total | 180 271 36 | 487 | 36.96 55.65 7.39 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 93 70 1 | 164 | 56.71 42.68 0.61 | 100.00 | | AA | 77 179 21 | 277 | 27.80 64.62 7.58 | 100.00 | | A | 4 30 12 | 46 | 8.70 65.22 26.09 | 100.00 -------------+------------------------+------- Total | 174 279 34 | 487 | 35.73 57.29 6.98 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . . * kth nearest neighbour (KNN) . * three categories and full model for beef_ultra data (standardized variables) . foreach var of varlist sex-carc_wt { 2. egen s`var'=std(`var') 3. } . discrim knn ssex-scarc_wt, group(grade) prior(prop) lootable measure (L2) k(7) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A Unclassified | Total -------------+--------------------------------------+------- AAA | 87 67 1 9 | 164 | 53.05 40.85 0.61 5.49 | 100.00 | | AA | 37 234 1 5 | 277 | 13.36 84.48 0.36 1.81 | 100.00 | | A | 2 32 9 3 | 46 | 4.35 69.57 19.57 6.52 | 100.00 -------------+--------------------------------------+------- Total | 126 333 11 17 | 487 | 25.87 68.38 2.26 3.49 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A Unclassified | Total -------------+--------------------------------------+------- AAA | 67 95 1 1 | 164 | 40.85 57.93 0.61 0.61 | 100.00 | | AA | 61 215 1 0 | 277 | 22.02 77.62 0.36 0.00 | 100.00 | | A | 3 35 8 0 | 46 | 6.52 76.09 17.39 0.00 | 100.00 -------------+--------------------------------------+------- Total | 131 345 10 1 | 487 | 26.90 70.84 2.05 0.21 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . discrim knn ssex-scarc_wt, group(grade) prior(prop) lootable measure (L1) k(7) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A Unclassified | Total -------------+--------------------------------------+------- AAA | 88 72 1 3 | 164 | 53.66 43.90 0.61 1.83 | 100.00 | | AA | 34 228 2 13 | 277 | 12.27 82.31 0.72 4.69 | 100.00 | | A | 0 31 9 6 | 46 | 0.00 67.39 19.57 13.04 | 100.00 -------------+--------------------------------------+------- Total | 122 331 12 22 | 487 | 25.05 67.97 2.46 4.52 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 69 94 1 | 164 | 42.07 57.32 0.61 | 100.00 | | AA | 56 219 2 | 277 | 20.22 79.06 0.72 | 100.00 | | A | 1 37 8 | 46 | 2.17 80.43 17.39 | 100.00 -------------+------------------------+------- Total | 126 350 11 | 487 | 25.87 71.87 2.26 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . * KNN with equal prior probabilities . discrim knn ssex-scarc_wt, group(grade) prior(equal) lootable measure (L1) k(11) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 111 35 18 | 164 | 67.68 21.34 10.98 | 100.00 | | AA | 84 119 74 | 277 | 30.32 42.96 26.71 | 100.00 | | A | 3 5 38 | 46 | 6.52 10.87 82.61 | 100.00 -------------+------------------------+------- Total | 198 159 130 | 487 | 40.66 32.65 26.69 | 100.00 | | Priors | 0.3333 0.3333 0.3333 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 93 50 21 | 164 | 56.71 30.49 12.80 | 100.00 | | AA | 98 94 85 | 277 | 35.38 33.94 30.69 | 100.00 | | A | 6 16 24 | 46 | 13.04 34.78 52.17 | 100.00 -------------+------------------------+------- Total | 197 160 130 | 487 | 40.45 32.85 26.69 | 100.00 | | Priors | 0.3333 0.3333 0.3333 | . . * kth nearest neighbour for sparrow data . di "suggested range for k (binary): " 49^(2/8) " - " 49^(3/8) suggested range for k (binary): 2.6457513 - 4.3035171 . import delimited "r:\sparrow", clear varnames(1) (encoding automatically selected: ISO-8859-1) (6 vars, 49 obs) . encode survivorship, gen(surv) . discrim knn total_length l_humerous, group(surv) prior(prop) measure(L2) k(3) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S | Total -------------+----------------+------- NS | 22 6 | 28 | 78.57 21.43 | 100.00 | | S | 7 14 | 21 | 33.33 66.67 | 100.00 -------------+----------------+------- Total | 29 20 | 49 | 59.18 40.82 | 100.00 | | Priors | 0.5714 0.4286 | . estat classtable, loo Leave-one-out classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | LOO Classified True surv | NS S | Total -------------+----------------+------- NS | 18 10 | 28 | 64.29 35.71 | 100.00 | | S | 9 12 | 21 | 42.86 57.14 | 100.00 -------------+----------------+------- Total | 27 22 | 49 | 55.10 44.90 | 100.00 | | Priors | 0.5714 0.4286 | . foreach var of varlist total_length-l_humerous { 2. egen s`var'=std(`var') 3. } . discrim knn stotal_length sl_humerous, group(surv) prior(prop) measure(L2) k(3) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S Unclassified | Total -------------+------------------------------+------- NS | 24 4 0 | 28 | 85.71 14.29 0.00 | 100.00 | | S | 8 11 2 | 21 | 38.10 52.38 9.52 | 100.00 -------------+------------------------------+------- Total | 32 15 2 | 49 | 65.31 30.61 4.08 | 100.00 | | Priors | 0.5714 0.4286 | . estat classtable, loo Leave-one-out classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | LOO Classified True surv | NS S | Total -------------+----------------+------- NS | 13 15 | 28 | 46.43 53.57 | 100.00 | | S | 13 8 | 21 | 61.90 38.10 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5714 0.4286 | .