. * do-file for lecture 4b of VHM 802/812, Winter 2016 . version 14 /* works also with version 13 */ . set more off . cd "h:\vhm\vhm802\data_stata" h:\vhm\vhm802\data_stata . . use nocardia.dta, clear . * multiple logistic regression model . logit casecont dneo dclox dcpct Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -54.156164 Iteration 2: log likelihood = -53.993934 Iteration 3: log likelihood = -53.993656 Iteration 4: log likelihood = -53.993656 Logistic regression Number of obs = 108 LR chi2(3) = 41.73 Prob > chi2 = 0.0000 Log likelihood = -53.993656 Pseudo R2 = 0.2787 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.212564 .5780423 3.83 0.000 1.079622 3.345506 dclox | -1.412516 .5572076 -2.53 0.011 -2.504623 -.3204093 dcpct | .0226682 .0071871 3.15 0.002 .0085817 .0367547 _cons | -2.984272 .7722467 -3.86 0.000 -4.497848 -1.470697 ------------------------------------------------------------------------------ . logit casecont dneo dclox dcpct, or Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -54.156164 Iteration 2: log likelihood = -53.993934 Iteration 3: log likelihood = -53.993656 Iteration 4: log likelihood = -53.993656 Logistic regression Number of obs = 108 LR chi2(3) = 41.73 Prob > chi2 = 0.0000 Log likelihood = -53.993656 Pseudo R2 = 0.2787 ------------------------------------------------------------------------------ casecont | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 9.139118 5.282797 3.83 0.000 2.943566 28.37493 dclox | .2435298 .1356966 -2.53 0.011 .0817064 .7258519 dcpct | 1.022927 .0073519 3.15 0.002 1.008619 1.037439 _cons | .0505763 .0390574 -3.86 0.000 .0111329 .2297654 ------------------------------------------------------------------------------ . . * confounding effect by dcpct? . logit casecont dneo dclox dcpct Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -54.156164 Iteration 2: log likelihood = -53.993934 Iteration 3: log likelihood = -53.993656 Iteration 4: log likelihood = -53.993656 Logistic regression Number of obs = 108 LR chi2(3) = 41.73 Prob > chi2 = 0.0000 Log likelihood = -53.993656 Pseudo R2 = 0.2787 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.212564 .5780423 3.83 0.000 1.079622 3.345506 dclox | -1.412516 .5572076 -2.53 0.011 -2.504623 -.3204093 dcpct | .0226682 .0071871 3.15 0.002 .0085817 .0367547 _cons | -2.984272 .7722467 -3.86 0.000 -4.497848 -1.470697 ------------------------------------------------------------------------------ . estimates store full . logit casecont dneo dclox Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -59.845472 Iteration 2: log likelihood = -59.671317 Iteration 3: log likelihood = -59.670766 Iteration 4: log likelihood = -59.670766 Logistic regression Number of obs = 108 LR chi2(2) = 30.38 Prob > chi2 = 0.0000 Log likelihood = -59.670766 Pseudo R2 = 0.2029 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.377409 .5504016 4.32 0.000 1.298642 3.456176 dclox | -1.009729 .5319365 -1.90 0.058 -2.052306 .0328471 _cons | -1.479612 .5013641 -2.95 0.003 -2.462267 -.4969559 ------------------------------------------------------------------------------ . estimates store red . estimates table *, se ---------------------------------------- Variable | full red -------------+-------------------------- dneo | 2.2125639 2.3774092 | .57804231 .55040157 dclox | -1.412516 -1.0097292 | .55720756 .53193649 dcpct | .02266825 | .00718712 _cons | -2.9842724 -1.4796115 | .77224672 .50136411 ---------------------------------------- legend: b/se . di (2.3774092-2.2125639)/2.3774092 /* percent change for dneo */ .06933821 . di (1.412516-1.0097292)/1.0097292 /* percent change for dclox */ .39890577 . * check association: dcpct vs. exposure . ranksum dcpct if casecont==0, by(dneo) Two-sample Wilcoxon rank-sum (Mann-Whitney) test dneo | obs rank sum expected -------------+--------------------------------- no | 29 796.5 797.5 yes | 25 688.5 687.5 -------------+--------------------------------- combined | 54 1485 1485 unadjusted variance 3322.92 adjustment for ties -379.09 ---------- adjusted variance 2943.82 Ho: dcpct(dneo==no) = dcpct(dneo==yes) z = -0.018 Prob > |z| = 0.9853 . ranksum dcpct if casecont==0, by(dclox) Two-sample Wilcoxon rank-sum (Mann-Whitney) test dclox | obs rank sum expected -------------+--------------------------------- no | 35 788 962.5 yes | 19 697 522.5 -------------+--------------------------------- combined | 54 1485 1485 unadjusted variance 3047.92 adjustment for ties -347.72 ---------- adjusted variance 2700.20 Ho: dcpct(dclox==no) = dcpct(dclox==yes) z = -3.358 Prob > |z| = 0.0008 . * check association in non-exposed units: dcpct vs. outcome . ranksum dcpct if dneo==0, by(casecont) Two-sample Wilcoxon rank-sum (Mann-Whitney) test casecont | obs rank sum expected -------------+--------------------------------- no | 29 496 507.5 yes | 5 99 87.5 -------------+--------------------------------- combined | 34 595 595 unadjusted variance 422.92 adjustment for ties -77.28 ---------- adjusted variance 345.64 Ho: dcpct(casecont==no) = dcpct(casecont==yes) z = -0.619 Prob > |z| = 0.5362 . ranksum dcpct if dclox==0, by(casecont) Two-sample Wilcoxon rank-sum (Mann-Whitney) test casecont | obs rank sum expected -------------+--------------------------------- no | 35 1042 1435 yes | 46 2279 1886 -------------+--------------------------------- combined | 81 3321 3321 unadjusted variance 11001.67 adjustment for ties -2026.17 ---------- adjusted variance 8975.50 Ho: dcpct(casecont==no) = dcpct(casecont==yes) z = -4.148 Prob > |z| = 0.0000 . . * interaction dneo*dclox . generate neoclox=dneo*dclox . logit casecont dcpct dneo dclox neoclox Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -52.23958 Iteration 2: log likelihood = -51.712191 Iteration 3: log likelihood = -51.70842 Iteration 4: log likelihood = -51.708419 Logistic regression Number of obs = 108 LR chi2(4) = 46.30 Prob > chi2 = 0.0000 Log likelihood = -51.708419 Pseudo R2 = 0.3093 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dcpct | .0226175 .0077233 2.93 0.003 .0074802 .0377549 dneo | 3.184002 .8372021 3.80 0.000 1.543116 4.824888 dclox | .4457043 1.026029 0.43 0.664 -1.565275 2.456683 neoclox | -2.551997 1.205077 -2.12 0.034 -4.913904 -.190089 _cons | -3.776896 .9932539 -3.80 0.000 -5.723638 -1.830154 ------------------------------------------------------------------------------ . logit casecont dcpct dneo dclox neoclox, or Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -52.23958 Iteration 2: log likelihood = -51.712191 Iteration 3: log likelihood = -51.70842 Iteration 4: log likelihood = -51.708419 Logistic regression Number of obs = 108 LR chi2(4) = 46.30 Prob > chi2 = 0.0000 Log likelihood = -51.708419 Pseudo R2 = 0.3093 ------------------------------------------------------------------------------ casecont | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dcpct | 1.022875 .0078999 2.93 0.003 1.007508 1.038477 dneo | 24.14318 20.21272 3.80 0.000 4.679147 124.5725 dclox | 1.56159 1.602236 0.43 0.664 .2090306 11.66605 neoclox | .0779259 .0939067 -2.12 0.034 .0073438 .8268856 _cons | .0228936 .0227392 -3.80 0.000 .0032678 .1603889 ------------------------------------------------------------------------------ . logit casecont dcpct i.dneo##i.dclox /* same thing with factor notation */ Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -52.23958 Iteration 2: log likelihood = -51.712191 Iteration 3: log likelihood = -51.70842 Iteration 4: log likelihood = -51.708419 Logistic regression Number of obs = 108 LR chi2(4) = 46.30 Prob > chi2 = 0.0000 Log likelihood = -51.708419 Pseudo R2 = 0.3093 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dcpct | .0226175 .0077233 2.93 0.003 .0074802 .0377549 | dneo | yes | 3.184002 .8372021 3.80 0.000 1.543116 4.824888 | dclox | yes | .4457043 1.026029 0.43 0.664 -1.565275 2.456683 | dneo#dclox | yes#yes | -2.551997 1.205077 -2.12 0.034 -4.913904 -.190089 | _cons | -3.776896 .9932539 -3.80 0.000 -5.723638 -1.830154 ------------------------------------------------------------------------------ . lincom 1.dneo+1.dneo#1.dclox /* log OR for dneo at dclox=1 */ ( 1) [casecont]1.dneo + [casecont]1.dneo#1.dclox = 0 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .6320052 .8728059 0.72 0.469 -1.078663 2.342673 ------------------------------------------------------------------------------ . lincom 1.dclox+1.dneo#1.dclox /* log OR for dclox at dneo=1 */ ( 1) [casecont]1.dclox + [casecont]1.dneo#1.dclox = 0 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -2.106292 .6657654 -3.16 0.002 -3.411169 -.8014162 ------------------------------------------------------------------------------ . lincom 1.dclox+1.dneo#1.dclox, or /* same, as OR */ ( 1) [casecont]1.dclox + [casecont]1.dneo#1.dclox = 0 ------------------------------------------------------------------------------ casecont | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .1216883 .0810159 -3.16 0.002 .0330026 .4486931 ------------------------------------------------------------------------------ . * interaction plots, at dcpct=100 . margins i.dneo#i.dclox, at(dcpct=100) /* probability scale */ Adjusted predictions Number of obs = 108 Model VCE : OIM Expression : Pr(casecont), predict() at : dcpct = 100 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo#dclox | no#no | .1801775 .1153654 1.56 0.118 -.0459344 .4062895 no#yes | .2555094 .1269408 2.01 0.044 .0067101 .5043088 yes#no | .8414233 .0528546 15.92 0.000 .7378302 .9450164 yes#yes | .3923522 .1343778 2.92 0.004 .1289765 .6557278 ------------------------------------------------------------------------------ . marginsplot Variables that uniquely identify margins: dneo dclox . marginsplot, x(dclox) /* switch roles of predictors in plot */ Variables that uniquely identify margins: dneo dclox . margins i.dneo#i.dclox, at(dcpct=100) expression(predict(xb)) /* logit scale */ Adjusted predictions Number of obs = 108 Model VCE : OIM Expression : Linear prediction (log odds), predict(xb) at : dcpct = 100 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo#dclox | no#no | -1.515145 .7810071 -1.94 0.052 -3.045891 .0156008 no#yes | -1.069441 .6673214 -1.60 0.109 -2.377367 .2384851 yes#no | 1.668857 .3961219 4.21 0.000 .892472 2.445241 yes#yes | -.4374356 .563637 -0.78 0.438 -1.542144 .6672726 ------------------------------------------------------------------------------ . marginsplot Variables that uniquely identify margins: dneo dclox . . * evaluating linearity graphically . logit casecont numcow Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -72.698945 Iteration 2: log likelihood = -72.69687 Iteration 3: log likelihood = -72.69687 Logistic regression Number of obs = 108 LR chi2(1) = 4.33 Prob > chi2 = 0.0375 Log likelihood = -72.69687 Pseudo R2 = 0.0289 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcow | .0147207 .0077021 1.91 0.056 -.0003751 .0298164 _cons | -.7101273 .4109623 -1.73 0.084 -1.515599 .0953439 ------------------------------------------------------------------------------ . summarize numcow, detail Herd size ------------------------------------------------------------- Percentiles Smallest 1% 16 16 5% 18 16 10% 22 16 Obs 108 25% 30 17 Sum of Wgt. 108 50% 41 Mean 49.09259 Largest Std. Dev. 30.0201 75% 54 135 90% 90 140 Variance 901.2063 95% 100 175 Skewness 2.24793 99% 175 190 Kurtosis 9.539434 . scatter casecont numcow, ylabel(0 1) . tabulate numcow casecont /* little replication of numcow */ | Case - Control Herd size | no yes | Total -----------+----------------------+---------- 16 | 2 1 | 3 17 | 0 2 | 2 18 | 2 0 | 2 20 | 1 1 | 2 21 | 1 0 | 1 22 | 2 0 | 2 23 | 3 0 | 3 24 | 1 1 | 2 25 | 2 0 | 2 26 | 0 1 | 1 29 | 0 1 | 1 30 | 3 5 | 8 32 | 0 1 | 1 34 | 2 0 | 2 35 | 2 2 | 4 36 | 1 0 | 1 37 | 3 1 | 4 38 | 1 0 | 1 40 | 3 7 | 10 41 | 1 2 | 3 42 | 2 1 | 3 43 | 1 0 | 1 44 | 1 0 | 1 45 | 0 2 | 2 47 | 1 1 | 2 50 | 6 8 | 14 51 | 0 2 | 2 53 | 1 0 | 1 55 | 2 0 | 2 56 | 1 0 | 1 57 | 0 2 | 2 60 | 2 1 | 3 65 | 0 1 | 1 72 | 0 1 | 1 75 | 1 1 | 2 80 | 1 0 | 1 85 | 1 2 | 3 90 | 2 1 | 3 91 | 1 0 | 1 97 | 0 1 | 1 100 | 1 1 | 2 135 | 0 1 | 1 140 | 0 1 | 1 175 | 0 1 | 1 190 | 0 1 | 1 -----------+----------------------+---------- Total | 54 54 | 108 . lowess casecont numcow, logit /* smoothed scatterplot */ . * added code for smoothed scatterplot with restricted range . lowess casecont numcow, logit nograph generate(smoothy) . twoway (line smoothy numcow if numcow<100, sort) . * note that the following coding is totally wrong . lowess casecont numcow if numcow<100, logit . . * evaluating linearity by categorization, using lintrend add-on command . lintrend casecont numcow, groups(4) plot(log) /* linear trend plot */ The proportion and log odds of casecont by categories of numcow (Note: 4 numcow categories of equal sample size; Uses mean numcow value for each category) +------------------------------------------------------+ | numcow min max d total casecont logodds | |------------------------------------------------------| | 23.6 16 30 12 29 0.41 -0.35 | | 37.9 32 41 13 26 0.50 0.00 | | 48.2 42 53 14 26 0.54 0.15 | | 88.1 55 190 15 27 0.56 0.22 | +------------------------------------------------------+ . egen numcow4=cut(numcow), at(16,32,42,55,200) /* note: left endpoints included */ . sort numcow . browse numcow numcow4 . logit casecont i.numcow4 /* model with numcow categorized */ Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -74.182735 Iteration 2: log likelihood = -74.182689 Iteration 3: log likelihood = -74.182689 Logistic regression Number of obs = 108 LR chi2(3) = 1.35 Prob > chi2 = 0.7163 Log likelihood = -74.182689 Pseudo R2 = 0.0090 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcow4 | 32 | .3483068 .5440616 0.64 0.522 -.7180343 1.414648 42 | .5024575 .5449025 0.92 0.356 -.5655318 1.570447 55 | .5714504 .5405154 1.06 0.290 -.4879403 1.630841 | _cons | -.3483068 .377037 -0.92 0.356 -1.087286 .390672 ------------------------------------------------------------------------------ . testparm i.numcow4 /* combined Wald test; could also have used LR test */ ( 1) [casecont]32.numcow4 = 0 ( 2) [casecont]42.numcow4 = 0 ( 3) [casecont]55.numcow4 = 0 chi2( 3) = 1.34 Prob > chi2 = 0.7199 . . * evaluating linearity by polynomials . generate numcowsq=numcow^2 . logit casecont numcow numcowsq Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -72.611319 Iteration 2: log likelihood = -72.526454 Iteration 3: log likelihood = -72.525066 Iteration 4: log likelihood = -72.525064 Logistic regression Number of obs = 108 LR chi2(2) = 4.67 Prob > chi2 = 0.0968 Log likelihood = -72.525064 Pseudo R2 = 0.0312 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcow | .0009965 .0256399 0.04 0.969 -.0492568 .0512498 numcowsq | .0000981 .0001827 0.54 0.591 -.00026 .0004563 _cons | -.342618 .763331 -0.45 0.654 -1.838719 1.153483 ------------------------------------------------------------------------------ . estat vce, corr /* strong correlation of estimates */ Correlation matrix of coefficients of logit model | casecont e(V) | numcow numcowsq _cons -------------+------------------------------ casecont | numcow | 1.0000 numcowsq | -0.9444 1.0000 _cons | -0.9435 0.8224 1.0000 . * code to illustrate reduction in collinearity by centering . generate numcowct=numcow-75 /* mean numcow=49, but distrib is right-skewed */ . generate numcowctsq=numcowct^2 . logit casecont numcowct numcowctsq Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -72.611319 Iteration 2: log likelihood = -72.526454 Iteration 3: log likelihood = -72.525066 Iteration 4: log likelihood = -72.525064 Logistic regression Number of obs = 108 LR chi2(2) = 4.67 Prob > chi2 = 0.0968 Log likelihood = -72.525064 Pseudo R2 = 0.0312 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcowct | .0157166 .0090151 1.74 0.081 -.0019527 .0333858 numcowctsq | .0000981 .0001827 0.54 0.591 -.00026 .0004563 _cons | .2841217 .3524342 0.81 0.420 -.4066367 .9748802 ------------------------------------------------------------------------------ . estat vce, corr Correlation matrix of coefficients of logit model | casecont e(V) | numcowct numc~tsq _cons -------------+------------------------------ casecont | numcowct | 1.0000 numcowctsq | 0.3545 1.0000 _cons | 0.4882 -0.4552 1.0000 . . * other polynomials (advanced) . * orthogonal polynomials . orthpoly numcow, degree(2) generate(numcowop1 numcowop2) . pwcorr numcowop1 numcowop2 /* no correlation */ | numcow~1 numcow~2 -------------+------------------ numcowop1 | 1.0000 numcowop2 | -0.0000 1.0000 . logit casecont numcowop1 numcowop2 /* same as quadratic model above */ Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -72.611319 Iteration 2: log likelihood = -72.526454 Iteration 3: log likelihood = -72.525066 Iteration 4: log likelihood = -72.525064 Logistic regression Number of obs = 108 LR chi2(2) = 4.67 Prob > chi2 = 0.0968 Log likelihood = -72.525064 Pseudo R2 = 0.0312 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcowop1 | .5146503 .3092086 1.66 0.096 -.0913874 1.120688 numcowop2 | .1636 .3046459 0.54 0.591 -.433495 .7606949 _cons | .0304329 .2025842 0.15 0.881 -.3666248 .4274906 ------------------------------------------------------------------------------ . estat vce, corr Correlation matrix of coefficients of logit model | casecont e(V) | numcow~1 numcow~2 _cons -------------+------------------------------ casecont | numcowop1 | 1.0000 numcowop2 | 0.5800 1.0000 _cons | 0.2283 0.2102 1.0000 . drop numcowop1 numcowop2 . * fractional polynomials . fp , scale center replace: logit casecont (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: -------------------------------------------------------------------- numcow | df Deviance Dev. dif. P(*) Powers -------------+------------------------------------------------------ omitted | 0 149.720 5.526 0.237 linear | 1 145.394 1.200 0.753 1 m = 1 | 2 144.798 0.604 0.739 3 m = 2 | 4 144.194 0.000 -- -2 3 -------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on chi^2 of dev. dif. Logistic regression Number of obs = 108 LR chi2(2) = 5.53 Prob > chi2 = 0.0631 Log likelihood = -72.096751 Pseudo R2 = 0.0369 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcow_1 | -.0184979 .024051 -0.77 0.442 -.0656371 .0286412 numcow_2 | .6824194 .5482471 1.24 0.213 -.3921253 1.756964 _cons | .0015338 .2352174 0.01 0.995 -.4594838 .4625515 ------------------------------------------------------------------------------ . fp plot, r(none) . predict linp_fp, xb . scatter linp_fp numcow . * piecewise linear curve . mkspline numcowpl1 50 numcowpl2 75 numcowpl3=numcow /* knots at 50,75 */ . logit casecont numcowpl1 numcowpl2 numcowpl3 Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -71.743541 Iteration 2: log likelihood = -71.540299 Iteration 3: log likelihood = -71.535504 Iteration 4: log likelihood = -71.535501 Logistic regression Number of obs = 108 LR chi2(3) = 6.65 Prob > chi2 = 0.0840 Log likelihood = -71.535501 Pseudo R2 = 0.0444 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- numcowpl1 | .0292991 .0205295 1.43 0.154 -.0109379 .0695361 numcowpl2 | -.0320893 .0328243 -0.98 0.328 -.0964238 .0322452 numcowpl3 | .0447032 .0333335 1.34 0.180 -.0206293 .1100356 _cons | -1.157369 .7869526 -1.47 0.141 -2.699768 .3850293 ------------------------------------------------------------------------------ . predict linp_pwl, xb . scatter linp_pwl numcow . * close to significance with quadratic polynomial and piecewise linear (50,75) . . * reload clean dataset . use nocardia.dta, clear . * preparation for model building . codebook dout /* exclude due to too little variation */ ------------------------------------------------------------------------------------------------ dout dry cows turned out ------------------------------------------------------------------------------------------------ type: numeric (byte) label: dout range: [1,3] units: 1 unique values: 3 missing .: 0/108 tabulation: Freq. Numeric Label 101 1 pasture 3 2 yard 4 3 none . codebook dcprep /* exclude due to too little variation */ ------------------------------------------------------------------------------------------------ dcprep Method of dry cow preparation ------------------------------------------------------------------------------------------------ type: numeric (byte) label: dcprep range: [1,4] units: 1 unique values: 4 missing .: 0/108 tabulation: Freq. Numeric Label 2 1 none 5 2 washed 95 3 washed & 6 4 dry cow . codebook dbarn /* generate dichotomous barn type, due to too sparse category */ ------------------------------------------------------------------------------------------------ dbarn dry cow housing ------------------------------------------------------------------------------------------------ type: numeric (byte) label: dbarn range: [1,3] units: 1 unique values: 3 missing .: 0/108 tabulation: Freq. Numeric Label 35 1 freestal 67 2 tiestall 6 3 other . generate dbarn2=dbarn . replace dbarn2=1 if dbarn==3 (6 real changes made) . codebook doth /* good distribution on two categories */ ------------------------------------------------------------------------------------------------ doth Other dry cow products used ------------------------------------------------------------------------------------------------ type: numeric (byte) label: yesno range: [0,1] units: 1 unique values: 2 missing .: 0/108 tabulation: Freq. Numeric Label 48 0 no 60 1 yes . * stepwise model selection procedures . stepwise, pr(.1): logit casecont dneo dclox dcpct doth numcow dbarn2 bscc /* backwards elimina > tion */ begin with full model p = 0.6989 >= 0.1000 removing bscc p = 0.5239 >= 0.1000 removing numcow Logistic regression Number of obs = 107 LR chi2(5) = 52.92 Prob > chi2 = 0.0000 Log likelihood = -47.703634 Pseudo R2 = 0.3568 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.620729 .6839433 3.83 0.000 1.280225 3.961233 dclox | -1.624902 .6240902 -2.60 0.009 -2.848096 -.4017076 dcpct | .0267303 .0081368 3.29 0.001 .0107825 .0426782 doth | -1.028909 .555921 -1.85 0.064 -2.118494 .060676 dbarn2 | -1.433304 .6047608 -2.37 0.018 -2.618613 -.2479943 _cons | -.6277067 1.109318 -0.57 0.571 -2.80193 1.546517 ------------------------------------------------------------------------------ . stepwise, pe(.099): logit casecont dneo dclox dcpct doth numcow dbarn2 bscc /* forward selecti > on */ begin with empty model p = 0.0000 < 0.0990 adding dneo p = 0.0055 < 0.0990 adding dcpct p = 0.0052 < 0.0990 adding dclox p = 0.0210 < 0.0990 adding dbarn2 p = 0.0642 < 0.0990 adding doth Logistic regression Number of obs = 107 LR chi2(5) = 52.92 Prob > chi2 = 0.0000 Log likelihood = -47.703634 Pseudo R2 = 0.3568 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.620729 .6839433 3.83 0.000 1.280225 3.961233 dcpct | .0267303 .0081368 3.29 0.001 .0107825 .0426782 dclox | -1.624902 .6240902 -2.60 0.009 -2.848096 -.4017076 dbarn2 | -1.433304 .6047608 -2.37 0.018 -2.618613 -.2479943 doth | -1.028909 .555921 -1.85 0.064 -2.118494 .060676 _cons | -.6277067 1.109318 -0.57 0.571 -2.80193 1.546517 ------------------------------------------------------------------------------ . stepwise, pe(.099) pr(.1): logit casecont dneo dclox dcpct doth numcow dbarn2 bscc /* backwise > stepwise */ begin with full model p = 0.6989 >= 0.1000 removing bscc p = 0.5239 >= 0.1000 removing numcow Logistic regression Number of obs = 107 LR chi2(5) = 52.92 Prob > chi2 = 0.0000 Log likelihood = -47.703634 Pseudo R2 = 0.3568 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.620729 .6839433 3.83 0.000 1.280225 3.961233 dclox | -1.624902 .6240902 -2.60 0.009 -2.848096 -.4017076 dcpct | .0267303 .0081368 3.29 0.001 .0107825 .0426782 doth | -1.028909 .555921 -1.85 0.064 -2.118494 .060676 dbarn2 | -1.433304 .6047608 -2.37 0.018 -2.618613 -.2479943 _cons | -.6277067 1.109318 -0.57 0.571 -2.80193 1.546517 ------------------------------------------------------------------------------ . * conclusion: all point to same model . * refit with all observations included . logit casecont dneo dclox dcpct doth dbarn2 Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -49.438676 Iteration 2: log likelihood = -49.019308 Iteration 3: log likelihood = -49.017623 Iteration 4: log likelihood = -49.017623 Logistic regression Number of obs = 108 LR chi2(5) = 51.68 Prob > chi2 = 0.0000 Log likelihood = -49.017623 Pseudo R2 = 0.3452 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.668593 .6773553 3.94 0.000 1.341001 3.996185 dclox | -1.420506 .6011634 -2.36 0.018 -2.598764 -.2422469 dcpct | .0251328 .007934 3.17 0.002 .0095826 .0406831 doth | -1.097966 .5499217 -2.00 0.046 -2.175793 -.0201393 dbarn2 | -1.383821 .5986505 -2.31 0.021 -2.557154 -.2104872 _cons | -.5857322 1.09636 -0.53 0.593 -2.734558 1.563093 ------------------------------------------------------------------------------ . * continue manually from model with interaction dneo*dclox included . logit casecont i.dneo##i.dclox dcpct doth dbarn2 Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -48.325988 Iteration 2: log likelihood = -47.571796 Iteration 3: log likelihood = -47.565495 Iteration 4: log likelihood = -47.565494 Logistic regression Number of obs = 108 LR chi2(6) = 54.59 Prob > chi2 = 0.0000 Log likelihood = -47.565494 Pseudo R2 = 0.3646 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | yes | 3.542159 .934735 3.79 0.000 1.710112 5.374205 | dclox | yes | .2270733 1.141361 0.20 0.842 -2.009953 2.4641 | dneo#dclox | yes#yes | -2.218828 1.316287 -1.69 0.092 -4.798703 .3610475 | dcpct | .024149 .0082936 2.91 0.004 .0078938 .0404041 doth | -.8165712 .5667784 -1.44 0.150 -1.927436 .294294 dbarn2 | -1.420535 .6030892 -2.36 0.019 -2.602568 -.2385017 _cons | -1.355178 1.280715 -1.06 0.290 -3.865334 1.154977 ------------------------------------------------------------------------------ . estat vce, corr /* look for correlation doth vs dneo#dclox */ Correlation matrix of coefficients of logit model | casecont | 1. 1. 1.dneo# e(V) | dneo dclox 1.dclox dcpct doth dbarn2 _cons -------------+---------------------------------------------------------------------- casecont | 1.dneo | 1.0000 1.dclox | 0.6412 1.0000 1.dneo#| 1.dclox | -0.6622 -0.8445 1.0000 dcpct | 0.0907 -0.1916 0.0300 1.0000 doth | 0.0989 0.3412 -0.2729 -0.2363 1.0000 dbarn2 | -0.3792 -0.0792 0.0846 -0.0856 0.1030 1.0000 _cons | -0.3874 -0.4065 0.4060 -0.3844 -0.3192 -0.5477 1.0000 . table casecont dneo dclox, by(doth) /* table to investigate change in doth estimate */ ------------------------------------ Other dry | cow | products | Cloxacillin used on farm used and |and Neomycin used on farm Case - | --- no --- --- yes -- Control | no yes no yes ----------+------------------------- no | no | 6 6 6 2 yes | 23 2 3 ----------+------------------------- yes | no | 14 9 3 8 yes | 2 21 1 2 ------------------------------------ . logit casecont i.dneo##i.dclox dcpct dbarn2 /* final model */ Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -49.42205 Iteration 2: log likelihood = -48.660445 Iteration 3: log likelihood = -48.652782 Iteration 4: log likelihood = -48.65278 Logistic regression Number of obs = 108 LR chi2(5) = 52.41 Prob > chi2 = 0.0000 Log likelihood = -48.65278 Pseudo R2 = 0.3501 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | yes | 3.758082 .9296064 4.04 0.000 1.936087 5.580077 | dclox | yes | .7683728 1.064759 0.72 0.471 -1.318516 2.855262 | dneo#dclox | yes#yes | -2.782955 1.257469 -2.21 0.027 -5.247549 -.3183622 | dcpct | .0221456 .008126 2.73 0.006 .0062189 .0380723 dbarn2 | -1.370958 .5895722 -2.33 0.020 -2.526499 -.2154181 _cons | -1.982021 1.239291 -1.60 0.110 -4.410987 .4469445 ------------------------------------------------------------------------------ . * display statistics and compute tests for dclox . estimates store full . estimates stats /* display model-fit statistics */ Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- full | 108 -74.8599 -48.65278 6 109.3056 125.3983 ----------------------------------------------------------------------------- Note: N=Obs used in calculating BIC; see [R] BIC note. . testparm i.dclox i.dneo#i.dclox /* Wald test for dclox */ ( 1) [casecont]1.dclox = 0 ( 2) [casecont]1.dneo#1.dclox = 0 chi2( 2) = 9.06 Prob > chi2 = 0.0108 . logit casecont dneo dcpct dbarn2 /* without dclox */ Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -53.932561 Iteration 2: log likelihood = -53.628384 Iteration 3: log likelihood = -53.626938 Iteration 4: log likelihood = -53.626938 Logistic regression Number of obs = 108 LR chi2(3) = 42.47 Prob > chi2 = 0.0000 Log likelihood = -53.626938 Pseudo R2 = 0.2836 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | 2.877515 .6503633 4.42 0.000 1.602827 4.152204 dcpct | .0188502 .0073545 2.56 0.010 .0044357 .0332647 dbarn2 | -1.430442 .5645149 -2.53 0.011 -2.536871 -.3240129 _cons | -1.221492 1.067064 -1.14 0.252 -3.312899 .8699153 ------------------------------------------------------------------------------ . estimates store red . lrtest full, stats /* LR test for dclox */ Likelihood-ratio test LR chi2(2) = 9.95 (Assumption: red nested in full) Prob > chi2 = 0.0069 Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- red | 108 -74.8599 -53.62694 4 115.2539 125.9824 full | 108 -74.8599 -48.65278 6 109.3056 125.3983 ----------------------------------------------------------------------------- Note: N=Obs used in calculating BIC; see [R] BIC note. . . * categorical predictor effects: effects of dbarn and dneo#dclox . logit casecont i.dneo##i.dclox dcpct i.dbarn /* final model */ Iteration 0: log likelihood = -74.859896 Iteration 1: log likelihood = -49.387707 Iteration 2: log likelihood = -48.572826 Iteration 3: log likelihood = -48.56345 Iteration 4: log likelihood = -48.563447 Logistic regression Number of obs = 108 LR chi2(6) = 52.59 Prob > chi2 = 0.0000 Log likelihood = -48.563447 Pseudo R2 = 0.3513 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dneo | yes | 3.840981 .9613888 4.00 0.000 1.956693 5.725268 | dclox | yes | .85191 1.086531 0.78 0.433 -1.277651 2.981471 | dneo#dclox | yes#yes | -2.843304 1.269394 -2.24 0.025 -5.33127 -.3553377 | dcpct | .0215351 .0082369 2.61 0.009 .0053911 .0376792 | dbarn | tiestall | -1.488824 .662023 -2.25 0.025 -2.786365 -.1912831 other | -.4818957 1.137152 -0.42 0.672 -2.710672 1.74688 | _cons | -3.272424 1.063408 -3.08 0.002 -5.356666 -1.188183 ------------------------------------------------------------------------------ . testparm i.dbarn /* overall test */ ( 1) [casecont]2.dbarn = 0 ( 2) [casecont]3.dbarn = 0 chi2( 2) = 5.46 Prob > chi2 = 0.0653 . lincom 2.dbarn-1.dbarn /* already in table of estimates */ ( 1) - [casecont]1b.dbarn + [casecont]2.dbarn = 0 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -1.488824 .662023 -2.25 0.025 -2.786365 -.1912831 ------------------------------------------------------------------------------ . lincom 3.dbarn-1.dbarn /* also in table of estimates */ ( 1) - [casecont]1b.dbarn + [casecont]3.dbarn = 0 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -.4818957 1.137152 -0.42 0.672 -2.710672 1.74688 ------------------------------------------------------------------------------ . lincom 3.dbarn-2.dbarn /* not in table of estimates */ ( 1) - [casecont]2.dbarn + [casecont]3.dbarn = 0 ------------------------------------------------------------------------------ casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 1.006929 1.033547 0.97 0.330 -1.018786 3.032643 ------------------------------------------------------------------------------ . pwcompare dbarn i.dneo#i.dclox Pairwise comparisons of marginal linear predictions Margins : asbalanced ------------------------------------------------------------------------ | Unadjusted | Contrast Std. Err. [95% Conf. Interval] -----------------------+------------------------------------------------ casecont | dbarn | tiestall vs freestal | -1.488824 .662023 -2.786365 -.1912831 other vs freestal | -.4818957 1.137152 -2.710672 1.74688 other vs tiestall | 1.006929 1.033547 -1.018786 3.032643 | dneo#dclox | (no#yes) vs (no#no) | .85191 1.086531 -1.277651 2.981471 (yes#no) vs (no#no) | 3.840981 .9613888 1.956693 5.725268 (yes#yes) vs (no#no) | 1.849587 1.077318 -.2619175 3.961091 (yes#no) vs (no#yes) | 2.989071 .8317627 1.358846 4.619295 (yes#yes) vs (no#yes) | .9976768 .9360594 -.8369658 2.83232 (yes#yes) vs (yes#no) | -1.991394 .6948229 -3.353222 -.6295658 ------------------------------------------------------------------------ . pwcompare dbarn i.dneo#i.dclox, pv /* option pv to get P-values */ Pairwise comparisons of marginal linear predictions Margins : asbalanced --------------------------------------------------------------- | Unadjusted | Contrast Std. Err. z P>|z| -----------------------+--------------------------------------- casecont | dbarn | tiestall vs freestal | -1.488824 .662023 -2.25 0.025 other vs freestal | -.4818957 1.137152 -0.42 0.672 other vs tiestall | 1.006929 1.033547 0.97 0.330 | dneo#dclox | (no#yes) vs (no#no) | .85191 1.086531 0.78 0.433 (yes#no) vs (no#no) | 3.840981 .9613888 4.00 0.000 (yes#yes) vs (no#no) | 1.849587 1.077318 1.72 0.086 (yes#no) vs (no#yes) | 2.989071 .8317627 3.59 0.000 (yes#yes) vs (no#yes) | .9976768 .9360594 1.07 0.287 (yes#yes) vs (yes#no) | -1.991394 .6948229 -2.87 0.004 --------------------------------------------------------------- . pwcompare dbarn i.dneo#i.dclox, pv mcompare(bon) /* bonferroni adjustment */ Pairwise comparisons of marginal linear predictions Margins : asbalanced --------------------------- | Number of | Comparisons -------------+------------- casecont | dbarn | 3 dneo#dclox | 6 --------------------------- --------------------------------------------------------------- | Bonferroni | Contrast Std. Err. z P>|z| -----------------------+--------------------------------------- casecont | dbarn | tiestall vs freestal | -1.488824 .662023 -2.25 0.074 other vs freestal | -.4818957 1.137152 -0.42 1.000 other vs tiestall | 1.006929 1.033547 0.97 0.990 | dneo#dclox | (no#yes) vs (no#no) | .85191 1.086531 0.78 1.000 (yes#no) vs (no#no) | 3.840981 .9613888 4.00 0.000 (yes#yes) vs (no#no) | 1.849587 1.077318 1.72 0.516 (yes#no) vs (no#yes) | 2.989071 .8317627 3.59 0.002 (yes#yes) vs (no#yes) | .9976768 .9360594 1.07 1.000 (yes#yes) vs (yes#no) | -1.991394 .6948229 -2.87 0.025 --------------------------------------------------------------- . . * generalised linear models for binary/binomial data . glm casecont dcpct dneo dclox, family(binomial) /* same as logit */ Iteration 0: log likelihood = -54.197242 Iteration 1: log likelihood = -53.993852 Iteration 2: log likelihood = -53.993656 Iteration 3: log likelihood = -53.993656 Generalized linear models No. of obs = 108 Optimization : ML Residual df = 104 Scale parameter = 1 Deviance = 107.9873114 (1/df) Deviance = 1.03834 Pearson = 117.392877 (1/df) Pearson = 1.128778 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = 1.073957 Log likelihood = -53.99365568 BIC = -378.9543 ------------------------------------------------------------------------------ | OIM casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dcpct | .0226682 .0071871 3.15 0.002 .0085817 .0367547 dneo | 2.212564 .5780423 3.83 0.000 1.079622 3.345506 dclox | -1.412516 .5572076 -2.53 0.011 -2.504623 -.3204093 _cons | -2.984272 .7722467 -3.86 0.000 -4.497848 -1.470697 ------------------------------------------------------------------------------ . glm casecont dcpct dneo dclox, family(binomial) link(probit) Iteration 0: log likelihood = -54.370785 Iteration 1: log likelihood = -54.264374 Iteration 2: log likelihood = -54.264207 Iteration 3: log likelihood = -54.264207 Generalized linear models No. of obs = 108 Optimization : ML Residual df = 104 Scale parameter = 1 Deviance = 108.528413 (1/df) Deviance = 1.043542 Pearson = 114.9320864 (1/df) Pearson = 1.105116 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = invnorm(u) [Probit] AIC = 1.078967 Log likelihood = -54.26420652 BIC = -378.4132 ------------------------------------------------------------------------------ | OIM casecont | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dcpct | .0134078 .0041296 3.25 0.001 .0053139 .0215017 dneo | 1.258567 .3168271 3.97 0.000 .6375976 1.879537 dclox | -.812481 .3213006 -2.53 0.011 -1.442219 -.1827433 _cons | -1.703551 .4132833 -4.12 0.000 -2.513571 -.8935307 ------------------------------------------------------------------------------