------------------------------------------------------------------------------------------------------------------------------------------- name: log: C:\vhm812-data\l2b_linear_reg_diag.txt log type: text opened on: 16 Jan 2023, 08:06:54 . set more off . . * open the DAISY dataset . use daisy2red.dta, clear . **# Create variables . gen parity1=parity-1 . gen calv_mth=month(calv_dt) . gen aut_calv=(calv_mth>=2 & calv_mth<=7) if !missing(calv_mth) . label value aut_calv noyes . gen hs100_ctr=(herd_size-251)/100 . gen hs100_ctr_sq=hs100_ctr^2 . * save new dataset to be used later, not needed for this exercise . save daisy2red_01, replace file daisy2red_01.dta saved . . **# Final model . reg wpc hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 13.22 Model | 296062.694 9 32895.8549 Prob > F = 0.0000 Residual | 3892027.86 1,564 2488.50886 R-squared = 0.0707 -------------+---------------------------------- Adj R-squared = 0.0653 Total | 4188090.56 1,573 2662.48605 Root MSE = 49.885 ------------------------------------------------------------------------------ wpc | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- hs100_ctr | 19.85708 2.163397 9.18 0.000 15.61361 24.10054 hs100_ctr_sq | 11.13827 3.111145 3.58 0.000 5.035817 17.24073 parity1 | 1.13721 .8583103 1.32 0.185 -.5463501 2.82077 | aut_calv | yes | -8.263839 2.537751 -3.26 0.001 -13.24159 -3.286086 | twin | yes | 20.68314 9.845165 2.10 0.036 1.37203 39.99425 | dyst | yes | 11.70041 5.462576 2.14 0.032 .985666 22.41516 | rp | yes | 5.98687 4.811976 1.24 0.214 -3.451734 15.42547 | vag_disch | yes | 1.228196 7.161395 0.17 0.864 -12.81875 15.27514 | rp#vag_disch | yes#yes | 22.85194 12.51605 1.83 0.068 -1.698056 47.40194 | _cons | 64.33029 2.634114 24.42 0.000 59.16352 69.49705 ------------------------------------------------------------------------------ . *initial diagnostic . capture drop fit stdres . predict fit, xb . predict stdres, rstandard . . **# Assumptions . *homoskedasticity . scatter stdres fit . egen fitcat=cut(fit), at(0 50 55 60 65 70 75 80 85 90 95 200) icodes . tabstat stdres, s(mean sd count) by(fitcat) Summary for variables: stdres Group variable: fitcat fitcat | Mean SD N ---------+------------------------------ 0 | -.0570064 .7397317 92 1 | -.0073181 .899948 179 2 | .0578738 .9474714 210 3 | -.0595596 .8973734 233 4 | .0241511 1.009195 240 5 | .1411393 1.186747 129 6 | -.0596733 1.042643 135 7 | -.0978648 1.090551 119 8 | .1050465 1.206279 93 9 | -.0443726 1.063637 107 10 | -.0275786 .9893797 37 ---------+------------------------------ Total | .0000153 1.000497 1574 ---------------------------------------- . estat hettest Breusch–Pagan/Cook–Weisberg test for heteroskedasticity Assumption: Normal error terms Variable: Fitted values of wpc H0: Constant variance chi2(1) = 20.58 Prob > chi2 = 0.0000 . estat imtest Cameron & Trivedi's decomposition of IM-test -------------------------------------------------- Source | chi2 df p ---------------------+---------------------------- Heteroskedasticity | 74.11 44 0.0030 Skewness | 143.84 9 0.0000 Kurtosis | 33.27 1 0.0000 ---------------------+---------------------------- Total | 251.22 54 0.0000 -------------------------------------------------- . . *normality . qnorm stdres . hist stdres, normal (bin=31, start=-1.5436347, width=.18891791) . summ stdres, d Standardized residuals ------------------------------------------------------------- Percentiles Smallest 1% -1.295559 -1.543635 5% -1.085226 -1.491268 10% -.9411177 -1.445124 Obs 1,574 25% -.7014489 -1.43959 Sum of wgt. 1,574 50% -.3049904 Mean .0000153 Largest Std. dev. 1.000497 75% .4302664 3.716394 90% 1.438834 3.910809 Variance 1.000993 95% 2.065976 4.020359 Skewness 1.371514 99% 3.24163 4.31282 Kurtosis 4.72451 . swilk stdres Shapiro–Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ stdres | 1,574 0.87871 115.660 11.977 0.00000 . . **# Transforming wpc . *boxcox uses old stata syntax! . xi:boxcox wpc hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp*i.vag_disch i.aut_calv _Iaut_calv_0-1 (naturally coded; _Iaut_calv_0 omitted) i.twin _Itwin_0-1 (naturally coded; _Itwin_0 omitted) i.dyst _Idyst_0-1 (naturally coded; _Idyst_0 omitted) i.rp _Irp_0-1 (naturally coded; _Irp_0 omitted) i.vag_disch _Ivag_disch_0-1 (naturally coded; _Ivag_disch_0 omitted) i.rp*i.vag_di~h _IrpXvag_#_# (coded as above) Fitting comparison model Iteration 0: log likelihood = -8439.9903 Iteration 1: log likelihood = -8082.9461 Iteration 2: log likelihood = -8035.2171 Iteration 3: log likelihood = -8034.9512 Iteration 4: log likelihood = -8034.9511 Fitting full model Iteration 0: log likelihood = -8382.2918 Iteration 1: log likelihood = -8022.9541 Iteration 2: log likelihood = -7958.3552 Iteration 3: log likelihood = -7957.985 Iteration 4: log likelihood = -7957.9849 Number of obs = 1,574 LR chi2(9) = 153.93 Log likelihood = -7957.9849 Prob > chi2 = 0.000 ------------------------------------------------------------------------------ wpc | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- /theta | .1097595 .0270646 4.06 0.000 .0567138 .1628052 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coefficient -------------+-------------- Notrans | hs100_ctr | .5236879 hs100_ctr_sq | .316867 parity1 | .0207726 _Iaut_calv_1 | -.2119782 _Itwin_1 | .5994942 _Idyst_1 | .1803076 _Irp_1 | .1697644 _Ivag_disc~1 | -.0333896 _IrpXvag_1_1 | .6352271 _cons | 4.90172 -------------+-------------- /sigma | 1.119778 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- theta = -1 -9664.734 3413.50 0.000 theta = 0 -7966.6087 17.25 0.000 theta = 1 -8382.2918 848.61 0.000 --------------------------------------------------------- . gen wpc_ln=ln(wpc) . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 i.aut_calv i.twin i.dyst i.rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 18.05 Model | 86.9474063 9 9.66082293 Prob > F = 0.0000 Residual | 836.874538 1,564 .535086022 R-squared = 0.0941 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .7315 ------------------------------------------------------------------------------ wpc_ln | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- hs100_ctr | .343658 .0317233 10.83 0.000 .2814333 .4058827 hs100_ctr_sq | .2118728 .0456207 4.64 0.000 .1223885 .301357 parity1 | .0129844 .012586 1.03 0.302 -.0117028 .0376715 | aut_calv | yes | -.1371621 .0372127 -3.69 0.000 -.2101542 -.0641701 | twin | yes | .3927426 .1443661 2.72 0.007 .1095711 .6759141 | dyst | yes | .1109405 .0801013 1.39 0.166 -.0461768 .2680578 | rp | yes | .1123166 .0705612 1.59 0.112 -.0260878 .250721 | vag_disch | yes | -.0260135 .1050122 -0.25 0.804 -.231993 .1799661 | rp#vag_disch | yes#yes | .4136999 .183531 2.25 0.024 .0537072 .7736926 | _cons | 3.888587 .0386257 100.67 0.000 3.812823 3.96435 ------------------------------------------------------------------------------ . **model checking . capture drop fit stdres . predict fit, xb . predict stdres, rstandard . *homoskedasticity . estat hettest Breusch–Pagan/Cook–Weisberg test for heteroskedasticity Assumption: Normal error terms Variable: Fitted values of wpc_ln H0: Constant variance chi2(1) = 19.98 Prob > chi2 = 0.0000 . estat imtest Cameron & Trivedi's decomposition of IM-test -------------------------------------------------- Source | chi2 df p ---------------------+---------------------------- Heteroskedasticity | 48.63 44 0.2919 Skewness | 8.75 9 0.4605 Kurtosis | 0.52 1 0.4698 ---------------------+---------------------------- Total | 57.91 54 0.3332 -------------------------------------------------- . scatter stdres fit . *normality . qnorm stdres . hist stdres, normal (bin=31, start=-5.3996067, width=.25507099) . summ stdres, d Standardized residuals ------------------------------------------------------------- Percentiles Smallest 1% -2.211829 -5.399607 5% -1.564966 -4.989504 10% -1.242075 -3.61681 Obs 1,574 25% -.7143057 -3.316 Sum of wgt. 1,574 50% -.0133218 Mean .000018 Largest Std. dev. 1.000267 75% .7199247 2.358784 90% 1.333008 2.373729 Variance 1.000533 95% 1.628151 2.452677 Skewness -.1738834 99% 2.15899 2.507594 Kurtosis 3.382854 . swilk stdres Shapiro–Wilk W test for normal data Variable | Obs W V z Prob>z -------------+------------------------------------------------------ stdres | 1,574 0.98974 9.787 5.751 0.00000 . * check for collinearity issues . estat vif Variable | VIF 1/VIF -------------+---------------------- hs100_ctr | 1.14 0.878856 hs100_ctr_sq | 1.14 0.879714 parity1 | 1.04 0.962306 1.aut_calv | 1.02 0.985045 1.twin | 1.03 0.967484 1.dyst | 1.06 0.943538 1.rp | 1.26 0.796702 1.vag_disch | 1.60 0.624261 rp#vag_disch | 1 1 | 1.85 0.539810 -------------+---------------------- Mean VIF | 1.24 . estat vce, corr Correlation matrix of coefficients of regress model | 1. 1. 1. 1. 1. 1.rp# e(V) | hs100_~r hs100_~q parity1 aut_calv twin dyst rp vag_di~h 1.vag_~h _cons -------------+---------------------------------------------------------------------------------------------------- hs100_ctr | 1.0000 hs100_ctr_sq | 0.3269 1.0000 parity1 | -0.0415 -0.0468 1.0000 1.aut_calv | 0.0521 -0.0265 -0.0500 1.0000 1.twin | 0.0326 0.0464 -0.0666 -0.0487 1.0000 1.dyst | 0.1146 0.0717 0.1303 0.0023 -0.0214 1.0000 1.rp | 0.0230 0.0540 0.0318 0.0438 -0.0807 -0.0656 1.0000 1.vag_disch | 0.0304 0.0698 0.0649 0.0387 -0.0383 -0.1188 0.0742 1.0000 1.rp#| 1.vag_disch | -0.0184 -0.0571 -0.0644 -0.0024 -0.0429 0.0861 -0.3873 -0.5752 1.0000 _cons | -0.1716 -0.4415 -0.5404 -0.4246 0.0004 -0.2091 -0.1974 -0.1711 0.1133 1.0000 . . **linearity . * check modelling of herd_size and parity . lowess stdres herd_size . lowess stdres parity . * looking at herd_size without herd_size in the model . reg wpc_ln parity1 i.aut_calv i.twin i.dyst i.rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 5.82 Model | 23.4240047 7 3.34628639 Prob > F = 0.0000 Residual | 900.39794 1,566 .574966756 R-squared = 0.0254 -------------+---------------------------------- Adj R-squared = 0.0210 Total | 923.821945 1,573 .587299393 Root MSE = .75827 ------------------------------------------------------------------------------ wpc_ln | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity1 | .0191623 .0130272 1.47 0.142 -.0063904 .0447149 | aut_calv | yes | -.156154 .0384813 -4.06 0.000 -.2316343 -.0806737 | twin | yes | .3353984 .1494624 2.24 0.025 .0422309 .6285659 | dyst | yes | .0081531 .0824313 0.10 0.921 -.1535343 .1698404 | rp | yes | .0906877 .0730356 1.24 0.215 -.0525701 .2339455 | vag_disch | yes | -.0684004 .1085865 -0.63 0.529 -.2813906 .1445898 | rp#vag_disch | yes#yes | .4618901 .1899367 2.43 0.015 .089333 .8344472 | _cons | 3.978786 .0359075 110.81 0.000 3.908354 4.049218 ------------------------------------------------------------------------------ . predict stdres1, rstandard . lowess stdres1 herd_size . * looking at parity without parity in the model . reg wpc_ln hs100_ctr hs100_ctr_sq i.aut_calv i.twin i.dyst i.rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 20.18 Model | 86.3779092 8 10.7972387 Prob > F = 0.0000 Residual | 837.444036 1,565 .53510801 R-squared = 0.0935 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .73151 ------------------------------------------------------------------------------ wpc_ln | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- hs100_ctr | .3450169 .0316966 10.88 0.000 .2828447 .4071892 hs100_ctr_sq | .2140753 .0455717 4.70 0.000 .1246873 .3034633 | aut_calv | yes | -.1352417 .0371669 -3.64 0.000 -.2081438 -.0623395 | twin | yes | .4026686 .1440481 2.80 0.005 .1201211 .6852162 | dyst | yes | .1001739 .0794202 1.26 0.207 -.0556073 .2559551 | rp | yes | .1099998 .0705269 1.56 0.119 -.0283373 .2483368 | vag_disch | yes | -.0330431 .1047931 -0.32 0.753 -.2385927 .1725065 | rp#vag_disch | yes#yes | .4258954 .1831536 2.33 0.020 .066643 .7851478 | _cons | 3.910122 .0324999 120.31 0.000 3.846374 3.97387 ------------------------------------------------------------------------------ . predict stdres2, rstandard . lowess stdres2 parity1 . . **# Residuals and diagnostics . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 aut_calv twin i.dyst i.rp##i.vag_disch if wpc>1 Source | SS df MS Number of obs = 1,572 -------------+---------------------------------- F(9, 1562) = 18.15 Model | 84.5110938 9 9.39012153 Prob > F = 0.0000 Residual | 807.934865 1,562 .517243831 R-squared = 0.0947 -------------+---------------------------------- Adj R-squared = 0.0895 Total | 892.445959 1,571 .568075085 Root MSE = .7192 ------------------------------------------------------------------------------ wpc_ln | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- hs100_ctr | .3391632 .031199 10.87 0.000 .2779669 .4003596 hs100_ctr_sq | .2026965 .0448706 4.52 0.000 .1146836 .2907094 parity1 | .0113333 .0123763 0.92 0.360 -.0129427 .0356093 aut_calv | -.1369515 .0366092 -3.74 0.000 -.2087598 -.0651432 twin | .3894441 .1419397 2.74 0.006 .1110316 .6678566 | dyst | yes | .1033889 .0787611 1.31 0.189 -.0510998 .2578777 | rp | yes | .1060341 .0693799 1.53 0.127 -.0300535 .2421216 | vag_disch | yes | -.0332815 .1032512 -0.32 0.747 -.2358071 .1692441 | rp#vag_disch | yes#yes | .4223009 .1804488 2.34 0.019 .0683535 .7762484 | _cons | 3.901024 .0380169 102.61 0.000 3.826455 3.975594 ------------------------------------------------------------------------------ . capture { . . predict fit, xb . predict stdres, rstandard . predict delres, rstudent . predict lev, leverage . predict cook, cooksd . predict dfit, dfit (2 missing values generated) . dfbeta Generating DFBETA variables ... (2 missing values generated) _dfbeta_1: DFBETA hs100_ctr (2 missing values generated) _dfbeta_2: DFBETA hs100_ctr_sq (2 missing values generated) _dfbeta_3: DFBETA parity1 (2 missing values generated) _dfbeta_4: DFBETA aut_calv (2 missing values generated) _dfbeta_5: DFBETA twin (2 missing values generated) _dfbeta_6: DFBETA 1.dyst (2 missing values generated) _dfbeta_7: DFBETA 1.rp (2 missing values generated) _dfbeta_8: DFBETA 1.vag_disch (2 missing values generated) _dfbeta_9: DFBETA 1.rp#1.vag_disch . scalar nobs=1574 . scalar nparam=10 . . ** formatting several variables at once to show only 3 decimals - advanced command - ignore for now . foreach var in wpc_ln fit stdres delres lev cook dfit _dfbeta* { 2. format `var' %5.3f 3. } . . * examine outliers . * number of standardized residuals outside -2/+2 and -3/+3 . * expect 5% (n=79) outside -2,+2 and 0.3% (n=14) outside -3,+3 . count if abs(stdres)>2 /* n=50 */ 51 . count if abs(stdres)>3 /* n=6, so one more very extreme residual values than expected */ 7 . sum delres, d Studentized residuals ------------------------------------------------------------- Percentiles Smallest 1% -2.260285 -5.55848 5% -1.590983 -5.131193 10% -1.271972 -3.705058 Obs 1,574 25% -.7295877 -3.394443 Sum of wgt. 1,574 50% -.0194657 Mean -.0067797 Largest Std. dev. 1.018222 75% .7256545 2.402748 90% 1.349921 2.404035 Variance 1.036777 95% 1.645225 2.495872 Skewness -.1811443 99% 2.184024 2.559009 Kurtosis 3.424196 . * outlier test based on deletion residuals . display 2*nobs*ttail(nobs-nparam-1, 5.02) /* P=0.0009 so S */ .0009064 . display invttail(nobs-nparam-1,.025/nobs) /* values >= abs(4.17) outliers*/ 4.1726361 . br if abs(delres) >=4 . **there are two observations with extreme residual values (outliers) . ** two cows with 1 days interval from the end of wp to conception . ** cows 2272 (herd 106) and cow 1032 (herd 4) . . *most extreme residuals . sort stdres . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in 1/10 /* most extreme negative residuals */ +------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres | |------------------------------------------------------------------------------------------------------| 1. | 1 0.000 3.956 no 263 1 no no no no -5.506 -5.558 | 2. | 1 0.000 3.656 yes 201 1 no no no no -5.090 -5.131 | 3. | 3 1.099 3.749 yes 235 3 no no no no -3.690 -3.705 | 4. | 4 1.386 3.816 no 201 3 no no no no -3.383 -3.394 | 5. | 4 1.386 3.645 yes 201 0 no no no no -3.146 -3.155 | |------------------------------------------------------------------------------------------------------| 6. | 5 1.609 3.827 no 201 4 no no no no -3.090 -3.098 | 7. | 5 1.609 3.782 no 201 0 no no no no -3.025 -3.033 | 8. | 7 1.946 3.945 no 263 0 no no no no -2.783 -2.789 | 9. | 7 1.946 3.777 no 185 1 no no no no -2.549 -2.553 | 10. | 9 2.197 3.993 yes 294 4 no no no no -2.501 -2.505 | +------------------------------------------------------------------------------------------------------+ . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in -10/-1 /* most extreme positive residuals */ +------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres | |------------------------------------------------------------------------------------------------------| 1565. | 214 5.366 3.743 yes 185 1 no yes no no 2.270 2.273 | 1566. | 218 5.384 3.732 yes 185 0 no yes no no 2.312 2.316 | 1567. | 232 5.447 3.794 yes 201 4 no yes no no 2.315 2.319 | 1568. | 207 5.333 3.656 yes 201 1 no no no no 2.333 2.337 | 1569. | 230 5.438 3.760 yes 235 4 no no no no 2.337 2.340 | |------------------------------------------------------------------------------------------------------| 1570. | 213 5.361 3.679 yes 201 3 no no no no 2.342 2.345 | 1571. | 217 5.380 3.656 yes 201 1 no no no no 2.399 2.403 | 1572. | 240 5.481 3.766 yes 185 3 no yes no no 2.400 2.404 | 1573. | 234 5.455 3.670 yes 125 1 no no no no 2.492 2.496 | 1574. | 253 5.533 3.704 yes 125 4 no no no no 2.554 2.559 | +------------------------------------------------------------------------------------------------------+ . . sort delres . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in 1/5 /* most extreme negative residuals */ +------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres | |------------------------------------------------------------------------------------------------------| 1. | 1 0.000 3.956 no 263 1 no no no no -5.506 -5.558 | 2. | 1 0.000 3.656 yes 201 1 no no no no -5.090 -5.131 | 3. | 3 1.099 3.749 yes 235 3 no no no no -3.690 -3.705 | 4. | 4 1.386 3.816 no 201 3 no no no no -3.383 -3.394 | 5. | 4 1.386 3.645 yes 201 0 no no no no -3.146 -3.155 | +------------------------------------------------------------------------------------------------------+ . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres in -5/-1 /* most extreme positive residuals */ +------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres | |------------------------------------------------------------------------------------------------------| 1570. | 213 5.361 3.679 yes 201 3 no no no no 2.342 2.345 | 1571. | 217 5.380 3.656 yes 201 1 no no no no 2.399 2.403 | 1572. | 240 5.481 3.766 yes 185 3 no yes no no 2.400 2.404 | 1573. | 234 5.455 3.670 yes 125 1 no no no no 2.492 2.496 | 1574. | 253 5.533 3.704 yes 125 4 no no no no 2.554 2.559 | +------------------------------------------------------------------------------------------------------+ . . . * leverage and influence diagnostics . * leverage . summ lev ,d Leverage ------------------------------------------------------------- Percentiles Smallest 1% .0016659 .0016659 5% .0018887 .0016659 10% .0020923 .0016659 Obs 1,574 25% .0023884 .0016659 Sum of wgt. 1,574 50% .0031867 Mean .006356 Largest Std. dev. .008343 75% .0070516 .0633357 90% .0124799 .0636474 Variance .0000696 95% .0214797 .0637571 Skewness 3.503848 99% .0444559 .0642244 Kurtosis 17.1666 . di "leverage cutoff: " 2*nparam/nobs leverage cutoff: .01270648 . di "conservative leverage cutoff: " 3*nparam/nobs conservative leverage cutoff: .01905972 . * large leverage values . count if lev>=.01905972 /* many (n=108) high leverage values */ 108 . sort lev . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev in -10/-1 . table twin rp dyst ----------------------------------------------------------- | Dystocia at calving | no yes Total ---------------------------------+------------------------- Twins born | no | Retained placenta at calving | no | 1,332 76 1,408 yes | 124 15 139 Total | 1,456 91 1,547 yes | Retained placenta at calving | no | 15 2 17 yes | 9 1 10 Total | 24 3 27 Total | Retained placenta at calving | no | 1,347 78 1,425 yes | 133 16 149 Total | 1,480 94 1,574 ----------------------------------------------------------- . . * large Cook's D values . summ cook,d Cook's D ------------------------------------------------------------- Percentiles Smallest 1% 5.67e-08 1.24e-09 5% 1.57e-06 1.83e-09 10% 7.11e-06 2.47e-09 Obs 1,574 25% .0000466 3.26e-09 Sum of wgt. 1,574 50% .000206 Mean .0006567 Largest Std. dev. .0014143 75% .000623 .0123919 90% .0015591 .0139248 Variance 2.00e-06 95% .0030278 .0145391 Skewness 5.333086 99% .0070001 .0181919 Kurtosis 42.48593 . di "Cook's D cutoff: " 4/nobs Cook's D cutoff: .0025413 . count if cook>=.0025413 /* many (n=95) high Cook's D values */ 95 . scatter fit cook, xline(0.002 0.008) . sort cook . br wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook if cook>0.008 /* most extreme Cook's D values * > / . scatter cook lev, xline(0.019) yline(0.0025) . . * large DFIT values . summ dfit, d DFITS ------------------------------------------------------------- Percentiles Smallest 1% -.2241622 -.4269926 5% -.1253405 -.3818234 10% -.0842282 -.3523999 Obs 1,572 25% -.0448795 -.3349559 Sum of wgt. 1,572 50% -.0009289 Mean .0003723 Largest Std. dev. .0806768 75% .0461383 .2845981 90% .0905727 .3021179 Variance .0065087 95% .1228508 .313741 Skewness -.0850534 99% .2306233 .3736203 Kurtosis 5.659326 . di "DFITS cutoff: " 2*sqrt(nparam/nobs)*(nobs>=120)+1*(nobs<120) DFITS cutoff: .15941443 . count if abs(dfit)>=.15941443 /* many (n=92) high DFITS values */ 95 . scatter fit dfit, xline(-0.25 -.15 0.15 0.25) . sort dfit . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook dfit if dfit<-0.25 /* most extreme negative DFIT > S values */ +-----------------------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev cook dfit | |-----------------------------------------------------------------------------------------------------------------------| 1. | 14 2.639 4.127 yes 235 2 yes no no no -2.111 0.039 0.018 -0.427 | 2. | 11 2.398 4.026 no 263 1 no yes no yes -2.295 0.027 0.015 -0.382 | 3. | 11 2.398 3.878 yes 263 0 no yes no yes -2.087 0.028 0.012 -0.352 | 4. | 38 3.638 4.750 no 333 4 yes no no no -1.581 0.043 0.011 -0.335 | 5. | 25 3.219 4.451 no 263 1 no no yes yes -1.744 0.035 0.011 -0.332 | |-----------------------------------------------------------------------------------------------------------------------| 6. | 23 3.135 4.121 yes 185 0 yes yes no no -1.406 0.050 0.010 -0.322 | 7. | 31 3.434 4.485 no 263 4 no no yes yes -1.489 0.036 0.008 -0.288 | 8. | 45 3.807 4.575 yes 201 4 yes no yes yes -1.104 0.064 0.008 -0.288 | 9. | 14 2.639 3.836 no 185 0 no yes no yes -1.687 0.027 0.008 -0.281 | 10. | 32 3.466 4.255 yes 201 1 yes yes yes no -1.128 0.052 0.007 -0.265 | +-----------------------------------------------------------------------------------------------------------------------+ . list wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres lev cook dfit if dfit >0.25 /* most extreme positive DFIT > S values */ +----------------------------------------------------------------------------------------------------------------------+ | wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres lev cook dfit | |----------------------------------------------------------------------------------------------------------------------| 1561. | 214 5.366 3.743 yes 185 1 no yes no no 2.270 0.012 0.006 0.253 | 1562. | 172 5.147 4.298 no 235 5 yes no no no 1.207 0.042 0.006 0.253 | 1563. | 218 5.384 3.732 yes 185 0 no yes no no 2.312 0.012 0.007 0.260 | 1564. | 205 5.323 4.051 no 294 0 no no no yes 1.787 0.021 0.007 0.262 | 1565. | 170 5.136 4.015 no 263 0 no yes no yes 1.580 0.027 0.007 0.262 | |----------------------------------------------------------------------------------------------------------------------| 1566. | 253 5.533 3.920 no 201 3 no yes no no 2.259 0.013 0.007 0.264 | 1567. | 149 5.004 3.912 yes 263 3 no yes no yes 1.542 0.030 0.007 0.269 | 1568. | 240 5.481 3.766 yes 185 3 no yes no no 2.400 0.013 0.008 0.280 | 1569. | 232 5.447 3.794 yes 201 4 no yes no no 2.315 0.015 0.008 0.285 | 1570. | 213 5.361 4.261 no 185 0 no no yes yes 1.559 0.036 0.009 0.302 | |----------------------------------------------------------------------------------------------------------------------| 1571. | 229 5.434 3.925 yes 294 1 no no no yes 2.120 0.021 0.010 0.314 | 1572. | 205 5.323 3.762 no 125 0 no no no yes 2.201 0.028 0.014 0.374 | 1573. | 1 0.000 3.956 no 263 1 no no no no -5.506 0.002 0.006 . | 1574. | 1 0.000 3.656 yes 201 1 no no no no -5.090 0.002 0.006 . | +----------------------------------------------------------------------------------------------------------------------+ . . * dfbeta diagnostics (for a subset of predictors) . di "DFBETA cutoff: " 2/sqrt(nobs)*(nobs>=120)+1*(nobs<120) DFBETA cutoff: .05041127 . sum _dfbeta* Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- _dfbeta_1 | 1,572 -8.17e-06 .023662 -.0863517 .0945048 _dfbeta_2 | 1,572 -5.84e-07 .0241496 -.1063028 .1516662 _dfbeta_3 | 1,572 8.85e-07 .0259938 -.125512 .1218819 _dfbeta_4 | 1,572 7.82e-07 .0254094 -.0945568 .090225 _dfbeta_5 | 1,572 -7.68e-06 .0231351 -.4149079 .2418807 -------------+--------------------------------------------------------- _dfbeta_6 | 1,572 -.0000167 .030361 -.2115694 .251616 _dfbeta_7 | 1,572 1.99e-07 .0258996 -.1679659 .1926909 _dfbeta_8 | 1,572 2.23e-06 .0298635 -.2862147 .3124341 _dfbeta_9 | 1,572 -7.48e-06 .0240054 -.2410799 .2159548 . graph box _dfbeta*, yline(-0.05 0.05) . . *you can repeat the next comands for other dbetas . sort _dfbeta_5 . br wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres cook dfit _dfbeta_5 in 1/10 /* most extreme negative dfb_dyst > values */ . list wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres cook dfit _dfbeta_5 in -10/-1 /* most extreme positive dfb_dys > t values */ +-------------------------------------------------------------------------------------------------------------------+ | wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres cook dfit _dfbet~5 | |-------------------------------------------------------------------------------------------------------------------| 1565. | 4.913 4.496 no 294 2 yes no no no 0.591 0.001 0.121 0.118 | 1566. | 4.644 4.175 yes 201 3 yes no yes no 0.668 0.002 0.144 0.126 | 1567. | 4.700 4.164 yes 263 0 yes no no yes 0.769 0.004 0.190 0.150 | 1568. | 5.081 4.466 yes 294 2 yes no yes no 0.876 0.004 0.189 0.167 | 1569. | 4.949 4.253 no 235 1 yes no no no 0.988 0.004 0.203 0.198 | |-------------------------------------------------------------------------------------------------------------------| 1570. | 4.883 4.080 yes 201 4 yes no no no 1.139 0.005 0.232 0.221 | 1571. | 5.147 4.298 no 235 5 yes no no no 1.207 0.006 0.253 0.236 | 1572. | 5.050 4.194 no 201 2 yes no no no 1.214 0.006 0.249 0.242 | 1573. | 0.000 3.956 no 263 1 no no no no -5.506 0.006 . . | 1574. | 0.000 3.656 yes 201 1 no no no no -5.090 0.006 . . | +-------------------------------------------------------------------------------------------------------------------+ . . *analysis without outliers . list wpc wpc_ln fit herd_size parity dyst rp vag_disch if wpc==1 +-----------------------------------------------------------------+ | wpc wpc_ln fit herd_s~e parity dyst rp vag_di~h | |-----------------------------------------------------------------| 1573. | 1 0.000 3.956 263 2 no no no | 1574. | 1 0.000 3.656 201 2 no no no | +-----------------------------------------------------------------+ . list wpc delres lev dfit cook _dfbeta* if wpc==1 /*main issue large residuals and dfit*/ +-----------------------------------------------------------------------------------------------------------------------------+ 1573. | wpc | delres | lev | dfit | cook | _dfbet~1 | _dfbet~2 | _dfbet~3 | _dfbet~4 | _dfbet~5 | _dfbet~6 | _dfbet~7 | _dfbet~8 | | 1 | -5.558 | 0.002 | . | 0.006 | . | . | . | . | . | . | . | . | |-----------------------------------------------------------------------------------------------------------------------------| | _dfbet~9 | | . | +-----------------------------------------------------------------------------------------------------------------------------+ +-----------------------------------------------------------------------------------------------------------------------------+ 1574. | wpc | delres | lev | dfit | cook | _dfbet~1 | _dfbet~2 | _dfbet~3 | _dfbet~4 | _dfbet~5 | _dfbet~6 | _dfbet~7 | _dfbet~8 | | 1 | -5.131 | 0.002 | . | 0.006 | . | . | . | . | . | . | . | . | |-----------------------------------------------------------------------------------------------------------------------------| | _dfbet~9 | | . | +-----------------------------------------------------------------------------------------------------------------------------+ . . **# Original final model . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 aut_calv twin dyst rp##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 18.05 Model | 86.9474063 9 9.66082293 Prob > F = 0.0000 Residual | 836.874538 1,564 .535086022 R-squared = 0.0941 -------------+---------------------------------- Adj R-squared = 0.0889 Total | 923.821945 1,573 .587299393 Root MSE = .7315 ------------------------------------------------------------------------------ wpc_ln | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- hs100_ctr | .343658 .0317233 10.83 0.000 .2814333 .4058827 hs100_ctr_sq | .2118728 .0456207 4.64 0.000 .1223885 .301357 parity1 | .0129844 .012586 1.03 0.302 -.0117028 .0376715 aut_calv | -.1371621 .0372127 -3.69 0.000 -.2101542 -.0641701 twin | .3927426 .1443661 2.72 0.007 .1095711 .6759141 dyst | .1109405 .0801013 1.39 0.166 -.0461768 .2680578 | rp | yes | .1123166 .0705612 1.59 0.112 -.0260878 .250721 | vag_disch | yes | -.0260135 .1050122 -0.25 0.804 -.231993 .1799661 | rp#vag_disch | yes#yes | .4136999 .183531 2.25 0.024 .0537072 .7736926 | _cons | 3.888587 .0386257 100.67 0.000 3.812823 3.96435 ------------------------------------------------------------------------------ . estimate store ln . . *re-fit without outliers/influential obs . reg wpc_ln hs100_ctr hs100_ctr_sq parity1 aut_calv twin dyst rp##vag_disch if wpc~=1 Source | SS df MS Number of obs = 1,572 -------------+---------------------------------- F(9, 1562) = 18.15 Model | 84.5110938 9 9.39012153 Prob > F = 0.0000 Residual | 807.934865 1,562 .517243831 R-squared = 0.0947 -------------+---------------------------------- Adj R-squared = 0.0895 Total | 892.445959 1,571 .568075085 Root MSE = .7192 ------------------------------------------------------------------------------ wpc_ln | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- hs100_ctr | .3391632 .031199 10.87 0.000 .2779669 .4003596 hs100_ctr_sq | .2026965 .0448706 4.52 0.000 .1146836 .2907094 parity1 | .0113333 .0123763 0.92 0.360 -.0129427 .0356093 aut_calv | -.1369515 .0366092 -3.74 0.000 -.2087598 -.0651432 twin | .3894441 .1419397 2.74 0.006 .1110316 .6678566 dyst | .1033889 .0787611 1.31 0.189 -.0510998 .2578777 | rp | yes | .1060341 .0693799 1.53 0.127 -.0300535 .2421216 | vag_disch | yes | -.0332815 .1032512 -0.32 0.747 -.2358071 .1692441 | rp#vag_disch | yes#yes | .4223009 .1804488 2.34 0.019 .0683535 .7762484 | _cons | 3.901024 .0380169 102.61 0.000 3.826455 3.975594 ------------------------------------------------------------------------------ . estimate store ln_noout . estat hettest Breusch–Pagan/Cook–Weisberg test for heteroskedasticity Assumption: Normal error terms Variable: Fitted values of wpc_ln H0: Constant variance chi2(1) = 16.22 Prob > chi2 = 0.0001 . estat imtest Cameron & Trivedi's decomposition of IM-test -------------------------------------------------- Source | chi2 df p ---------------------+---------------------------- Heteroskedasticity | 71.76 44 0.0051 Skewness | 14.82 9 0.0960 Kurtosis | 9.69 1 0.0019 ---------------------+---------------------------- Total | 96.27 54 0.0004 -------------------------------------------------- . . capture { . . . predict fit if e(sample), xb (2 missing values generated) . predict stdres if e(sample), rstandard (2 missing values generated) . predict delres if e(sample), rstudent (2 missing values generated) . predict dfit if e(sample), dfit (2 missing values generated) . predict cook if e(sample), cook (2 missing values generated) . scalar nobs=1572 . scalar nparam=10 . . ** formatting several variables at once to show only 3 decimals - advanced command - ignore for now . foreach var in fit stdres delres cook dfit { 2. format `var' %5.3f 3. } . hist stdres (bin=31, start=-3.6900551, width=.20143648) . egen fit_group=cut(fit), group(10) (2 missing values generated) . tabstat stdres, statistics( mean sd count ) by(fit_group) Summary for variables: stdres Group variable: fit_group fit_group | Mean SD N ----------+------------------------------ 0 | -.0798362 .9912713 151 1 | -.0052168 1.078422 158 2 | .0429111 1.119622 160 3 | -.0532808 1.040071 125 4 | .0708433 1.016679 191 5 | .078308 .9893025 153 6 | -.0311166 1.100208 133 7 | -.0623551 .9994817 185 8 | .0161114 .8454157 138 9 | .0035924 .8149584 178 ----------+------------------------------ Total | .0000171 1.000335 1572 ----------------------------------------- . sort delres . list cow wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres cook dfit in 1/10 +------------------------------------------------------------------------------------------------------------------------------+ | cow wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres cook dfit | |------------------------------------------------------------------------------------------------------------------------------| 1. | 403 3 1.099 3.749 yes 235 3 no no no no -3.690 -3.705 0.004 -0.190 | 2. | 983 4 1.386 3.816 no 201 3 no no no no -3.383 -3.394 0.003 -0.176 | 3. | 1062 4 1.386 3.645 yes 201 0 no no no no -3.146 -3.155 0.003 -0.176 | 4. | 1130 5 1.609 3.827 no 201 4 no no no no -3.090 -3.098 0.004 -0.191 | 5. | 1049 5 1.609 3.782 no 201 0 no no no no -3.025 -3.033 0.003 -0.165 | |------------------------------------------------------------------------------------------------------------------------------| 6. | 2281 7 1.946 3.945 no 263 0 no no no no -2.783 -2.789 0.002 -0.145 | 7. | 5006 7 1.946 3.777 no 185 1 no no no no -2.549 -2.553 0.002 -0.125 | 8. | 4 9 2.197 3.993 yes 294 4 no no no no -2.501 -2.505 0.002 -0.142 | 9. | 249 10 2.303 4.096 no 294 1 no no no no -2.495 -2.500 0.001 -0.106 | 10. | 157 9 2.197 3.970 yes 294 2 no no no no -2.467 -2.471 0.001 -0.107 | +------------------------------------------------------------------------------------------------------------------------------+ . * outlier test based on deletion residuals . summ delres, d Studentized residuals ------------------------------------------------------------- Percentiles Smallest 1% -2.146736 -3.705058 5% -1.581073 -3.394443 10% -1.259347 -3.154713 Obs 1,572 25% -.7280966 -3.098353 Sum of wgt. 1,572 50% -.0183825 Mean .0000118 Largest Std. dev. 1.000858 75% .7256545 2.402748 90% 1.349921 2.404035 Variance 1.001716 95% 1.645225 2.495872 Skewness -.0172075 99% 2.184024 2.559009 Kurtosis 2.632243 . display invttail(nobs-nparam-1,.025/nobs) /* values >= abs(4.17) outliers*/ 4.1723589 . count if abs(delres) & delres!=. >=4.17 0 . . sort dfit . list cow wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres cook dfit in 1/10 +-------------------------------------------------------------------------------------------------------------------------------+ | cow wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres cook dfit | |-------------------------------------------------------------------------------------------------------------------------------| 1. | 444 14 2.639 4.127 yes 235 2 yes no no no -2.111 -2.113 0.018 -0.427 | 2. | 2498 11 2.398 4.026 no 263 1 no yes no yes -2.295 -2.298 0.015 -0.382 | 3. | 2307 11 2.398 3.878 yes 263 0 no yes no yes -2.087 -2.089 0.012 -0.352 | 4. | 713 38 3.638 4.750 no 333 4 yes no no no -1.581 -1.582 0.011 -0.335 | 5. | 2480 25 3.219 4.451 no 263 1 no no yes yes -1.744 -1.745 0.011 -0.332 | |-------------------------------------------------------------------------------------------------------------------------------| 6. | 5029 23 3.135 4.121 yes 185 0 yes yes no no -1.406 -1.407 0.010 -0.322 | 7. | 2394 31 3.434 4.485 no 263 4 no no yes yes -1.489 -1.489 0.008 -0.288 | 8. | 1122 45 3.807 4.575 yes 201 4 yes no yes yes -1.104 -1.104 0.008 -0.288 | 9. | 5015 14 2.639 3.836 no 185 0 no yes no yes -1.687 -1.688 0.008 -0.281 | 10. | 1036 32 3.466 4.255 yes 201 1 yes yes yes no -1.128 -1.128 0.007 -0.265 | +-------------------------------------------------------------------------------------------------------------------------------+ . sort cook . list cow wpc wpc_ln fit aut_calv herd_size parity1 twin dyst rp vag_disch stdres delres cook dfit in -10/-1 +-------------------------------------------------------------------------------------------------------------------------------+ | cow wpc wpc_ln fit aut_calv herd_s~e parity1 twin dyst rp vag_di~h stdres delres cook dfit | |-------------------------------------------------------------------------------------------------------------------------------| 1565. | 238 229 5.434 3.925 yes 294 1 no no no yes 2.120 2.122 0.010 0.314 | 1566. | 5029 23 3.135 4.121 yes 185 0 yes yes no no -1.406 -1.407 0.010 -0.322 | 1567. | 2480 25 3.219 4.451 no 263 1 no no yes yes -1.744 -1.745 0.011 -0.332 | 1568. | 713 38 3.638 4.750 no 333 4 yes no no no -1.581 -1.582 0.011 -0.335 | 1569. | 2307 11 2.398 3.878 yes 263 0 no yes no yes -2.087 -2.089 0.012 -0.352 | |-------------------------------------------------------------------------------------------------------------------------------| 1570. | 1238 205 5.323 3.762 no 125 0 no no no yes 2.201 2.204 0.014 0.374 | 1571. | 2498 11 2.398 4.026 no 263 1 no yes no yes -2.295 -2.298 0.015 -0.382 | 1572. | 444 14 2.639 4.127 yes 235 2 yes no no no -2.111 -2.113 0.018 -0.427 | 1573. | 1032 1 0.000 . yes 201 1 no no no no . . . . | 1574. | 2272 1 0.000 . no 263 1 no no no no . . . . | +-------------------------------------------------------------------------------------------------------------------------------+ . *the two outliers don't have the highest infl. values . . estimate table ln ln_noout, //models are very similar since outliers did not have much influence ---------------------------------------- Variable | ln ln_noout -------------+-------------------------- hs100_ctr | .34365799 .33916323 hs100_ctr_sq | .21187276 .20269649 parity1 | .01298436 .0113333 aut_calv | -.13716214 -.13695147 twin | .39274261 .3894441 dyst | .1109405 .10338895 | rp | yes | .11231661 .10603408 | vag_disch | yes | -.02601346 -.03328151 | rp#vag_disch | yes#yes | .41369992 .42230092 | _cons | 3.8885867 3.9010244 ---------------------------------------- . estimate table ln ln_noout, se //now with SE's ---------------------------------------- Variable | ln ln_noout -------------+-------------------------- hs100_ctr | .34365799 .33916323 | .03172331 .031199 hs100_ctr_sq | .21187276 .20269649 | .04562075 .04487057 parity1 | .01298436 .0113333 | .01258597 .01237635 aut_calv | -.13716214 -.13695147 | .0372127 .03660917 twin | .39274261 .3894441 | .14436609 .14193974 dyst | .1109405 .10338895 | .08010133 .07876115 | rp | yes | .11231661 .10603408 | .07056115 .0693799 | vag_disch | yes | -.02601346 -.03328151 | .10501221 .10325122 | rp#vag_disch | yes#yes | .41369992 .42230092 | .18353098 .18044883 | _cons | 3.8885867 3.9010244 | .03862574 .03801689 ---------------------------------------- Legend: b/se . . log close name: log: C:\vhm812-data\l2b_linear_reg_diag.txt log type: text closed on: 16 Jan 2023, 08:07:18 -------------------------------------------------------------------------------------------------------------------------------------------