-------------------------------------------------------------------------------------- name: log: C:\vhm812-data\l2a_modelbuildI.txt log type: text opened on: 11 Jan 2016, 22:17:12 . set more off . . * open the DAISY dataset . use daisy2red.dta, clear . . *meaningless use of a nominal variable . egen farm=group(herd) . label var farm farm . table farm, c(mean milk120) ------------------------- farm | mean(milk120) ----------+-------------- 1 | 3357.8 2 | 3279.3 3 | 2778.6 4 | 3032.5 5 | 2838.2 6 | 3730.6 7 | 3435.8 ------------------------- . twoway (scatter milk120 farm) (lfit milk120 farm) , name(pos, replace) . regress milk120 farm Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(1, 1534) = 33.10 Model | 15802185.7 1 15802185.7 Prob > F = 0.0000 Residual | 732338007 1,534 477404.177 R-squared = 0.0211 -------------+---------------------------------- Adj R-squared = 0.0205 Total | 748140192 1,535 487387.748 Root MSE = 690.94 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- farm | 50.90872 8.848643 5.75 0.000 33.552 68.26543 _cons | 3026.143 37.27521 81.18 0.000 2953.028 3099.259 ------------------------------------------------------------------------------ . . gen farm1=abs(farm-7)+1 . preserve . collapse milk120 farm, by(farm1) . sort farm . list farm milk120 farm1 +------------------------+ | farm milk120 farm1 | |------------------------| 1. | 1 3357.8 7 | 2. | 2 3279.3 6 | 3. | 3 2778.6 5 | 4. | 4 3032.5 4 | 5. | 5 2838.2 3 | |------------------------| 6. | 6 3730.6 2 | 7. | 7 3435.8 1 | +------------------------+ . restore . twoway (scatter milk120 farm1) (lfit milk120 farm1) , name(pos, replace) . regress milk120 farm1 Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(1, 1534) = 33.10 Model | 15802185.7 1 15802185.7 Prob > F = 0.0000 Residual | 732338007 1,534 477404.177 R-squared = 0.0211 -------------+---------------------------------- Adj R-squared = 0.0205 Total | 748140192 1,535 487387.748 Root MSE = 690.94 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- farm1 | -50.90872 8.848643 -5.75 0.000 -68.26543 -33.552 _cons | 3433.413 41.84204 82.06 0.000 3351.339 3515.487 ------------------------------------------------------------------------------ . . * create a 4 level categorical variable for age . egen parity_c4=cut(parity), at(1 2 3 4 99) icodes . sort parity . br parity parity_c4 . tab parity parity_c4 Lactation | parity_c4 number | 0 1 2 3 | Total -----------+--------------------------------------------+---------- 1 | 417 0 0 0 | 417 2 | 0 374 0 0 | 374 3 | 0 0 319 0 | 319 4 | 0 0 0 222 | 222 5 | 0 0 0 169 | 169 6 | 0 0 0 69 | 69 7 | 0 0 0 4 | 4 -----------+--------------------------------------------+---------- Total | 417 374 319 464 | 1,574 . . *INDICATOR VARIABLES . * create indicator variables "manually" (you never actually do this) . gen parity_c4_0=(parity_c4==0) if !missing(parity) . gen parity_c4_1=(parity_c4==1) if !missing(parity) . gen parity_c4_2=(parity_c4==2) if !missing(parity) . gen parity_c4_3=(parity_c4==3) if !missing(parity) . . br parity parity_c4 parity_c4_0 parity_c4_1 parity_c4_2 parity_c4_3 . . * determine the average milk120 in each age group . format milk120 %4.2f . tab parity_c4, sum(milk120) | Summary of Milk volume (l) in first | 120 days of lactation parity_c4 | Mean Std. Dev. Freq. ------------+------------------------------------ 0 | 2639.65 486.40 403 1 | 3347.86 626.47 369 2 | 3429.49 655.11 308 3 | 3471.42 650.91 456 ------------+------------------------------------ Total | 3215.10 698.13 1,536 . . * regress -milk120- on the age groups . reg milk120 parity_c4_1 parity_c4_2 parity_c4_3 Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(3, 1532) = 166.65 Model | 184071923 3 61357307.8 Prob > F = 0.0000 Residual | 564068269 1,532 368190.776 R-squared = 0.2460 -------------+---------------------------------- Adj R-squared = 0.2446 Total | 748140192 1,535 487387.748 Root MSE = 606.79 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c4_1 | 708.2134 43.71992 16.20 0.000 622.4561 793.9706 parity_c4_2 | 789.8435 45.92439 17.20 0.000 699.7622 879.9248 parity_c4_3 | 831.7748 41.48567 20.05 0.000 750.4001 913.1495 _cons | 2639.645 30.22623 87.33 0.000 2580.356 2698.934 ------------------------------------------------------------------------------ . * or . reg milk120 i.parity_c4 Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(3, 1532) = 166.65 Model | 184071923 3 61357307.8 Prob > F = 0.0000 Residual | 564068269 1,532 368190.776 R-squared = 0.2460 -------------+---------------------------------- Adj R-squared = 0.2446 Total | 748140192 1,535 487387.748 Root MSE = 606.79 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c4 | 1 | 708.2134 43.71992 16.20 0.000 622.4561 793.9706 2 | 789.8435 45.92439 17.20 0.000 699.7622 879.9248 3 | 831.7748 41.48567 20.05 0.000 750.4001 913.1495 | _cons | 2639.645 30.22623 87.33 0.000 2580.356 2698.934 ------------------------------------------------------------------------------ . * testing a group of categorical predictors . testparm i.parity_c4 ( 1) 1.parity_c4 = 0 ( 2) 2.parity_c4 = 0 ( 3) 3.parity_c4 = 0 F( 3, 1532) = 166.65 Prob > F = 0.0000 . . . * SCALING X VARIABLES . * substract min value . reg milk120 parity Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(1, 1534) = 262.27 Model | 109234227 1 109234227 Prob > F = 0.0000 Residual | 638905966 1,534 416496.718 R-squared = 0.1460 -------------+---------------------------------- Adj R-squared = 0.1455 Total | 748140192 1,535 487387.748 Root MSE = 645.37 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | 178.347 11.01266 16.19 0.000 156.7455 199.9484 _cons | 2727.08 34.33991 79.41 0.000 2659.722 2794.438 ------------------------------------------------------------------------------ . gen parity_1=parity-1 . reg milk120 parity_1 Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(1, 1534) = 262.27 Model | 109234227 1 109234227 Prob > F = 0.0000 Residual | 638905966 1,534 416496.718 R-squared = 0.1460 -------------+---------------------------------- Adj R-squared = 0.1455 Total | 748140192 1,535 487387.748 Root MSE = 645.37 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_1 | 178.347 11.01266 16.19 0.000 156.7455 199.9484 _cons | 2905.427 25.23474 115.14 0.000 2855.928 2954.925 ------------------------------------------------------------------------------ . . * centring . reg milk120 c.herd_size##c.herd_size, vsquish Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(2, 1533) = 111.15 Model | 94747175.1 2 47373587.5 Prob > F = 0.0000 Residual | 653393017 1,533 426218.537 R-squared = 0.1266 -------------+---------------------------------- Adj R-squared = 0.1255 Total | 748140192 1,535 487387.748 Root MSE = 652.85 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- herd_size | 28.73126 1.993023 14.42 0.000 24.82192 32.6406 c.herd_size#| c.herd_size | -.0608255 .0041101 -14.80 0.000 -.0688875 -.0527634 _cons | 66.06488 231.8877 0.28 0.776 -388.7858 520.9155 ------------------------------------------------------------------------------ . estat vce, corr Correlation matrix of coefficients of regress model | c.herd~e# e(V) | herd_s~e c.herd~e _cons -------------+------------------------------ herd_size | 1.0000 c.herd_size#| c.herd_size | -0.9907 1.0000 _cons | -0.9844 0.9535 1.0000 . . summ herd_size Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- herd_size | 1,574 251.0076 62.01692 125 333 . gen hrdsz_ctr=herd_size - 251 . reg milk120 c.hrdsz_ctr##c.hrdsz_ctr, vsquish Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(2, 1533) = 111.15 Model | 94747175.1 2 47373587.5 Prob > F = 0.0000 Residual | 653393017 1,533 426218.537 R-squared = 0.1266 -------------+---------------------------------- Adj R-squared = 0.1255 Total | 748140192 1,535 487387.748 Root MSE = 652.85 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hrdsz_ctr | -1.803116 .2847073 -6.33 0.000 -2.361573 -1.244659 c.hrdsz_ctr#| c.hrdsz_ctr | -.0608255 .0041101 -14.80 0.000 -.0688875 -.0527634 _cons | 3445.547 22.80494 151.09 0.000 3400.815 3490.279 ------------------------------------------------------------------------------ . estat vce, corr Correlation matrix of coefficients of regress model | c.hrds~r# e(V) | hrdsz_~r c.hrds~r _cons -------------+------------------------------ hrdsz_ctr | 1.0000 c.hrdsz_ctr#| c.hrdsz_ctr | 0.3115 1.0000 _cons | -0.2118 -0.6830 1.0000 . . * scale of measurement . reg milk120 herd_size Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(1, 1534) = 2.88 Model | 1400932.73 1 1400932.73 Prob > F = 0.0900 Residual | 746739260 1,534 486792.216 R-squared = 0.0019 -------------+---------------------------------- Adj R-squared = 0.0012 Total | 748140192 1,535 487387.748 Root MSE = 697.7 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- herd_size | -.49048 .2891242 -1.70 0.090 -1.057601 .0766405 _cons | 3338.164 74.69789 44.69 0.000 3191.643 3484.685 ------------------------------------------------------------------------------ . capture drop herdsz_100 . gen herdsz_100=herd_size/100 /*rescale herd_size so coef are larger*/ . reg milk120 herdsz_100 Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(1, 1534) = 2.88 Model | 1400930.12 1 1400930.12 Prob > F = 0.0900 Residual | 746739262 1,534 486792.218 R-squared = 0.0019 -------------+---------------------------------- Adj R-squared = 0.0012 Total | 748140192 1,535 487387.748 Root MSE = 697.7 ------------------------------------------------------------------------------ milk120 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- herdsz_100 | -49.04796 28.91242 -1.70 0.090 -105.76 7.664098 _cons | 3338.164 74.69789 44.69 0.000 3191.643 3484.685 ------------------------------------------------------------------------------ . . * DETECTING CONFOUNDING . reg wpc vag_disch /* vag_disch adds ~12 days, P=0.04 */ Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(1, 1572) = 4.21 Model | 11186.2576 1 11186.2576 Prob > F = 0.0404 Residual | 4176904.3 1,572 2657.0638 R-squared = 0.0027 -------------+---------------------------------- Adj R-squared = 0.0020 Total | 4188090.56 1,573 2662.48605 Root MSE = 51.547 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | 11.99647 5.846716 2.05 0.040 .5282858 23.46465 _cons | 68.17426 1.334494 51.09 0.000 65.55669 70.79184 ------------------------------------------------------------------------------ . reg wpc i.herd if vag_disch==0 Source | SS df MS Number of obs = 1,492 -------------+---------------------------------- F(6, 1485) = 14.92 Model | 223706.812 6 37284.4686 Prob > F = 0.0000 Residual | 3711781.88 1,485 2499.51642 R-squared = 0.0568 -------------+---------------------------------- Adj R-squared = 0.0530 Total | 3935488.69 1,491 2639.4961 Root MSE = 49.995 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- herd | 2 | -7.455598 4.572715 -1.63 0.103 -16.42527 1.51407 3 | 12.30396 4.184586 2.94 0.003 4.095636 20.51229 4 | -20.0733 4.70619 -4.27 0.000 -29.30479 -10.84181 5 | -21.78125 5.489853 -3.97 0.000 -32.54994 -11.01256 106 | -15.40129 4.618509 -3.33 0.001 -24.46079 -6.341796 119 | -17.26021 5.05679 -3.41 0.001 -27.17942 -7.341 | _cons | 75.1583 3.106548 24.19 0.000 69.06461 81.25199 ------------------------------------------------------------------------------ . testparm i.herd /* herd is sig. associated with wpc*/ ( 1) 2.herd = 0 ( 2) 3.herd = 0 ( 3) 4.herd = 0 ( 4) 5.herd = 0 ( 5) 106.herd = 0 ( 6) 119.herd = 0 F( 6, 1485) = 14.92 Prob > F = 0.0000 . tab herd vag_disch, chi row /* herd is associated with vag_disc*/ +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Vaginal discharge Herd | observed Number | no yes | Total -----------+----------------------+---------- 1 | 259 13 | 272 | 95.22 4.78 | 100.00 -----------+----------------------+---------- 2 | 222 5 | 227 | 97.80 2.20 | 100.00 -----------+----------------------+---------- 3 | 318 4 | 322 | 98.76 1.24 | 100.00 -----------+----------------------+---------- 4 | 200 3 | 203 | 98.52 1.48 | 100.00 -----------+----------------------+---------- 5 | 122 7 | 129 | 94.57 5.43 | 100.00 -----------+----------------------+---------- 106 | 214 39 | 253 | 84.58 15.42 | 100.00 -----------+----------------------+---------- 119 | 157 11 | 168 | 93.45 6.55 | 100.00 -----------+----------------------+---------- Total | 1,492 82 | 1,574 | 94.79 5.21 | 100.00 Pearson chi2(6) = 74.2267 Pr = 0.000 . reg wpc vag_disch i.herd Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 14.35 Model | 252509.59 7 36072.7985 Prob > F = 0.0000 Residual | 3935580.97 1,566 2513.14238 R-squared = 0.0603 -------------+---------------------------------- Adj R-squared = 0.0561 Total | 4188090.56 1,573 2662.48605 Root MSE = 50.131 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | 17.81936 5.825177 3.06 0.002 6.393393 29.24533 | herd | 2 | -8.178615 4.509228 -1.81 0.070 -17.02338 .6661455 3 | 11.71303 4.133611 2.83 0.005 3.605039 19.82103 4 | -20.74627 4.653654 -4.46 0.000 -29.87432 -11.61823 5 | -22.15954 5.359351 -4.13 0.000 -32.6718 -11.64728 106 | -18.18881 4.422295 -4.11 0.000 -26.86305 -9.514562 119 | -17.85657 4.920293 -3.63 0.000 -27.50763 -8.205518 | _cons | 75.97555 3.052377 24.89 0.000 69.98837 81.96272 ------------------------------------------------------------------------------ . di "% change =" ((17.82-11.99)/11.99)*100 " -> vag_disch change by ~49% , herd is > a confounder" % change =48.623853 -> vag_disch change by ~49% , herd is a confounder . . * CONFOUNDING AND COLLINEARITY . use bw5k.dta, clear . corr cig_2 cig_3 /*considering only cig_2 for the example*/ (obs=5,000) | cig_2 cig_3 -------------+------------------ cig_2 | 1.0000 cig_3 | 0.9451 1.0000 . . * Model for cig_3 as exposure of interest . * 1) relationship Z->E . corr cig_3 cig_2 (obs=5,000) | cig_3 cig_2 -------------+------------------ cig_3 | 1.0000 cig_2 | 0.9451 1.0000 . * 2) relationship Z-Y in E- . reg bwt cig_2 if cig_3==0 /*borderline sig* but not the best model*/ Source | SS df MS Number of obs = 4,656 -------------+---------------------------------- F(1, 4654) = 2.77 Model | 879610.363 1 879610.363 Prob > F = 0.0960 Residual | 1.4771e+09 4,654 317380.896 R-squared = 0.0006 -------------+---------------------------------- Adj R-squared = 0.0004 Total | 1.4780e+09 4,655 317501.676 Root MSE = 563.37 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig_2 | -22.32463 13.41002 -1.66 0.096 -48.61462 3.965367 _cons | 3310.075 8.263501 400.57 0.000 3293.874 3326.275 ------------------------------------------------------------------------------ . * 3) not an int. variable . * 4) change in coef. . reg bwt cig_3 Source | SS df MS Number of obs = 5,000 -------------+---------------------------------- F(1, 4998) = 22.55 Model | 7189497.98 1 7189497.98 Prob > F = 0.0000 Residual | 1.5935e+09 4,998 318835.965 R-squared = 0.0045 -------------+---------------------------------- Adj R-squared = 0.0043 Total | 1.6007e+09 4,999 320210.372 Root MSE = 564.66 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig_3 | -12.48752 2.629726 -4.75 0.000 -17.64293 -7.332101 _cons | 3303.448 8.177729 403.96 0.000 3287.416 3319.48 ------------------------------------------------------------------------------ . reg bwt cig_3 cig_2 Source | SS df MS Number of obs = 5,000 -------------+---------------------------------- F(2, 4997) = 12.61 Model | 8037757.94 2 4018878.97 Prob > F = 0.0000 Residual | 1.5927e+09 4,997 318730.017 R-squared = 0.0050 -------------+---------------------------------- Adj R-squared = 0.0046 Total | 1.6007e+09 4,999 320210.372 Root MSE = 564.56 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig_3 | -.0859205 8.043799 -0.01 0.991 -15.8553 15.68346 cig_2 | -12.11372 7.425483 -1.63 0.103 -26.67093 2.44348 _cons | 3304.134 8.187191 403.57 0.000 3288.084 3320.185 ------------------------------------------------------------------------------ . di "%change = " ((12.49-0.09)/12.49)*100 %change = 99.279424 . estat vce, corr /*high correlation*/ Correlation matrix of coefficients of regress model e(V) | cig_3 cig_2 _cons -------------+------------------------------ cig_3 | 1.0000 cig_2 | -0.9451 1.0000 _cons | -0.0218 -0.0514 1.0000 . * try centring . summ cig_2 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- cig_2 | 5,000 .743 3.28979 0 45 . gen cig2_ctr=cig_2-r(mean) . summ cig_3 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- cig_3 | 5,000 .6704 3.036908 0 45 . gen cig3_ctr=cig_3-r(mean) . reg bwt cig3_ctr cig2_ctr Source | SS df MS Number of obs = 5,000 -------------+---------------------------------- F(2, 4997) = 12.61 Model | 8037757.91 2 4018878.95 Prob > F = 0.0000 Residual | 1.5927e+09 4,997 318730.017 R-squared = 0.0050 -------------+---------------------------------- Adj R-squared = 0.0046 Total | 1.6007e+09 4,999 320210.372 Root MSE = 564.56 ------------------------------------------------------------------------------ bwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cig3_ctr | -.0859208 8.043799 -0.01 0.991 -15.8553 15.68346 cig2_ctr | -12.11372 7.425483 -1.63 0.103 -26.67093 2.44348 _cons | 3295.076 7.984109 412.70 0.000 3279.424 3310.728 ------------------------------------------------------------------------------ . estat vce, corr Correlation matrix of coefficients of regress model e(V) | cig3_ctr cig2_ctr _cons -------------+------------------------------ cig3_ctr | 1.0000 cig2_ctr | -0.9451 1.0000 _cons | 0.0000 -0.0000 1.0000 . /*still high correlation since centring will not reduce correlation > on derived variables*/ . . * DETECTING INTERACTION . * 2 dichotomous variables . use daisy2red, clear . gen parity_1=parity-1 . . reg wpc vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(1, 1572) = 4.21 Model | 11186.2576 1 11186.2576 Prob > F = 0.0404 Residual | 4176904.3 1,572 2657.0638 R-squared = 0.0027 -------------+---------------------------------- Adj R-squared = 0.0020 Total | 4188090.56 1,573 2662.48605 Root MSE = 51.547 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | 11.99647 5.846716 2.05 0.040 .5282858 23.46465 _cons | 68.17426 1.334494 51.09 0.000 65.55669 70.79184 ------------------------------------------------------------------------------ . reg wpc i.vag_disch i.rp vag_disch#rp Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(3, 1570) = 4.53 Model | 35915.9774 3 11971.9925 Prob > F = 0.0036 Residual | 4152174.58 1,570 2644.69719 R-squared = 0.0086 -------------+---------------------------------- Adj R-squared = 0.0067 Total | 4188090.56 1,573 2662.48605 Root MSE = 51.427 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | yes | .5429296 7.265382 0.07 0.940 -13.70794 14.7938 | rp | yes | 6.339794 4.914322 1.29 0.197 -3.299531 15.97912 | vag_disch#rp | yes#yes | 26.34867 12.77367 2.06 0.039 1.293414 51.40392 | _cons | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 ------------------------------------------------------------------------------ . reg wpc vag_disch##rp //same as above Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(3, 1570) = 4.53 Model | 35915.9774 3 11971.9925 Prob > F = 0.0036 Residual | 4152174.58 1,570 2644.69719 R-squared = 0.0086 -------------+---------------------------------- Adj R-squared = 0.0067 Total | 4188090.56 1,573 2662.48605 Root MSE = 51.427 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | yes | .5429296 7.265382 0.07 0.940 -13.70794 14.7938 | rp | yes | 6.339794 4.914322 1.29 0.197 -3.299531 15.97912 | vag_disch#rp | yes#yes | 26.34867 12.77367 2.06 0.039 1.293414 51.40392 | _cons | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 ------------------------------------------------------------------------------ . table vag_disch rp, c(mean wpc) // display wpc means by vag_dich and rp ------------------------------ Vaginal | Retained placenta discharge | at calving observed | no yes ----------+------------------- no | 67.66861 74.0084 yes | 68.21154 100.9 ------------------------------ . . *Use of margins command as a tool to create plots - this will be explained later . reg wpc vag_disch#rp //total effects Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(3, 1570) = 4.53 Model | 35915.9774 3 11971.9925 Prob > F = 0.0036 Residual | 4152174.58 1,570 2644.69719 R-squared = 0.0086 -------------+---------------------------------- Adj R-squared = 0.0067 Total | 4188090.56 1,573 2662.48605 Root MSE = 51.427 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch#rp | no#yes | 6.339794 4.914322 1.29 0.197 -3.299531 15.97912 yes#no | .5429296 7.265382 0.07 0.940 -13.70794 14.7938 yes#yes | 33.23139 9.491195 3.50 0.000 14.61464 51.84814 | _cons | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 ------------------------------------------------------------------------------ . margins vag_disch#rp //same as table command Adjusted predictions Number of obs = 1,574 Model VCE : OLS Expression : Linear prediction, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch#rp | no#no | 67.66861 1.387883 48.76 0.000 64.94631 70.39091 no#yes | 74.0084 4.71427 15.70 0.000 64.76147 83.25533 yes#no | 68.21154 7.131589 9.56 0.000 54.2231 82.19998 yes#yes | 100.9 9.389173 10.75 0.000 82.48336 119.3166 ------------------------------------------------------------------------------ . marginsplot, legend(title("Retained Placenta")) noci ytitle("Predicted WPC") title(" > ") Variables that uniquely identify margins: vag_disch rp . marginsplot, x(rp) legend(title("Vaginal Discharge")) noci ytitle("Predicted WPC") t > itle("") Variables that uniquely identify margins: vag_disch rp . . * dichotomous and continuous variable . gen milk120k=milk120/1000 (38 missing values generated) . summ milk120k Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- milk120k | 1,536 3.215096 .6981316 1.1102 5.6303 . gen milk120k_ctr=milk120k-r(mean) (38 missing values generated) . reg wpc i.dyst##c.milk120k_ctr, vsquish Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(3, 1532) = 3.83 Model | 30572.8752 3 10190.9584 Prob > F = 0.0095 Residual | 4073791.78 1,532 2659.13302 R-squared = 0.0074 -------------+---------------------------------- Adj R-squared = 0.0055 Total | 4104364.66 1,535 2673.8532 Root MSE = 51.567 ------------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------+---------------------------------------------------------------- dyst | yes | 8.20714 5.718528 1.44 0.151 -3.00983 19.42411 milk120k_ctr | -3.446531 1.928535 -1.79 0.074 -7.229379 .3363161 dyst#c.milk120k_ctr | yes | 29.14238 9.468101 3.08 0.002 10.57057 47.71419 _cons | 68.75682 1.357147 50.66 0.000 66.09475 71.41888 ------------------------------------------------------------------------------------- . margins dyst, at(milk120k_ctr=(-2(1)3)) Adjusted predictions Number of obs = 1,536 Model VCE : OLS Expression : Linear prediction, predict() 1._at : milk120k_ctr = -2 2._at : milk120k_ctr = -1 3._at : milk120k_ctr = 0 4._at : milk120k_ctr = 1 5._at : milk120k_ctr = 2 6._at : milk120k_ctr = 3 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#dyst | 1#no | 75.64988 4.106318 18.42 0.000 67.59528 83.70448 1#yes | 25.57227 17.96398 1.42 0.155 -9.664321 60.80885 2#no | 72.20335 2.37331 30.42 0.000 67.54807 76.85863 2#yes | 51.26811 9.531863 5.38 0.000 32.57123 69.96499 3#no | 68.75682 1.357147 50.66 0.000 66.09475 71.41888 3#yes | 76.96396 5.555152 13.85 0.000 66.06745 87.86046 4#no | 65.31028 2.342987 27.87 0.000 60.71448 69.90609 4#yes | 102.6598 11.94631 8.59 0.000 79.22694 126.0927 5#no | 61.86375 4.071342 15.19 0.000 53.87776 69.84975 5#yes | 128.3556 20.64995 6.22 0.000 87.85048 168.8608 6#no | 58.41722 5.924572 9.86 0.000 46.79609 70.03835 6#yes | 154.0515 29.69811 5.19 0.000 95.79823 212.3047 ------------------------------------------------------------------------------ . marginsplot, noci title("") legend(order(1 "dyst=0" 2 "dyst=1")) scheme(s2mono) Variables that uniquely identify margins: milk120k_ctr dyst . . * two continuous variables . reg wpc c.parity_1##c.milk120k_ctr Source | SS df MS Number of obs = 1,536 -------------+---------------------------------- F(3, 1532) = 2.26 Model | 18084.1899 3 6028.06329 Prob > F = 0.0797 Residual | 4086280.47 1,532 2667.2849 R-squared = 0.0044 -------------+---------------------------------- Adj R-squared = 0.0025 Total | 4104364.66 1,535 2673.8532 Root MSE = 51.646 -------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- parity_1 | 2.072164 .9549532 2.17 0.030 .1990103 3.945318 milk120k_ctr | -2.542834 3.091716 -0.82 0.411 -8.607278 3.521609 | c.parity_1#| c.milk120k_ctr | -.8764358 1.363504 -0.64 0.520 -3.550968 1.798096 | _cons | 65.73655 2.207989 29.77 0.000 61.40555 70.06755 -------------------------------------------------------------------------------- . margins, at(milk120k_ctr=(-2(1)3) parity=(0(1)3)) Adjusted predictions Number of obs = 1,536 Model VCE : OLS Expression : Linear prediction, predict() 1._at : parity_1 = 0 milk120k_ctr = -2 2._at : parity_1 = 0 milk120k_ctr = -1 3._at : parity_1 = 0 milk120k_ctr = 0 4._at : parity_1 = 0 milk120k_ctr = 1 5._at : parity_1 = 0 milk120k_ctr = 2 6._at : parity_1 = 0 milk120k_ctr = 3 7._at : parity_1 = 1 milk120k_ctr = -2 8._at : parity_1 = 1 milk120k_ctr = -1 9._at : parity_1 = 1 milk120k_ctr = 0 10._at : parity_1 = 1 milk120k_ctr = 1 11._at : parity_1 = 1 milk120k_ctr = 2 12._at : parity_1 = 1 milk120k_ctr = 3 13._at : parity_1 = 2 milk120k_ctr = -2 14._at : parity_1 = 2 milk120k_ctr = -1 15._at : parity_1 = 2 milk120k_ctr = 0 16._at : parity_1 = 2 milk120k_ctr = 1 17._at : parity_1 = 2 milk120k_ctr = 2 18._at : parity_1 = 2 milk120k_ctr = 3 19._at : parity_1 = 3 milk120k_ctr = -2 20._at : parity_1 = 3 milk120k_ctr = -1 21._at : parity_1 = 3 milk120k_ctr = 0 22._at : parity_1 = 3 milk120k_ctr = 1 23._at : parity_1 = 3 milk120k_ctr = 2 24._at : parity_1 = 3 milk120k_ctr = 3 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | 70.82222 5.665012 12.50 0.000 59.71022 81.93422 2 | 68.27938 2.987487 22.86 0.000 62.41939 74.13938 3 | 65.73655 2.207989 29.77 0.000 61.40555 70.06755 4 | 63.19371 4.465733 14.15 0.000 54.43412 71.95331 5 | 60.65088 7.357156 8.24 0.000 46.21972 75.08204 6 | 58.10805 10.36485 5.61 0.000 37.77725 78.43884 7 | 74.64725 4.302751 17.35 0.000 66.20735 83.08716 8 | 71.22798 2.33446 30.51 0.000 66.64891 75.80706 9 | 67.80871 1.601972 42.33 0.000 64.66642 70.951 10 | 64.38944 3.140622 20.50 0.000 58.22907 70.54982 11 | 60.97017 5.228888 11.66 0.000 50.71364 71.22671 12 | 57.5509 7.416173 7.76 0.000 43.00398 72.09783 13 | 78.47229 4.592419 17.09 0.000 69.4642 87.48038 14 | 74.17658 2.691774 27.56 0.000 68.89663 79.45653 15 | 69.88088 1.442666 48.44 0.000 67.05107 72.71069 16 | 65.58517 2.365847 27.72 0.000 60.94453 70.22581 17 | 61.28947 4.218233 14.53 0.000 53.01534 69.56359 18 | 56.99376 6.218525 9.17 0.000 44.79604 69.19148 19 | 82.29732 6.310492 13.04 0.000 69.91921 94.67544 20 | 77.12518 3.783241 20.39 0.000 69.7043 84.54606 21 | 71.95304 1.849358 38.91 0.000 68.3255 75.58058 22 | 66.7809 2.672818 24.99 0.000 61.53813 72.02367 23 | 61.60876 5.048207 12.20 0.000 51.70663 71.51089 24 | 56.43662 7.644701 7.38 0.000 41.44143 71.4318 ------------------------------------------------------------------------------ . marginsplot, noci title("") scheme(s2mono) Variables that uniquely identify margins: milk120k_ctr parity_1 . . * interaction between 2 cont. variables assumes both effect are linear . * is this true for -milk120k- ? - generate a smoothed scatterplot . twoway (scatter wpc milk120k) (lowess wpc milk120k) . twoway (scatter wpc parity) (lowess wpc parity) . . * CAUSAL INTERPRETATION . use daisy2red.dta, clear . . * model to evaluate effect of - rp and vag_disch . gen calv_mth=month(calv_dt) . tab calv_mth, summ(wpc) | Summary of Interval from wait | period to conception calv_mth | Mean Std. Dev. Freq. ------------+------------------------------------ 1 | 77.00885 50.850533 113 2 | 60.009615 45.765017 104 3 | 69.402516 54.240687 159 4 | 60.385827 53.046258 127 5 | 63 49.882911 106 6 | 64.350365 51.665154 137 7 | 63.581818 45.047453 110 8 | 69.514019 54.277478 107 9 | 74.664773 52.827873 176 10 | 72.477011 49.332856 174 11 | 72.026846 54.231479 149 12 | 73.508929 53.222801 112 ------------+------------------------------------ Total | 68.799238 51.599283 1,574 . gen aut_calv=(calv_mth>=2 & calv_mth<=7) if !missing(calv_mth) . tab aut_calv, summ(wpc) | Summary of Interval from wait | period to conception aut_calv | Mean Std. Dev. Freq. ------------+------------------------------------ 0 | 73.233454 52.236588 831 1 | 63.839838 50.451973 743 ------------+------------------------------------ Total | 68.799238 51.599283 1,574 . gen hs100=(herd_size/100) . gen parity_1=parity-1 . . reg wpc c.hs100##c.hs100 parity_1 i.aut_calv twin dyst i.rp##vag_disch , vsquish Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 13.22 Model | 296062.681 9 32895.8535 Prob > F = 0.0000 Residual | 3892027.88 1,564 2488.50887 R-squared = 0.0707 -------------+---------------------------------- Adj R-squared = 0.0653 Total | 4188090.56 1,573 2662.48605 Root MSE = 49.885 --------------------------------------------------------------------------------- wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs100 | -36.05705 15.05032 -2.40 0.017 -65.57798 -6.53612 c.hs100#c.hs100 | 11.13827 3.111145 3.58 0.000 5.035818 17.24073 parity_1 | 1.13721 .8583103 1.32 0.185 -.54635 2.82077 1.aut_calv | -8.263839 2.537751 -3.26 0.001 -13.24159 -3.286086 twin | 20.68314 9.845165 2.10 0.036 1.37203 39.99425 dyst | 11.70041 5.462576 2.14 0.032 .9856659 22.41516 rp | yes | 5.98687 4.811976 1.24 0.214 -3.451734 15.42547 vag_disch | yes | 1.228195 7.161395 0.17 0.864 -12.81875 15.27514 rp#vag_disch | yes#yes | 22.85194 12.51605 1.83 0.068 -1.698056 47.40194 _cons | 84.66125 17.61671 4.81 0.000 50.10639 119.2161 --------------------------------------------------------------------------------- . estat vce, corr Correlation matrix of coefficients of regress model | c.hs100# 1. 1. e(V) | hs100 c.hs100 parity_1 aut_calv twin dyst rp -------------+---------------------------------------------------------------------- hs100 | 1.0000 c.hs100#| c.hs100 | -0.9907 1.0000 parity_1 | 0.0426 -0.0468 1.0000 1.aut_calv | 0.0349 -0.0265 -0.0500 1.0000 twin | -0.0435 0.0464 -0.0666 -0.0487 1.0000 dyst | -0.0579 0.0717 0.1303 0.0023 -0.0214 1.0000 1.rp | -0.0527 0.0540 0.0318 0.0438 -0.0807 -0.0656 1.0000 1.vag_disch | -0.0680 0.0698 0.0649 0.0387 -0.0383 -0.1188 0.0742 1.rp#| 1.vag_disch | 0.0566 -0.0571 -0.0644 -0.0024 -0.0429 0.0861 -0.3873 _cons | -0.9772 0.9458 -0.1201 -0.1090 0.0417 0.0132 0.0235 | 1. 1.rp# e(V) | vag_di~h 1.vag_~h _cons -------------+------------------------------ 1.vag_disch | 1.0000 1.rp#| 1.vag_disch | -0.5752 1.0000 _cons | 0.0427 -0.0410 1.0000 . *centre hs100 . summ hs100 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- hs100 | 1,574 2.510076 .6201691 1.25 3.33 . gen hs100_ctr=hs100-r(mean) . . reg wpc c.hs100_ctr##c.hs100_ctr parity_1 i.aut_calv i.twin i.dyst i.rp##vag_disch , > vsquish Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 13.22 Model | 296062.682 9 32895.8535 Prob > F = 0.0000 Residual | 3892027.88 1,564 2488.50887 R-squared = 0.0707 -------------+---------------------------------- Adj R-squared = 0.0653 Total | 4188090.56 1,573 2662.48605 Root MSE = 49.885 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | 19.85878 2.163552 9.18 0.000 15.61501 24.10255 c.hs100_ctr#| c.hs100_ctr | 11.13827 3.111145 3.58 0.000 5.035818 17.24073 parity_1 | 1.13721 .8583103 1.32 0.185 -.54635 2.82077 1.aut_calv | -8.263839 2.537751 -3.26 0.001 -13.24159 -3.286086 twin | yes | 20.68314 9.845165 2.10 0.036 1.37203 39.99425 dyst | yes | 11.70041 5.462576 2.14 0.032 .9856659 22.41516 rp | yes | 5.98687 4.811976 1.24 0.214 -3.451734 15.42547 vag_disch | yes | 1.228195 7.161395 0.17 0.864 -12.81875 15.27514 rp#vag_disch | yes#yes | 22.85194 12.51605 1.83 0.068 -1.698056 47.40194 _cons | 64.3318 2.634086 24.42 0.000 59.16509 69.49851 ------------------------------------------------------------------------------ . estat vce, corr Correlation matrix of coefficients of regress model | c.hs10~r# 1. 1. 1. 1. e(V) | hs100_~r c.hs10~r parity_1 aut_calv twin dyst rp -------------+---------------------------------------------------------------------- hs100_ctr | 1.0000 c.hs100_ctr#| c.hs100_ctr | 0.3271 1.0000 parity_1 | -0.0415 -0.0468 1.0000 1.aut_calv | 0.0521 -0.0265 -0.0500 1.0000 1.twin | 0.0326 0.0464 -0.0666 -0.0487 1.0000 1.dyst | 0.1146 0.0717 0.1303 0.0023 -0.0214 1.0000 1.rp | 0.0230 0.0540 0.0318 0.0438 -0.0807 -0.0656 1.0000 1.vag_disch | 0.0305 0.0698 0.0649 0.0387 -0.0383 -0.1188 0.0742 1.rp#| 1.vag_disch | -0.0184 -0.0571 -0.0644 -0.0024 -0.0429 0.0861 -0.3873 _cons | -0.1716 -0.4415 -0.5404 -0.4246 0.0004 -0.2091 -0.1974 | 1. 1.rp# e(V) | vag_di~h 1.vag_~h _cons -------------+------------------------------ 1.vag_disch | 1.0000 1.rp#| 1.vag_disch | -0.5752 1.0000 _cons | -0.1711 0.1133 1.0000 . . *Plot to show herd size effect . margins i.rp, at(hs100=(-1.26 -0.66 -0.5 -0.16 0.12 0.43 0.82) vag_disch=0 twin=0 au > t_calv=1) Predictive margins Number of obs = 1,574 Model VCE : OLS Expression : Linear prediction, predict() 1._at : hs100_ctr = -1.26 aut_calv = 1 twin = 0 vag_disch = 0 2._at : hs100_ctr = -.66 aut_calv = 1 twin = 0 vag_disch = 0 3._at : hs100_ctr = -.5 aut_calv = 1 twin = 0 vag_disch = 0 4._at : hs100_ctr = -.16 aut_calv = 1 twin = 0 vag_disch = 0 5._at : hs100_ctr = .12 aut_calv = 1 twin = 0 vag_disch = 0 6._at : hs100_ctr = .43 aut_calv = 1 twin = 0 vag_disch = 0 7._at : hs100_ctr = .82 aut_calv = 1 twin = 0 vag_disch = 0 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#rp | 1#no | 51.39514 4.179809 12.30 0.000 43.19652 59.59376 1#yes | 57.38201 6.237152 9.20 0.000 45.14794 69.61607 2#no | 50.47911 2.305193 21.90 0.000 45.95752 55.00071 2#yes | 56.46598 5.017018 11.25 0.000 46.62519 66.30677 3#no | 51.58925 2.272508 22.70 0.000 47.13177 56.04674 3#yes | 57.57612 4.979707 11.56 0.000 47.80852 67.34373 4#no | 55.84181 2.307135 24.20 0.000 51.31641 60.36721 4#yes | 61.82868 4.975561 12.43 0.000 52.06921 71.58815 5#no | 61.27752 2.227896 27.50 0.000 56.90754 65.6475 5#yes | 67.26439 4.951049 13.59 0.000 57.553 76.97578 6#no | 69.33282 2.148793 32.27 0.000 65.118 73.54764 6#yes | 75.31969 4.95891 15.19 0.000 65.59287 85.0465 7#no | 82.50765 2.927494 28.18 0.000 76.76542 88.24987 7#yes | 88.49452 5.433098 16.29 0.000 77.83759 99.15144 ------------------------------------------------------------------------------ . marginsplot, noci xlabel(-1.26 "125" -0.66 "185" -0.5 "201" -0.16 "235" 0.12 "263" 0 > .43 "294" 0.82 "333") /// > xtitle("Herd Size") ytitle("Days") title("Preditive WPC by RP") Variables that uniquely identify margins: hs100_ctr rp . . * dyst conf for rp . reg wpc c.hs100_ctr##c.hs100_ctr parity_1 i.aut_calv i.twin i.rp Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(6, 1567) = 18.07 Model | 270961.954 6 45160.3256 Prob > F = 0.0000 Residual | 3917128.61 1,567 2499.76299 R-squared = 0.0647 -------------+---------------------------------- Adj R-squared = 0.0611 Total | 4188090.56 1,573 2662.48605 Root MSE = 49.998 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | 19.30525 2.151998 8.97 0.000 15.08415 23.52635 | c.hs100_ctr#| c.hs100_ctr | 10.71722 3.099622 3.46 0.001 4.637372 16.79706 | parity_1 | .9394592 .8495005 1.11 0.269 -.7268182 2.605737 1.aut_calv | -8.459954 2.540778 -3.33 0.001 -13.44364 -3.476272 | twin | yes | 23.09719 9.826032 2.35 0.019 3.823631 42.37074 | rp | yes | 11.18504 4.35328 2.57 0.010 2.646174 19.72391 _cons | 65.59313 2.527825 25.95 0.000 60.63485 70.5514 ------------------------------------------------------------------------------ . reg wpc c.hs100_ctr##c.hs100_ctr parity_1 i.aut_calv i.twin i.rp dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 16.16 Model | 282216.669 7 40316.667 Prob > F = 0.0000 Residual | 3905873.89 1,566 2494.17234 R-squared = 0.0674 -------------+---------------------------------- Adj R-squared = 0.0632 Total | 4188090.56 1,573 2662.48605 Root MSE = 49.942 ------------------------------------------------------------------------------ wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- hs100_ctr | 19.85311 2.165007 9.17 0.000 15.60649 24.09973 | c.hs100_ctr#| c.hs100_ctr | 11.25323 3.10642 3.62 0.000 5.160047 17.34641 | parity_1 | 1.194524 .8570034 1.39 0.164 -.4864711 2.875519 1.aut_calv | -8.425575 2.537986 -3.32 0.001 -13.40378 -3.447366 | twin | yes | 22.58687 9.817977 2.30 0.022 3.3291 41.84463 | rp | yes | 10.69412 4.354546 2.46 0.014 2.152766 19.23548 dyst | 11.53206 5.428789 2.12 0.034 .8836013 22.18052 _cons | 64.29614 2.597768 24.75 0.000 59.20067 69.39161 ------------------------------------------------------------------------------ . end of do-file . log close name: log: C:\vhm812-data\l2a_modelbuildI.txt log type: text closed on: 11 Jan 2016, 22:17:22 --------------------------------------------------------------------------------------