----------------------------------------------------------------------------------------------------- name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text opened on: 19 Jan 2016, 09:41:03 . set more off . . * open the DAISY Red dataset . use daisy2red.dta, clear . gen month=month(calv_dt) . gen aut_calv=(month>=2 & month<=7) . gen hs_ct=herd_size-251 . gen hs_sq=herd_size^2 . gen parity1=parity-1 . gen milk120k=milk120/1000 (38 missing values generated) . gen wpc_sqrt=sqrt(wpc) . . * specifying maximum model . * no analyses . . * causal model . * no analyses . . * Reducing the Number of Predictors . * descriptive statistics . codebook cf vag_disch ----------------------------------------------------------------------------------------------------- cf Calving to first service interval ----------------------------------------------------------------------------------------------------- type: numeric (float) range: [21,240] units: 1 unique values: 129 missing .: 14/1,574 mean: 71.3115 std. dev: 21.9039 percentiles: 10% 25% 50% 75% 90% 51 59 67 78 96 ----------------------------------------------------------------------------------------------------- vag_disch Vaginal discharge observed ----------------------------------------------------------------------------------------------------- type: numeric (float) label: noyes range: [0,1] units: 1 unique values: 2 missing .: 0/1,574 tabulation: Freq. Numeric Label 1,492 0 no 82 1 yes . sum milk120 parity herd_size dyst rp vag_disch, detail Milk volume (l) in first 120 days of lactation ------------------------------------------------------------- Percentiles Smallest 1% 1698.5 1110.2 5% 2077.2 1397.7 10% 2298.2 1461.5 Obs 1,536 25% 2731.95 1467 Sum of Wgt. 1,536 50% 3215.25 Mean 3215.096 Largest Std. Dev. 698.1316 75% 3682.1 5181.8 90% 4080.7 5278 Variance 487387.7 95% 4403.3 5399.9 Skewness .1101838 99% 4904.4 5630.3 Kurtosis 2.845637 Lactation number ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 1 1 10% 1 1 Obs 1,574 25% 1 1 Sum of Wgt. 1,574 50% 2 Mean 2.729987 Largest Std. Dev. 1.493841 75% 4 7 90% 5 7 Variance 2.23156 95% 5 7 Skewness .5450922 99% 6 7 Kurtosis 2.315593 Herd size ------------------------------------------------------------- Percentiles Smallest 1% 125 125 5% 125 125 10% 185 125 Obs 1,574 25% 201 125 Sum of Wgt. 1,574 50% 263 Mean 251.0076 Largest Std. Dev. 62.01692 75% 294 333 90% 333 333 Variance 3846.098 95% 333 333 Skewness -.3550929 99% 333 333 Kurtosis 2.256969 Dystocia at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0597205 Largest Std. Dev. .2370435 75% 0 1 90% 0 1 Variance .0561896 95% 1 1 Skewness 3.715938 99% 1 1 Kurtosis 14.80819 Retained placenta at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0946633 Largest Std. Dev. .2928423 75% 0 1 90% 0 1 Variance .0857566 95% 1 1 Skewness 2.769173 99% 1 1 Kurtosis 8.66832 Vaginal discharge observed ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 0 Mean .0520966 Largest Std. Dev. .2222924 75% 0 1 90% 0 1 Variance .0494139 95% 1 1 Skewness 4.031139 99% 1 1 Kurtosis 17.25008 . . * correlation . corr milk120 parity herd_size (obs=1,536) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 parity | 0.3821 1.0000 herd_size | -0.0433 0.0356 1.0000 . pwcorr milk120 parity herd_size, obs star(0.05) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 | 1536 | parity | 0.3821* 1.0000 | 1536 1574 | herd_size | -0.0433 0.0386 1.0000 | 1536 1574 1574 | . . * indices . * no analyses . . * unconditional associations . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . reg cf vag_disch Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 0.49 Model | 236.166881 1 236.166881 Prob > F = 0.4831 Residual | 747744.425 1,558 479.938656 R-squared = 0.0003 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.908 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- vag_disch | 1.743523 2.485484 0.70 0.483 -3.131724 6.61877 _cons | 71.21989 .5698436 124.98 0.000 70.10215 72.33763 ------------------------------------------------------------------------------ . . * principle components / factor analysis / correspondence analysis . * not covered . . * Functional Form of predictors . * residual plots . reg cf milk120 Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(1, 1523) = 0.72 Model | 329.282002 1 329.282002 Prob > F = 0.3977 Residual | 700869.964 1,523 460.19039 R-squared = 0.0005 -------------+---------------------------------- Adj R-squared = -0.0002 Total | 701199.246 1,524 460.104492 Root MSE = 21.452 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120 | .0006667 .0007881 0.85 0.398 -.0008793 .0022126 _cons | 68.95612 2.595207 26.57 0.000 63.86556 74.04667 ------------------------------------------------------------------------------ . predict stdres, rsta (49 missing values generated) . . * lowess smoother . twoway (scatter stdres milk120) (lowess stdres milk120) . . * Detecting and Correcting for non-linearity (transformation of X) . * categorization of predictor . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . egen parity_c6=cut(parity), at(0 1 2 3 4 5 6 15) icodes . tab parity parity_c6 Lactation | parity_c6 number | 1 2 3 4 5 6 | Total -----------+------------------------------------------------------------------+---------- 1 | 417 0 0 0 0 0 | 417 2 | 0 374 0 0 0 0 | 374 3 | 0 0 319 0 0 0 | 319 4 | 0 0 0 222 0 0 | 222 5 | 0 0 0 0 169 0 | 169 6 | 0 0 0 0 0 69 | 69 7 | 0 0 0 0 0 4 | 4 -----------+------------------------------------------------------------------+---------- Total | 417 374 319 222 169 73 | 1,574 . reg cf i.parity_c6 Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(5, 1554) = 0.91 Model | 2174.44893 5 434.889785 Prob > F = 0.4761 Residual | 745806.143 1,554 479.926733 R-squared = 0.0029 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.907 ------------------------------------------------------------------------------ cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity_c6 | 2 | -.6468817 1.568168 -0.41 0.680 -3.722829 2.429066 3 | -3.181369 1.635853 -1.94 0.052 -6.390081 .027343 4 | -1.528531 1.831255 -0.83 0.404 -5.120523 2.063462 5 | -2.322236 2.004684 -1.16 0.247 -6.254406 1.609935 6 | -.9349232 2.781437 -0.34 0.737 -6.390688 4.520841 | _cons | 72.61985 1.077984 67.37 0.000 70.5054 74.73431 ------------------------------------------------------------------------------ . . * quadratic function of X . gen ln_cf=ln(cf) (14 missing values generated) . reg ln_cf c.milk120k##c.milk120k Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133443 2 .354566722 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- milk120k | .2056932 .0697547 2.95 0.003 .0688677 .3425186 | c.milk120k#c.milk120k | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 3.883482 .1122207 34.61 0.000 3.663358 4.103605 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- milk120k | 48.58 0.020585 c.milk120k#| c.milk120k | 48.58 0.020585 -------------+---------------------- Mean VIF | 48.58 . vce, corr Correlation matrix of coefficients of regress model | c.m~120k# e(V) | milk120k c.m~120k _cons -------------+------------------------------ milk120k | 1.0000 c.milk120k#| c.milk120k | -0.9897 1.0000 _cons | -0.9872 0.9559 1.0000 . . * redoing the analysis with milk120 centred . summ milk120k Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- milk120k | 1,536 3.215096 .6981316 1.1102 5.6303 . gen m120k_ct=(milk120k-r(mean)) /*r(mean) - memory variable created by summ command that store the > mean of the variable*/ (38 missing values generated) . reg ln_cf c.m120k_ct##c.m120k_ct Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133424 2 .354566712 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------------+---------------------------------------------------------------- m120k_ct | .0159242 .0100484 1.58 0.113 -.0037859 .0356344 | c.m120k_ct#c.m120k_ct | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 4.239742 .0086679 489.13 0.000 4.22274 4.256745 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- m120k_ct | 1.01 0.992010 c.m120k_ct#| c.m120k_ct | 1.01 0.992010 -------------+---------------------- Mean VIF | 1.01 . vce, corr Correlation matrix of coefficients of regress model | c.m120~t# e(V) | m120k_ct c.m120~t _cons -------------+------------------------------ m120k_ct | 1.0000 c.m120k_ct#| c.m120k_ct | -0.0894 1.0000 _cons | 0.0494 -0.5936 1.0000 . . capture drop stdres . predict stdres, rsta (49 missing values generated) . twoway (scatter stdres m120k_ct) (lowess stdres m120k_ct) . . * fractional polynomials . fp , scale center replace: reg ln_cf (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------- milk120k | df Deviance Res. s.d. Dev. dif. P(*) Powers -------------+----------------------------------------------------------------- omitted | 0 367.951 0.273 10.533 0.033 linear | 1 366.159 0.273 8.741 0.033 1 m = 1 | 2 361.396 0.273 3.978 0.138 -2 m = 2 | 4 357.418 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------- (*) P = sig. level of model with m = 2 based on F with 1520 denominator dof. Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 5.27 Model | .782289941 2 .391144971 Prob > F = 0.0052 Residual | 112.870944 1,522 .074159622 R-squared = 0.0069 -------------+---------------------------------- Adj R-squared = 0.0056 Total | 113.653234 1,524 .074575613 Root MSE = .27232 ------------------------------------------------------------------------------ ln_cf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5301266 .1654544 -3.20 0.001 -.8546694 -.2055839 milk120k_2 | -.0008516 .0004271 -1.99 0.046 -.0016894 -.0000138 _cons | 4.238434 .008295 510.96 0.000 4.222163 4.254704 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . . *or you can predict fitted values and standardized residuals and plot them . capture drop fit1 . capture drop stdres . predict fit1 (option xb assumed; fitted values) (49 missing values generated) . predict stdres, rstand (49 missing values generated) . twoway scatter ln_cf milk120k || line fit1 milk120k, sort . twoway (scatter stdres milk120k) (lowess stdres milk120k) . . * interactions . * 2-way . reg wpc_sqrt hs_ct hs_sq aut_calv twin dyst##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 -------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- hs_ct | -.0240984 .0083995 -2.87 0.004 -.0405737 -.007623 hs_sq | .0000729 .0000174 4.20 0.000 .0000389 .0001069 aut_calv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 twin | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 3.038895 1.165538 2.61 0.009 .7527158 5.325074 -------------------------------------------------------------------------------- . . * Selection criteria . * non-statistical . * no analyses . . * statistical - nested models . * Wald test . reg wpc_sqrt c.hs_ct##c.hs_ct aut_calv twin i.dyst##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | aut_calv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 twin | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 --------------------------------------------------------------------------------- . testparm i.dyst#vag_disch ( 1) 1.dyst#1.vag_disch = 0 F( 1, 1566) = 6.62 Prob > F = 0.0102 . . * AIC and BIC . * statistical non-nested - this in logistic regression . . *Selection strategy - Coleman dataset . * best subset - install command vselect . use coleman.dta, clear . rename x1 x1_staff_sal . rename x2 x2_father_job . rename x3 x3_ses . rename x4 x4_test_teach . rename x5 x5_edu_mother . rename y y_test_scr . . vselect y_test_scr x1_staff_sal x2_father_job x3_ses /// > x4_test_teach x5_edu_mother, best Response : y_test_scr Fixed Predictors : Selected Predictors: x3_ses x4_test_teach x1_staff_sal x5_edu_mother x2_father_job Actual Regressions 5 Possible Regressions 32 Optimal Models Highlighted: # Preds R2ADJ C AIC AICC BIC 1 .8518292 4.974883 90.89429 149.1518 92.88576 2 .8740953 2.832759 88.49432 147.9185 91.48152 3 .8820943 2.836089 87.96903 149.0123 91.95196 4 .8756399 4.670221 89.74417 152.9633 94.72284 5 .8728444 6 90.80893 156.8998 96.78332 Selected Predictors 1 : x3_ses 2 : x3_ses x4_test_teach 3 : x3_ses x4_test_teach x1_staff_sal 4 : x3_ses x4_test_teach x1_staff_sal x5_edu_mother 5 : x3_ses x4_test_teach x1_staff_sal x5_edu_mother x2_father_job . . * forward selection . stepwise, pe(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother begin with empty model p = 0.0000 < 0.1000 adding x3_ses p = 0.0566 < 0.1000 adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * backward elimination . stepwise, pr(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother begin with full model p = 0.4267 >= 0.1000 removing x2_father_job p = 0.6863 >= 0.1000 removing x5_edu_mother p = 0.1616 >= 0.1000 removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * stepwise selection . stepwise, pe(0.1) pr(0.11): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_m > other begin with full model p = 0.4267 >= 0.1100 removing x2_father_job p = 0.6863 >= 0.1100 removing x5_edu_mother p = 0.1616 >= 0.1100 removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . . *selection strategy - daisy2red . use daisy2red, clear . gen hs_ct=herd_size-251 . gen hs_sq=herd_size^2 . gen parity1=parity-1 . gen month=month(calv_dt) . gen aut_clv=(month>=2 & month<=7) if !missing(calv_dt) . gen sqrt_wpc=sqrt(wpc) . gen twdy=twin*dyst . gen rpvd=rp*vag_disch . . reg sqrt_wpc parity1 aut_clv hs_ct hs_sq twin dyst twdy rp vag_disch rpvd Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(10, 1563) = 14.84 Model | 1153.69711 10 115.369711 Prob > F = 0.0000 Residual | 12152.5697 1,563 7.77515658 R-squared = 0.0867 -------------+---------------------------------- Adj R-squared = 0.0809 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7884 ------------------------------------------------------------------------------ sqrt_wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0558659 .0480009 1.16 0.245 -.0382871 .150019 aut_clv | -.5128308 .1418525 -3.62 0.000 -.7910719 -.2345896 hs_ct | -.0234668 .0084135 -2.79 0.005 -.0399698 -.0069638 hs_sq | .0000713 .0000174 4.10 0.000 .0000372 .0001054 twin | 1.698238 .5842331 2.91 0.004 .552275 2.844201 dyst | .627718 .3100054 2.02 0.043 .0196477 1.235788 twdy | -2.772795 1.739103 -1.59 0.111 -6.184016 .6384255 rp | .3904235 .2689735 1.45 0.147 -.1371634 .9180105 vag_disch | -.0422026 .4007079 -0.11 0.916 -.8281844 .7437792 rpvd | 1.476058 .6996667 2.11 0.035 .1036735 2.848442 _cons | 3.023021 1.168247 2.59 0.010 .7315249 5.314516 ------------------------------------------------------------------------------ . . *stepwise backward . stepwise, pe(0.05) pr(0.051) lockterm1: reg sqrt_wpc parity1 aut_clv (hs_ct hs_sq) /// > (dyst twin twdy) (rp vag_disch rpvd) begin with full model p < 0.0510 for all terms in model Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(10, 1563) = 14.84 Model | 1153.69711 10 115.369711 Prob > F = 0.0000 Residual | 12152.5697 1,563 7.77515658 R-squared = 0.0867 -------------+---------------------------------- Adj R-squared = 0.0809 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7884 ------------------------------------------------------------------------------ sqrt_wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0558659 .0480009 1.16 0.245 -.0382871 .150019 aut_clv | -.5128308 .1418525 -3.62 0.000 -.7910719 -.2345896 hs_ct | -.0234668 .0084135 -2.79 0.005 -.0399698 -.0069638 hs_sq | .0000713 .0000174 4.10 0.000 .0000372 .0001054 dyst | .627718 .3100054 2.02 0.043 .0196477 1.235788 twin | 1.698238 .5842331 2.91 0.004 .552275 2.844201 twdy | -2.772795 1.739103 -1.59 0.111 -6.184016 .6384255 rp | .3904235 .2689735 1.45 0.147 -.1371634 .9180105 vag_disch | -.0422026 .4007079 -0.11 0.916 -.8281844 .7437792 rpvd | 1.476058 .6996667 2.11 0.035 .1036735 2.848442 _cons | 3.023021 1.168247 2.59 0.010 .7315249 5.314516 ------------------------------------------------------------------------------ . estimates store sw_1 . . stepwise, pe(0.05) pr(0.051) lockterm1: reg sqrt_wpc parity1 aut_clv (hs_ct hs_sq) /// > dyst twin twdy rp vag_disch rpvd begin with full model p = 0.9161 >= 0.0510 removing vag_disch p = 0.1434 >= 0.0510 removing rp p = 0.1136 >= 0.0510 removing twdy p = 0.0616 >= 0.0510 removing dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(6, 1567) = 23.31 Model | 1090.25318 6 181.708864 Prob > F = 0.0000 Residual | 12216.0137 1,567 7.79579685 R-squared = 0.0819 -------------+---------------------------------- Adj R-squared = 0.0784 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7921 ------------------------------------------------------------------------------ sqrt_wpc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- parity1 | .0438808 .0474418 0.92 0.355 -.0491753 .1369369 aut_clv | -.5244014 .1418074 -3.70 0.000 -.8025536 -.2462491 hs_ct | -.0216107 .0083743 -2.58 0.010 -.0380366 -.0051847 hs_sq | .0000669 .0000173 3.87 0.000 .000033 .0001009 rpvd | 1.831729 .5194199 3.53 0.000 .8128976 2.85056 twin | 1.478791 .5486927 2.70 0.007 .4025419 2.55504 _cons | 3.404817 1.155405 2.95 0.003 1.138515 5.67112 ------------------------------------------------------------------------------ . estimates store sw_2 . estimates table sw_1 sw_2 ---------------------------------------- Variable | sw_1 sw_2 -------------+-------------------------- parity1 | .05586593 .04388077 aut_clv | -.51283075 -.52440137 hs_ct | -.02346682 -.02161068 hs_sq | .0000713 .00006694 dyst | .62771805 twin | 1.6982381 1.4787911 twdy | -2.7727951 rp | .39042354 vag_disch | -.04220258 rpvd | 1.4760578 1.8317287 _cons | 3.0230207 3.4048174 ---------------------------------------- . . * reliability . * split sample analysis . use daisy2red, clear . gen wpc_sqrt=sqrt(wpc) . gen hs_ct=herd_size-251 . gen parity1=parity-1 . gen month=month(calv_dt) . gen aut_calv=(month>=2 & month<=7) . . gen rand=uniform() . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 aut_calv twin i.dyst##vag_disch if rand<0.6 Source | SS df MS Number of obs = 926 -------------+---------------------------------- F(8, 917) = 12.55 Model | 776.173652 8 97.0217065 Prob > F = 0.0000 Residual | 7087.93325 917 7.72948009 R-squared = 0.0987 -------------+---------------------------------- Adj R-squared = 0.0908 Total | 7864.1069 925 8.50173719 Root MSE = 2.7802 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0117182 .0015609 7.51 0.000 .0086549 .0147815 | c.hs_ct#c.hs_ct | .0000857 .0000224 3.83 0.000 .0000417 .0001296 | parity1 | .0810866 .0625783 1.30 0.195 -.0417268 .2039 aut_calv | -.7456408 .1837457 -4.06 0.000 -1.106252 -.38503 twin | 1.784134 .6584103 2.71 0.007 .4919681 3.0763 | dyst | yes | .8317532 .4376673 1.90 0.058 -.0271926 1.690699 | vag_disch | yes | .9265756 .4812136 1.93 0.054 -.0178322 1.870983 | dyst#vag_disch | yes#yes | -3.853194 1.17308 -3.28 0.001 -6.155426 -1.550961 | _cons | 7.593925 .1916378 39.63 0.000 7.217825 7.970025 --------------------------------------------------------------------------------- . scalar r2_1 = e(r2) . capture drop pv . predict pv (option xb assumed; fitted values) . *cross validation correlation . corr pv wpc_sqrt if rand>=0.6 (obs=648) | pv wpc_sqrt -------------+------------------ pv | 1.0000 wpc_sqrt | 0.2268 1.0000 . *shrinkage on cross-validation . scalar cv_sq = r(rho)^2 /* this computes r2 - r(rho) is a "saved result" from -corr- */ . di "R2 first model= " r2_1 " and CV2= " cv_sq R2 first model= .09869826 and CV2= .05145487 . di "shrinkage on cross-validation = " r2_1-cv_sq shrinkage on cross-validation = .04724339 . . *leave-one-out (advanced commands) . * computin "n" regression models without obs "i" using residual and leverage values . * and then calculating PRESS (predicted residual sum of squares) . . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 aut_calv twin dyst##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 aut_calv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 twin | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . capture drop res lev eq1 . predict res, res . predict lev ,lev . gen eq1=(res/(1-lev))^2 . summ eq1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- eq1 | 1,574 7.851331 10.9765 1.85e-07 83.92817 . di "PRESS = " r(sum) //error sum square from predicted points PRESS = 12357.995 . di "ESS full= "e(rss) //residuals sum square from full model ESS full= 12203.975 . . di "R2(pred) = "1-(r(sum)/ (`e(mss)' + `e(rss)')) R2(pred) = .07126506 . di "R2 full model = " e(r2) R2 full model = .08284007 . . * presenting results . * standardize coefficients . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 aut_calv twin dyst##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 aut_calv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 twin | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . * rescale coefficient by SD of the predictor . matrix define coeff = e(b)' . capture drop coeff* . svmat coeff . summ hs_ct parity1 aut_calv twin dyst vag_disch Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- hs_ct | 1,574 .0076239 62.01692 -126 82 parity1 | 1,574 1.729987 1.493841 0 6 aut_calv | 1,574 .4720457 .4993766 0 1 twin | 1,574 .0171537 .1298854 0 1 dyst | 1,574 .0597205 .2370435 0 1 -------------+--------------------------------------------------------- vag_disch | 1,574 .0520966 .2222924 0 1 . . * IQR . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 aut_calv twin dyst##vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 aut_calv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 twin | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . summ parity1,d /* note IQR = 3 - 0 = 3 */ parity1 ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of Wgt. 1,574 50% 1 Mean 1.729987 Largest Std. Dev. 1.493841 75% 3 6 90% 4 6 Variance 2.23156 95% 4 6 Skewness .5450922 99% 5 6 Kurtosis 2.315593 . /* coefficient for parity = 0.062 */ . end of do-file . log close name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text closed on: 19 Jan 2016, 09:41:15 -----------------------------------------------------------------------------------------------------