----------------------------------------------------------------------------------------------------------------------------- name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text opened on: 19 Jan 2023, 11:27:31 . set more off . . * open the DAISY Red dataset . use daisy2red1.dta, clear . . * Specifying maximum model . * no analyses . . * Causal model . * no analyses . . **# Reducing the Number of Predictors . * descriptive statistics . codebook cf vag_disch ----------------------------------------------------------------------------------------------------------------------------- cf Calving to first service interval ----------------------------------------------------------------------------------------------------------------------------- Type: Numeric (float) Range: [21,240] Units: 1 Unique values: 129 Missing .: 14/1,574 Mean: 71.3115 Std. dev.: 21.9039 Percentiles: 10% 25% 50% 75% 90% 51 59 67 78 96 ----------------------------------------------------------------------------------------------------------------------------- vag_disch Vaginal discharge observed ----------------------------------------------------------------------------------------------------------------------------- Type: Numeric (float) Label: noyes Range: [0,1] Units: 1 Unique values: 2 Missing .: 0/1,574 Tabulation: Freq. Numeric Label 1,492 0 no 82 1 yes . codebook milk120 parity herd_size dyst rp vag_disch, compact Variable Obs Unique Mean Min Max Label ----------------------------------------------------------------------------------------------------------------------------- milk120 1536 1486 3215.096 1110.2 5630.3 Milk volume (l) in first 120 days of lactation parity 1574 7 2.729987 1 7 Lactation number herd_size 1574 7 251.0076 125 333 Herd size dyst 1574 2 .0597205 0 1 Dystocia at calving rp 1574 2 .0946633 0 1 Retained placenta at calving vag_disch 1574 2 .0520966 0 1 Vaginal discharge observed ----------------------------------------------------------------------------------------------------------------------------- . . sum milk120 parity herd_size dyst rp vag_disch, detail Milk volume (l) in first 120 days of lactation ------------------------------------------------------------- Percentiles Smallest 1% 1698.5 1110.2 5% 2077.2 1397.7 10% 2298.2 1461.5 Obs 1,536 25% 2731.95 1467 Sum of wgt. 1,536 50% 3215.25 Mean 3215.096 Largest Std. dev. 698.1316 75% 3682.1 5181.8 90% 4080.7 5278 Variance 487387.7 95% 4403.3 5399.9 Skewness .1101838 99% 4904.4 5630.3 Kurtosis 2.845637 Lactation number ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 1 1 10% 1 1 Obs 1,574 25% 1 1 Sum of wgt. 1,574 50% 2 Mean 2.729987 Largest Std. dev. 1.493841 75% 4 7 90% 5 7 Variance 2.23156 95% 5 7 Skewness .5450922 99% 6 7 Kurtosis 2.315593 Herd size ------------------------------------------------------------- Percentiles Smallest 1% 125 125 5% 125 125 10% 185 125 Obs 1,574 25% 201 125 Sum of wgt. 1,574 50% 263 Mean 251.0076 Largest Std. dev. 62.01692 75% 294 333 90% 333 333 Variance 3846.098 95% 333 333 Skewness -.3550929 99% 333 333 Kurtosis 2.256969 Dystocia at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 0 Mean .0597205 Largest Std. dev. .2370435 75% 0 1 90% 0 1 Variance .0561896 95% 1 1 Skewness 3.715938 99% 1 1 Kurtosis 14.80819 Retained placenta at calving ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 0 Mean .0946633 Largest Std. dev. .2928423 75% 0 1 90% 0 1 Variance .0857566 95% 1 1 Skewness 2.769173 99% 1 1 Kurtosis 8.66832 Vaginal discharge observed ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 0 Mean .0520966 Largest Std. dev. .2222924 75% 0 1 90% 0 1 Variance .0494139 95% 1 1 Skewness 4.031139 99% 1 1 Kurtosis 17.25008 . . * correlation . corr milk120 parity herd_size (obs=1,536) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 parity | 0.3821 1.0000 herd_size | -0.0433 0.0356 1.0000 . pwcorr milk120 parity herd_size, obs star(0.05) | milk120 parity herd_s~e -------------+--------------------------- milk120 | 1.0000 | 1536 | parity | 0.3821* 1.0000 | 1536 1574 | herd_size | -0.0433 0.0386 1.0000 | 1536 1574 1574 | . tab rp dyst,chi exact Retained | placenta | Dystocia at calving at calving | no yes | Total -----------+----------------------+---------- no | 1,347 78 | 1,425 yes | 133 16 | 149 -----------+----------------------+---------- Total | 1,480 94 | 1,574 Pearson chi2(1) = 6.6580 Pr = 0.010 Fisher's exact = 0.017 1-sided Fisher's exact = 0.012 . tab vag_disch dyst, chi exact Vaginal | discharge | Dystocia at calving observed | no yes | Total -----------+----------------------+---------- no | 1,412 80 | 1,492 yes | 68 14 | 82 -----------+----------------------+---------- Total | 1,480 94 | 1,574 Pearson chi2(1) = 18.9847 Pr = 0.000 Fisher's exact = 0.000 1-sided Fisher's exact = 0.000 . tab vag_disch rp, chi exact Vaginal | Retained placenta at discharge | calving observed | no yes | Total -----------+----------------------+---------- no | 1,373 119 | 1,492 yes | 52 30 | 82 -----------+----------------------+---------- Total | 1,425 149 | 1,574 Pearson chi2(1) = 74.2346 Pr = 0.000 Fisher's exact = 0.000 1-sided Fisher's exact = 0.000 . . * indices . * no analyses . . * unconditional associations . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . reg cf i.vag_disch Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 0.49 Model | 236.166881 1 236.166881 Prob > F = 0.4831 Residual | 747744.425 1,558 479.938656 R-squared = 0.0003 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.908 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- vag_disch | yes | 1.743523 2.485484 0.70 0.483 -3.131724 6.61877 _cons | 71.21989 .5698436 124.98 0.000 70.10215 72.33763 ------------------------------------------------------------------------------ . . * principal components / factor analysis / correspondence analysis . * not covered . . **# Functional Form of predictors . * residual plots . reg cf milk120 Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(1, 1523) = 0.72 Model | 329.282002 1 329.282002 Prob > F = 0.3977 Residual | 700869.964 1,523 460.19039 R-squared = 0.0005 -------------+---------------------------------- Adj R-squared = -0.0002 Total | 701199.246 1,524 460.104492 Root MSE = 21.452 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120 | .0006667 .0007881 0.85 0.398 -.0008793 .0022126 _cons | 68.95612 2.595207 26.57 0.000 63.86556 74.04667 ------------------------------------------------------------------------------ . predict stdres, rsta (49 missing values generated) . . * lowess smoother . twoway (scatter stdres milk120) (lowess stdres milk120) . lowess stdres milk120 . * Detecting and Correcting for non-linearity (transformation of X) . * categorization of predictor . reg cf parity Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(1, 1558) = 1.46 Model | 699.698494 1 699.698494 Prob > F = 0.2273 Residual | 747280.894 1,558 479.641139 R-squared = 0.0009 -------------+---------------------------------- Adj R-squared = 0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.901 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity | -.4480144 .3709323 -1.21 0.227 -1.175594 .2795649 _cons | 72.53554 1.155186 62.79 0.000 70.26965 74.80142 ------------------------------------------------------------------------------ . egen parity_c6=cut(parity), at(0 1 2 3 4 5 6 15) icodes . tab parity parity_c6 Lactation | parity_c6 number | 1 2 3 4 5 6 | Total -----------+------------------------------------------------------------------+---------- 1 | 417 0 0 0 0 0 | 417 2 | 0 374 0 0 0 0 | 374 3 | 0 0 319 0 0 0 | 319 4 | 0 0 0 222 0 0 | 222 5 | 0 0 0 0 169 0 | 169 6 | 0 0 0 0 0 69 | 69 7 | 0 0 0 0 0 4 | 4 -----------+------------------------------------------------------------------+---------- Total | 417 374 319 222 169 73 | 1,574 . reg cf i.parity_c6 Source | SS df MS Number of obs = 1,560 -------------+---------------------------------- F(5, 1554) = 0.91 Model | 2174.44893 5 434.889785 Prob > F = 0.4761 Residual | 745806.143 1,554 479.926733 R-squared = 0.0029 -------------+---------------------------------- Adj R-squared = -0.0003 Total | 747980.592 1,559 479.782291 Root MSE = 21.907 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- parity_c6 | 2 | -.6468817 1.568168 -0.41 0.680 -3.722829 2.429066 3 | -3.181369 1.635853 -1.94 0.052 -6.390081 .027343 4 | -1.528531 1.831255 -0.83 0.404 -5.120523 2.063462 5 | -2.322236 2.004684 -1.16 0.247 -6.254406 1.609935 6 | -.9349232 2.781437 -0.34 0.737 -6.390688 4.520841 | _cons | 72.61985 1.077984 67.37 0.000 70.5054 74.73431 ------------------------------------------------------------------------------ . di "F-statistic = " ((747280.9-745806.1)/(1558-1554))/480 F-statistic = .768125 . di "P-value = " Ftail(4, 1554, ((747280.9-745806.1)/(1558-1554))/480) // R model is valid but term is NS P-value = .54594195 . . . *box cox did not work with raw need to use log(cf) . gen ln_cf=ln(cf) (14 missing values generated) . boxcox ln_cf milk120, model(rhs) Fitting full model Iteration 0: log likelihood = -183.07949 (not concave) Iteration 1: log likelihood = -180.48708 (not concave) Iteration 2: log likelihood = -180.09968 Iteration 3: log likelihood = -180.01808 Iteration 4: log likelihood = -180.01807 (not concave) Iteration 5: log likelihood = -180.01807 Iteration 6: log likelihood = -180.01806 Iteration 7: log likelihood = -180.01804 Iteration 8: log likelihood = -180.01803 (38 missing values generated) (38 missing values generated) (38 missing values generated) Number of obs = 1,525 LR chi2(2) = 7.91 Log likelihood = -180.01803 Prob > chi2 = 0.019 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- /lambda | -2.997824 .0059421 -504.50 0.000 -3.009471 -2.986178 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coefficient -------------+-------------- Notrans | _cons | -4.74e+08 -------------+-------------- Trans | milk120 | 1.42e+09 -------------+-------------- /sigma | .2722883 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- lambda = -1 -181.5923 3.15 0.076 lambda = 0 -182.42186 4.81 0.028 lambda = 1 -183.07949 6.12 0.013 --------------------------------------------------------- . . * quadratic function of X . reg ln_cf c.milk120k##c.milk120k Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133443 2 .354566722 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- milk120k | .2056932 .0697547 2.95 0.003 .0688677 .3425186 | c.milk120k#c.milk120k | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 3.883482 .1122207 34.61 0.000 3.663358 4.103605 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- milk120k | 48.58 0.020585 c.milk120k#| c.milk120k | 48.58 0.020585 -------------+---------------------- Mean VIF | 48.58 . vce, corr Correlation matrix of coefficients of regress model | c.m~120k# e(V) | milk120k c.m~120k _cons -------------+------------------------------ milk120k | 1.0000 c.milk120k#| c.milk120k | -0.9897 1.0000 _cons | -0.9872 0.9559 1.0000 . . * re-doing the analysis with milk120 centred . summ milk120k Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- milk120k | 1,536 3.215096 .6981316 1.1102 5.6303 . gen m120k_ct=(milk120k-r(mean)) /*r(mean) - memory variable created by summ command that store the mean of the variable*/ (38 missing values generated) . reg ln_cf c.m120k_ct##c.m120k_ct Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 4.78 Model | .709133424 2 .354566712 Prob > F = 0.0085 Residual | 112.944101 1,522 .074207688 R-squared = 0.0062 -------------+---------------------------------- Adj R-squared = 0.0049 Total | 113.653234 1,524 .074575613 Root MSE = .27241 --------------------------------------------------------------------------------------- ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- m120k_ct | .0159242 .0100484 1.58 0.113 -.0037859 .0356344 | c.m120k_ct#c.m120k_ct | -.0295122 .0105961 -2.79 0.005 -.0502966 -.0087277 | _cons | 4.239742 .0086679 489.13 0.000 4.22274 4.256745 --------------------------------------------------------------------------------------- . estat vif Variable | VIF 1/VIF -------------+---------------------- m120k_ct | 1.01 0.992010 c.m120k_ct#| c.m120k_ct | 1.01 0.992010 -------------+---------------------- Mean VIF | 1.01 . vce, corr Correlation matrix of coefficients of regress model | c.m120~t# e(V) | m120k_ct c.m120~t _cons -------------+------------------------------ m120k_ct | 1.0000 c.m120k_ct#| c.m120k_ct | -0.0894 1.0000 _cons | 0.0494 -0.5936 1.0000 . . capture drop stdres . predict stdres, rsta (49 missing values generated) . twoway (scatter stdres m120k_ct) (lowess stdres m120k_ct) . . * fractional polynomials . fp , scale center replace: reg ln_cf (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 367.951 0.273 10.533 0.033 linear | 3 366.159 0.273 8.741 0.033 1 m = 1 | 2 361.396 0.273 3.978 0.138 -2 m = 2 | 0 357.418 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 1520). Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(2, 1522) = 5.27 Model | .782289941 2 .391144971 Prob > F = 0.0052 Residual | 112.870944 1,522 .074159622 R-squared = 0.0069 -------------+---------------------------------- Adj R-squared = 0.0056 Total | 113.653234 1,524 .074575613 Root MSE = .27232 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5301266 .1654544 -3.20 0.001 -.8546694 -.2055839 milk120k_2 | -.0008516 .0004271 -1.99 0.046 -.0016894 -.0000138 _cons | 4.238434 .008295 510.96 0.000 4.222163 4.254704 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(residuals) ytitle(Predicted Ln(cf)) /*plot with residuals*/ . . *more predictors . fp , scale center replace: reg ln_cf parity rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 361.366 0.273 12.986 0.012 linear | 3 357.527 0.272 9.147 0.028 1 m = 1 | 2 351.183 0.272 2.803 0.248 -2 m = 2 | 0 348.380 0.272 0.000 -- -2 3 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 1518). Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(4, 1520) = 4.91 Model | 1.44923978 4 .362309946 Prob > F = 0.0006 Residual | 112.203994 1,520 .073818417 R-squared = 0.0128 -------------+---------------------------------- Adj R-squared = 0.0102 Total | 113.653234 1,524 .074575613 Root MSE = .2717 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -.5748179 .1665152 -3.45 0.001 -.9014417 -.2481941 milk120k_2 | -.0007211 .0004312 -1.67 0.095 -.001567 .0001247 parity | -.0097322 .005001 -1.95 0.052 -.0195417 .0000773 rp | .053428 .0236505 2.26 0.024 .007037 .099819 _cons | 4.260094 .0162315 262.46 0.000 4.228255 4.291932 ------------------------------------------------------------------------------ . fp , scale center replace: reg ln_cf milk120k_1 milk120k_2 rp (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance parity | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 352.175 0.272 11.021 0.027 linear | 3 348.380 0.272 7.227 0.066 1 m = 1 | 2 344.414 0.271 3.261 0.198 -2 m = 2 | 0 341.154 0.271 0.000 -- 2 2 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 1517). Source | SS df MS Number of obs = 1,525 -------------+---------------------------------- F(5, 1519) = 5.39 Model | 1.97968732 5 .395937465 Prob > F = 0.0001 Residual | 111.673547 1,519 .073517806 R-squared = 0.0174 -------------+---------------------------------- Adj R-squared = 0.0142 Total | 113.653234 1,524 .074575613 Root MSE = .27114 ------------------------------------------------------------------------------ ln_cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -.609147 .1666002 -3.66 0.000 -.9359378 -.2823563 milk120k_2 | -.0005881 .0004331 -1.36 0.175 -.0014377 .0002615 parity_1 | -.020564 .0064416 -3.19 0.001 -.0331993 -.0079287 parity_2 | .010903 .0035577 3.06 0.002 .0039244 .0178816 rp | .0572608 .0236445 2.42 0.016 .0108815 .1036401 _cons | 4.216664 .0107005 394.06 0.000 4.195674 4.237653 ------------------------------------------------------------------------------ . fp plot, r(none) ytitle(Predicted Ln(cf)) . fp plot, r(residuals) ytitle(Predicted Ln(cf)) /*plot with residuals*/ . . *or you can predict fitted values and standardized residuals and plot them . capture drop fit1 . capture drop stdres . predict fit1 (option xb assumed; fitted values) (49 missing values generated) . predict stdres, rstand (49 missing values generated) . twoway (scatter stdres milk120k) (lowess stdres milk120k) . . * interactions . * 2-way . reg wpc_sqrt c.hs_ct##c.hs_ct i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | 1.aut_clv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 | twin | yes | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 --------------------------------------------------------------------------------- . . **# Selection criteria . * non-statistical . * no analyses . . * statistical - nested models . * Wald test . reg wpc_sqrt c.hs_ct##c.hs_ct i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 19.94 Model | 1089.06861 7 155.581229 Prob > F = 0.0000 Residual | 12217.1982 1,566 7.80153144 R-squared = 0.0818 -------------+---------------------------------- Adj R-squared = 0.0777 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7931 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- hs_ct | .0124973 .0012109 10.32 0.000 .0101222 .0148724 | c.hs_ct#c.hs_ct | .0000729 .0000174 4.20 0.000 .0000389 .0001069 | 1.aut_clv | -.5220441 .1417584 -3.68 0.000 -.8001004 -.2439877 | twin | yes | 1.533154 .5476246 2.80 0.005 .4589989 2.607308 | dyst | yes | .7636387 .3243731 2.35 0.019 .1273874 1.39989 | vag_disch | yes | .979344 .3502916 2.80 0.005 .2922541 1.666434 | dyst#vag_disch | yes#yes | -2.272293 .8833614 -2.57 0.010 -4.004989 -.5395972 | _cons | 7.631652 .1213388 62.90 0.000 7.393648 7.869655 --------------------------------------------------------------------------------- . testparm i.dyst#i.vag_disch ( 1) 1.dyst#1.vag_disch = 0 F( 1, 1566) = 6.62 Prob > F = 0.0102 . . * AIC and BIC . * statistical non-nested - this in logistic regression . . **# Selection strategy . use coleman1.dta, clear . * forward selection . stepwise, pe(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother Wald test, begin with empty model: p = 0.0000 < 0.1000, adding x3_ses p = 0.0566 < 0.1000, adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * backward elimination . stepwise, pr(0.1): reg y_test_scr x1_staff_sal x2_father_job x3_ses /// > x4_test_teach x5_edu_mother Wald test, begin with full model: p = 0.4267 >= 0.1000, removing x2_father_job p = 0.6863 >= 0.1000, removing x5_edu_mother p = 0.1616 >= 0.1000, removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * stepwise selection . * backward . stepwise, pe(0.1) pr(0.11): reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother Wald test, begin with full model: p = 0.4267 >= 0.1100, removing x2_father_job p = 0.6863 >= 0.1100, removing x5_edu_mother p = 0.1616 >= 0.1100, removing x1_staff_sal Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . * forward . stepwise, pe(0.1) pr(0.11) forward: reg y_test_scr x1_staff_sal x2_father_job x3_ses x4_test_teach x5_edu_mother Wald test, begin with empty model: p = 0.0000 < 0.1000, adding x3_ses p = 0.0566 < 0.1000, adding x4_test_teach Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------- y_test_scr | Coefficient Std. err. t P>|t| [95% conf. interval] --------------+---------------------------------------------------------------- x3_ses | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4_test_teach | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------- . . *Daisy dataset . use daisy2red1.dta, clear . *stepwise backward . stepwise, pe(0.05) pr(0.051) : reg wpc_sqrt parity1 aut_clv c.hs_ct##c.hs_ct i.dyst##i.twin i.rp##i.vag_disch note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0o.dyst#0o.twin omitted because of estimability. note: 0o.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0o.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0o.rp#0o.vag_disch omitted because of estimability. note: 0o.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0o.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.9161 >= 0.0510, removing 1.vag_disch p = 0.2407 >= 0.0510, removing parity1 p = 0.1522 >= 0.0510, removing 1.rp p = 0.1062 >= 0.0510, removing 1.dyst#1.twin p = 0.0855 >= 0.0510, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(5, 1568) = 27.80 Model | 1083.58379 5 216.716758 Prob > F = 0.0000 Residual | 12222.6831 1,568 7.79507848 R-squared = 0.0814 -------------+---------------------------------- Adj R-squared = 0.0785 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.792 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- rp#vag_disch | yes#yes | 1.843187 .5192482 3.55 0.000 .8246926 2.861681 | aut_clv | -.5170909 .1415804 -3.65 0.000 -.7947978 -.2393839 hs_ct | .0120631 .0011988 10.06 0.000 .0097116 .0144146 | c.hs_ct#c.hs_ct | .000068 .0000173 3.94 0.000 .0000342 .0001018 | twin | yes | 1.508587 .547721 2.75 0.006 .4342445 2.58293 _cons | 7.689887 .1171432 65.65 0.000 7.460113 7.91966 --------------------------------------------------------------------------------- . *stepwise backward but keeping first term even if NS and keep all the terms in quadratic . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 aut_clv (c.hs_ct##c.hs_ct) (i.dyst##i.twin) (i.rp##i.vag_disc > h) note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0b.dyst#0b.twin omitted because of estimability. note: 0b.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0b.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0b.rp#0b.vag_disch omitted because of estimability. note: 0b.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0b.vag_disch omitted because of estimability. Wald test, begin with full model: p < 0.0510 for all terms in model Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(10, 1563) = 14.84 Model | 1153.69711 10 115.369711 Prob > F = 0.0000 Residual | 12152.5697 1,563 7.77515658 R-squared = 0.0867 -------------+---------------------------------- Adj R-squared = 0.0809 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7884 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- parity1 | .0558659 .0480009 1.16 0.245 -.0382871 .150019 aut_clv | -.5128308 .1418525 -3.62 0.000 -.7910719 -.2345896 hs_ct | .012328 .0012094 10.19 0.000 .0099559 .0147002 | c.hs_ct#c.hs_ct | .0000713 .0000174 4.10 0.000 .0000372 .0001054 | dyst | yes | .627718 .3100054 2.02 0.043 .0196477 1.235788 | twin | yes | 1.698238 .5842331 2.91 0.004 .552275 2.844201 | dyst#twin | yes#yes | -2.772795 1.739103 -1.59 0.111 -6.184016 .6384255 | rp | yes | .3904235 .2689735 1.45 0.147 -.1371634 .9180105 | vag_disch | yes | -.0422026 .4007079 -0.11 0.916 -.8281844 .7437792 | rp#vag_disch | yes#yes | 1.476058 .6996667 2.11 0.035 .1036735 2.848442 | _cons | 7.515274 .1472404 51.04 0.000 7.226465 7.804084 --------------------------------------------------------------------------------- . estimates store sw_1 . *now assessing variables from interactions . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 aut_clv (c.hs_ct##c.hs_ct) i.dyst##i.twin i.rp##i.vag_disch note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0o.dyst#0o.twin omitted because of estimability. note: 0o.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0o.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0o.rp#0o.vag_disch omitted because of estimability. note: 0o.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0o.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.9161 >= 0.0510, removing 1.vag_disch p = 0.1434 >= 0.0510, removing 1.rp p = 0.1136 >= 0.0510, removing 1.dyst#1.twin p = 0.0616 >= 0.0510, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(6, 1567) = 23.31 Model | 1090.25318 6 181.708864 Prob > F = 0.0000 Residual | 12216.0137 1,567 7.79579685 R-squared = 0.0819 -------------+---------------------------------- Adj R-squared = 0.0784 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7921 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- parity1 | .0438808 .0474418 0.92 0.355 -.0491753 .1369369 aut_clv | -.5244014 .1418074 -3.70 0.000 -.8025536 -.2462491 hs_ct | .0119939 .0012012 9.98 0.000 .0096377 .0143501 | c.hs_ct#c.hs_ct | .0000669 .0000173 3.87 0.000 .000033 .0001009 | rp#vag_disch | yes#yes | 1.831729 .5194199 3.53 0.000 .8128976 2.85056 | twin | yes | 1.478791 .5486927 2.70 0.007 .4025419 2.55504 _cons | 7.622191 .1381318 55.18 0.000 7.351249 7.893134 --------------------------------------------------------------------------------- . estimates store sw_2 . *since dyst##twin is NS, then we can ignore brackets. . stepwise, pe(0.05) pr(0.051) lockterm1: reg wpc_sqrt parity1 (c.hs_ct##c.hs_ct) aut_clv i.dys##i.twin (i.rp##i.vag_disch) note: 0b.dyst omitted because of estimability. note: 0b.twin omitted because of estimability. note: 0o.dyst#0o.twin omitted because of estimability. note: 0o.dyst#1o.twin omitted because of estimability. note: 1o.dyst#0o.twin omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0b.rp#0b.vag_disch omitted because of estimability. note: 0b.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0b.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.1111 >= 0.0510, removing 1.dyst#1.twin p = 0.0761 >= 0.0510, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.79 Model | 1109.40797 8 138.675997 Prob > F = 0.0000 Residual | 12196.8589 1,565 7.79352004 R-squared = 0.0834 -------------+---------------------------------- Adj R-squared = 0.0787 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7917 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- parity1 | .0472026 .0476238 0.99 0.322 -.0462106 .1406157 hs_ct | .0120557 .0012027 10.02 0.000 .0096965 .0144148 | c.hs_ct#c.hs_ct | .0000686 .0000174 3.95 0.000 .0000346 .0001027 | aut_clv | -.5142217 .1420186 -3.62 0.000 -.7927884 -.235655 | rp | yes | .4207971 .2687102 1.57 0.118 -.1062728 .9478671 | vag_disch | yes | .0711392 .3979339 0.18 0.858 -.7094005 .8516789 | rp#vag_disch | yes#yes | 1.384039 .6978284 1.98 0.048 .0152618 2.752816 | twin | yes | 1.40634 .5508343 2.55 0.011 .3258891 2.486791 _cons | 7.571339 .144152 52.52 0.000 7.288588 7.854091 --------------------------------------------------------------------------------- . estimates store sw_3 . *compare the different model results . estimates table sw_1 sw_2 sw_3, star(0.10 0.05 0.001) vsquish -------------------------------------------------------------- Variable | sw_1 sw_2 sw_3 -------------+------------------------------------------------ parity1 | .05586593 .04388077 .04720258 aut_clv | -.51283075*** -.52440137*** -.51422168*** hs_ct | .01232803*** .0119939*** .01205566*** c.hs_ct#| c.hs_ct | .0000713*** .00006694*** .00006864*** 1.dyst | .62771805** twin | yes | 1.6982381** 1.4787911** 1.4063402** dyst#twin | 1#yes | -2.7727951 rp | yes | .39042354 .42079714 vag_disch | yes | -.04220258 .07113917 rp#vag_disch | yes#yes | 1.4760578** 1.8317287*** 1.3840389** _cons | 7.5152743*** 7.6221914*** 7.5713394*** -------------------------------------------------------------- Legend: * p<.1; ** p<.05; *** p<.001 . . **# Reliability . **leave-one-out . ***computing "n" regression models without obs "i" using residual and leverage values . ***and then calculating PRESS (predicted residual sum of squares) . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . capture drop res lev eq1 . predict res, res . predict lev ,lev . gen eq1=(res/(1-lev))^2 . summ eq1 Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- eq1 | 1,574 7.851331 10.9765 1.85e-07 83.92817 . di "PRESS = " r(sum) //error sum square from predicted points PRESS = 12357.995 . scalar PRESS = r(sum) . . di "R2(pred) = "1-(PRESS/ (e(mss) + e(rss))) R2(pred) = .07126506 . di "R2 full model = " e(r2) R2 full model = .08284007 . . *there is a command you can install called loocv . loocv reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch //*Pseudo R2 is very similar to R2(pred) Leave-One-Out Cross-Validation Results ----------------------------------------- Method | Value -------------------------+--------------- Root Mean Squared Errors | 2.8020227 Mean Absolute Errors | 2.2761307 Pseudo-R2 | .07168424 ----------------------------------------- . . **# Presenting results . * standardize coefficients . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . * rescale coefficient by SD of the predictor . matrix define coeff = e(b)' . capture drop coeff* . svmat coeff . summ hs_ct parity1 aut_clv twin dyst vag_disch Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- hs_ct | 1,574 .0076239 62.01692 -126 82 parity1 | 1,574 1.729987 1.493841 0 6 aut_clv | 1,574 .4720457 .4993766 0 1 twin | 1,574 .0171537 .1298854 0 1 dyst | 1,574 .0597205 .2370435 0 1 -------------+--------------------------------------------------------- vag_disch | 1,574 .0520966 .2222924 0 1 . . * iqr . reg wpc_sqrt c.hs_ct##c.hs_ct parity1 i.aut_clv i.twin i.dyst##i.vag_disch Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(8, 1565) = 17.67 Model | 1102.29212 8 137.786515 Prob > F = 0.0000 Residual | 12203.9747 1,565 7.79806691 R-squared = 0.0828 -------------+---------------------------------- Adj R-squared = 0.0782 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7925 --------------------------------------------------------------------------------- wpc_sqrt | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------+---------------------------------------------------------------- hs_ct | .0124288 .0012117 10.26 0.000 .010052 .0148056 | c.hs_ct#c.hs_ct | .0000717 .0000174 4.13 0.000 .0000377 .0001058 | parity1 | .0624408 .04795 1.30 0.193 -.0316122 .1564939 1.aut_clv | -.5313957 .1419088 -3.74 0.000 -.8097471 -.2530443 | twin | yes | 1.484421 .5487805 2.70 0.007 .407999 2.560844 | dyst | yes | .8159669 .3267812 2.50 0.013 .1749918 1.456942 | vag_disch | yes | .9922199 .3503534 2.83 0.005 .3050084 1.679431 | dyst#vag_disch | yes#yes | -2.257598 .8832373 -2.56 0.011 -3.990051 -.5251447 | _cons | 7.529392 .1445102 52.10 0.000 7.245938 7.812846 --------------------------------------------------------------------------------- . summ parity1,d /* note IQR = 3 - 0 = 3 */ parity1 ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 1,574 25% 0 0 Sum of wgt. 1,574 50% 1 Mean 1.729987 Largest Std. dev. 1.493841 75% 3 6 90% 4 6 Variance 2.23156 95% 4 6 Skewness .5450922 99% 5 6 Kurtosis 2.315593 . /* coefficient for parity = 0.062 */ . . log close name: log: c:\vhm812-data\l3a_model_build_II.txt log type: text closed on: 19 Jan 2023, 11:27:54 -----------------------------------------------------------------------------------------------------------------------------