. * do-file for lecture 8 of VHM 802, Winter 2023 . version 17 /* works also with versions 14-16 */ . set more off . cd "r:\" r:\ . . * Manly Example 1.4 . import delimited prehist_dog.csv, clear (encoding automatically selected: ISO-8859-1) (7 vars, 7 obs) . foreach var of varlist mandbreadth-mol14length { 2. egen `var's=std(`var') 3. } . matrix dissim distdogs=mandbreadths-mol14lengths, names(group) . matrix list distdogs /* for display only */ symmetric distdogs[7,7] Modern_dog Golden_jac~l Chinese_wolf Indian_wolf Cuon Modern_dog 0 Golden_jac~l 1.9122856 0 Chinese_wolf 5.3814325 7.1194854 0 Indian_wolf 3.384939 5.0583902 2.138615 0 Cuon 1.5118597 3.1902631 4.5733228 2.9082499 0 Dingo 1.5590078 3.1828699 4.2134122 2.1966026 1.668084 Prehistori~g .66324019 2.3850655 5.1183682 3.2364901 1.2643167 Dingo Prehistori~g Dingo 0 Prehistori~g 1.7087187 0 . . * GM salmon farms example (note: datafile not included in course material) . import delimited SiteID.csv, varnames(1) clear (encoding automatically selected: ISO-8859-1) (10 vars, 22 obs) . tostring siteid, gen(sitelabel) sitelabel generated as str3 . gen xcoordkm=xcoord/1000 . gen ycoordkm=ycoord/1000 . matrix dissim dist3b=xcoordkm ycoordkm if bma=="3b", names(sitelabel) . matrix list dist3b /* for display only */ symmetric dist3b[9,9] 413 270 491 408 3 403 292 202 413 0 270 2.7424557 0 491 3.6322437 1.2594863 0 408 5.4799988 2.9891526 1.8518269 0 3 4.0536758 1.5124683 1.9809889 2.880682 0 403 6.0276281 3.3049781 2.561918 1.38606 2.5115672 0 292 3.9370729 2.1878803 3.128847 4.2322251 1.351844 3.7642963 0 202 4.7494548 2.8385811 3.5919654 4.3820357 1.6281234 3.6601487 .8124146 0 303 7.4538805 4.7960454 4.558108 3.7784119 3.4036518 2.3926801 4.0333762 3.4606713 303 303 0 . mds xcoordkm ycoordkm, id(mfsite) noplot config Classical metric multidimensional scaling Dissimilarity: L2, computed on 2 variables Number of obs = 22 Eigenvalues > 0 = 2 Mardia fit measure 1 = 1.0000 Retained dimensions = 2 Mardia fit measure 2 = 1.0000 -------------------------------------------------------------------------- | abs(eigenvalue) (eigenvalue)^2 Dimension | Eigenvalue Percent Cumul. Percent Cumul. -------------+------------------------------------------------------------ 1 | 753.711 78.98 78.98 93.38 93.38 2 | 200.63502 21.02 100.00 6.62 100.00 -------------------------------------------------------------------------- Configuration in 2-dimensional Euclidean space (principal normalization) mfsite | dim1 dim2 -------+---------------------------- 413 | 9.9788 2.6207 270 | 7.4653 1.5235 491 | 7.1889 0.2947 408 | 5.8679 -1.0030 3 | 5.9943 1.8749 403 | 4.6904 -0.2717 292 | 6.0881 3.2235 202 | 5.2841 3.3400 303 | 2.6931 1.0458 316 | 1.5475 -4.2355 381 | 1.5656 -5.3730 416 | 0.2655 -7.1537 503 | -2.9868 -3.4891 172 | -3.0132 -1.8554 300 | -4.3388 -1.8283 298 | -5.4792 -2.0851 282 | -5.3610 -0.7321 349 | -6.7580 2.4705 2 | -7.2979 2.2681 350 | -7.7868 2.0254 368 | -7.4507 3.7485 213 | -8.1572 3.5911 ------------------------------------ . mdsconfig, title("Euclidean distance") xneg msize(vsmall) mlabsize(vsmall) . * should fit perfectly to map . mds xcoordkm ycoordkm, id(mfsite) method(modern) noplot config (loss(stress) assumed) (transform(identity) assumed) Iteration 1: stress = 2.558e-08 Iteration 2: stress = 4.604e-08 Iteration 3: stress = 4.604e-08 Modern multidimensional scaling Dissimilarity: L2, computed on 2 variables Loss criterion: stress = raw_stress/norm(distances) Transformation: identity (no transformation) Number of obs = 22 Dimensions = 2 Normalization: principal Loss criterion = 0.0000 Configuration in 2-dimensional Euclidean space (principal normalization) mfsite | dim1 dim2 -------+---------------------------- 413 | 9.9788 2.6207 270 | 7.4653 1.5235 491 | 7.1889 0.2947 408 | 5.8679 -1.0030 3 | 5.9943 1.8749 403 | 4.6904 -0.2717 292 | 6.0881 3.2235 202 | 5.2841 3.3400 303 | 2.6931 1.0458 316 | 1.5475 -4.2355 381 | 1.5656 -5.3730 416 | 0.2655 -7.1537 503 | -2.9868 -3.4891 172 | -3.0132 -1.8554 300 | -4.3388 -1.8283 298 | -5.4792 -2.0851 282 | -5.3610 -0.7321 349 | -6.7580 2.4705 2 | -7.2979 2.2681 350 | -7.7868 2.0254 368 | -7.4507 3.7485 213 | -8.1572 3.5911 ------------------------------------ . * identical to classical . . import delimited seawaydist.csv, varnames(nonames) clear (encoding automatically selected: UTF-8) (23 vars, 22 obs) . * note: file not included in course material . foreach var of varlist v2-v23 { 2. replace `var'=`var'/1000 3. } (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) (21 real changes made) . mkmat v2-v10 if _n<=9, matrix(swdist3b) rownames(v1) . matrix list swdist3b /* for display only */ symmetric swdist3b[9,9] v2 v3 v4 v5 v6 v7 v8 v9 413 0 270 2.9428082 0 491 3.8137903 2.0781112 0 408 5.6861115 3.9504321 1.8723209 0 3 4.2092361 1.6177874 3.6958985 4.5911093 0 403 6.505888 4.7702093 2.6920981 1.4990714 3.2907383 0 292 4.1734977 2.2920368 4.2707977 5.3489909 1.3874694 4.0486197 0 202 5.0355039 2.8891096 4.9672208 5.1908436 1.7680725 3.8904724 .86200643 0 303 7.8432469 5.158102 5.1069574 4.0076265 3.6340106 2.5085552 4.3256197 3.6041567 v10 303 0 . mkmat v2-v23, matrix(swdist) rownames(v1) . mdsmat swdist, noplot config (row names of (dis)similarity matrix differ from column names; row names used) Classical metric multidimensional scaling Dissimilarity matrix: swdist Number of obs = 22 Eigenvalues > 0 = 11 Mardia fit measure 1 = 0.9459 Retained dimensions = 2 Mardia fit measure 2 = 0.9993 -------------------------------------------------------------------------- | abs(eigenvalue) (eigenvalue)^2 Dimension | Eigenvalue Percent Cumul. Percent Cumul. -------------+------------------------------------------------------------ 1 | 1225.9449 84.87 84.87 98.63 98.63 2 | 140.44698 9.72 94.59 1.29 99.93 -------------+------------------------------------------------------------ 3 | 25.567829 1.77 96.36 0.04 99.97 4 | 7.4553018 0.52 96.88 0.00 99.98 5 | 3.9408861 0.27 97.15 0.00 99.98 6 | 2.1320984 0.15 97.30 0.00 99.98 7 | 1.1718394 0.08 97.38 0.00 99.98 8 | .74007055 0.05 97.43 0.00 99.98 9 | .61601169 0.04 97.48 0.00 99.98 10 | .16566741 0.01 97.49 0.00 99.98 -------------------------------------------------------------------------- Configuration in 2-dimensional Euclidean space (principal normalization) Category | dim1 dim2 ---------+---------------------------- 413 | 12.1679 1.1285 270 | 9.4043 1.6670 491 | 8.1791 -0.2965 408 | 6.2436 -0.8938 3 | 7.8249 2.2489 403 | 5.5591 0.0140 292 | 8.4480 2.6563 202 | 7.6594 3.0000 303 | 4.2300 1.4313 316 | 1.4446 -3.3935 381 | 1.5553 -4.8945 416 | 0.7963 -7.7010 503 | -3.1484 -2.1369 172 | -3.0866 -0.0480 300 | -4.4055 -0.2419 298 | -5.3199 -0.6135 282 | -5.9169 0.2674 349 | -9.5936 1.3940 2 | -9.6639 1.4186 350 | -9.8558 1.2778 368 | -11.1549 1.8270 213 | -11.3671 1.8888 -------------------------------------- . mdsconfig , title("Sea-way distance") xneg aspect(0.6666667) msize(vsmall) mlabsize(vsmall) Aspect ratio: .66666667 . mdsmat swdist, method(modern) noplot config (loss(stress) assumed) (transform(identity) assumed) (row names of (dis)similarity matrix differ from column names; row names used) Iteration 1: stress = .0214777 Iteration 2: stress = .01978982 Iteration 3: stress = .0192756 Iteration 4: stress = .01904175 Iteration 5: stress = .01889413 Iteration 6: stress = .01880924 Iteration 7: stress = .01876336 Iteration 8: stress = .01873538 Iteration 9: stress = .01871292 Iteration 10: stress = .01869033 Iteration 11: stress = .01866698 Iteration 12: stress = .01864504 Iteration 13: stress = .01862685 Iteration 14: stress = .01861303 Iteration 15: stress = .01860323 Iteration 16: stress = .01859652 Iteration 17: stress = .01859208 Iteration 18: stress = .01858917 Iteration 19: stress = .0185873 Iteration 20: stress = .0185861 Iteration 21: stress = .01858534 Iteration 22: stress = .01858486 Iteration 23: stress = .01858456 Iteration 24: stress = .01858437 Iteration 25: stress = .01858425 Iteration 26: stress = .01858417 Iteration 27: stress = .01858412 Iteration 28: stress = .01858409 Iteration 29: stress = .01858407 Iteration 30: stress = .01858406 Iteration 31: stress = .01858405 Iteration 32: stress = .01858404 Iteration 33: stress = .01858404 Iteration 34: stress = .01858403 Iteration 35: stress = .01858403 Modern multidimensional scaling Dissimilarity matrix: swdist Loss criterion: stress = raw_stress/norm(distances) Transformation: identity (no transformation) Number of obs = 22 Dimensions = 2 Normalization: principal Loss criterion = 0.0186 Configuration in 2-dimensional Euclidean space (principal normalization) Category | dim1 dim2 ---------+---------------------------- 413 | 12.0525 1.0557 270 | 9.4446 1.4746 491 | 8.4091 -0.8352 408 | 6.4674 -1.3902 3 | 7.8320 2.3218 403 | 5.6205 -0.1016 292 | 8.3032 3.2439 202 | 7.4749 3.5779 303 | 4.1751 1.6747 316 | 1.4796 -3.5073 381 | 1.5513 -4.8794 416 | 0.7155 -7.3568 503 | -3.1254 -2.2246 172 | -3.0756 -0.1973 300 | -4.4053 -0.2273 298 | -5.3954 -0.8517 282 | -5.9179 0.3588 349 | -9.5146 1.7903 2 | -9.6882 1.3606 350 | -9.9310 0.8963 368 | -11.0765 2.1290 213 | -11.3960 1.6875 -------------------------------------- . * not the same as classical, but visually almost indistinguishable . mdsconfig , title("Sea-way distance") xneg aspect(0.6666667) Aspect ratio: .66666667 . . mkmat v2-v6 if _n<=5, matrix(swdist15) rownames(v1) . matrix list swdist15 /* just for display */ symmetric swdist15[5,5] v2 v3 v4 v5 v6 413 0 270 2.9428082 0 491 3.8137903 2.0781112 0 408 5.6861115 3.9504321 1.8723209 0 3 4.2092361 1.6177874 3.6958985 4.5911093 0 . keep if _n<=5 (17 observations deleted) . clustermat single swdist15, name(single15) add . cluster dendrogram single15, label(v1) title("GM Farms 1-5: Single linkage") ytitle("L2 dissimilarity (km)") . clustermat average swdist15, name(aver15) add . cluster dendrogram aver15, label(v1) title("GM Farms 1-5: Average linkage") ytitle("L2 dissimilarity (km)") . clustermat complete swdist15, name(compl15) add . cluster dendrogram compl15, label(v1) title("GM Farms 1-5: Complete linkage") ytitle("L2 dissimilarity (km)") . . clear . clear matrix . set maxvar 7500 . set matsize 7000 set matsize ignored. Matrix sizes are no longer limited by c(matsize) in modern Statas. Matrix sizes are now limited by edition of Stata. See limits for more details. . import delimited nciarrayxp.csv, varnames(1) clear (encoding automatically selected: ISO-8859-9) (6,832 vars, 64 obs) . matrix dissim dist1_9=gene1-gene6830 if cell<10, names(label) . matrix list dist1_9 /* for display only */ symmetric dist1_9[9,9] CNS CNS CNS RENAL BREAST CNS CNS BREAST CNS 0 CNS 51.438231 0 CNS 65.938146 68.987251 0 RENAL 79.878855 81.73831 71.744706 0 BREAST 92.651851 95.777651 79.014321 78.90728 0 CNS 80.365441 84.32319 74.004657 80.641661 80.085019 0 CNS 81.031953 80.899191 78.306562 88.683903 82.804145 77.959792 0 BREAST 80.144905 82.386589 74.906743 83.044013 83.089282 77.068376 80.119854 0 NSCLC 74.17706 76.613591 67.837893 86.702055 83.69203 76.654123 74.176647 76.88027 NSCLC NSCLC 0 . mds gene1-gene6830, id(label) noplot config (id() has duplicate values) Classical metric multidimensional scaling Dissimilarity: L2, computed on 6830 variables Number of obs = 64 Eigenvalues > 0 = 63 Mardia fit measure 1 = 0.2319 Retained dimensions = 2 Mardia fit measure 2 = 0.6277 -------------------------------------------------------------------------- | abs(eigenvalue) (eigenvalue)^2 Dimension | Eigenvalue Percent Cumul. Percent Cumul. -------------+------------------------------------------------------------ 1 | 39892.582 14.89 14.89 47.89 47.89 2 | 22234.452 8.30 23.19 14.88 62.77 -------------+------------------------------------------------------------ 3 | 17634.89 6.58 29.78 9.36 72.13 4 | 11534.23 4.31 34.08 4.00 76.14 5 | 10304.109 3.85 37.93 3.20 79.33 6 | 9393.0973 3.51 41.44 2.66 81.99 7 | 7704.1579 2.88 44.31 1.79 83.77 8 | 7546.8461 2.82 47.13 1.71 85.49 9 | 7067.195 2.64 49.77 1.50 86.99 10 | 5777.7786 2.16 51.93 1.00 88.00 -------------------------------------------------------------------------- Configuration in 2-dimensional Euclidean space (principal normalization) label | dim1 dim2 ------------+---------------------------- CNS#1 | 19.7958 0.1153 CNS#2 | 21.5461 -1.4574 CNS#3 | 25.0566 1.5261 RENAL#4 | 37.4095 -11.3895 BREAST#5 | 50.2186 -1.3462 CNS#6 | 26.4352 0.4630 CNS#7 | 27.3393 2.6503 BREAST#8 | 21.4897 4.9541 NSCLC#9 | 20.8525 10.1631 NSCLC#10 | 26.9529 21.4733 RENAL#11 | 24.4467 9.8421 RENAL#12 | 35.0750 6.2621 RENAL#13 | 21.4832 10.6705 RENAL#14 | 25.0047 11.9464 RENAL#15 | 31.7457 14.9829 RENAL#16 | 24.2373 14.7480 RENAL#17 | 20.5030 16.1818 BREAST#18 | 11.9857 8.0623 NSCLC#19 | 24.3442 3.7106 RENAL#20 | 14.3074 4.0258 UNKNOWN | 11.6961 11.5400 OVARIAN#22 | 17.5513 16.6590 MELANOMA#23 | 10.1589 -1.5009 PROSTATE#24 | -2.6097 7.0694 OVARIAN#25 | -4.2736 14.7402 OVARIAN#26 | 7.0035 14.7513 OVARIAN#27 | -2.1755 10.0690 OVARIAN#28 | 10.3907 17.2134 OVARIAN#29 | 4.1056 12.8337 PROSTATE#30 | 3.1072 13.5042 NSCLC#31 | 13.9201 10.7706 NSCLC#32 | 7.3409 11.8635 NSCLC#33 | 5.0320 1.6785 LEUKEMIA#34 | -21.3039 -6.0889 K562B-repro | -38.9220 -3.6352 K562A-repro | -46.4411 -4.8195 LEUKEMIA#37 | -48.5138 -7.0614 LEUKEMIA#38 | -41.8787 -6.9915 LEUKEMIA#39 | -56.9227 -9.5122 LEUKEMIA#40 | -50.9593 -8.1774 LEUKEMIA#41 | -26.0316 -8.7691 COLON#42 | -11.8181 8.0596 COLON#43 | -31.6548 6.5893 COLON#44 | -26.3336 15.0840 COLON#45 | -20.5681 18.2511 COLON#46 | -21.5362 13.4346 COLON#47 | -34.6698 13.2857 COLON#48 | -18.6058 17.0725 MCF7A-repro | -38.1944 13.3674 BREAST#50 | -38.1554 11.5925 MCF7D-repro | -30.8931 12.9394 BREAST#52 | -21.5858 9.8917 NSCLC#53 | -3.1107 17.6582 NSCLC#54 | -9.1605 1.4605 NSCLC#55 | -14.8789 0.6310 MELANOMA#56 | -5.2730 -45.4664 BREAST#57 | 3.3198 -46.3001 BREAST#58 | -1.3213 -51.3627 MELANOMA#59 | 11.0873 -37.6789 MELANOMA#60 | 15.4461 -44.1646 MELANOMA#61 | 1.9254 -35.3286 MELANOMA#62 | 14.3596 -33.2917 MELANOMA#63 | 12.7401 -45.2229 MELANOMA#64 | 8.3778 -34.2232 ----------------------------------------- . mdsconfig , title("Euclidean distance") maxlength(2) (beware: point labels not unique) . cluster average gene1-gene6830, name(aver_cell) . cluster dendrogram aver_cell, label(label) xlabel(, angle(90) labsize(*.5)) title("") . . * Iris data . import delimited iris.csv, clear (encoding automatically selected: ISO-8859-1) (5 vars, 150 obs) . set seed 210310 . forval i=1(1)10 { 2. cluster kmeans sepallength-petalwidth, k(3) start(krandom) name(iris3m`i') 3. tab species iris3m`i', row 4. scalar wss=0 5. scalar mss=0 6. foreach var of varlist sepallength-petalwidth { 7. quietly anova `var' iris3m`i' 8. scalar wss=wss+e(rss) 9. scalar mss=mss+e(mss) 10. } 11. di "Within-cluster sum-of-squares: " wss " percent of total: " wss/(wss+mss)*100 12. } +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 32 18 0 | 50 | 64.00 36.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 4 46 | 50 | 0.00 8.00 92.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 0 50 | 50 | 0.00 0.00 100.00 | 100.00 -----------+---------------------------------+---------- Total | 32 22 96 | 150 | 21.33 14.67 64.00 | 100.00 Within-cluster sum-of-squares: 142.75406 percent of total: 20.951016 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 32 18 | 50 | 0.00 64.00 36.00 | 100.00 -----------+---------------------------------+---------- versicolor | 46 0 4 | 50 | 92.00 0.00 8.00 | 100.00 -----------+---------------------------------+---------- virginica | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- Total | 96 32 22 | 150 | 64.00 21.33 14.67 | 100.00 Within-cluster sum-of-squares: 142.75406 percent of total: 20.951016 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 32 0 18 | 50 | 64.00 0.00 36.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 46 4 | 50 | 0.00 92.00 8.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 50 0 | 50 | 0.00 100.00 0.00 | 100.00 -----------+---------------------------------+---------- Total | 32 96 22 | 150 | 21.33 64.00 14.67 | 100.00 Within-cluster sum-of-squares: 142.75406 percent of total: 20.951016 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 48 2 | 50 | 0.00 96.00 4.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 14 36 | 50 | 0.00 28.00 72.00 | 100.00 -----------+---------------------------------+---------- Total | 50 62 38 | 150 | 33.33 41.33 25.33 | 100.00 Within-cluster sum-of-squares: 78.85144 percent of total: 11.572475 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 32 18 0 | 50 | 64.00 36.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 4 46 | 50 | 0.00 8.00 92.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 0 50 | 50 | 0.00 0.00 100.00 | 100.00 -----------+---------------------------------+---------- Total | 32 22 96 | 150 | 21.33 14.67 64.00 | 100.00 Within-cluster sum-of-squares: 142.75406 percent of total: 20.951016 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 50 0 | 50 | 0.00 100.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 3 0 47 | 50 | 6.00 0.00 94.00 | 100.00 -----------+---------------------------------+---------- virginica | 36 0 14 | 50 | 72.00 0.00 28.00 | 100.00 -----------+---------------------------------+---------- Total | 39 50 61 | 150 | 26.00 33.33 40.67 | 100.00 Within-cluster sum-of-squares: 78.855664 percent of total: 11.573095 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 2 48 | 50 | 0.00 4.00 96.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 36 14 | 50 | 0.00 72.00 28.00 | 100.00 -----------+---------------------------------+---------- Total | 50 38 62 | 150 | 33.33 25.33 41.33 | 100.00 Within-cluster sum-of-squares: 78.85144 percent of total: 11.572475 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 50 0 | 50 | 0.00 100.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 3 0 47 | 50 | 6.00 0.00 94.00 | 100.00 -----------+---------------------------------+---------- virginica | 36 0 14 | 50 | 72.00 0.00 28.00 | 100.00 -----------+---------------------------------+---------- Total | 39 50 61 | 150 | 26.00 33.33 40.67 | 100.00 Within-cluster sum-of-squares: 78.855664 percent of total: 11.573095 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 48 2 | 50 | 0.00 96.00 4.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 14 36 | 50 | 0.00 28.00 72.00 | 100.00 -----------+---------------------------------+---------- Total | 50 62 38 | 150 | 33.33 41.33 25.33 | 100.00 Within-cluster sum-of-squares: 78.85144 percent of total: 11.572475 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 18 32 0 | 50 | 36.00 64.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 4 0 46 | 50 | 8.00 0.00 92.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 0 50 | 50 | 0.00 0.00 100.00 | 100.00 -----------+---------------------------------+---------- Total | 22 32 96 | 150 | 14.67 21.33 64.00 | 100.00 Within-cluster sum-of-squares: 142.75406 percent of total: 20.951016 . . foreach var of varlist sepallength-petalwidth { 2. egen `var's=std(`var') 3. } . forval i=1(1)10 { 2. cluster kmeans sepallengths-petalwidths, k(3) start(krandom) name(iris3sm`i') 3. tab species iris3sm`i', row 4. scalar wss=0 5. scalar mss=0 6. foreach var of varlist sepallengths-petalwidths { 7. quietly anova `var' iris3sm`i' 8. scalar wss=wss+e(rss) 9. scalar mss=mss+e(mss) 10. } 11. di "Within-cluster sum-of-squares: " wss " percent of total: " wss/(wss+mss)*100 12. } +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 50 0 | 50 | 0.00 100.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 11 0 39 | 50 | 22.00 0.00 78.00 | 100.00 -----------+---------------------------------+---------- virginica | 33 0 17 | 50 | 66.00 0.00 34.00 | 100.00 -----------+---------------------------------+---------- Total | 44 50 56 | 150 | 29.33 33.33 37.33 | 100.00 Within-cluster sum-of-squares: 139.0992 percent of total: 23.338792 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 49 0 1 | 50 | 98.00 0.00 2.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 13 37 | 50 | 0.00 26.00 74.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 42 8 | 50 | 0.00 84.00 16.00 | 100.00 -----------+---------------------------------+---------- Total | 49 55 46 | 150 | 32.67 36.67 30.67 | 100.00 Within-cluster sum-of-squares: 139.96219 percent of total: 23.483588 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 38 12 | 50 | 0.00 76.00 24.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 14 36 | 50 | 0.00 28.00 72.00 | 100.00 -----------+---------------------------------+---------- Total | 50 52 48 | 150 | 33.33 34.67 32.00 | 100.00 Within-cluster sum-of-squares: 138.89326 percent of total: 23.304239 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 0 50 | 50 | 0.00 0.00 100.00 | 100.00 -----------+---------------------------------+---------- versicolor | 12 38 0 | 50 | 24.00 76.00 0.00 | 100.00 -----------+---------------------------------+---------- virginica | 36 14 0 | 50 | 72.00 28.00 0.00 | 100.00 -----------+---------------------------------+---------- Total | 48 52 50 | 150 | 32.00 34.67 33.33 | 100.00 Within-cluster sum-of-squares: 138.89326 percent of total: 23.304239 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 1 49 | 50 | 0.00 2.00 98.00 | 100.00 -----------+---------------------------------+---------- versicolor | 13 37 0 | 50 | 26.00 74.00 0.00 | 100.00 -----------+---------------------------------+---------- virginica | 42 8 0 | 50 | 84.00 16.00 0.00 | 100.00 -----------+---------------------------------+---------- Total | 55 46 49 | 150 | 36.67 30.67 32.67 | 100.00 Within-cluster sum-of-squares: 139.96219 percent of total: 23.483588 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 0 50 | 50 | 0.00 0.00 100.00 | 100.00 -----------+---------------------------------+---------- versicolor | 39 11 0 | 50 | 78.00 22.00 0.00 | 100.00 -----------+---------------------------------+---------- virginica | 17 33 0 | 50 | 34.00 66.00 0.00 | 100.00 -----------+---------------------------------+---------- Total | 56 44 50 | 150 | 37.33 29.33 33.33 | 100.00 Within-cluster sum-of-squares: 139.0992 percent of total: 23.338792 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 0 12 38 | 50 | 0.00 24.00 76.00 | 100.00 -----------+---------------------------------+---------- virginica | 0 39 11 | 50 | 0.00 78.00 22.00 | 100.00 -----------+---------------------------------+---------- Total | 50 51 49 | 150 | 33.33 34.00 32.67 | 100.00 Within-cluster sum-of-squares: 139.14814 percent of total: 23.347003 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 17 33 | 50 | 0.00 34.00 66.00 | 100.00 -----------+---------------------------------+---------- versicolor | 46 4 0 | 50 | 92.00 8.00 0.00 | 100.00 -----------+---------------------------------+---------- virginica | 50 0 0 | 50 | 100.00 0.00 0.00 | 100.00 -----------+---------------------------------+---------- Total | 96 21 33 | 150 | 64.00 14.00 22.00 | 100.00 Within-cluster sum-of-squares: 189.75124 percent of total: 31.837455 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 50 0 | 50 | 0.00 100.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 11 0 39 | 50 | 22.00 0.00 78.00 | 100.00 -----------+---------------------------------+---------- virginica | 33 0 17 | 50 | 66.00 0.00 34.00 | 100.00 -----------+---------------------------------+---------- Total | 44 50 56 | 150 | 29.33 33.33 37.33 | 100.00 Within-cluster sum-of-squares: 139.0992 percent of total: 23.338792 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ | Cluster ID Species | 1 2 3 | Total -----------+---------------------------------+---------- setosa | 0 50 0 | 50 | 0.00 100.00 0.00 | 100.00 -----------+---------------------------------+---------- versicolor | 39 0 11 | 50 | 78.00 0.00 22.00 | 100.00 -----------+---------------------------------+---------- virginica | 17 0 33 | 50 | 34.00 0.00 66.00 | 100.00 -----------+---------------------------------+---------- Total | 56 50 44 | 150 | 37.33 33.33 29.33 | 100.00 Within-cluster sum-of-squares: 139.0992 percent of total: 23.338792 . end of do-file