Formula mismatch when requesting predictions from model fit through purrr/map - r

I am using purrr to fit several linear mixed spline models and to select the best model based on the lowest BIC. I would like to extract predictions from the best model however I get the following error message when trying to do this:
Error in model.matrix.default(fixed, model.frame(delete.response(Terms), : model frame and formula mismatch in model.matrix()
How can i extract predictions from the best model?
This is the example data
dat <- structure(list(id = c(1001L, 1001L, 1001L, 1001L, 1001L, 1002L,
1003L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1005L,
1005L, 1005L, 1005L, 1005L, 1006L, 1006L, 1006L, 1006L, 1006L,
1007L, 1007L, 1008L, 1008L, 1008L, 1008L, 1008L, 1009L, 1009L,
1009L, 1010L, 1010L, 1010L, 1011L, 1012L, 1012L, 1012L, 1013L,
1013L, 1014L, 1015L, 1015L, 1015L, 1016L, 1016L, 1016L, 1016L,
1016L, 1017L, 1017L, 1018L, 1020L, 1020L, 1021L, 1021L, 1021L,
1021L, 1022L, 1022L, 1023L, 1023L, 1023L, 1023L, 1023L, 1023L,
1023L, 1023L, 1023L, 1023L, 1024L, 1024L, 1024L, 1024L, 1024L,
1025L, 1025L, 1025L, 1026L, 1026L, 1026L, 1026L, 1027L, 1027L,
1028L, 1028L, 1028L, 1028L, 1028L, 1028L, 1028L, 1029L, 1029L,
1029L, 1029L, 1029L, 1029L, 1030L, 1030L, 1030L, 1030L, 1030L,
1030L, 1030L, 1030L, 1031L, 1031L, 1031L, 1031L, 1032L, 1032L,
1032L, 1032L, 1032L, 1033L, 1033L, 1033L, 1033L, 1034L, 1034L,
1034L, 1034L, 1034L, 1035L, 1035L, 1036L, 1037L, 1037L, 1037L,
1037L, 1039L, 1039L, 1040L, 1040L, 1040L, 1040L, 1040L, 1040L,
1041L, 1041L, 1041L, 1041L, 1041L, 1041L, 1042L, 1042L, 1042L,
1042L, 1042L, 1042L, 1042L, 1043L, 1043L, 1043L, 1043L, 1044L,
1044L, 1044L, 1045L, 1045L, 1045L, 1045L, 1045L, 1045L, 1047L,
1048L, 1048L, 1049L, 1049L, 1049L, 1049L, 1051L, 1051L, 1052L,
1052L, 1052L, 1052L, 1052L, 1053L, 1053L, 1053L, 1053L, 1053L,
1054L, 1054L, 1054L, 1054L, 1054L, 1054L, 1054L, 1054L, 1056L,
1056L, 1056L, 1056L, 1057L, 1057L, 1058L, 1058L, 1058L, 1058L,
1058L, 1060L, 1060L, 1060L, 1061L, 1061L, 1061L, 1061L, 1061L,
1062L, 1062L, 1062L, 1062L, 1062L, 1063L, 1063L, 1063L, 1064L,
1064L, 1064L, 1064L, 1065L, 1065L, 1066L, 1066L, 1066L, 1066L,
1066L, 1066L, 1067L, 1067L, 1067L, 1068L, 1068L, 1068L, 1068L,
1068L, 1068L, 1068L, 1069L, 1070L, 1070L, 1070L, 1071L, 1071L,
1071L, 1072L, 1072L, 1072L, 1072L, 1072L, 1073L, 1073L, 1073L,
1073L, 1074L, 1074L, 1074L, 1075L, 1075L, 1075L, 1075L, 1075L,
1075L, 1076L, 1076L, 1076L, 1077L, 1077L, 1077L, 1077L, 1077L,
1077L, 1078L, 1078L, 1078L, 1078L, 1078L, 1078L, 1078L, 1080L,
1080L, 1080L, 1080L, 1081L, 1081L, 1082L, 1082L, 1082L, 1083L,
1083L, 1084L, 1085L, 1085L, 1085L, 1085L, 1085L, 1085L, 1086L,
1086L, 1086L, 1087L, 1087L, 1087L, 1087L, 1087L, 1087L, 1087L,
1087L, 1088L, 1088L, 1088L, 1088L, 1089L, 1089L, 1089L, 1089L,
1089L, 1090L, 1090L, 1091L, 1091L, 1091L, 1091L, 1091L, 1092L,
1092L, 1092L, 1092L, 1092L, 1093L, 1093L, 1093L, 1093L, 1094L,
1094L, 1094L, 1094L, 1094L, 1095L, 1095L, 1095L, 1095L, 1096L,
1097L, 1097L, 1098L, 1098L, 1098L, 1098L, 1098L, 1099L, 1099L,
1099L, 1099L, 1099L, 1099L, 1099L, 1099L, 1100L, 1100L, 1100L,
1101L, 1101L, 1101L, 1101L, 1103L, 1103L, 1103L, 1103L, 1103L,
1103L, 1103L, 1104L, 1104L, 1104L, 1104L, 1105L, 1105L, 1105L,
1106L, 1106L, 1106L, 1106L, 1106L, 1106L, 1106L, 1106L, 1106L,
1107L, 1108L, 1110L, 1111L, 1112L, 1117L, 1123L), y = c(1934.047646,
1075.598345, 1956.214821, 2000.38538, 2000.38538, 732.315937,
3119.86, 624.951231, 791.2764892, 1884.530826, 624.951231, 1047.57,
1047.57, 791.2764892, 1238.306103, 1555.042976, 2547.870529,
2547.870529, 2467.385, 1181.635212, 1181.635212, 565.306282,
2016.027874, 2016.027874, 712.6134567, 635.2537841, 2167.362267,
2575.574188, 2167.362267, 2480.028259, 2575.574188, 2875.363243,
1180.139938, 2828.037147, 3017.119362, 2722.940933, 2167.92,
2409.652458, 2245.442558, 724.1520328, 635.6034756, 1649.08326,
966.8182507, 865.2717723, 1570.23, 916.1300105, 1180.999973,
2351.32885, 2418.851707, 2290.038887, 2224.060562, 2509.52, 1174.589081,
1540.219376, 2692.26, 1300.899734, 1100.650177, 1786.628242,
1705.842979, 543.8596134, 1786.628242, 2115.374241, 2331.46,
875.949604, 2241.945103, 2319.666939, 2316.220234, 719.7139549,
2042.803307, 719.7139549, 1132.977503, 875.949604, 2316.220234,
1737.18, 1351.629826, 1291.44593, 1291.44593, 1108.26586, 1028.979719,
1291.44593, 2068.934227, 2440.784416, 1036.72, 894.6663704, 2449.184731,
1109.9, 672.9310664, 2072.320354, 2114.215416, 2114.215416, 1805.422001,
2461.18, 2101.374248, 2105.879, 1600.086481, 2866.84, 1600.086481,
2807.311, 3055.569931, 1600.086481, 2602.287521, 2690.007614,
620.5975037, 2608.4, 2722.3, 2713.66185, 2608.4, 1590.002, 2198.211,
2488.097725, 2198.211, 2322.616348, 2627.1, 2418.328346, 2601.661034,
531.7369251, 811.9494571, 884.31, 768.0526981, 652.1271248, 768.0526981,
2767.479, 1047.144354, 1047.144354, 1995.119, 1995.119, 707.6093158,
707.6093158, 1120.650104, 3036.591904, 3036.591904, 3081.86,
1193.583691, 2056.569244, 1823.155, 1238.948124, 2124.685, 887.20438,
1823.155, 2056.569244, 2056.569244, 2560.155342, 3095.923164,
3095.923164, 3003.729011, 2861.12, 2560.155342, 2735.26, 822.8209591,
1648.951, 1648.951, 1648.951, 822.8209591, 906.7692623, 582.787096,
1286.45, 797.2365359, 2566.770554, 2666.41, 2666.41, 2045.320816,
2401.21, 2401.21, 2583.2, 2581.32, 2622.357, 2581.32, 2588.462498,
442.433671, 1251.627064, 406.2565479, 2108.787437, 983.1101169,
2102.085403, 1155.713411, 1909.797131, 2871.55, 2711.07, 2883.22245,
2883.22245, 2711.07, 3027.103172, 3108.21537, 3007.87294, 3208.963631,
3108.21537, 2617.91, 2457.464466, 2890.51, 2698.48214, 2700.723,
2700.723, 2817.668579, 2700.723, 1349.90691, 1476.19994, 1552.95,
1349.90691, 925.8325004, 1258.28, 840.1875095, 2405.175911, 840.1875095,
1056.678543, 1571.936, 1210.89, 1210.89, 673.7005405, 687.7842464,
1016.86, 1217.866, 1493.791817, 2246.726913, 1054.821, 1054.821,
563.6580887, 1054.821, 1540.429863, 2209.006493, 1437.835186,
2191.308, 1412.128944, 2724.164597, 2791.705185, 2727.774208,
2070.451198, 866.7974147, 1661.082638, 2108.271309, 2411.515434,
2342.026085, 2071.06, 2258.321014, 1537.06, 760.6319065, 867.7596569,
1907.60466, 1770.658, 760.6319065, 912.8781966, 912.8781966,
912.8781966, 1257.222706, 2586.922356, 1608.28, 962.5674305,
1085.451181, 2539.218132, 2535.526085, 2561.60054, 1600.198,
2100.048149, 758.3851737, 758.3851737, 2643.373329, 367.7795143,
866.0683727, 718.5049658, 866.0683727, 1906.694649, 2291.48,
2190.560314, 744.1710777, 1498.981777, 2460.912292, 590.1345787,
2487.559135, 1855.601353, 660.9104843, 1116.08, 792.929533, 708.8373737,
2272.232933, 1801.729801, 2299.800095, 2272.232933, 2299.800095,
1895.828438, 1757.75, 1050.279345, 1757.75, 1326.09478, 1326.09478,
1633.119305, 1558, 1167.971405, 1828.16, 1788.571758, 2175.469,
1071.039494, 941.6030864, 2053.067215, 1461.02132, 1597.646778,
1885.321567, 2195.704372, 2195.704372, 1675.768558, 3157.550789,
1565.173126, 2195.704372, 3157.550789, 2404.836883, 2541.045593,
585.7223682, 2465.177761, 2678.462074, 500.3733997, 2465.177761,
781.342, 898.3551559, 2465.177761, 2465.177761, 1807.02, 1418.888027,
1797.36, 1807.02, 2200.06, 2218.369926, 2200.06, 1986.642735,
2088.292, 2069.139, 1507.901432, 2061.395798, 2075.164864, 2081.913219,
2081.913219, 483.8579493, 1857.88, 2578.772636, 1857.88, 1857.88,
1039.632153, 2288.28, 2288.28, 1831.349922, 2349.23, 933.1002788,
2626.298935, 1521.744, 933.1002788, 2626.298935, 1984.760715,
2450.333, 1732.339031, 1984.760715, 2731.9, 869.2320918, 1785.72,
1922.798, 3081.28, 1508.8, 2421.288597, 1922.798, 1268.074959,
1569.05, 1808.115, 1569.05, 1268.074959, 2165.724808, 2165.724808,
1808.115, 2084.149837, 2693.027184, 2464.489, 2607.653496, 1012.837271,
1012.837271, 2673.190872, 2635.290516, 2773.42, 2635.290516,
2654.772674, 2377.905655, 2679.014969, 2654.772674, 1226.40016,
1470.69, 1273.789799, 2294.926086, 1226.40016, 1470.69, 1273.789799,
1873.817, 2274.930534, 2317.429165, 959.1709613, 1328.159428,
1328.159428, 1328.159428, 959.1709613, 1630.28, 1610.54982, 2507.05302,
750.467966, 750.467966, 821.2255058, 802.8240452, 2829.47879),
age = c(31.54004107, 11.95071869, 27.88501027, 27.88501027,
25.07871321, 10.90759754, 25.70020534, 9.560574949, 11.17864476,
15.8384668, 9.560574949, 11.23613963, 14.01232033, 10.54620123,
12.89527721, 14.52977413, 24.96919918, 24.72005476, 23.95893224,
13.31690623, 11.52087611, 9.927446954, 22.10814511, 16.44353183,
10.90759754, 7.991786448, 17.26488706, 23.95893224, 15.66872005,
17.63723477, 24.72005476, 30.97330595, 11.52087611, 17.5633128,
30.11088296, 23.31279945, 17.26488706, 20.58590007, 28.27926078,
11.66324435, 9.927446954, 13.92744695, 11.20328542, 12.70362765,
13.52498289, 12.21355236, 13.80150582, 22.81724846, 39.3045859,
16.62696783, 22.63107461, 29.86447639, 12.54483231, 14.42299795,
34.27789185, 12.91170431, 12.25462012, 21.81245722, 21.81245722,
10.05065024, 23.6659822, 16.22450376, 28.74743326, 12.70362765,
35.43052704, 21.21013005, 19.28542094, 12.77207392, 16.59411362,
12.12867899, 11.29637235, 11.81930185, 19.04449008, 19.93429158,
16.14236824, 12.85420945, 13.21560575, 11.61396304, 11.85763176,
13.3798768, 17.42915811, 24.41341547, 13.08418891, 11.6659822,
24.41341547, 12.06297057, 10.22861054, 26.15468857, 21.71937029,
20.1889117, 12.60232717, 25.39904175, 30.72689938, 19.22245038,
14.45037645, 24.77207392, 13.47570157, 17.87816564, 27.52635181,
15.16221766, 19.68514716, 21.67282683, 9.062286105, 20.43805613,
21.67282683, 21.24024641, 20.70362765, 13.5687885, 17.13347023,
28.11498973, 24.16974675, 18.19575633, 27.73442847, 15.52361396,
20.70362765, 11.76728268, 10.98699521, 11.51540041, 9.902806297,
13.05407255, 8.703627652, 25.60164271, 10.59000684, 10.59000684,
14.45859001, 14.05886379, 10.88295688, 10.75427789, 10.59000684,
26.50513347, 18.83093771, 22.86379192, 11.8384668, 15.04449008,
15.42505133, 14.14099932, 28.06844627, 11.51540041, 14.66119097,
13.79055441, 15.37850787, 22.58179329, 22.86379192, 30.0752909,
21.85900068, 25.60164271, 15.29089665, 26.79534565, 11.68514716,
15.42505133, 15.58384668, 15.08555784, 14.11909651, 11.6659822,
10.21765914, 12.1670089, 10.50239562, 23.3045859, 15.92607803,
22.58179329, 16.65982204, 20.58590007, 39.3045859, 32.56947296,
16.90349076, 25.12799452, 17.88364134, 19.46338125, 8.736481862,
14.14099932, 8.736481862, 17.68104038, 14.54893908, 19.22245038,
12.98562628, 22.45311431, 18.83093771, 38.68856947, 26.50513347,
25.44010951, 28.70910335, 19.21697467, 30.0752909, 26.50513347,
29.45106092, 33.31690623, 16.68172485, 15.816564, 24.89801506,
15.816564, 18.7761807, 18.4366872, 19.45790554, 19.78370979,
14.98973306, 15.89869952, 29.06502396, 16.14236824, 10.74880219,
13.47843943, 10.5982204, 24.61875428, 10.74880219, 12.47364819,
16.95277207, 12.41889117, 13.44832307, 9.984941821, 9.451060917,
12.59137577, 13.38261465, 15.14852841, 21.65913758, 12.57494867,
12.40520192, 10.75701574, 15.16495551, 15.67419576, 22.52703628,
13.31143053, 16.71457906, 12.98288843, 32.16974675, 25.3798768,
30.57084189, 22.14647502, 11.43874059, 13.25119781, 18.48049281,
25.81519507, 24.78028747, 17.85626283, 27.70704997, 13.28952772,
8.703627652, 11.61396304, 35.04996578, 15.61943874, 8.703627652,
13.33333333, 10.56810404, 11.34017796, 13.5797399, 28.79671458,
12.56673511, 13.33333333, 12.55578371, 30.80082136, 23.63039014,
29.66461328, 13.25119781, 17.46748802, 8.703627652, 8.703627652,
21.21013005, 9.768651608, 13.46748802, 10.75427789, 13.24298426,
26.87474333, 27.43326489, 20.6899384, 10.0752909, 13.37713895,
28.38056126, 8.911704312, 24.62149213, 14.32443532, 10.24229979,
13.87268994, 10.54620123, 11.44421629, 21.68377823, 15.61943874,
27.97809719, 28.90075291, 28.90075291, 24.64339493, 14.32443532,
10.61190965, 15.8110883, 14.25051335, 14.25051335, 13.64818617,
26.05338809, 13.69746749, 23.98083504, 16.68172485, 20.42162902,
12.68172485, 11.51813826, 16.65982204, 14.32443532, 15.49897331,
35.04996578, 18.70225873, 17.47570157, 14.66666667, 26.83915127,
13.29226557, 18.14647502, 25.70020534, 14.67761807, 16.61601643,
9.812457221, 15.96714579, 24.41341547, 8.911704312, 17.61806982,
11.87953457, 11.80561259, 19.15400411, 17.61806982, 15.70704997,
12.35318275, 18.12457221, 16.8733744, 32.02464066, 32.02464066,
25.30047912, 16.13415469, 19.37850787, 26.50513347, 15.89869952,
13.79055441, 25.42368241, 16.05201916, 15.43874059, 9.158110883,
14.39014374, 22.12183436, 15.70704997, 15.35934292, 11.44421629,
28.45995893, 17.06502396, 14.39014374, 26.32991102, 12.38056126,
16.42436687, 13.37713895, 11.70978782, 17.62628337, 16.13415469,
17.61806982, 15.11019849, 14.09993155, 21.89185489, 13.80150582,
16.8733744, 17.73305955, 25.55509925, 14.75975359, 24.03559206,
14.36002738, 12.73100616, 16.09034908, 18.12457221, 15.11019849,
13.69472964, 23.03901437, 16.94182067, 15.70704997, 13.99315537,
21.89185489, 15.65776865, 19.25530459, 10.43394935, 12.72826831,
24.41341547, 24.25735797, 37.41820671, 37.41820671, 25.25393566,
24.78028747, 25.25393566, 37.41820671, 12.11772758, 14.19575633,
14.091718, 15.10746064, 13.16906229, 12.09856263, 13.3798768,
14.39014374, 36.3504449, 22.68035592, 11.21149897, 12.73100616,
13.34702259, 14.5982204, 11.31827515, 15.14579055, 15.44969199,
15.65776865, 12.12867899, 12.43531828, 12.72005476, 14.11909651,
24.25735797)), row.names = c(7L, 303L, 323L, 372L, 391L,
240L, 311L, 38L, 46L, 94L, 149L, 154L, 185L, 362L, 40L, 70L,
98L, 262L, 305L, 73L, 74L, 77L, 306L, 374L, 104L, 397L, 14L,
43L, 188L, 248L, 370L, 50L, 101L, 143L, 25L, 155L, 251L, 37L,
173L, 208L, 263L, 49L, 383L, 389L, 30L, 237L, 353L, 156L, 283L,
288L, 302L, 325L, 33L, 158L, 159L, 35L, 360L, 57L, 128L, 204L,
387L, 300L, 365L, 16L, 51L, 82L, 85L, 93L, 148L, 150L, 232L,
242L, 287L, 32L, 62L, 200L, 285L, 290L, 193L, 352L, 398L, 54L,
175L, 203L, 324L, 69L, 195L, 92L, 106L, 141L, 189L, 218L, 347L,
394L, 23L, 24L, 120L, 166L, 257L, 349L, 6L, 118L, 235L, 266L,
269L, 275L, 282L, 390L, 122L, 153L, 330L, 378L, 53L, 88L, 229L,
241L, 314L, 135L, 278L, 332L, 384L, 64L, 168L, 207L, 212L, 359L,
329L, 338L, 130L, 67L, 108L, 286L, 316L, 182L, 254L, 113L, 215L,
247L, 273L, 322L, 336L, 27L, 102L, 162L, 171L, 270L, 326L, 19L,
205L, 210L, 307L, 333L, 358L, 375L, 41L, 111L, 179L, 226L, 2L,
277L, 367L, 68L, 83L, 147L, 180L, 260L, 354L, 144L, 81L, 342L,
103L, 217L, 321L, 376L, 131L, 280L, 39L, 267L, 291L, 301L, 400L,
11L, 36L, 152L, 177L, 377L, 21L, 201L, 236L, 281L, 312L, 331L,
355L, 369L, 8L, 176L, 202L, 385L, 45L, 327L, 12L, 138L, 151L,
157L, 233L, 95L, 258L, 279L, 224L, 239L, 243L, 310L, 328L, 63L,
191L, 214L, 227L, 356L, 80L, 110L, 366L, 97L, 107L, 293L, 373L,
117L, 335L, 22L, 160L, 209L, 221L, 230L, 268L, 55L, 163L, 284L,
5L, 10L, 76L, 132L, 222L, 256L, 399L, 228L, 127L, 343L, 357L,
133L, 259L, 334L, 261L, 341L, 382L, 393L, 395L, 213L, 219L, 249L,
289L, 44L, 126L, 368L, 42L, 72L, 196L, 297L, 308L, 320L, 84L,
137L, 172L, 60L, 129L, 142L, 186L, 197L, 319L, 15L, 109L, 115L,
116L, 125L, 199L, 223L, 190L, 245L, 346L, 396L, 146L, 364L, 1L,
29L, 192L, 112L, 170L, 315L, 164L, 225L, 231L, 255L, 274L, 345L,
65L, 96L, 264L, 4L, 28L, 31L, 59L, 87L, 250L, 271L, 295L, 161L,
198L, 265L, 339L, 18L, 26L, 114L, 124L, 174L, 145L, 304L, 105L,
119L, 140L, 238L, 381L, 48L, 52L, 71L, 351L, 371L, 244L, 253L,
294L, 340L, 20L, 75L, 86L, 165L, 167L, 47L, 89L, 298L, 318L,
211L, 350L, 380L, 66L, 79L, 90L, 234L, 309L, 61L, 99L, 139L,
276L, 299L, 344L, 348L, 361L, 313L, 337L, 379L, 9L, 58L, 181L,
187L, 17L, 100L, 121L, 123L, 184L, 206L, 220L, 178L, 292L, 386L,
392L, 194L, 252L, 272L, 3L, 56L, 134L, 136L, 183L, 216L, 246L,
296L, 363L, 169L, 388L, 78L, 34L, 13L, 91L, 317L), class = "data.frame")
This is the code to fit different models and select the best model based on the lowest BIC
library(nlme)
library(splines)
library(tidyverse)
models <- map(c(3:6), possibly(~ {
lme(as.formula(paste("y ~", capture.output(print(call("ns", quote(age), .x))))),
data = dat, random = ~ age | id, method = "ML")
}, otherwise = NA_real_))
(models_bic <- unlist(map(models, BIC)))
(best_model <- which.min(models_bic))
(best_model <- models[[best_model]])
This is what I want to do to get the predictions
best_model_pred <- data.frame(age =seq(min(dat$age), max(dat$age), length = 100))
best_model_pred$pred <- predict(best_model, best_model_pred, level = 0)
Error in model.matrix.default(fixed, model.frame(delete.response(Terms), :
model frame and formula mismatch in model.matrix()

The issue is that the function lme takes formula literally as you put it there. In your case it's this
as.formula(paste("y ~ ns(age, ", .x, ")"))
That only works inside the map loop. Print best model and take a look ath the fourth line.
To fix it you can take it a step further and construct entire call as string and then evaluate it.
models <- map(c(3:6), possibly(~ {
eval(parse(text = paste0("lme(y ~ ns(age, ", .x, "), data = dat, random = ~ age | id, method = 'ML')")))
}, otherwise = NA_real_))
It's not perfect but it works :)
Unrelated note: I'd use use safely instead of possibly so that errors are transformed into NULL and then use compact to remove missing models.
models <- map(3:6, safely(your_function)) %>% compact()

Related

Sample a single value from list of vectors multiple times

I have the following list of vectors:
list(c(663L, 705L, 680L, 769L, 775L, 327L, 665L, 805L, 808L,
689L, 774L, 831L, 832L, 217L, 739L, 918L, 354L, 373L, 764L, 691L,
839L, 372L, 146L, 840L, 727L, 728L, 617L, 647L, 159L, 161L, 581L,
142L, 618L, 332L, 585L, 134L, 809L, 154L, 158L, 133L, 448L, 736L,
737L, 815L, 876L, 151L, 750L, 701L, 778L, 861L, 584L, 692L, 427L,
455L, 601L, 412L, 432L, 449L, 457L, 456L, 620L, 124L, 125L, 679L,
329L, 667L, 697L, 806L, 807L, 312L, 315L, 733L, 821L, 222L, 583L,
702L, 631L, 642L, 812L, 850L, 726L, 853L, 129L, 660L, 799L, 410L,
188L, 798L, 130L, 703L, 341L, 826L, 137L, 253L, 123L, 827L, 844L,
786L, 655L, 879L, 695L, 749L, 866L, 820L, 890L, 889L, 888L, 694L,
744L, 746L, 813L, 818L, 868L, 873L, 872L, 869L, 870L, 414L, 738L,
751L, 208L, 209L, 210L, 899L, 900L, 901L, 903L, 902L, 904L, 913L,
911L, 912L, 767L, 917L, 777L, 521L, 396L, 397L, 915L, 277L, 529L,
740L, 509L, 508L, 524L, 224L, 790L, 791L, 698L, 725L, 696L, 817L,
802L, 897L, 898L, 787L, 788L, 789L, 462L, 356L, 395L, 693L, 745L,
469L, 519L, 336L, 355L, 792L, 556L, 375L, 398L, 358L, 399L, 720L,
539L, 558L, 331L, 166L, 167L, 128L, 131L, 214L, 239L, 269L, 276L,
213L, 337L, 176L, 304L, 503L, 394L, 296L, 298L, 211L, 223L, 238L,
338L, 487L, 490L, 488L, 489L, 273L, 274L, 892L, 300L, 301L, 816L,
819L, 275L, 752L, 139L, 206L, 420L, 793L, 215L, 320L, 321L, 676L,
226L, 699L, 325L, 252L, 319L, 672L, 236L, 306L, 743L, 237L, 439L,
212L, 675L, 333L, 429L, 476L, 478L, 704L, 768L, 440L, 517L, 518L,
776L, 810L, 413L, 554L, 555L, 765L, 622L, 626L, 624L, 625L, 231L,
577L, 335L, 628L, 629L, 511L, 339L, 352L, 353L, 138L, 578L, 349L,
496L, 611L, 606L, 614L, 612L, 613L, 607L, 609L, 608L, 610L, 328L,
194L, 195L, 639L, 183L, 632L, 340L, 418L, 308L, 435L, 436L, 437L,
543L, 905L, 914L, 428L, 374L, 444L, 502L, 825L, 510L, 732L, 557L,
559L, 730L, 566L, 567L, 506L, 520L, 531L, 534L, 549L, 630L, 174L,
175L, 140L, 677L, 426L, 377L, 392L, 196L, 186L, 197L, 144L, 141L,
407L), c(887L, 886L, 884L, 885L), c(528L, 527L, 525L, 526L),
c(70L, 71L, 75L, 77L, 72L, 73L, 74L, 76L), c(111L, 109L,
110L, 98L, 120L, 112L, 116L, 103L, 106L, 93L, 95L, 94L, 119L,
117L, 99L, 118L), c(87L, 88L, 89L, 81L, 82L, 83L, 84L, 85L,
86L, 91L, 92L, 949L, 126L, 127L, 90L, 122L), c(530L, 185L,
202L, 363L, 729L, 880L, 368L, 401L, 391L, 405L, 906L, 513L,
652L, 708L, 552L, 766L, 505L, 382L, 383L, 803L, 565L, 571L,
572L, 688L, 460L, 480L, 661L, 153L, 859L, 256L, 268L, 685L,
763L, 147L, 865L, 874L, 741L, 754L, 858L, 878L, 220L, 225L,
307L, 317L, 313L, 758L, 314L, 848L, 163L, 165L, 387L, 452L,
378L, 270L, 271L, 464L, 302L, 280L, 283L, 504L, 712L, 281L,
801L), c(595L, 596L, 597L, 908L, 841L, 842L, 493L, 669L,
783L, 360L, 507L, 500L, 501L, 823L, 824L, 779L, 891L, 780L,
781L, 760L, 379L, 756L, 762L, 857L, 814L, 759L, 854L, 867L,
871L, 856L, 855L, 877L, 851L, 852L, 318L, 735L, 811L, 619L,
863L, 322L, 326L, 310L, 309L, 323L, 324L, 459L, 700L, 461L,
687L, 664L, 668L, 587L, 590L, 562L, 563L, 564L, 574L, 569L,
573L, 342L, 547L, 561L, 568L, 575L, 662L, 240L, 316L, 311L,
761L, 443L, 445L, 446L, 836L, 755L, 909L, 910L, 830L, 533L,
881L, 916L, 716L, 843L, 666L, 690L, 670L, 551L, 173L, 466L,
415L, 748L, 718L, 860L, 673L, 747L, 742L, 846L, 875L, 576L,
345L, 594L, 604L, 644L, 603L, 602L, 605L, 598L, 441L, 442L,
450L, 453L, 616L, 447L, 454L, 419L, 433L, 822L, 431L, 634L,
633L, 645L, 586L, 615L, 359L, 421L, 361L, 385L, 386L, 347L,
351L, 757L, 834L, 835L, 155L, 481L, 169L, 390L, 170L, 636L,
417L, 711L, 160L, 162L, 143L, 156L, 593L, 150L, 657L, 656L,
658L, 152L, 648L, 357L, 380L, 434L, 829L, 847L, 580L, 145L,
678L, 164L, 430L, 203L, 204L, 198L, 199L, 635L, 637L, 640L,
641L, 544L, 179L, 828L, 148L, 254L, 184L, 653L, 650L, 651L,
191L, 200L, 201L, 177L, 178L, 181L, 182L, 207L, 495L, 424L,
381L, 403L, 282L, 404L, 406L, 710L, 278L, 279L, 494L, 484L,
485L, 486L, 425L, 498L, 497L, 334L, 348L, 371L, 463L, 467L,
686L, 362L, 402L, 384L, 400L, 230L, 344L, 671L, 684L, 546L,
560L, 709L, 479L, 550L, 570L, 388L, 389L, 149L, 190L, 221L,
376L), c(1364L, 1373L, 1371L, 1372L, 1148L, 1211L, 1369L,
1370L, 1165L, 1377L, 1378L, 1112L, 1140L, 1139L, 1143L, 1019L,
1006L, 1247L, 1263L, 1191L, 1208L, 1059L, 1062L, 1115L, 1451L,
1448L, 1449L, 1113L, 1144L, 1458L, 1498L, 1499L, 955L, 968L,
1093L, 1365L, 1141L, 1265L, 1248L, 1249L, 1040L, 985L, 1119L,
1107L, 986L, 1197L, 1317L, 975L, 1155L, 1267L, 1215L, 1266L,
1106L, 1111L, 1058L, 1060L, 1457L, 1250L, 1314L, 1234L, 1146L,
1315L, 1101L, 1116L, 1310L, 1335L, 1041L, 1114L, 1124L, 954L,
1351L, 1358L, 1011L, 1409L, 1049L, 1167L, 1341L, 1278L, 1316L,
1392L, 1418L, 1307L, 1342L, 1086L, 1356L, 1432L, 1434L, 1466L,
1467L, 1479L, 1501L, 1487L, 1496L, 1495L, 1497L, 1476L, 1505L,
1506L, 1508L, 1507L, 1510L, 944L, 950L), c(1069L, 1094L,
1200L, 1306L, 981L, 1110L, 1206L, 1308L, 1047L, 1207L, 1312L,
1313L, 1109L, 1334L, 1309L, 1332L), c(1237L, 1242L, 1240L,
1243L, 1239L, 1238L, 1241L, 1343L, 1181L, 1301L, 1298L, 1300L,
1117L, 1133L, 1061L, 1419L, 1416L, 1417L, 1453L, 1311L, 1339L,
1333L, 1336L, 1028L, 1079L, 1459L, 1486L, 1192L, 1010L, 1012L,
1125L, 1199L, 1142L, 1205L, 1196L, 1198L, 951L, 1137L, 1128L,
1435L), c(930L, 942L, 922L, 940L, 941L, 943L, 920L, 921L,
923L, 925L, 927L, 928L, 924L, 926L, 931L, 932L, 937L, 938L,
939L, 935L, 936L, 929L, 933L, 934L), c(956L, 1051L, 1433L,
1468L, 1077L, 973L, 1438L, 1009L, 1158L, 1082L, 1170L, 1195L,
1177L, 1212L, 1213L, 1088L, 1153L, 1152L, 1354L, 959L, 1052L,
1176L, 1178L, 957L, 1376L, 1374L, 1375L, 1159L, 1223L, 1227L,
1268L, 1302L, 1275L, 1285L, 1016L, 1014L, 1126L, 1055L, 1102L,
1171L, 1327L, 1183L, 1274L, 1288L, 1296L, 1186L, 1297L, 1426L,
1454L, 1515L, 1078L, 989L, 990L, 980L, 1098L, 1150L, 1151L
), 78:79, c(1455L, 1475L, 1509L, 1477L, 1478L, 1494L, 1490L,
1491L, 1492L, 1427L, 1425L, 1473L, 1471L, 1472L, 1474L, 977L,
1179L, 1299L, 1290L, 1292L, 1480L, 1187L, 1295L, 1233L, 1188L,
1185L, 1293L, 1184L, 1294L, 1291L, 1175L, 1286L, 1424L, 1469L,
1502L, 1503L, 1421L, 1103L, 1488L, 1489L, 1092L, 1452L, 1350L,
1046L, 1166L, 1100L, 1305L, 1180L, 1182L, 1190L, 1289L, 979L,
961L, 1406L, 1273L, 1303L, 1456L, 1105L, 1331L, 1304L, 1407L,
994L, 1022L, 1021L, 1020L, 1025L, 1024L, 1023L, 1026L, 1216L,
1163L, 1161L, 1262L, 1156L, 1164L, 1230L, 1228L, 1224L, 80L,
953L, 962L, 974L, 992L, 1004L, 1005L, 1017L, 1031L, 1032L,
1029L, 1030L, 1057L, 982L, 1003L, 1007L, 1008L, 1042L, 1097L,
1089L, 1160L, 963L, 972L, 1070L, 1044L, 1431L, 1194L, 1204L,
993L, 1000L, 1001L, 1209L, 1210L, 1470L, 1287L, 1493L, 1075L,
1073L, 1074L, 1355L, 1090L, 1154L, 1357L, 1085L, 1087L, 1218L,
1504L, 1217L, 1174L, 1269L, 1270L, 1120L, 1272L, 1015L, 1018L,
946L, 1145L, 1397L, 971L, 1083L, 1284L, 1045L, 1048L, 1360L,
1361L, 1149L, 1282L, 1235L, 1236L, 1172L, 1367L, 1368L, 1345L,
964L, 976L, 1189L, 1281L, 1280L, 1279L, 1330L, 1328L, 1329L,
1157L, 1271L, 1324L, 1325L, 1081L, 1398L, 1391L, 1393L, 1405L,
1420L, 1104L, 1168L, 1201L, 1202L, 1338L, 1340L, 1277L, 1283L,
945L, 978L, 1422L, 1054L, 1076L, 960L, 1096L, 1091L, 1080L,
1169L, 1276L, 1050L, 1084L, 1035L, 1053L, 1095L, 1173L, 1056L,
1099L, 1138L, 997L, 1162L, 958L, 947L, 1344L), c(1222L, 1221L,
1219L, 1220L), c(1444L, 1446L, 1447L, 1445L, 1450L, 1132L,
1131L, 1130L, 1253L, 1462L, 1129L, 1254L, 965L, 966L, 967L,
1463L, 1134L, 1485L, 1483L, 1481L, 1482L, 1513L, 1465L, 1464L,
1512L, 1255L, 1258L, 1381L, 1318L, 1257L, 1323L, 1027L, 1251L,
1252L, 1214L, 1229L, 1256L, 1225L, 1226L, 1349L, 1352L, 1347L,
1348L, 1430L, 1428L, 1429L, 1436L, 1439L, 1440L, 952L, 1399L,
1389L, 1410L, 1385L, 1380L, 1401L, 1382L, 1366L, 1404L, 1403L,
1402L, 1400L, 1259L, 1415L, 1414L, 1413L, 1411L, 1412L, 1036L,
1039L, 1387L, 1386L, 1383L, 1379L, 1396L, 1394L, 1395L),
c(1322L, 1321L, 1319L, 1320L), c(998L, 1193L, 1072L, 991L,
999L, 1261L, 1326L, 1043L, 1037L, 1038L, 1353L, 1260L, 1390L,
1437L, 1346L, 1384L, 1408L, 1127L, 1423L, 1147L, 1135L, 1514L
), c(579L, 643L, 189L, 192L, 599L, 600L, 591L, 423L, 458L,
422L, 654L, 365L, 772L, 833L, 771L, 770L, 837L, 838L, 227L,
416L, 706L, 773L, 849L, 542L, 621L, 364L, 845L, 919L, 346L,
707L, 659L, 135L, 721L), c(305L, 255L, 795L, 800L, 719L,
734L, 794L, 1108L, 1136L, 1118L, 1071L, 1264L, 1203L, 1337L,
108L, 1232L, 1362L), c(674L, 796L, 864L, 235L, 724L, 408L,
731L, 723L, 722L, 548L, 168L, 797L, 132L, 205L, 649L, 180L,
582L, 330L, 157L, 465L, 499L, 536L, 516L, 883L), c(491L,
411L, 171L, 172L, 216L, 681L, 682L, 343L, 862L, 896L, 538L,
882L, 907L, 468L, 474L, 473L, 472L, 471L, 470L, 475L, 244L,
243L, 242L, 257L, 260L, 263L, 262L, 261L, 259L, 258L, 266L,
265L, 264L, 267L, 229L, 483L, 893L, 245L, 241L, 299L, 409L,
136L, 638L, 588L, 589L, 234L, 232L, 293L, 294L, 251L, 250L,
247L, 246L, 286L, 287L, 292L, 291L, 290L, 272L, 233L, 248L,
249L, 297L, 303L, 785L, 717L, 894L, 895L, 366L, 367L, 477L,
532L, 350L, 370L), c(289L, 288L, 284L, 285L), c(96L, 101L,
104L, 107L, 105L, 114L, 121L, 102L, 113L, 115L, 97L, 100L
), c(948L, 970L, 1033L, 969L, 996L, 987L, 988L, 995L, 1002L,
1034L, 1067L, 1068L, 1013L, 983L, 984L, 1460L, 1442L, 1500L,
1484L, 1246L, 1511L, 1461L, 1123L, 1443L, 1388L, 1063L, 1363L,
1064L, 1122L, 1359L, 1121L, 1231L, 1244L, 1245L, 1066L, 1065L,
1441L), c(295L, 438L, 753L, 782L, 219L, 228L, 714L, 369L,
553L, 393L, 713L, 683L, 784L, 492L, 715L, 482L, 541L, 592L,
451L, 627L, 187L, 193L, 804L, 623L, 646L, 514L, 515L, 522L,
512L, 523L, 545L, 218L, 535L, 537L, 540L), 16L, c(15L, 18L
), 1L, c(7L, 9L), 6L, 14L, 4L, 5L, 3L, 11L, 17L, 8L, 10L)
I want to sample a single value from each of the list entries for each iteration in order to create a large matrix of samples, meaning the I'll have 40 columns (the amount of groups) and 5000 rows (the amount of times to sample)
I tried the following:
# groups - is the list
# repetition - is 5000
as.matrix(sapply(groups, sample, repetition, TRUE))
This seem to work for small list, but when I try on the big list I get elements from other groups who shouldn't appear:
Example using the code above:
When you have vector of length 1 the sampling happens from 1:x. From ?sample :
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x
So when you do
set.seed(123)
sample(10, 1)
#[1] 3
It is selecting 1 number from 1 to 10. To avoid that from happening you can check length of vector in sapply :
sapply(groups, function(x) if(length(x) == 1) rep(x, repetition)
else sample(x, repetition, replace = TRUE))
So this will return the same number repetition number of times when the length of vector is 1.
We may list single values into a sub list, to avoid the 1:x "convenience". Example:
groups <- list(2, 9, 2:9, 22:99)
groups[lengths(groups) == 1] <- lapply(groups[lengths(groups) == 1], list)
str(groups)
# List of 4
# $ :List of 1
# ..$ : num 2
# $ :List of 1
# ..$ : num 9
# $ : int [1:8] 2 3 4 5 6 7 8 9
# $ : int [1:78] 22 23 24 25 26 27 28 29 30 31 ...
repetition <- 10
set.seed(42)
r <- t(replicate(repetition, sapply(groups, sample, 1, replace=TRUE)))
r
# [,1] [,2] [,3] [,4]
# [1,] 2 9 2 46
# [2,] 2 9 3 70
# [3,] 2 9 9 92
# [4,] 2 9 6 41
# [5,] 2 9 8 24
# [6,] 2 9 4 57
# [7,] 2 9 6 26
# [8,] 2 9 5 24
# [9,] 2 9 3 45
# [10,] 2 9 8 43
Note, that the sub lists of length one are sampled as lists and sapply simplifies them to integers internally using simplify2array (i.e. unlists them).
The manual of sample gives the a solution for the case If ‘x’ has length 1, is numeric in the examples with:
resample <- function(x, ...) x[sample.int(length(x), ...)]
set.seed(42)
repetition <- 5
as.matrix(sapply(groups, resample, repetition, TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40]
#[1,] 778 886 525 77 106 90 153 310 1059 1110 1243 937 1051 79 1489 1220 1394 1320 1408 706 1232 180 907 288 96 1231 482 16 18 1 9 6 14 4 5 3 11 17 8 10
#[2,] 802 887 526 71 106 126 878 857 1250 1094 1205 936 989 78 1478 1222 1253 1321 1127 838 1362 723 216 284 97 1442 545 16 18 1 7 6 14 4 5 3 11 17 8 10
#[3,] 222 885 528 71 95 82 202 145 1370 1109 1196 938 1433 79 1424 1221 1214 1321 999 845 1264 516 350 288 113 983 537 16 15 1 7 6 14 4 5 3 11 17 8 10
#[4,] 237 884 528 74 98 81 280 309 1365 1313 1028 943 980 79 1277 1222 1463 1321 1437 772 1136 408 287 288 113 987 523 16 18 1 9 6 14 4 5 3 11 17 8 10
#[5,] 224 885 527 75 120 88 763 143 1114 981 1336 943 1052 79 1044 1222 1036 1320 1043 227 1071 674 473 285 104 1441 218 16 18 1 9 6 14 4 5 3 11 17 8 10
Where sample.int takes the number of items to choose from and sample elements from which to choose or a positive integer.

fit latent trajectory models with varying number of classes and save models in a list

I would like fit several latent trajectory models where the only difference between models is the number of groups (2 to 4). How can I automate this process and save the models in a list.
This is the example data
library(lcmm)
library(splines)
library(tidyverse)
data <- structure(list(id = c(1001L, 1001L, 1001L, 1001L, 1001L, 1002L,
1003L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1004L, 1005L,
1005L, 1005L, 1005L, 1005L, 1006L, 1006L, 1006L, 1006L, 1006L,
1007L, 1007L, 1008L, 1008L, 1008L, 1008L, 1008L, 1009L, 1009L,
1009L, 1010L, 1010L, 1010L, 1011L, 1012L, 1012L, 1012L, 1013L,
1013L, 1014L, 1015L, 1015L, 1015L, 1016L, 1016L, 1016L, 1016L,
1016L, 1017L, 1017L, 1018L, 1020L, 1020L, 1021L, 1021L, 1021L,
1021L, 1022L, 1022L, 1023L, 1023L, 1023L, 1023L, 1023L, 1023L,
1023L, 1023L, 1023L, 1023L, 1024L, 1024L, 1024L, 1024L, 1024L,
1025L, 1025L, 1025L, 1026L, 1026L, 1026L, 1026L, 1027L, 1027L,
1028L, 1028L, 1028L, 1028L, 1028L, 1028L, 1028L, 1029L, 1029L,
1029L, 1029L, 1029L, 1029L, 1030L, 1030L, 1030L, 1030L, 1030L,
1030L, 1030L, 1030L, 1031L, 1031L, 1031L, 1031L, 1032L, 1032L,
1032L, 1032L, 1032L, 1033L, 1033L, 1033L, 1033L, 1034L, 1034L,
1034L, 1034L, 1034L, 1035L, 1035L, 1036L, 1037L, 1037L, 1037L,
1037L, 1039L, 1039L, 1040L, 1040L, 1040L, 1040L, 1040L, 1040L,
1041L, 1041L, 1041L, 1041L, 1041L, 1041L, 1042L, 1042L, 1042L,
1042L, 1042L, 1042L, 1042L, 1043L, 1043L, 1043L, 1043L, 1044L,
1044L, 1044L, 1045L, 1045L, 1045L, 1045L, 1045L, 1045L, 1047L,
1048L, 1048L, 1049L, 1049L, 1049L, 1049L, 1051L, 1051L, 1052L,
1052L, 1052L, 1052L, 1052L, 1053L, 1053L, 1053L, 1053L, 1053L,
1054L, 1054L, 1054L, 1054L, 1054L, 1054L, 1054L, 1054L, 1056L,
1056L, 1056L, 1056L, 1057L, 1057L, 1058L, 1058L, 1058L, 1058L,
1058L, 1060L, 1060L, 1060L, 1061L, 1061L, 1061L, 1061L, 1061L,
1062L, 1062L, 1062L, 1062L, 1062L, 1063L, 1063L, 1063L, 1064L,
1064L, 1064L, 1064L, 1065L, 1065L, 1066L, 1066L, 1066L, 1066L,
1066L, 1066L, 1067L, 1067L, 1067L, 1068L, 1068L, 1068L, 1068L,
1068L, 1068L, 1068L, 1069L, 1070L, 1070L, 1070L, 1071L, 1071L,
1071L, 1072L, 1072L, 1072L, 1072L, 1072L, 1073L, 1073L, 1073L,
1073L, 1074L, 1074L, 1074L, 1075L, 1075L, 1075L, 1075L, 1075L,
1075L, 1076L, 1076L, 1076L, 1077L, 1077L, 1077L, 1077L, 1077L,
1077L, 1078L, 1078L, 1078L, 1078L, 1078L, 1078L, 1078L, 1080L,
1080L, 1080L, 1080L, 1081L, 1081L, 1082L, 1082L, 1082L, 1083L,
1083L, 1084L, 1085L, 1085L, 1085L, 1085L, 1085L, 1085L, 1086L,
1086L, 1086L, 1087L, 1087L, 1087L, 1087L, 1087L, 1087L, 1087L,
1087L, 1088L, 1088L, 1088L, 1088L, 1089L, 1089L, 1089L, 1089L,
1089L, 1090L, 1090L, 1091L, 1091L, 1091L, 1091L, 1091L, 1092L,
1092L, 1092L, 1092L, 1092L, 1093L, 1093L, 1093L, 1093L, 1094L,
1094L, 1094L, 1094L, 1094L, 1095L, 1095L, 1095L, 1095L, 1096L,
1097L, 1097L, 1098L, 1098L, 1098L, 1098L, 1098L, 1099L, 1099L,
1099L, 1099L, 1099L, 1099L, 1099L, 1099L, 1100L, 1100L, 1100L,
1101L, 1101L, 1101L, 1101L, 1103L, 1103L, 1103L, 1103L, 1103L,
1103L, 1103L, 1104L, 1104L, 1104L, 1104L, 1105L, 1105L, 1105L,
1106L, 1106L, 1106L, 1106L, 1106L, 1106L, 1106L, 1106L, 1106L,
1107L, 1108L, 1110L, 1111L, 1112L, 1117L, 1123L), y = c(1934.047646,
1075.598345, 1956.214821, 2000.38538, 2000.38538, 732.315937,
3119.86, 624.951231, 791.2764892, 1884.530826, 624.951231, 1047.57,
1047.57, 791.2764892, 1238.306103, 1555.042976, 2547.870529,
2547.870529, 2467.385, 1181.635212, 1181.635212, 565.306282,
2016.027874, 2016.027874, 712.6134567, 635.2537841, 2167.362267,
2575.574188, 2167.362267, 2480.028259, 2575.574188, 2875.363243,
1180.139938, 2828.037147, 3017.119362, 2722.940933, 2167.92,
2409.652458, 2245.442558, 724.1520328, 635.6034756, 1649.08326,
966.8182507, 865.2717723, 1570.23, 916.1300105, 1180.999973,
2351.32885, 2418.851707, 2290.038887, 2224.060562, 2509.52, 1174.589081,
1540.219376, 2692.26, 1300.899734, 1100.650177, 1786.628242,
1705.842979, 543.8596134, 1786.628242, 2115.374241, 2331.46,
875.949604, 2241.945103, 2319.666939, 2316.220234, 719.7139549,
2042.803307, 719.7139549, 1132.977503, 875.949604, 2316.220234,
1737.18, 1351.629826, 1291.44593, 1291.44593, 1108.26586, 1028.979719,
1291.44593, 2068.934227, 2440.784416, 1036.72, 894.6663704, 2449.184731,
1109.9, 672.9310664, 2072.320354, 2114.215416, 2114.215416, 1805.422001,
2461.18, 2101.374248, 2105.879, 1600.086481, 2866.84, 1600.086481,
2807.311, 3055.569931, 1600.086481, 2602.287521, 2690.007614,
620.5975037, 2608.4, 2722.3, 2713.66185, 2608.4, 1590.002, 2198.211,
2488.097725, 2198.211, 2322.616348, 2627.1, 2418.328346, 2601.661034,
531.7369251, 811.9494571, 884.31, 768.0526981, 652.1271248, 768.0526981,
2767.479, 1047.144354, 1047.144354, 1995.119, 1995.119, 707.6093158,
707.6093158, 1120.650104, 3036.591904, 3036.591904, 3081.86,
1193.583691, 2056.569244, 1823.155, 1238.948124, 2124.685, 887.20438,
1823.155, 2056.569244, 2056.569244, 2560.155342, 3095.923164,
3095.923164, 3003.729011, 2861.12, 2560.155342, 2735.26, 822.8209591,
1648.951, 1648.951, 1648.951, 822.8209591, 906.7692623, 582.787096,
1286.45, 797.2365359, 2566.770554, 2666.41, 2666.41, 2045.320816,
2401.21, 2401.21, 2583.2, 2581.32, 2622.357, 2581.32, 2588.462498,
442.433671, 1251.627064, 406.2565479, 2108.787437, 983.1101169,
2102.085403, 1155.713411, 1909.797131, 2871.55, 2711.07, 2883.22245,
2883.22245, 2711.07, 3027.103172, 3108.21537, 3007.87294, 3208.963631,
3108.21537, 2617.91, 2457.464466, 2890.51, 2698.48214, 2700.723,
2700.723, 2817.668579, 2700.723, 1349.90691, 1476.19994, 1552.95,
1349.90691, 925.8325004, 1258.28, 840.1875095, 2405.175911, 840.1875095,
1056.678543, 1571.936, 1210.89, 1210.89, 673.7005405, 687.7842464,
1016.86, 1217.866, 1493.791817, 2246.726913, 1054.821, 1054.821,
563.6580887, 1054.821, 1540.429863, 2209.006493, 1437.835186,
2191.308, 1412.128944, 2724.164597, 2791.705185, 2727.774208,
2070.451198, 866.7974147, 1661.082638, 2108.271309, 2411.515434,
2342.026085, 2071.06, 2258.321014, 1537.06, 760.6319065, 867.7596569,
1907.60466, 1770.658, 760.6319065, 912.8781966, 912.8781966,
912.8781966, 1257.222706, 2586.922356, 1608.28, 962.5674305,
1085.451181, 2539.218132, 2535.526085, 2561.60054, 1600.198,
2100.048149, 758.3851737, 758.3851737, 2643.373329, 367.7795143,
866.0683727, 718.5049658, 866.0683727, 1906.694649, 2291.48,
2190.560314, 744.1710777, 1498.981777, 2460.912292, 590.1345787,
2487.559135, 1855.601353, 660.9104843, 1116.08, 792.929533, 708.8373737,
2272.232933, 1801.729801, 2299.800095, 2272.232933, 2299.800095,
1895.828438, 1757.75, 1050.279345, 1757.75, 1326.09478, 1326.09478,
1633.119305, 1558, 1167.971405, 1828.16, 1788.571758, 2175.469,
1071.039494, 941.6030864, 2053.067215, 1461.02132, 1597.646778,
1885.321567, 2195.704372, 2195.704372, 1675.768558, 3157.550789,
1565.173126, 2195.704372, 3157.550789, 2404.836883, 2541.045593,
585.7223682, 2465.177761, 2678.462074, 500.3733997, 2465.177761,
781.342, 898.3551559, 2465.177761, 2465.177761, 1807.02, 1418.888027,
1797.36, 1807.02, 2200.06, 2218.369926, 2200.06, 1986.642735,
2088.292, 2069.139, 1507.901432, 2061.395798, 2075.164864, 2081.913219,
2081.913219, 483.8579493, 1857.88, 2578.772636, 1857.88, 1857.88,
1039.632153, 2288.28, 2288.28, 1831.349922, 2349.23, 933.1002788,
2626.298935, 1521.744, 933.1002788, 2626.298935, 1984.760715,
2450.333, 1732.339031, 1984.760715, 2731.9, 869.2320918, 1785.72,
1922.798, 3081.28, 1508.8, 2421.288597, 1922.798, 1268.074959,
1569.05, 1808.115, 1569.05, 1268.074959, 2165.724808, 2165.724808,
1808.115, 2084.149837, 2693.027184, 2464.489, 2607.653496, 1012.837271,
1012.837271, 2673.190872, 2635.290516, 2773.42, 2635.290516,
2654.772674, 2377.905655, 2679.014969, 2654.772674, 1226.40016,
1470.69, 1273.789799, 2294.926086, 1226.40016, 1470.69, 1273.789799,
1873.817, 2274.930534, 2317.429165, 959.1709613, 1328.159428,
1328.159428, 1328.159428, 959.1709613, 1630.28, 1610.54982, 2507.05302,
750.467966, 750.467966, 821.2255058, 802.8240452, 2829.47879),
age = c(31.54004107, 11.95071869, 27.88501027, 27.88501027,
25.07871321, 10.90759754, 25.70020534, 9.560574949, 11.17864476,
15.8384668, 9.560574949, 11.23613963, 14.01232033, 10.54620123,
12.89527721, 14.52977413, 24.96919918, 24.72005476, 23.95893224,
13.31690623, 11.52087611, 9.927446954, 22.10814511, 16.44353183,
10.90759754, 7.991786448, 17.26488706, 23.95893224, 15.66872005,
17.63723477, 24.72005476, 30.97330595, 11.52087611, 17.5633128,
30.11088296, 23.31279945, 17.26488706, 20.58590007, 28.27926078,
11.66324435, 9.927446954, 13.92744695, 11.20328542, 12.70362765,
13.52498289, 12.21355236, 13.80150582, 22.81724846, 39.3045859,
16.62696783, 22.63107461, 29.86447639, 12.54483231, 14.42299795,
34.27789185, 12.91170431, 12.25462012, 21.81245722, 21.81245722,
10.05065024, 23.6659822, 16.22450376, 28.74743326, 12.70362765,
35.43052704, 21.21013005, 19.28542094, 12.77207392, 16.59411362,
12.12867899, 11.29637235, 11.81930185, 19.04449008, 19.93429158,
16.14236824, 12.85420945, 13.21560575, 11.61396304, 11.85763176,
13.3798768, 17.42915811, 24.41341547, 13.08418891, 11.6659822,
24.41341547, 12.06297057, 10.22861054, 26.15468857, 21.71937029,
20.1889117, 12.60232717, 25.39904175, 30.72689938, 19.22245038,
14.45037645, 24.77207392, 13.47570157, 17.87816564, 27.52635181,
15.16221766, 19.68514716, 21.67282683, 9.062286105, 20.43805613,
21.67282683, 21.24024641, 20.70362765, 13.5687885, 17.13347023,
28.11498973, 24.16974675, 18.19575633, 27.73442847, 15.52361396,
20.70362765, 11.76728268, 10.98699521, 11.51540041, 9.902806297,
13.05407255, 8.703627652, 25.60164271, 10.59000684, 10.59000684,
14.45859001, 14.05886379, 10.88295688, 10.75427789, 10.59000684,
26.50513347, 18.83093771, 22.86379192, 11.8384668, 15.04449008,
15.42505133, 14.14099932, 28.06844627, 11.51540041, 14.66119097,
13.79055441, 15.37850787, 22.58179329, 22.86379192, 30.0752909,
21.85900068, 25.60164271, 15.29089665, 26.79534565, 11.68514716,
15.42505133, 15.58384668, 15.08555784, 14.11909651, 11.6659822,
10.21765914, 12.1670089, 10.50239562, 23.3045859, 15.92607803,
22.58179329, 16.65982204, 20.58590007, 39.3045859, 32.56947296,
16.90349076, 25.12799452, 17.88364134, 19.46338125, 8.736481862,
14.14099932, 8.736481862, 17.68104038, 14.54893908, 19.22245038,
12.98562628, 22.45311431, 18.83093771, 38.68856947, 26.50513347,
25.44010951, 28.70910335, 19.21697467, 30.0752909, 26.50513347,
29.45106092, 33.31690623, 16.68172485, 15.816564, 24.89801506,
15.816564, 18.7761807, 18.4366872, 19.45790554, 19.78370979,
14.98973306, 15.89869952, 29.06502396, 16.14236824, 10.74880219,
13.47843943, 10.5982204, 24.61875428, 10.74880219, 12.47364819,
16.95277207, 12.41889117, 13.44832307, 9.984941821, 9.451060917,
12.59137577, 13.38261465, 15.14852841, 21.65913758, 12.57494867,
12.40520192, 10.75701574, 15.16495551, 15.67419576, 22.52703628,
13.31143053, 16.71457906, 12.98288843, 32.16974675, 25.3798768,
30.57084189, 22.14647502, 11.43874059, 13.25119781, 18.48049281,
25.81519507, 24.78028747, 17.85626283, 27.70704997, 13.28952772,
8.703627652, 11.61396304, 35.04996578, 15.61943874, 8.703627652,
13.33333333, 10.56810404, 11.34017796, 13.5797399, 28.79671458,
12.56673511, 13.33333333, 12.55578371, 30.80082136, 23.63039014,
29.66461328, 13.25119781, 17.46748802, 8.703627652, 8.703627652,
21.21013005, 9.768651608, 13.46748802, 10.75427789, 13.24298426,
26.87474333, 27.43326489, 20.6899384, 10.0752909, 13.37713895,
28.38056126, 8.911704312, 24.62149213, 14.32443532, 10.24229979,
13.87268994, 10.54620123, 11.44421629, 21.68377823, 15.61943874,
27.97809719, 28.90075291, 28.90075291, 24.64339493, 14.32443532,
10.61190965, 15.8110883, 14.25051335, 14.25051335, 13.64818617,
26.05338809, 13.69746749, 23.98083504, 16.68172485, 20.42162902,
12.68172485, 11.51813826, 16.65982204, 14.32443532, 15.49897331,
35.04996578, 18.70225873, 17.47570157, 14.66666667, 26.83915127,
13.29226557, 18.14647502, 25.70020534, 14.67761807, 16.61601643,
9.812457221, 15.96714579, 24.41341547, 8.911704312, 17.61806982,
11.87953457, 11.80561259, 19.15400411, 17.61806982, 15.70704997,
12.35318275, 18.12457221, 16.8733744, 32.02464066, 32.02464066,
25.30047912, 16.13415469, 19.37850787, 26.50513347, 15.89869952,
13.79055441, 25.42368241, 16.05201916, 15.43874059, 9.158110883,
14.39014374, 22.12183436, 15.70704997, 15.35934292, 11.44421629,
28.45995893, 17.06502396, 14.39014374, 26.32991102, 12.38056126,
16.42436687, 13.37713895, 11.70978782, 17.62628337, 16.13415469,
17.61806982, 15.11019849, 14.09993155, 21.89185489, 13.80150582,
16.8733744, 17.73305955, 25.55509925, 14.75975359, 24.03559206,
14.36002738, 12.73100616, 16.09034908, 18.12457221, 15.11019849,
13.69472964, 23.03901437, 16.94182067, 15.70704997, 13.99315537,
21.89185489, 15.65776865, 19.25530459, 10.43394935, 12.72826831,
24.41341547, 24.25735797, 37.41820671, 37.41820671, 25.25393566,
24.78028747, 25.25393566, 37.41820671, 12.11772758, 14.19575633,
14.091718, 15.10746064, 13.16906229, 12.09856263, 13.3798768,
14.39014374, 36.3504449, 22.68035592, 11.21149897, 12.73100616,
13.34702259, 14.5982204, 11.31827515, 15.14579055, 15.44969199,
15.65776865, 12.12867899, 12.43531828, 12.72005476, 14.11909651,
24.25735797)), row.names = c(7L, 303L, 323L, 372L, 391L,
240L, 311L, 38L, 46L, 94L, 149L, 154L, 185L, 362L, 40L, 70L,
98L, 262L, 305L, 73L, 74L, 77L, 306L, 374L, 104L, 397L, 14L,
43L, 188L, 248L, 370L, 50L, 101L, 143L, 25L, 155L, 251L, 37L,
173L, 208L, 263L, 49L, 383L, 389L, 30L, 237L, 353L, 156L, 283L,
288L, 302L, 325L, 33L, 158L, 159L, 35L, 360L, 57L, 128L, 204L,
387L, 300L, 365L, 16L, 51L, 82L, 85L, 93L, 148L, 150L, 232L,
242L, 287L, 32L, 62L, 200L, 285L, 290L, 193L, 352L, 398L, 54L,
175L, 203L, 324L, 69L, 195L, 92L, 106L, 141L, 189L, 218L, 347L,
394L, 23L, 24L, 120L, 166L, 257L, 349L, 6L, 118L, 235L, 266L,
269L, 275L, 282L, 390L, 122L, 153L, 330L, 378L, 53L, 88L, 229L,
241L, 314L, 135L, 278L, 332L, 384L, 64L, 168L, 207L, 212L, 359L,
329L, 338L, 130L, 67L, 108L, 286L, 316L, 182L, 254L, 113L, 215L,
247L, 273L, 322L, 336L, 27L, 102L, 162L, 171L, 270L, 326L, 19L,
205L, 210L, 307L, 333L, 358L, 375L, 41L, 111L, 179L, 226L, 2L,
277L, 367L, 68L, 83L, 147L, 180L, 260L, 354L, 144L, 81L, 342L,
103L, 217L, 321L, 376L, 131L, 280L, 39L, 267L, 291L, 301L, 400L,
11L, 36L, 152L, 177L, 377L, 21L, 201L, 236L, 281L, 312L, 331L,
355L, 369L, 8L, 176L, 202L, 385L, 45L, 327L, 12L, 138L, 151L,
157L, 233L, 95L, 258L, 279L, 224L, 239L, 243L, 310L, 328L, 63L,
191L, 214L, 227L, 356L, 80L, 110L, 366L, 97L, 107L, 293L, 373L,
117L, 335L, 22L, 160L, 209L, 221L, 230L, 268L, 55L, 163L, 284L,
5L, 10L, 76L, 132L, 222L, 256L, 399L, 228L, 127L, 343L, 357L,
133L, 259L, 334L, 261L, 341L, 382L, 393L, 395L, 213L, 219L, 249L,
289L, 44L, 126L, 368L, 42L, 72L, 196L, 297L, 308L, 320L, 84L,
137L, 172L, 60L, 129L, 142L, 186L, 197L, 319L, 15L, 109L, 115L,
116L, 125L, 199L, 223L, 190L, 245L, 346L, 396L, 146L, 364L, 1L,
29L, 192L, 112L, 170L, 315L, 164L, 225L, 231L, 255L, 274L, 345L,
65L, 96L, 264L, 4L, 28L, 31L, 59L, 87L, 250L, 271L, 295L, 161L,
198L, 265L, 339L, 18L, 26L, 114L, 124L, 174L, 145L, 304L, 105L,
119L, 140L, 238L, 381L, 48L, 52L, 71L, 351L, 371L, 244L, 253L,
294L, 340L, 20L, 75L, 86L, 165L, 167L, 47L, 89L, 298L, 318L,
211L, 350L, 380L, 66L, 79L, 90L, 234L, 309L, 61L, 99L, 139L,
276L, 299L, 344L, 348L, 361L, 313L, 337L, 379L, 9L, 58L, 181L,
187L, 17L, 100L, 121L, 123L, 184L, 206L, 220L, 178L, 292L, 386L,
392L, 194L, 252L, 272L, 3L, 56L, 134L, 136L, 183L, 216L, 246L,
296L, 363L, 169L, 388L, 78L, 34L, 13L, 91L, 317L), class = "data.frame")
The models I am trying to automate are the following (the only variable parameter between models is ng =)
lcmm2g <- lcmm::hlme(fixed = y ~ 1 + ns(age, df = 3),
mixture = ~ 1 + ns(age, df = 3),
random = ~ 1 + age,
ng = 2, nwg = TRUE,
idiag = FALSE,
data = data, subject = "id")
lcmm3g <- lcmm::hlme(fixed = y ~ 1 + ns(age, df = 3),
mixture = ~ 1 + ns(age, df = 3),
random = ~ 1 + age,
ng = 3, nwg = TRUE,
idiag = FALSE,
data = data, subject = "id")
lcmm4g <- lcmm::hlme(fixed = y ~ 1 + ns(age, df = 3),
mixture = ~ 1 + ns(age, df = 3),
random = ~ 1 + age,
ng = 4, nwg = TRUE,
idiag = FALSE,
data = data, subject = "id")
I guess you mean the following:
ng = 2:4
res = lapply(ng, function(x) lcmm::hlme(fixed = y ~ 1 + ns(age, df = 3),
mixture = ~ 1 + ns(age, df = 3),
random = ~ 1 + age,
ng = x, nwg = TRUE,
idiag = FALSE,
data = data, subject = "id"))
names(res) = ng
res$`3` # Gives out the model with 3 groups

Reduce large data-frame of samples to ensure maximum variability between samples

I have a list of vectors that each entry in the list is a vector of indices, for example:
list(c(563L, 688L, 630L, 160L, 568L, 908L, 457L, 798L, 3L, 558L,
56L, 389L, 506L, 106L, 807L, 556L, 809L, 63L, 343L, 242L, 470L,
894L, 804L, 970L, 406L, 881L, 893L, 952L, 126L, 827L, 282L, 910L,
61L, 66L, 763L, 787L, 337L, 41L, 712L, 144L, 450L, 12L, 200L,
574L, 945L, 236L, 336L, 684L, 280L, 721L, 233L, 686L, 64L, 504L,
174L, 934L, 40L, 850L, 26L, 799L, 853L, 978L), c(85L, 564L, 591L,
662L, 377L, 536L, 325L, 402L, 72L, 410L, 687L, 216L, 603L, 67L,
794L, 388L, 627L, 376L, 863L, 491L, 598L, 861L, 991L, 651L, 670L,
401L, 459L, 39L, 997L, 806L, 623L, 954L), c(427L, 791L, 212L,
779L, 657L, 740L, 800L, 838L, 104L, 985L, 167L, 486L, 685L, 739L,
60L, 862L, 130L, 134L, 175L, 375L, 683L, 885L, 575L, 859L, 341L,
726L, 472L, 802L, 76L, 424L, 177L, 624L, 189L, 334L, 378L, 329L,
581L, 224L, 851L, 218L, 993L, 678L, 248L, 365L, 188L, 774L, 58L,
813L, 514L, 59L, 777L, 485L, 606L, 480L, 826L, 350L, 608L, 27L,
661L, 775L, 340L, 10L, 207L, 260L, 483L, 150L, 205L), c(138L,
587L, 165L, 1L, 722L, 300L, 500L, 535L, 832L, 392L, 432L, 139L,
744L, 676L, 839L, 107L, 769L, 589L, 647L, 548L, 704L, 197L, 689L,
111L, 342L, 319L, 567L, 17L, 925L, 5L, 116L, 493L, 241L, 965L
), c(89L, 440L, 228L, 884L, 88L, 147L, 413L, 821L, 70L, 95L,
71L, 917L, 463L, 990L, 672L, 981L, 765L, 937L, 75L, 766L, 374L,
636L, 449L, 816L, 1000L, 356L, 629L), c(421L, 650L, 453L, 666L,
584L, 717L, 220L, 605L, 182L, 811L, 157L, 523L, 28L, 527L, 737L,
812L, 263L, 675L, 132L, 879L, 438L, 451L, 883L, 950L, 114L, 466L,
348L, 711L, 209L, 887L, 593L, 949L, 349L, 764L, 595L, 736L, 660L,
801L, 118L, 877L), c(23L, 231L, 78L, 988L, 55L, 57L, 753L, 994L,
437L, 202L, 842L, 190L, 822L, 968L, 331L, 733L, 782L, 886L, 105L,
943L, 743L, 815L, 311L, 498L, 792L, 795L, 184L, 728L, 573L, 771L,
117L, 251L, 192L, 735L, 15L, 776L, 295L, 677L, 631L, 235L, 237L,
705L, 856L, 97L, 725L), c(229L, 671L, 129L, 405L, 115L, 644L,
98L, 492L, 871L, 935L, 435L, 707L, 773L, 754L, 803L, 120L, 656L,
345L, 875L, 330L, 533L, 366L, 240L, 408L, 332L, 577L, 550L, 452L,
963L, 8L, 187L, 226L, 901L, 371L, 426L, 339L, 519L, 86L, 501L,
274L, 831L), c(16L, 79L, 68L, 477L, 133L, 659L, 2L, 973L, 264L,
953L, 90L, 234L, 420L, 588L, 21L, 788L, 363L, 539L, 227L, 565L,
30L, 642L, 786L, 982L, 347L, 680L, 52L, 96L, 592L, 409L, 643L,
81L, 419L, 245L, 658L, 416L, 590L, 448L, 819L, 277L, 357L, 442L,
789L, 516L, 980L, 93L, 998L, 149L, 166L, 299L, 454L, 529L, 986L,
127L, 541L, 45L, 829L, 289L, 418L, 179L, 310L, 113L, 729L), c(429L,
781L, 303L, 434L, 83L, 259L, 387L, 583L, 393L, 770L, 246L, 428L,
947L, 976L, 31L, 382L, 710L, 944L, 164L, 868L, 373L, 899L, 74L,
468L, 614L, 701L, 221L, 645L, 268L, 785L, 293L, 632L, 24L, 749L,
283L, 741L, 796L, 915L), c(258L, 844L, 649L, 752L, 474L, 613L,
351L, 551L, 309L, 380L, 497L, 724L, 327L, 992L, 845L, 607L, 818L,
693L, 914L, 291L, 720L, 633L, 974L, 367L, 639L, 94L, 467L, 92L,
522L, 141L, 496L, 276L, 542L, 665L, 695L, 634L, 602L, 913L, 396L,
597L, 443L, 892L, 65L, 394L, 222L, 778L, 169L, 960L, 35L, 655L,
422L, 927L, 154L, 215L, 262L, 203L, 880L, 217L, 423L, 755L, 904L,
180L, 620L), c(507L, 628L, 29L, 902L, 738L, 897L, 664L, 967L,
294L, 682L, 254L, 302L, 128L, 559L, 511L, 526L, 7L, 742L, 464L,
621L, 265L, 599L, 102L, 546L, 458L, 969L, 751L, 860L, 326L, 873L,
335L, 580L, 499L, 962L, 290L, 557L, 213L, 716L, 53L, 835L, 600L,
610L, 321L, 673L, 713L, 876L, 244L, 462L, 136L, 272L, 195L, 447L,
230L, 679L, 465L, 611L, 297L, 731L, 44L, 824L, 162L, 837L), c(446L,
561L, 391L, 652L, 857L, 946L, 560L, 784L, 854L, 204L, 512L, 82L,
455L, 372L, 407L, 328L, 808L, 152L, 178L, 185L, 543L, 108L, 473L,
490L, 955L, 719L, 757L, 198L, 338L, 223L, 919L, 531L, 653L, 734L,
923L, 487L, 637L, 398L, 431L, 46L, 848L, 324L, 948L, 43L, 183L,
288L, 697L, 87L, 307L, 42L, 571L, 360L, 433L, 390L, 569L, 956L,
534L, 6L, 381L, 549L, 301L, 920L, 69L, 322L, 267L, 503L, 285L,
961L, 370L, 425L), c(344L, 959L, 364L, 552L, 11L, 481L, 287L,
891L, 692L, 762L, 47L, 292L, 358L, 810L, 942L, 730L, 746L, 638L,
750L, 759L, 761L, 140L, 444L, 191L, 805L, 306L, 691L, 170L, 715L,
508L, 984L, 461L, 911L, 103L, 938L, 718L, 928L), c(124L, 284L,
123L, 513L, 417L, 933L, 121L, 168L, 208L, 385L, 32L, 273L, 869L,
932L, 397L, 509L, 239L, 797L, 379L, 723L, 898L, 163L, 320L, 833L,
151L, 906L, 648L, 732L, 279L, 834L, 489L, 840L, 783L, 971L, 49L,
145L, 253L, 352L, 137L, 261L, 247L, 143L, 544L, 109L, 921L, 830L,
972L, 585L, 690L, 609L, 703L, 250L, 708L, 225L, 889L, 181L, 987L,
54L, 502L, 148L, 355L, 888L, 579L, 983L, 825L, 855L, 62L, 918L,
979L, 586L, 681L, 384L, 709L, 333L, 758L, 194L, 368L), c(646L,
930L, 361L, 399L, 13L, 298L, 395L, 975L, 482L, 940L, 596L, 772L,
700L, 843L, 171L, 537L, 173L, 836L, 767L, 989L, 532L, 890L, 99L,
865L, 142L, 135L, 271L, 346L, 441L, 48L, 941L, 866L, 201L, 872L,
36L, 520L, 530L, 77L, 270L), c(238L, 699L, 22L, 50L, 615L, 702L,
4L, 469L, 101L, 314L, 616L, 995L, 996L, 414L, 566L, 249L, 572L,
369L, 553L, 158L, 159L, 199L, 317L, 515L, 517L, 524L, 562L, 19L,
476L, 20L, 146L, 618L, 895L, 312L, 912L), c(768L, 939L, 578L,
849L, 196L, 640L, 323L, 635L, 304L, 318L, 874L, 977L, 488L, 619L,
155L, 905L, 9L, 112L, 484L, 847L, 313L, 900L, 494L, 727L, 625L,
931L, 119L, 846L, 186L, 219L, 471L, 696L, 404L, 460L, 668L, 896L,
439L, 964L, 275L, 756L, 411L, 878L, 538L, 669L, 478L, 570L, 255L,
547L, 257L, 841L, 37L, 576L, 456L, 663L, 525L, 817L, 612L, 820L
), c(243L, 594L, 33L, 176L, 415L, 667L, 748L, 852L, 232L, 922L,
308L, 436L, 153L, 505L, 14L, 281L, 316L, 495L, 540L, 622L, 156L,
926L, 521L, 698L, 545L, 760L, 84L, 210L, 359L, 131L, 745L, 34L,
91L, 555L, 858L, 445L, 867L, 125L, 814L, 604L, 706L, 315L, 654L,
747L, 936L, 269L, 957L), c(80L, 924L, 110L, 193L, 958L, 296L,
475L, 18L, 907L, 626L, 999L, 278L, 362L, 51L, 641L, 211L, 929L,
122L, 694L, 73L, 353L, 25L, 100L, 305L, 864L, 214L, 790L, 286L,
518L, 674L, 206L, 400L, 554L, 903L, 780L, 916L, 38L, 430L, 617L,
823L, 172L, 966L, 412L, 951L, 510L, 828L, 479L, 909L, 266L, 582L,
870L, 882L, 161L, 252L, 256L, 383L, 403L, 601L, 386L, 793L, 528L,
354L, 714L))
Where each entry (or each nested list) represents a group obtained using a clustering method.
Now I have the following piece of code that takes this list of nested lists and the amount of samples required and returns a data-frame where each row represents a single sample and each column is a single sample from a group from one of the nested list.
groups_samples <- function(groups, repetition) {
return(as.data.frame(sapply(groups, sample, repetition, TRUE)))
}
Let's take the following as an example:
df <- groups_samples(ll, 100)
structure(list(V1 = c(106L, 686L, 721L, 200L, 970L, 910L, 556L,
807L, 908L, 568L, 688L, 389L, 56L, 470L, 630L, 893L, 574L, 236L,
804L, 798L, 721L, 934L, 763L, 807L, 457L, 568L, 684L, 934L, 787L,
450L, 688L, 64L, 568L, 934L, 894L, 558L, 568L, 343L, 450L, 853L,
336L, 64L, 712L, 144L, 934L, 144L, 809L, 763L, 457L, 763L, 558L,
457L, 688L, 763L, 504L, 66L, 406L, 881L, 3L, 343L, 556L, 799L,
712L, 568L, 61L, 799L, 908L, 688L, 64L, 881L, 236L, 787L, 66L,
160L, 853L, 343L, 809L, 200L, 827L, 893L, 894L, 799L, 470L, 406L,
337L, 389L, 63L, 952L, 236L, 337L, 763L, 41L, 945L, 144L, 56L,
978L, 233L, 978L, 881L, 910L), V2 = c(72L, 651L, 861L, 651L,
591L, 72L, 564L, 662L, 402L, 623L, 603L, 377L, 401L, 603L, 598L,
67L, 991L, 376L, 67L, 325L, 325L, 377L, 536L, 861L, 564L, 670L,
806L, 377L, 687L, 603L, 954L, 627L, 67L, 388L, 954L, 564L, 991L,
564L, 591L, 863L, 376L, 991L, 85L, 85L, 564L, 598L, 591L, 687L,
806L, 564L, 401L, 72L, 603L, 536L, 459L, 603L, 954L, 67L, 216L,
410L, 687L, 806L, 623L, 388L, 67L, 401L, 491L, 662L, 85L, 627L,
598L, 954L, 459L, 591L, 997L, 687L, 687L, 536L, 863L, 459L, 670L,
459L, 603L, 401L, 39L, 687L, 39L, 651L, 991L, 376L, 388L, 954L,
997L, 85L, 39L, 627L, 861L, 670L, 39L, 459L), V3 = c(424L, 775L,
862L, 791L, 683L, 826L, 60L, 205L, 802L, 740L, 58L, 985L, 683L,
341L, 838L, 212L, 993L, 59L, 851L, 657L, 375L, 885L, 150L, 167L,
218L, 205L, 58L, 260L, 341L, 661L, 791L, 350L, 726L, 378L, 188L,
150L, 60L, 813L, 774L, 104L, 207L, 207L, 485L, 514L, 424L, 514L,
859L, 130L, 350L, 188L, 188L, 740L, 859L, 177L, 212L, 802L, 606L,
104L, 608L, 260L, 329L, 993L, 427L, 427L, 485L, 472L, 859L, 424L,
661L, 514L, 791L, 678L, 993L, 726L, 188L, 340L, 483L, 150L, 340L,
514L, 606L, 248L, 205L, 188L, 581L, 813L, 175L, 657L, 862L, 775L,
212L, 341L, 27L, 885L, 575L, 334L, 350L, 486L, 483L, 340L), V4 = c(138L,
493L, 111L, 241L, 548L, 107L, 548L, 965L, 839L, 1L, 139L, 1L,
165L, 769L, 111L, 965L, 548L, 1L, 676L, 319L, 689L, 769L, 567L,
197L, 139L, 319L, 319L, 832L, 116L, 500L, 392L, 704L, 689L, 500L,
689L, 832L, 165L, 138L, 116L, 676L, 197L, 589L, 832L, 165L, 925L,
165L, 647L, 832L, 116L, 744L, 587L, 925L, 500L, 116L, 107L, 832L,
500L, 319L, 17L, 925L, 116L, 548L, 17L, 107L, 676L, 111L, 832L,
925L, 111L, 107L, 17L, 722L, 139L, 432L, 319L, 548L, 241L, 769L,
319L, 17L, 689L, 342L, 165L, 722L, 676L, 319L, 197L, 241L, 139L,
139L, 111L, 744L, 689L, 722L, 965L, 432L, 647L, 432L, 1L, 111L
), V5 = c(816L, 95L, 884L, 821L, 88L, 374L, 981L, 672L, 70L,
71L, 89L, 95L, 374L, 75L, 917L, 765L, 917L, 449L, 71L, 884L,
766L, 70L, 672L, 89L, 816L, 937L, 937L, 440L, 413L, 1000L, 1000L,
413L, 70L, 356L, 821L, 440L, 990L, 821L, 147L, 356L, 629L, 374L,
766L, 766L, 71L, 937L, 89L, 95L, 917L, 937L, 937L, 449L, 95L,
463L, 1000L, 440L, 821L, 884L, 917L, 816L, 89L, 1000L, 766L,
356L, 765L, 440L, 75L, 463L, 440L, 440L, 765L, 636L, 672L, 629L,
88L, 356L, 374L, 374L, 463L, 95L, 463L, 75L, 71L, 89L, 449L,
88L, 990L, 884L, 765L, 463L, 884L, 672L, 463L, 449L, 629L, 821L,
981L, 75L, 990L, 440L), V6 = c(650L, 675L, 737L, 466L, 883L,
877L, 209L, 887L, 584L, 263L, 605L, 132L, 584L, 950L, 650L, 451L,
737L, 453L, 348L, 675L, 949L, 349L, 209L, 584L, 801L, 593L, 711L,
666L, 466L, 605L, 527L, 666L, 584L, 717L, 114L, 660L, 118L, 466L,
811L, 595L, 438L, 28L, 593L, 811L, 118L, 711L, 605L, 593L, 466L,
650L, 801L, 438L, 348L, 349L, 118L, 584L, 114L, 584L, 801L, 209L,
157L, 466L, 801L, 182L, 812L, 132L, 523L, 666L, 605L, 527L, 950L,
950L, 812L, 421L, 584L, 801L, 132L, 182L, 737L, 887L, 883L, 605L,
737L, 711L, 28L, 675L, 220L, 157L, 118L, 887L, 675L, 132L, 736L,
811L, 887L, 438L, 182L, 717L, 737L, 950L), V7 = c(994L, 202L,
311L, 725L, 437L, 725L, 776L, 295L, 792L, 57L, 57L, 295L, 842L,
15L, 776L, 331L, 822L, 795L, 78L, 988L, 498L, 822L, 988L, 782L,
776L, 728L, 631L, 725L, 735L, 573L, 105L, 295L, 23L, 78L, 202L,
117L, 190L, 705L, 105L, 57L, 792L, 251L, 251L, 968L, 192L, 23L,
231L, 822L, 295L, 231L, 631L, 842L, 57L, 235L, 815L, 331L, 117L,
705L, 331L, 994L, 795L, 237L, 815L, 815L, 23L, 822L, 235L, 631L,
78L, 97L, 57L, 192L, 677L, 184L, 57L, 231L, 231L, 753L, 733L,
237L, 743L, 677L, 631L, 988L, 815L, 311L, 815L, 311L, 771L, 728L,
23L, 988L, 728L, 705L, 97L, 988L, 994L, 57L, 728L, 192L), V8 = c(754L,
875L, 332L, 935L, 86L, 339L, 86L, 644L, 339L, 501L, 803L, 229L,
644L, 426L, 550L, 129L, 330L, 129L, 229L, 86L, 773L, 803L, 129L,
901L, 452L, 8L, 229L, 98L, 129L, 366L, 187L, 8L, 773L, 187L,
229L, 8L, 98L, 935L, 98L, 345L, 754L, 533L, 332L, 550L, 240L,
875L, 773L, 229L, 426L, 754L, 120L, 803L, 129L, 901L, 901L, 644L,
345L, 707L, 707L, 773L, 533L, 120L, 332L, 330L, 803L, 86L, 803L,
8L, 226L, 345L, 871L, 240L, 550L, 963L, 330L, 345L, 226L, 533L,
366L, 452L, 803L, 405L, 803L, 405L, 550L, 577L, 8L, 339L, 901L,
577L, 330L, 229L, 330L, 656L, 452L, 330L, 519L, 226L, 366L, 435L
), V9 = c(643L, 953L, 642L, 21L, 592L, 16L, 127L, 539L, 409L,
516L, 419L, 277L, 986L, 590L, 45L, 980L, 998L, 516L, 541L, 980L,
454L, 81L, 149L, 986L, 227L, 45L, 420L, 363L, 986L, 90L, 409L,
986L, 953L, 45L, 982L, 588L, 68L, 127L, 127L, 16L, 418L, 21L,
953L, 442L, 418L, 419L, 565L, 980L, 659L, 16L, 149L, 448L, 789L,
454L, 516L, 2L, 127L, 79L, 277L, 980L, 234L, 357L, 357L, 642L,
980L, 680L, 729L, 81L, 21L, 454L, 986L, 357L, 980L, 973L, 680L,
592L, 788L, 2L, 264L, 79L, 680L, 729L, 52L, 986L, 539L, 79L,
277L, 416L, 786L, 477L, 113L, 454L, 419L, 442L, 953L, 79L, 245L,
788L, 93L, 234L), V10 = c(31L, 468L, 468L, 387L, 164L, 796L,
701L, 785L, 915L, 614L, 741L, 770L, 770L, 583L, 373L, 373L, 393L,
221L, 303L, 83L, 74L, 785L, 387L, 741L, 741L, 393L, 468L, 701L,
382L, 393L, 387L, 899L, 429L, 947L, 781L, 781L, 645L, 645L, 710L,
915L, 74L, 796L, 259L, 749L, 373L, 393L, 246L, 632L, 785L, 259L,
614L, 785L, 428L, 741L, 632L, 382L, 770L, 710L, 781L, 749L, 868L,
915L, 434L, 221L, 429L, 303L, 393L, 468L, 632L, 976L, 781L, 373L,
947L, 428L, 781L, 781L, 645L, 868L, 645L, 710L, 283L, 31L, 868L,
583L, 915L, 246L, 373L, 373L, 781L, 164L, 428L, 710L, 373L, 303L,
632L, 868L, 614L, 947L, 74L, 382L), V11 = c(351L, 154L, 423L,
496L, 818L, 913L, 665L, 913L, 380L, 720L, 542L, 380L, 634L, 551L,
258L, 818L, 634L, 474L, 222L, 639L, 974L, 755L, 262L, 665L, 522L,
217L, 927L, 351L, 755L, 914L, 380L, 65L, 844L, 633L, 613L, 222L,
649L, 892L, 752L, 423L, 755L, 169L, 904L, 309L, 639L, 276L, 217L,
394L, 291L, 522L, 203L, 720L, 35L, 422L, 724L, 423L, 720L, 914L,
180L, 327L, 92L, 422L, 258L, 467L, 724L, 620L, 665L, 367L, 639L,
443L, 892L, 724L, 141L, 422L, 327L, 396L, 92L, 309L, 844L, 258L,
914L, 634L, 497L, 222L, 141L, 880L, 467L, 443L, 496L, 913L, 394L,
217L, 35L, 396L, 35L, 880L, 351L, 755L, 474L, 215L), V12 = c(102L,
546L, 682L, 464L, 162L, 876L, 162L, 302L, 682L, 162L, 302L, 53L,
967L, 679L, 837L, 824L, 44L, 53L, 294L, 738L, 254L, 557L, 546L,
7L, 902L, 244L, 128L, 499L, 621L, 499L, 458L, 526L, 837L, 465L,
290L, 969L, 265L, 507L, 835L, 837L, 546L, 136L, 897L, 213L, 195L,
244L, 465L, 835L, 464L, 621L, 162L, 511L, 969L, 230L, 580L, 335L,
610L, 969L, 546L, 897L, 835L, 447L, 526L, 302L, 464L, 302L, 682L,
628L, 610L, 272L, 53L, 254L, 969L, 962L, 511L, 621L, 290L, 458L,
559L, 860L, 136L, 507L, 462L, 136L, 462L, 731L, 873L, 462L, 335L,
897L, 580L, 447L, 628L, 731L, 7L, 335L, 102L, 128L, 679L, 742L
), V13 = c(108L, 637L, 757L, 734L, 534L, 42L, 808L, 322L, 757L,
204L, 808L, 324L, 288L, 82L, 285L, 961L, 955L, 652L, 808L, 961L,
503L, 549L, 697L, 87L, 734L, 43L, 204L, 455L, 398L, 961L, 183L,
433L, 431L, 854L, 490L, 69L, 407L, 808L, 398L, 69L, 87L, 338L,
446L, 178L, 6L, 198L, 82L, 543L, 370L, 534L, 87L, 267L, 455L,
360L, 534L, 407L, 431L, 446L, 854L, 857L, 46L, 637L, 848L, 923L,
560L, 531L, 919L, 223L, 307L, 561L, 6L, 719L, 560L, 43L, 734L,
288L, 324L, 87L, 808L, 322L, 757L, 446L, 425L, 324L, 757L, 857L,
87L, 848L, 223L, 503L, 307L, 152L, 503L, 757L, 956L, 152L, 43L,
69L, 719L, 637L), V14 = c(746L, 805L, 191L, 47L, 508L, 508L,
715L, 461L, 928L, 750L, 140L, 746L, 364L, 552L, 287L, 984L, 481L,
715L, 762L, 959L, 750L, 344L, 959L, 959L, 306L, 911L, 103L, 638L,
759L, 761L, 750L, 444L, 692L, 692L, 761L, 481L, 552L, 942L, 810L,
938L, 306L, 762L, 344L, 942L, 344L, 364L, 552L, 891L, 11L, 103L,
762L, 287L, 891L, 358L, 730L, 959L, 750L, 191L, 718L, 959L, 358L,
306L, 287L, 692L, 746L, 461L, 750L, 170L, 358L, 911L, 805L, 938L,
481L, 759L, 750L, 140L, 715L, 959L, 928L, 692L, 461L, 750L, 306L,
762L, 691L, 306L, 287L, 481L, 170L, 746L, 810L, 762L, 358L, 292L,
750L, 191L, 47L, 942L, 344L, 191L), V15 = c(987L, 972L, 151L,
397L, 250L, 825L, 681L, 825L, 723L, 49L, 585L, 109L, 833L, 137L,
49L, 690L, 681L, 253L, 385L, 921L, 708L, 151L, 109L, 385L, 54L,
247L, 979L, 121L, 225L, 124L, 825L, 417L, 320L, 979L, 681L, 918L,
145L, 397L, 681L, 145L, 586L, 709L, 284L, 840L, 121L, 368L, 250L,
898L, 840L, 109L, 417L, 513L, 544L, 194L, 417L, 544L, 320L, 987L,
840L, 987L, 888L, 489L, 855L, 906L, 62L, 579L, 379L, 783L, 368L,
379L, 49L, 732L, 279L, 509L, 54L, 145L, 797L, 979L, 709L, 840L,
368L, 830L, 502L, 123L, 681L, 194L, 855L, 703L, 247L, 833L, 609L,
830L, 708L, 609L, 509L, 397L, 987L, 609L, 320L, 124L), V16 = c(346L,
48L, 865L, 865L, 173L, 890L, 482L, 13L, 537L, 171L, 482L, 940L,
843L, 173L, 975L, 866L, 142L, 646L, 482L, 700L, 395L, 298L, 975L,
890L, 361L, 173L, 890L, 975L, 940L, 271L, 395L, 989L, 395L, 142L,
865L, 361L, 399L, 441L, 441L, 772L, 142L, 520L, 142L, 520L, 975L,
930L, 890L, 989L, 530L, 866L, 941L, 530L, 596L, 890L, 36L, 441L,
346L, 865L, 173L, 646L, 270L, 441L, 866L, 866L, 346L, 441L, 482L,
872L, 36L, 890L, 271L, 13L, 36L, 836L, 767L, 395L, 890L, 537L,
395L, 530L, 346L, 346L, 940L, 173L, 865L, 772L, 520L, 171L, 48L,
866L, 135L, 298L, 135L, 77L, 361L, 872L, 395L, 596L, 772L, 532L
), V17 = c(912L, 146L, 312L, 22L, 618L, 317L, 618L, 199L, 369L,
101L, 515L, 4L, 476L, 699L, 517L, 317L, 159L, 517L, 553L, 616L,
995L, 314L, 317L, 314L, 562L, 101L, 249L, 369L, 615L, 562L, 476L,
702L, 312L, 312L, 515L, 101L, 159L, 572L, 101L, 618L, 895L, 317L,
616L, 618L, 572L, 562L, 4L, 517L, 312L, 312L, 249L, 699L, 312L,
158L, 469L, 20L, 524L, 476L, 572L, 249L, 50L, 19L, 249L, 912L,
469L, 476L, 101L, 146L, 616L, 618L, 476L, 20L, 146L, 249L, 50L,
101L, 158L, 517L, 238L, 515L, 895L, 553L, 702L, 146L, 312L, 517L,
158L, 895L, 517L, 101L, 314L, 238L, 22L, 146L, 317L, 895L, 469L,
912L, 369L, 572L), V18 = c(525L, 635L, 488L, 456L, 878L, 119L,
119L, 849L, 768L, 817L, 931L, 275L, 460L, 900L, 494L, 669L, 846L,
488L, 768L, 494L, 570L, 439L, 878L, 275L, 471L, 896L, 768L, 619L,
727L, 977L, 155L, 155L, 896L, 112L, 817L, 768L, 411L, 304L, 964L,
612L, 905L, 768L, 456L, 255L, 119L, 404L, 304L, 576L, 219L, 756L,
612L, 668L, 255L, 768L, 196L, 668L, 155L, 931L, 896L, 878L, 488L,
576L, 640L, 37L, 846L, 494L, 257L, 37L, 411L, 411L, 625L, 820L,
304L, 112L, 619L, 9L, 669L, 494L, 471L, 323L, 318L, 570L, 817L,
578L, 878L, 696L, 977L, 768L, 896L, 525L, 669L, 841L, 471L, 727L,
619L, 304L, 874L, 931L, 37L, 619L), V19 = c(926L, 281L, 957L,
308L, 315L, 814L, 622L, 153L, 858L, 315L, 867L, 176L, 555L, 210L,
867L, 540L, 555L, 867L, 622L, 852L, 540L, 436L, 269L, 505L, 436L,
505L, 654L, 505L, 91L, 125L, 131L, 706L, 243L, 125L, 922L, 281L,
91L, 359L, 33L, 957L, 232L, 698L, 555L, 540L, 667L, 34L, 545L,
698L, 555L, 308L, 926L, 445L, 316L, 748L, 243L, 14L, 521L, 232L,
654L, 243L, 232L, 359L, 156L, 131L, 555L, 359L, 521L, 852L, 706L,
957L, 308L, 125L, 91L, 852L, 315L, 604L, 604L, 760L, 604L, 936L,
521L, 747L, 922L, 555L, 243L, 521L, 316L, 867L, 84L, 176L, 814L,
232L, 315L, 316L, 555L, 505L, 745L, 505L, 232L, 540L), V20 = c(554L,
882L, 823L, 386L, 966L, 694L, 286L, 354L, 214L, 25L, 25L, 110L,
353L, 475L, 479L, 252L, 582L, 999L, 266L, 211L, 18L, 278L, 828L,
412L, 528L, 386L, 296L, 353L, 412L, 80L, 206L, 714L, 18L, 211L,
475L, 554L, 38L, 882L, 25L, 362L, 510L, 110L, 206L, 823L, 362L,
694L, 256L, 479L, 582L, 25L, 828L, 193L, 951L, 80L, 793L, 999L,
882L, 903L, 38L, 386L, 354L, 214L, 916L, 25L, 110L, 864L, 882L,
25L, 353L, 780L, 296L, 864L, 510L, 38L, 386L, 400L, 694L, 793L,
999L, 122L, 278L, 475L, 916L, 903L, 958L, 161L, 828L, 73L, 790L,
73L, 430L, 18L, 958L, 828L, 582L, 383L, 51L, 278L, 18L, 122L)), class = "data.frame", row.names = c(NA,
-100L))
Now what I wish to do is reduce the amount, let's say from 100 to 50 entries, where each entry is couple of indices 1 from each group. I tried to calculate the distance matrix using several methods and chose the most distant entries, but when I examined it was not so informative.
Is there a way to do it, maybe to consider the list of lists or other sophisticated methods?
Would appreciate some help/insights
Edit - Clarifing the objective
Lets say I sampled 100 groups where each group contains 1 element from each list of the nested lists.
Some of the groups are close to others, let's say only 1 element is different between the 2 groups, so I will probably will want to discard it. Or even only 2 elements are different etc. But I wish to keep eventually the K groups which as "distant" as possible.
Also nice if it is possible to consider is the amount of elements in a specific nested list, some sort of weighting procedure.
Edit No.2
for the following list(c(1L, 5L, 6L), c(3L, 4L, 2L, 9L), c(8L, 7L, 10L)) we get the following data-frame:
structure(list(V1 = c(1L, 5L, 6L, 1L, 6L, 1L, 1L, 6L, 1L, 5L,
5L, 5L, 1L, 1L, 5L, 6L, 5L, 6L, 6L, 5L, 5L, 5L, 6L, 5L, 6L, 1L,
6L, 1L, 1L, 1L, 5L, 5L, 6L, 6L, 5L, 1L, 6L, 6L, 5L, 6L, 1L, 1L,
5L, 5L, 5L, 1L, 6L, 5L, 1L, 5L, 5L, 5L, 5L, 1L, 5L, 5L, 1L, 6L,
5L, 6L, 5L, 6L, 5L, 1L, 5L, 1L, 5L, 6L, 5L, 1L, 6L, 1L, 6L, 1L,
1L, 5L, 5L, 6L, 1L, 5L, 1L, 5L, 5L, 6L, 6L, 1L, 1L, 6L, 6L, 6L,
5L, 5L, 1L, 6L, 1L, 1L, 6L, 5L, 5L, 1L), V2 = c(9L, 3L, 9L, 4L,
2L, 4L, 3L, 3L, 3L, 2L, 2L, 9L, 3L, 3L, 2L, 2L, 9L, 9L, 9L, 3L,
4L, 3L, 2L, 3L, 4L, 2L, 2L, 3L, 4L, 9L, 9L, 2L, 3L, 2L, 9L, 9L,
3L, 2L, 4L, 4L, 3L, 4L, 3L, 2L, 2L, 9L, 9L, 2L, 4L, 4L, 4L, 9L,
2L, 3L, 9L, 3L, 3L, 2L, 2L, 2L, 4L, 2L, 4L, 3L, 3L, 3L, 2L, 9L,
9L, 9L, 2L, 9L, 3L, 3L, 9L, 4L, 3L, 3L, 4L, 3L, 4L, 4L, 4L, 4L,
2L, 9L, 9L, 4L, 9L, 2L, 2L, 9L, 4L, 4L, 9L, 9L, 2L, 4L, 4L, 3L
), V3 = c(7L, 7L, 7L, 8L, 7L, 7L, 7L, 7L, 10L, 8L, 10L, 8L, 7L,
7L, 10L, 10L, 10L, 8L, 8L, 8L, 8L, 8L, 8L, 7L, 10L, 7L, 10L,
10L, 7L, 8L, 7L, 8L, 7L, 8L, 8L, 8L, 7L, 8L, 8L, 8L, 10L, 7L,
8L, 7L, 7L, 10L, 7L, 7L, 10L, 7L, 10L, 8L, 8L, 7L, 10L, 10L,
10L, 8L, 8L, 10L, 7L, 8L, 8L, 10L, 8L, 10L, 10L, 10L, 8L, 10L,
10L, 10L, 8L, 10L, 8L, 7L, 10L, 7L, 7L, 10L, 8L, 7L, 8L, 10L,
7L, 8L, 10L, 7L, 7L, 7L, 7L, 10L, 7L, 7L, 10L, 10L, 7L, 7L, 8L,
10L)), class = "data.frame", row.names = c(NA, -100L))
running #Allan Cameron code, will produce the following where there are better 5:
V1 V2 V3
26 1 2 7
68 6 9 10
7 1 3 7
17 5 9 10
13 1 3 7
As you have described it, the concept of overall "distance" between two groups is a bit vague. It's clear that a pair like c(1, 5, 2, 6) and c(2, 9, 12, 3) are closer than the pair c(1, 5, 2, 6) and c(101, 78, 96, 54), but should there be a penalty for an exact match? Is variance important? In the absence of a clearer notion of distance, the best measure we have is the mean of each group. This is easy to obtain by rowMeans(df).
There's also some vagueness with regards to the concept of "the K furthest apart groups". Distance between groups is a function of pairs of groups, not individual groups. If K = 1, then presumably any group is fine. If K = 2, then you want the single pair of groups with the largest difference between their means. After that, it's not clear what you are looking for, but one approach would be to find the set of K groups which has the highest variance.
So if we do something like:
k <- 5
group_means <- rowMeans(df)
indices <- seq(nrow(df))
k_furthest <- c(which.min(group_means), which.max(group_means))
k_vals <- c(min(group_means), max(group_means))
group_means <- group_means[-k_furthest]
indices <- indices[-k_furthest]
while(length(k_furthest) < k)
{
best <- which.max(rowSums(sapply(k_vals, function(x) (x - group_means)^2)))
k_vals <- c(k_vals, group_means[best])
k_furthest <- c(k_furthest, indices[best])
group_means <- group_means[-best]
indices <- indices[-best]
}
Then k_furthest will contain the set of 5 rows of the data frame with the highest possible variance between all the means. Your result would be obtained like:
df[k_furthest,]
#> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#> 63 236 794 885 300 71 114 725 492 52 468 92 128 948 191 585 441 414 196 156 18
#> 51 798 536 739 704 1000 883 237 644 299 915 695 860 338 47 972 890 996 939 957 793
#> 61 41 388 624 689 672 466 55 229 454 164 542 265 338 170 32 271 314 640 922 582
#> 33 970 598 775 548 228 132 842 644 986 781 818 679 920 287 825 361 562 756 748 929
#> 12 336 216 774 107 71 801 725 492 642 74 613 297 948 306 124 646 19 439 281 122
Note though that this algorithm effectively just takes the rows with the highest and lowest means alternately on each iteration. Although this produces the largest overall collective "difference" between the samples, you might end up with some samples that are very close together, provided that they are also both very far apart from another sample. This may not be what you are looking for, and it is why it might be a good idea to specify exactly what you mean by "distance" in this context.
EDIT
With further clarification and a new example from the OP, it seems that we are looking to maximize the sum of element-wise difference between groups. This means we can do:
distances <- as.data.frame(t(sapply(1:nrow(df), function(i) {
a <- rowSums(apply(df, 2, function(x) abs(x[i] - x)))
c(row = i, most_distant = which.max(a), difference = max(a))
})))
This will give us a data frame which for each row tells us the most "distant" other group.
head(distances)
#> row most_distant difference
#> 1 1 16 15
#> 2 2 46 13
#> 3 3 9 14
#> 4 4 68 12
#> 5 5 46 15
#> 6 6 68 13
If we sort this according to the biggest difference, and take the first K groups mentioned in the first two columns, we will have our result:
i <- unique(c(t(distances[order(-distances$difference)[seq(k)], 1:2])))[seq(k)]
df[i,]
#> V1 V2 V3
#> 1 1 9 7
#> 16 6 2 10
#> 5 6 2 7
#> 46 1 9 10
#> 26 1 2 7

How to find the value corresponding to maximum value and label it in R

I am dealing with tick data which contains price and volume(buy and sell).
I've tried codes from this post How to label max value points in a faceted plot in R?, yet still cannot solve it. I think what I wanna do is about x and y coordinates. e.g. the maximum volume (4622 ) happened at price 11360 and I would like to label 11360 at the point with maximum volume 4622.
Here is my codes :
ggplot(data=ts629sum) +
geom_point(mapping=aes(x=BS,y=Price)) +
geom_label(filter(BS==max(BS)) +
aes(label(sprintf(%0.2f,y)), hjust=-0.5)
It would be appreciated if someone knows how to solve this problem.
Below is the dataset.
ts629sum <- structure(list(Price = 11315:11528, BS = c(236L, 340L, 266L,
306L, 300L, 546L, 700L, 1106L, 1064L, 1312L, 1358L, 1126L, 876L,
1382L, 1382L, 2290L, 2292L, 2282L, 2454L, 2710L, 3082L, 2252L,
2214L, 2574L, 2498L, 3088L, 2644L, 2664L, 2558L, 2452L, 2508L,
2122L, 2188L, 2152L, 1730L, 2222L, 1210L, 1074L, 1736L, 1750L,
2340L, 2252L, 2004L, 2448L, 2590L, 4622L, 3428L, 3642L, 3628L,
3960L, 4020L, 2690L, 2110L, 1974L, 1018L, 1182L, 796L, 788L,
762L, 780L, 1442L, 1048L, 814L, 862L, 616L, 916L, 808L, 626L,
552L, 506L, 588L, 888L, 1222L, 1942L, 1300L, 1856L, 1284L, 968L,
932L, 1942L, 1320L, 1218L, 1514L, 1746L, 1886L, 3186L, 2540L,
2194L, 2314L, 2166L, 3072L, 2344L, 2238L, 2568L, 2132L, 2806L,
2606L, 2492L, 2610L, 2860L, 3754L, 2940L, 2754L, 3246L, 2912L,
4018L, 3402L, 3534L, 3374L, 3028L, 3760L, 3820L, 3822L, 3890L,
3296L, 4596L, 2780L, 2546L, 2958L, 2706L, 2990L, 2558L, 2518L,
2462L, 2110L, 2818L, 2276L, 2184L, 1828L, 1436L, 1878L, 1468L,
1464L, 1590L, 1580L, 2524L, 1586L, 1480L, 1702L, 1568L, 2490L,
2074L, 1872L, 1872L, 1274L, 2000L, 1252L, 1194L, 1422L, 1422L,
1630L, 1668L, 1798L, 2264L, 1806L, 2244L, 1480L, 2028L, 1616L,
2074L, 2066L, 1798L, 1514L, 1440L, 1116L, 1308L, 780L, 816L,
904L, 1162L, 1434L, 1042L, 1074L, 666L, 400L, 356L, 164L, 130L,
110L, 48L, 48L, 54L, 36L, 34L, 28L, 106L, 32L, 56L, 64L, 54L,
38L, 24L, 18L, 42L, 34L, 86L, 42L, 76L, 196L, 316L, 316L, 422L,
418L, 358L, 300L, 348L, 378L, 238L, 214L, 178L, 248L, 168L, 76L,
18L)), class = "data.frame", row.names = c(NA, -214L))
You can subset the data in geom_label and keep only the row with max BS.
library(ggplot2)
ggplot(data=ts629sum, aes(x=BS,y=Price, label = Price)) +
geom_point() +
geom_label(data = ts629sum[which.max(ts629sum$BS), ], vjust = 1.5)

R - Conditionally replace multiple rows in a dataframe

Hi programming fellows,
Please consider the following data frame:
df <- structure(list(date = structure(c(1251350100.288, 1251351900,
1251353699.712, 1251355500.288, 1251357300, 1251359099.712), class = c("POSIXct",
"POSIXt")), mix.ratio.csi = c(442.78316237477, 436.757082063885,
425.742872761246, 395.770804307671, 386.758335309866, 392.115887652156
), mix.ratio.licor = c(447.141491945547, 441.319548211994, 430.854166343173,
402.232640566763, 393.683007533694, 398.388336602215), ToKeep = c(FALSE,
FALSE, TRUE, TRUE, TRUE, TRUE)), .Names = c("date", "value1",
"value2", "ToKeep"), index = structure(integer(0), ToKeep = c(1L,
2L, 8L, 52L, 53L, 54L, 55L, 85L, 86L, 87L, 88L, 89L, 92L, 93L,
94L, 95L, 96L, 97L, 98L, 99L, 100L, 102L, 103L, 105L, 106L, 192L,
193L, 220L, 223L, 225L, 228L, 229L, 260L, 263L, 264L, 265L, 266L,
267L, 305L, 306L, 307L, 308L, 309L, 310L, 311L, 312L, 313L, 314L,
315L, 352L, 353L, 354L, 375L, 376L, 378L, 379L, 380L, 383L, 411L,
412L, 413L, 414L, 415L, 416L, 418L, 419L, 445L, 453L, 463L, 464L,
465L, 466L, 467L, 468L, 497L, 504L, 547L, 548L, 549L, 586L, 589L,
630L, 631L, 632L, 633L, 634L, 635L, 636L, 644L, 645L, 646L, 647L,
648L, 649L, 650L, 651L, 674L, 675L, 676L, 677L, 678L, 682L, 687L,
690L, 691L, 724L, 725L, 726L, 727L, 728L, 729L, 730L, 731L, 732L,
733L, 734L, 735L, 736L, 739L, 740L, 741L, 742L, 768L, 771L, 772L,
773L, 774L, 775L, 776L, 777L, 778L, 779L, 3L, 4L, 5L, 6L, 7L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L,
48L, 49L, 50L, 51L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L,
65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L,
78L, 79L, 80L, 81L, 82L, 83L, 84L, 90L, 91L, 101L, 104L, 107L,
108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L,
119L, 120L, 121L, 122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L,
130L, 131L, 132L, 133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L,
141L, 142L, 143L, 144L, 145L, 146L, 147L, 148L, 149L, 150L, 151L,
152L, 153L, 154L, 155L, 156L, 157L, 158L, 159L, 160L, 161L, 162L,
163L, 164L, 165L, 166L, 167L, 168L, 169L, 170L, 171L, 172L, 173L,
174L, 175L, 176L, 177L, 178L, 179L, 180L, 181L, 182L, 183L, 184L,
185L, 186L, 187L, 188L, 189L, 190L, 191L, 194L, 195L, 196L, 197L,
198L, 199L, 200L, 201L, 202L, 203L, 204L, 205L, 206L, 207L, 208L,
209L, 210L, 211L, 212L, 213L, 214L, 215L, 216L, 217L, 218L, 219L,
221L, 222L, 224L, 226L, 227L, 230L, 231L, 232L, 233L, 234L, 235L,
236L, 237L, 238L, 239L, 240L, 241L, 242L, 243L, 244L, 245L, 246L,
247L, 248L, 249L, 250L, 251L, 252L, 253L, 254L, 255L, 256L, 257L,
258L, 259L, 261L, 262L, 268L, 269L, 270L, 271L, 272L, 273L, 274L,
275L, 276L, 277L, 278L, 279L, 280L, 281L, 282L, 283L, 284L, 285L,
286L, 287L, 288L, 289L, 290L, 291L, 292L, 293L, 294L, 295L, 296L,
297L, 298L, 299L, 300L, 301L, 302L, 303L, 304L, 316L, 317L, 318L,
319L, 320L, 321L, 322L, 323L, 324L, 325L, 326L, 327L, 328L, 329L,
330L, 331L, 332L, 333L, 334L, 335L, 336L, 337L, 338L, 339L, 340L,
341L, 342L, 343L, 344L, 345L, 346L, 347L, 348L, 349L, 350L, 351L,
355L, 356L, 357L, 358L, 359L, 360L, 361L, 362L, 363L, 364L, 365L,
366L, 367L, 368L, 369L, 370L, 371L, 372L, 373L, 374L, 377L, 381L,
382L, 384L, 385L, 386L, 387L, 388L, 389L, 390L, 391L, 392L, 393L,
394L, 395L, 396L, 397L, 398L, 399L, 400L, 401L, 402L, 403L, 404L,
405L, 406L, 407L, 408L, 409L, 410L, 417L, 420L, 421L, 422L, 423L,
424L, 425L, 426L, 427L, 428L, 429L, 430L, 431L, 432L, 433L, 434L,
435L, 436L, 437L, 438L, 439L, 440L, 441L, 442L, 443L, 444L, 446L,
447L, 448L, 449L, 450L, 451L, 452L, 454L, 455L, 456L, 457L, 458L,
459L, 460L, 461L, 462L, 469L, 470L, 471L, 472L, 473L, 474L, 475L,
476L, 477L, 478L, 479L, 480L, 481L, 482L, 483L, 484L, 485L, 486L,
487L, 488L, 489L, 490L, 491L, 492L, 493L, 494L, 495L, 496L, 498L,
499L, 500L, 501L, 502L, 503L, 505L, 506L, 507L, 508L, 509L, 510L,
511L, 512L, 513L, 514L, 515L, 516L, 517L, 518L, 519L, 520L, 521L,
522L, 523L, 524L, 525L, 526L, 527L, 528L, 529L, 530L, 531L, 532L,
533L, 534L, 535L, 536L, 537L, 538L, 539L, 540L, 541L, 542L, 543L,
544L, 545L, 546L, 550L, 551L, 552L, 553L, 554L, 555L, 556L, 557L,
558L, 559L, 560L, 561L, 562L, 563L, 564L, 565L, 566L, 567L, 568L,
569L, 570L, 571L, 572L, 573L, 574L, 575L, 576L, 577L, 578L, 579L,
580L, 581L, 582L, 583L, 584L, 585L, 587L, 588L, 590L, 591L, 592L,
593L, 594L, 595L, 596L, 597L, 598L, 599L, 600L, 601L, 602L, 603L,
604L, 605L, 606L, 607L, 608L, 609L, 610L, 611L, 612L, 613L, 614L,
615L, 616L, 617L, 618L, 619L, 620L, 621L, 622L, 623L, 624L, 625L,
626L, 627L, 628L, 629L, 637L, 638L, 639L, 640L, 641L, 642L, 643L,
652L, 653L, 654L, 655L, 656L, 657L, 658L, 659L, 660L, 661L, 662L,
663L, 664L, 665L, 666L, 667L, 668L, 669L, 670L, 671L, 672L, 673L,
679L, 680L, 681L, 683L, 684L, 685L, 686L, 688L, 689L, 692L, 693L,
694L, 695L, 696L, 697L, 698L, 699L, 700L, 701L, 702L, 703L, 704L,
705L, 706L, 707L, 708L, 709L, 710L, 711L, 712L, 713L, 714L, 715L,
716L, 717L, 718L, 719L, 720L, 721L, 722L, 723L, 737L, 738L, 743L,
744L, 745L, 746L, 747L, 748L, 749L, 750L, 751L, 752L, 753L, 754L,
755L, 756L, 757L, 758L, 759L, 760L, 761L, 762L, 763L, 764L, 765L,
766L, 767L, 769L, 770L, 780L, 781L, 782L, 783L, 784L, 785L, 786L,
787L, 788L, 789L)), row.names = c(NA, 6L), class = "data.frame")
I need to create a new data.frame with the following structure:
1) if column 'ToKeep' is TRUE, then columns 'date', 'value1' and 'value2' remain the same;
2) if column 'ToKeep' is FALSE, then columns 'value1' e 'value2' receive NA (and 'date' remains the same).
I have been trying to use ifelse so far, but still haven't found the right indexing procedure:
df[, c(2,3)] <- lapply(df[, 4], function(x) ifelse(x == FALSE, NA, x))
Any suggestion?
Thanks in advance,
Thiago.
You can use the logical column to subset the rows, choose the columns you want, then assign the NA values with [<-
df2 <- df ## so that we don't over-write the original data set
df2[!df2$ToKeep, c("value1", "value2")] <- NA
which results in
df2
# date value1 value2 ToKeep
# 1 2009-08-26 22:15:00 NA NA FALSE
# 2 2009-08-26 22:45:00 NA NA FALSE
# 3 2009-08-26 23:14:59 425.7429 430.8542 TRUE
# 4 2009-08-26 23:45:00 395.7708 402.2326 TRUE
# 5 2009-08-27 00:15:00 386.7583 393.6830 TRUE
# 6 2009-08-27 00:44:59 392.1159 398.3883 TRUE
You could replace the lapply command with
df[,2:3] <- lapply(df[,2:3], function(x)
ifelse(df[,'ToKeep'], x, NA))
df
# date value1 value2 ToKeep
#1 2009-08-27 01:15:00 NA NA FALSE
#2 2009-08-27 01:45:00 NA NA FALSE
#3 2009-08-27 02:14:59 425.7429 430.8542 TRUE
#4 2009-08-27 02:45:00 395.7708 402.2326 TRUE
#5 2009-08-27 03:15:00 386.7583 393.6830 TRUE
#6 2009-08-27 03:44:59 392.1159 398.3883 TRUE
Or instead of ifelse, you can use replace
df[,2:3] <- lapply(df[,2:3], function(x)
replace(x, !df[,'ToKeep'], NA ))

Resources