Prediction interval around a gam - r

I have a large data set that consists of thousands of measurements of length and weight. I have provided a subset of 500 observations here:
df <- structure(list(length_cm = c(24.7, 23.8, 21.9, 23.2, 23.5, 22.2,
20.5, 22.6, 24, 21.6, 22.4, 21.2, 20.6, 23.1, 21.4, 23.1, 23.5,
23, 21.8, 22.4, 23, 23.8, 24, 21, 23.4, 23.2, 21.6, 25.9, 22.1,
30.6, 22.1, 21.7, 23.2, 21.1, 23.8, 23.2, 27.2, 23.8, 21.6, 21.1,
21.7, 22.9, 23.3, 24.1, 22.7, 20.4, 22.5, 21.7, 23.2, 22.7, 20.6,
23.7, 24.6, 23.5, 26.3, 23.6, 22.2, 23.6, 21.4, 23.3, 24.7, 24.4,
21.8, 24.9, 22.2, 23.1, 25, 23.5, 22.5, 20.4, 23.9, 23.7, 24,
24.2, 22.9, 36.4, 30, 26, 28.5, 27, 35.7, 24.3, 28.6, 29.8, 18.7,
25.7, 34.7, 31.4, 23.4, 37.7, 26.7, 28.3, 30.8, 29.2, 27.2, 25.6,
39, 35.1, 41.2, 35.7, 29.9, 25.7, 24.6, 24, 24.9, 31, 29.9, 29.4,
25.4, 20.2, 27.8, 32.7, 23.4, 29.1, 26.3, 25.7, 26, 24.9, 26.3,
31.5, 30.1, 25.9, 28.8, 37.9, 38.4, 21.5, 20.5, 21.3, 21.3, 20.9,
20.8, 22.5, 22.4, 21.4, 16.8, 17.3, 22.7, 19.7, 21.2, 18.1, 23.5,
18.1, 22, 18.5, 18.4, 19.2, 19.4, 19.9, 20.5, 18.6, 22.6, 20.9,
20.7, 20.6, 20.6, 21.6, 23.7, 22.8, 22.9, 20.8, 21.3, 23.5, 21.1,
21.6, 24, 21, 23.3, 20.3, 22.4, 23.7, 24.6, 20.7, 23.1, 22.6,
22.7, 19.5, 23, 19.8, 21, 19.8, 19.8, 17.2, 21.8, 25.3, 21.3,
19.2, 22.1, 24.5, 23.2, 22.6, 19, 22, 17.5, 19.9, 24.4, 23.7,
19.9, 23, 20.5, 18.3, 23.2, 21.1, 20.4, 22.2, 19.7, 19.2, 24,
23.3, 23.3, 19, 21.5, 22, 19.1, 23.7, 19.9, 21.2, 23, 27.3, 20.7,
22, 19.3, 24.9, 18.2, 20, 19.3, 25, 18, 21.8, 23.4, 23.9, 25.2,
18.5, 22.2, 24.6, 22, 20.4, 20.7, 21.7, 19.1, 23.1, 21.5, 21.2,
20.6, 22.3, 22.8, 21.3, 21.6, 22, 23, 24.2, 21.3, 19.7, 18.8,
20.9, 20.3, 22.3, 18.9, 19.9, 20.2, 23.9, 19.7, 19.5, 17.6, 23.1,
20.4, 20, 19.7, 20.3, 21.2, 23.9, 24, 25.6, 23.9, 23.5, 20.5,
30.8, 32.8, 28.4, 28.7, 28, 28.9, 29.8, 31, 31.7, 28.6, 28.7,
28.7, 26.7, 24.6, 30, 36.5, 26.5, 32, 29.6, 30.7, 27.7, 24.1,
29.8, 28.8, 26, 22.4, 24, 24.8, 22.7, 22.7, 23.8, 25.3, 32.3,
26.8, 22.1, 24.2, 23.8, 25.3, 24.1, 22.6, 22.9, 24.4, 26.7, 24.4,
24.7, 25, 23.7, 24.3, 22.3, 22.7, 20, 22.5, 24.5, 25.1, 24, 22,
20, 21.9, 18.3, 19.9, 19.4, 23.5, 20.2, 20, 17.8, 20.5, 23.2,
18.5, 21.2, 18.2, 19.1, 22.1, 18.3, 21.6, 19.5, 22.7, 23.6, 24.6,
23.2, 24.4, 19.1, 22.8, 23, 18.8, 22.6, 19, 21.7, 20.8, 23.7,
20.8, 20, 23.2, 22, 21.4, 20.6, 22.6, 23.8, 21, 26.4, 24.5, 32.6,
36.1, 36, 31, 33.1, 31.3, 34.2, 41.9, 35.4, 33.9, 31.9, 29.3,
34.2, 29.9, 36.4, 38.5, 30.7, 40.2, 34.1, 29.7, 37.8, 37.8, 35.3,
39, 39.5, 34.1, 30.5, 33.3, 33.2, 36, 31.6, 35, 34.2, 33.1, 31.5,
33.5, 33.7, 39, 33.2, 35, 34.1, 32.6, 36.2, 34.4, 31.7, 32, 37.5,
31.5, 32.7, 31.7, 35.7, 32.4, 28.5, 33.7, 33.9, 33.6, 34, 32,
29.8, 35, 36, 31.7, 32.5, 32, 31, 29.5, 33.4, 32.5, 26.5, 28,
35.3, 26, 26.5, 38.9, 32.7, 36.4, 35.7, 27.7, 25.8, 25.3, 30.1,
36, 33.4, 37, 33.6, 31.7, 29.7, 35.9, 28.5, 33.1, 33.9, 29, 36.5,
35.5, 29.2, 37.3, 40.3, 35.7, 32.6, 38.8, 40, 38.9, 39, 33.3,
33.5, 34.3, 38.8, 34.4, 36, 35.9, 35.1, 30.7, 38.1, 31.3, 35,
36.3, 32.4, 32.3, 35.5, 36.4, 36, 40.8, 34.2, 30.1, 35.6), wt_kg = c(0.165,
0.1412, 0.1043, 0.1225, 0.1247, 0.1099, 0.087, 0.1176, 0.1431,
0.1041, 0.1213, 0.0937, 0.0856, 0.1255, 0.1099, 0.124, 0.1361,
0.1384, 0.1021, 0.1113, 0.12, 0.1513, 0.1448, 0.0978, 0.138,
0.1232, 0.0942, 0.1881, 0.1038, 0.3498, 0.1122, 0.094, 0.1268,
0.1009, 0.1358, 0.12, 0.2388, 0.1456, 0.0982, 0.0903, 0.1005,
0.1252, 0.1138, 0.1476, 0.1326, 0.0849, 0.108, 0.0996, 0.1229,
0.1279, 0.0874, 0.1492, 0.1416, 0.1187, 0.193, 0.1383, 0.1125,
0.1449, 0.0941, 0.1265, 0.1823, 0.1455, 0.0948, 0.1603, 0.1119,
0.1124, 0.1641, 0.1259, 0.116, 0.086, 0.1361, 0.1284, 0.1403,
0.1461, 0.1195, 0.5985, 0.3099, 0.1829, 0.2688, 0.2244, 0.6214,
0.1554, 0.2475, 0.2976, 0.0683, 0.1731, 0.4751, 0.356, 0.1388,
0.5939, 0.2122, 0.2784, 0.3689, 0.3127, 0.2284, 0.1775, 0.6697,
0.5998, 0.8374, 0.5647, 0.3187, 0.1704, 0.1619, 0.1413, 0.1621,
0.3577, 0.319, 0.2846, 0.1815, 0.0776, 0.2567, 0.4483, 0.1337,
0.2798, 0.202, 0.1847, 0.1758, 0.1659, 0.1828, 0.3669, 0.3211,
0.1863, 0.2559, 0.6901, 0.6483, 0.0922, 0.088, 0.099, 0.0836,
0.094, 0.099, 0.1157, 0.1138, 0.1046, 0.0495, 0.0513, 0.119,
0.0761, 0.0936, 0.0564, 0.1438, 0.0636, 0.1134, 0.0641, 0.0594,
0.0713, 0.0733, 0.0804, 0.0853, 0.0689, 0.118, 0.0892, 0.0875,
0.0837, 0.0807, 0.1065, 0.1385, 0.1163, 0.1305, 0.0923, 0.0974,
0.1176, 0.0848, 0.1059, 0.157, 0.0932, 0.1127, 0.0779, 0.1048,
0.1327, 0.1688, 0.1096, 0.1304, 0.1173, 0.115, 0.0742, 0.129,
0.0629, 0.0992, 0.0758, 0.0722, 0.0535, 0.0958, 0.1721, 0.1017,
0.0766, 0.1099, 0.152, 0.128, 0.1185, 0.065, 0.1176, 0.0565,
0.0866, 0.163, 0.12, 0.0825, 0.1149, 0.0839, 0.0587, 0.1335,
0.0968, 0.0901, 0.1073, 0.0802, 0.0744, 0.1493, 0.1384, 0.1128,
0.0738, 0.1146, 0.1108, 0.08, 0.1285, 0.0829, 0.1116, 0.1368,
0.2348, 0.0995, 0.0989, 0.0748, 0.1484, 0.0629, 0.0823, 0.075,
0.1768, 0.0607, 0.1142, 0.1289, 0.1506, 0.1742, 0.0626, 0.1187,
0.1509, 0.1144, 0.0928, 0.0946, 0.099, 0.0717, 0.1318, 0.1025,
0.093, 0.0972, 0.1325, 0.1209, 0.0943, 0.1006, 0.1073, 0.1336,
0.1439, 0.1066, 0.0765, 0.0673, 0.1082, 0.0923, 0.1139, 0.068,
0.0758, 0.0868, 0.1499, 0.0779, 0.0794, 0.0575, 0.1392, 0.0915,
0.0845, 0.086, 0.084, 0.1049, 0.1486, 0.1573, 0.177, 0.1319,
0.13, 0.0872, 0.388, 0.4751, 0.2898, 0.2931, 0.2663, 0.2838,
0.3494, 0.3675, 0.4342, 0.2907, 0.3072, 0.2815, 0.2761, 0.1945,
0.3512, 0.615, 0.2195, 0.4818, 0.3684, 0.4056, 0.2841, 0.1617,
0.3425, 0.288, 0.1962, 0.1285, 0.1553, 0.1708, 0.1332, 0.1167,
0.1491, 0.2028, 0.1267, 0.2406, 0.1257, 0.1499, 0.1559, 0.1895,
0.1508, 0.1111, 0.1274, 0.1675, 0.2324, 0.1732, 0.1491, 0.1568,
0.1465, 0.1548, 0.1245, 0.1399, 0.0855, 0.1151, 0.1612, 0.1693,
0.1493, 0.1208, 0.088, 0.1106, 0.0654, 0.0827, 0.0794, 0.1331,
0.0834, 0.0837, 0.0619, 0.092, 0.1397, 0.071, 0.1035, 0.0676,
0.0729, 0.0906, 0.064, 0.0985, 0.0823, 0.1206, 0.155, 0.1438,
0.1357, 0.1695, 0.0834, 0.1359, 0.1289, 0.0764, 0.1249, 0.0775,
0.1139, 0.104, 0.1566, 0.1069, 0.0869, 0.1376, 0.1223, 0.105,
0.0996, 0.1356, 0.1335, 0.0951, 0.2162, 0.1744, 0.4547, 0.5789,
0.5555, 0.3899, 0.5037, 0.4281, 0.486, 1.0209, 0.5855, 0.5312,
0.488, 0.3133, 0.5054, 0.3724, 0.59, 0.8119, 0.3811, 0.797, 0.5139,
0.348, 0.7722, 0.743, 0.548, 0.8791, 0.9054, 0.5392, 0.4333,
0.5314, 0.4976, 0.5953, 0.4288, 0.5179, 0.5634, 0.5331, 0.4371,
0.5709, 0.5065, 0.8047, 0.5368, 0.5657, 0.5816, 0.4763, 0.5907,
0.533, 0.4384, 0.4949, 0.7277, 0.4445, 0.4894, 0.4655, 0.5384,
0.5106, 0.3343, 0.5186, 0.5262, 0.5311, 0.495, 0.4691, 0.3465,
0.5558, 0.5975, 0.4768, 0.4802, 0.4573, 0.4037, 0.3316, 0.5152,
0.4673, 0.2356, 0.2905, 0.5672, 0.2097, 0.2216, 0.7384, 0.4089,
0.6159, 0.5219, 0.2866, 0.2443, 0.2071, 0.3658, 0.5861, 0.5021,
0.6953, 0.5053, 0.3978, 0.3853, 0.6207, 0.2944, 0.507, 0.4412,
0.3424, 0.6597, 0.5892, 0.3295, 0.6505, 0.9334, 0.6674, 0.4919,
0.8392, 0.9123, 0.813, 0.8223, 0.5801, 0.5745, 0.5148, 0.8514,
0.5563, 0.6417, 0.6445, 0.5701, 0.4186, 0.8303, 0.46, 0.6041,
0.6537, 0.5221, 0.4782, 0.5657, 0.6499, 0.6667, 0.9074, 0.555,
0.6696, 0.6083)), .Names = c("length_cm", "wt_kg"), row.names = c(NA,
500L), class = "data.frame")
The relationship between length and weight is not linear. Unfortunately I could not include the whole data set here but when the whole data set is used a gam provides the best fit, unlike in this subset where loess is suggested.
I would like to focus on gam since an answer that works for the whole data set is what I am after.
It is obvious, even in the subset provided, that my data has some outliers, in the example data set (df) there are at least two obvious outliers.
library(ggplot2)
ggplot(df, aes(x=wt_kg, y=length_cm))+
geom_point()+
stat_smooth(method = "gam", formula = y ~ s(x), size = 1)
Moving forward with a gam approach I would like to generate the prediction interval so that I can identify which points fall in and out of say the 95% prediction interval.
This is extremely simple to do with a linear regression using predict:
l_model <- lm(wt_kg ~ length_cm, data=df)
df <- cbind(df, predict(l_model, interval = "prediction"))
Then simply plotting the upper and lower bounds of the interval
ggplot(df, aes(y=wt_kg, x=length_cm)) +
geom_ribbon(aes(ymin = lwr, ymax = upr),
fill = "blue", alpha = 0.2) +
geom_point()
But I can't seem to find a similar approach that works when using gam instead of lm. I have tried predict.gam from the mgcv package with no success.
library(mgcv)
df_model <- gam(wt_kg ~ length_cm, data=df)
gam_pred <- cbind(df, mgcv::predict.gam(df_model))
I don't get any errors when running this however what i get back is a single col of data which I am unsure how to interpret. Any help would be much appreciated.

I think that part of your code is:
require(broom)
require(gam)
mod <- gam(wt_kg ~ length_cm, data=df)
pred <- augment(mod)
But i dont understand the second ggplot2. "Pred" has the fitted value and others features about your regression, mainly .resid

Related

Interact_plot keeps coming back with Error: data must be compatible with existing data

I have been trying to solve this for days, so any help would be appreciated!
I am trying to make an interaction plot for an OLS Regression.
This is the code I am using:
interact <- lm(ele$vt_c ~ ele$Immigrants:ele$X.qual, data = as.data.frame(ele))
interact_plot(model = interact, pred=Immigrants, modx =X.qual, modx.values = NULL, data = ele)
This is the error that is coming up
Error in ecdf(d[[modx]]) : 'x' must have 1 or more non-missing values
In addition: Warning message:
immigrants and X.qual are not included in an interaction with one another in the model.
Reproducible data
if (!"interactions" %in% installed.packages()) install.packages("interactions")
library(interactions)
ele = structure(list(vt_c = c(68.37056, 67.55938, 69.25354, 67.54727,
67.39343, 67.81161, 65.81312, 64.68675, 70.8572, 72.1439, 67.39006,
64.89897, 62.81833, 63.82975, 58.99062, 67.69617, 68.17096, 65.24267,
67.08106, 66.47592, 68.40781, 70.40636, 69.50657, 72.37613, 70.24236,
67.50159, 71.77177, 67.09047, 74.58491, 70.64892, 65.20199, 70.03566,
70.23142, 71.62487, 66.87982, 70.72528, 66.97507, 69.38713, 67.20061,
68.79907, 67.05735, 67.38101, 66.10595, 60.97635, 61.9047, 61.28828,
72.11577, 63.04311, 71.04747, 77.16823, 63.77144, 72.5249, 69.10145,
74.61647, 55.0847, 70.97664, 73.40273, 72.02715, 69.28485, 68.66256,
77.92079, 69.78192, 71.32363, 79.13777, 76.21347, 72.96919, 71.95923,
70.94545, 64.8141, 55.98621, 74.19439, 72.70276, 68.77999, 63.09397,
61.72898), Immigrants = c(57.3, 55.1, 50.6, 45.7, 42.8, 51.7,
51.2, 50.9, 44.9, 44.5, 44.3, 42.7, 50.5, 50.5, 39.2, 50.6, 39.7,
38.9, 39.2, 41.8, 42.5, 43.1, 39.5, 41.1, 44.2, 38.6, 41.8, 40.1,
43.8, 41.9, 38.2, 38.9, 37.5, 40.8, 33.2, 41.6, 38.1, 30, 38.8,
34.4, 36.5, 32.1, 41.3, 30.6, 32.9, 27.8, 35.4, 28.7, 37.1, 33.3,
29.8, 29.8, 33.8, 32.8, 28.8, 32.6, 31.6, 30.7, 28.6, 30.9, 34.7,
24.6, 24.7, 28.4, 26, 26.2, 27.4, 26.1, 22.6, 24.7, 32.4, 22.9,
26.4, 22.2, 22.1), X.qual = c(32.9, 29.8, 30.8, 32.5, 18.3, 47.3,
30.5, 29.8, 32.7, 38.5, 42.5, 25.8, 54.5, 52.2, 24.9, 29.3, 30.5,
23, 37.6, 22.3, 35.2, 54, 39.6, 42.8, 30.4, 41.5, 47.5, 44.5,
48.4, 31.3, 25.9, 28.2, 41.6, 46.5, 24.8, 36.3, 45.2, 27, 48.7,
40, 42.1, 19.7, 53.7, 26, 21.8, 12.1, 51.6, 19.2, 46.6, 54.4,
24.9, 30.1, 47.4, 51.4, 29.7, 57.4, 48.8, 47.6, 34.3, 22.8, 52,
21.8, 29.6, 55.2, 38.6, 37.4, 39.3, 25.9, 15.7, 19.8, 38.2, 39.3,
37.7, 18.3, 32.6)), class = "data.frame", row.names = c(NA, -75L
))
interact <- lm(vt_c ~ Immigrants:X.qual,
data = ele)
interact_plot(model = interact, pred=immigrants,
modx =X.qual, data = ele)
Thank you!
Welcome to SO, Lucia Thomas!
I read this message and it sounded so much more thorough than what usually write about reproducible questions:
Please make this question reproducible. This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample unambiguous data (e.g., data.frame(x=...,y=...) or the output from dput(head(x))), and intended output given that input. Refs: stackoverflow.com/q/5963269, minimal reproducible example, and stackoverflow.com/tags/r/info.
That being said, I think I can help. Right now you have called each variable as a vector and called a data frame in your call to lm(). This has led to an incompatibility issue between these two functions.
ele = structure(list(vt_c = c(68.37056, 67.55938, 69.25354, 67.54727,
67.39343, 67.81161, 65.81312, 64.68675, 70.8572, 72.1439, 67.39006,
64.89897, 62.81833, 63.82975, 58.99062, 67.69617, 68.17096, 65.24267,
67.08106, 66.47592, 68.40781, 70.40636, 69.50657, 72.37613, 70.24236,
67.50159, 71.77177, 67.09047, 74.58491, 70.64892, 65.20199, 70.03566,
70.23142, 71.62487, 66.87982, 70.72528, 66.97507, 69.38713, 67.20061,
68.79907, 67.05735, 67.38101, 66.10595, 60.97635, 61.9047, 61.28828,
72.11577, 63.04311, 71.04747, 77.16823, 63.77144, 72.5249, 69.10145,
74.61647, 55.0847, 70.97664, 73.40273, 72.02715, 69.28485, 68.66256,
77.92079, 69.78192, 71.32363, 79.13777, 76.21347, 72.96919, 71.95923,
70.94545, 64.8141, 55.98621, 74.19439, 72.70276, 68.77999, 63.09397,
61.72898), immigrants = c(57.3, 55.1, 50.6, 45.7, 42.8, 51.7,
51.2, 50.9, 44.9, 44.5, 44.3, 42.7, 50.5, 50.5, 39.2, 50.6, 39.7,
38.9, 39.2, 41.8, 42.5, 43.1, 39.5, 41.1, 44.2, 38.6, 41.8, 40.1,
43.8, 41.9, 38.2, 38.9, 37.5, 40.8, 33.2, 41.6, 38.1, 30, 38.8,
34.4, 36.5, 32.1, 41.3, 30.6, 32.9, 27.8, 35.4, 28.7, 37.1, 33.3,
29.8, 29.8, 33.8, 32.8, 28.8, 32.6, 31.6, 30.7, 28.6, 30.9, 34.7,
24.6, 24.7, 28.4, 26, 26.2, 27.4, 26.1, 22.6, 24.7, 32.4, 22.9,
26.4, 22.2, 22.1), X.qual = c(32.9, 29.8, 30.8, 32.5, 18.3, 47.3,
30.5, 29.8, 32.7, 38.5, 42.5, 25.8, 54.5, 52.2, 24.9, 29.3, 30.5,
23, 37.6, 22.3, 35.2, 54, 39.6, 42.8, 30.4, 41.5, 47.5, 44.5,
48.4, 31.3, 25.9, 28.2, 41.6, 46.5, 24.8, 36.3, 45.2, 27, 48.7,
40, 42.1, 19.7, 53.7, 26, 21.8, 12.1, 51.6, 19.2, 46.6, 54.4,
24.9, 30.1, 47.4, 51.4, 29.7, 57.4, 48.8, 47.6, 34.3, 22.8, 52,
21.8, 29.6, 55.2, 38.6, 37.4, 39.3, 25.9, 15.7, 19.8, 38.2, 39.3,
37.7, 18.3, 32.6)), class = "data.frame", row.names = c(NA, -75L
))
Since you called the data frame, call the names of the columns, without the data frame appended:
interact <- lm(vt_c ~ immigrants:X.qual,
data = ele)
interact_plot(model = interact, pred=immigrants,
modx =X.qual, data = ele)

How can I construct Histogram, Bar plot, frequency curve and Ogive curve in R for this problem?

Dr. Tillman is Dean of the School of Business Socastee University. He wishes prepare to a report showing the number of hours per week students spends studying. He selects a random sample of 30 students and determines the number of hours each student studied last week.
15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6
I tried histogram using R code as follows:
v <- c(15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6)
hist(v)
Also I tried bar plot like this:
v <- c(15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6)
barplot(v)
But I couldn't code for the "Frequency curve" and "Ogive Curve".
How can I code for these in R?
Thank you!
You can get the frequency data from the histogram and the ogive by sorting the values:
out <- hist(v, breaks=8)
plot(out$mids, out$counts, xlab="Hours", ylab="Freqency", type="l")
v.srt <- sort(v)
# Cumulative Frequency
plot(v.srt, cumsum(v.srt), xlab="Hours", ylab="Cumulative Frequency", type="l")
# Cumulative proportion
plot(v.srt, cumsum(v.srt)/sum(v.srt), xlab="Hours", ylab="Cumulative Frequency", type="l")
abline(h=1, lty=2)

Creating variables by combining a vector of names and a vector of values [duplicate]

This question already has answers here:
Pasting two vectors with combinations of all vectors' elements
(8 answers)
Closed 3 years ago.
I'd like to create a vector/list containing a series of variables that are the result of the combination of two vectors containing (i) specific variable names and (ii) specific variable ID (same for all the variables).
Here are reported a short version of the two vectors:
the variable names:
names<-c("XPTS", "TROCK", "JFSG")
and the variable IDs:
values<-c(1, 1.1, 1.2, 1.3, 2, 2.1, 2.2, 2.3, 3, 3.1, 3.2, 3.3, 4, 4.1, 4.2, 4.3, 5, 5.1, 5.2, 5.3, 6, 6.1, 6.2, 6.3, 7, 7.1, 7.2, 7.3, 8, 8.1, 8.2, 8.3, 9, 9.1, 9.2, 9.3, 10, 10.1, 10.2, 10.3, 11, 11.1, 11.2, 11.3, 12, 12.1, 12.2, 12.3, 13, 13.1, 13.2, 13.3, 14, 14.1, 14.2, 14.3, 15, 15.1, 15.2, 15.3, 16, 16.1, 16.2, 16.3, 17, 17.1, 17.2, 17.3, 18, 18.1, 18.2, 18.3, 19, 19.1, 19.2, 19.3, 20, 20.1, 20.2, 20.3, 21, 21.1, 21.2, 21.3, 22, 22.1, 22.2, 22.3, 23, 23.1, 23.2, 23.3, 24, 24.1, 24.2, 24.3, 25, 25.1, 25.2, 25.3, 26, 26.1, 26.2, 26.3, 27, 27.1, 27.2, 27.3, 28, 28.1, 28.2, 28.3, 29, 29.1, 29.2, 29.3, 30, 30.1, 30.2, 30.3, 31, 31.1, 31.2, 31.3, 32, 32.1, 32.2, 32.3, 33, 33.1, 33.2, 33.3, 34, 34.1, 34.2, 34.3, 35, 35.1, 35.2, 35.3, 36, 36.1, 36.2, 36.3, 37, 37.1, 37.2, 37.3, 38, 38.1, 38.2, 38.3, 39, 39.1, 39.2, 39.3, 40, 40.1, 40.2, 40.3, 41, 41.1, 41.2, 41.3, 42, 42.1, 42.2, 42.3, 43, 43.1, 43.2, 43.3, 44, 44.1, 44.2, 44.3, 45, 45.1, 45.2, 45.3, 46, 46.1, 46.2, 46.3, 47, 47.1, 47.2, 47.3, 48, 48.1, 48.2, 48.3, 49, 49.1, 49.2, 49.3, 50)
I'd live to obtain a list of variable names as follows:
"XPTS_1","XPTS_1.1","XPTS_1.2", ..., "XPTS_49.3","XPTS_50","TROCK_1","TROCK_1.1",...,"TROCK_49.3","TROCK_50","JFSG_1","JFSG_1.1",...,"JFSG_49.3","JFSG_50"
The variable names are not only those reported but might change, so I'd like to have a dynamic loop for dealing with it. The one I wrote, as follows, doesn't fit my purpose:
variables_ID<-for (i in 1:length(values)) {
paste(names, values[i], sep = "_")
}
since I get only
"XPTS_50" "TROCK_50" "JFSG_50"
We can use outer
out1 <- c(t(outer(names, values, paste, sep="_")))
NOTE: transposed just to show that we get identical results with rep
Or use rep to replicate the 'names' and then paste
out2 <- paste(rep(names, each = length(values)), values, sep="_")
all.equal(out1, out2)
#[1] TRUE
head(out1)
#[1] "XPTS_1" "XPTS_1.1" "XPTS_1.2" "XPTS_1.3" "XPTS_2" "XPTS_2.1"
tail(out1)
#[1] "JFSG_48.3" "JFSG_49" "JFSG_49.1" "JFSG_49.2" "JFSG_49.3" "JFSG_50"
Or using CJ
library(data.table)
CJ(names, values)[, paste(names, values, sep="_")]
Or with tidyverse
library(tidyverse)
crossing(names, values) %>%
unite(names, names, values) %>%
pull(names)

Plotting Visual Tables

I am trying to plot a table in R and I am trying to format it so that is visually attractive for presentations.
I am trying to make it look like:
2000-01-01 2000-03-01 2000-06-01 ...
Revenue 3.5 4.6 7.9
Cost 2.3 2.7 5.6
And have the boxes that encapsulate the words, numbers, be the right size such that the column header dates and row labels are not squished . How do I do that??
plot.table(t(z))
z <- structure(c(68.2, 66.1, 64.7, 31.8, 30.9, 25.4, 36.1, 38.3, 38.3,
42.2, 43.3, 40.2, 41.9, 47.7, 50.8, 46.7, 48.2, 55.2, 58.2, 55.3,
58.2, 62.5, 62.2, 59.5, 59.3, 59.4, 58.7, 68.2, 64.9, 94.7, 75.7,
72, 73.5, 77.9, 83.8, 82.6, 83.8, 88.8, 91.5, 91.8, 92.6, 103.4,
100.5, 110.8, 105.4, 113.5, 110, 110.2, 118.9, 125.5, 122.5,
121.4, 122.6, 122.6, 127.4, 133.8, 131.5, 137.6, 142.7, 133,
39.8, 46.3, 38.2, 16.6, 14.5, 17.4, 17.7, 19.1, 19, 21.2, 20.9,
21.2, 19.9, 23.5, 25.2, 25.3, 23.3, 27.9, 29.3, 28.1, 29.6, 32.4,
31.3, 31.1, 31.3, 31.3, 31.5, 36, 36.9, 40.1, 39, 37.4, 38.1,
41.1, 43.1, 42.3, 42.4, 45.3, 46.4, 47.3, 48.2, 54.1, 51.6, 57.8,
54.3, 59.7, 56.1, 56.1, 60.9, 65.8, 62.8, 62.8, 62.1, 63.8, 65.5,
68.2, 66.7, 72.1, 75.1, 71.6), .Dim = c(60L, 2L), .Dimnames = list(
c("11323", "11413", "11504", "11596", "11688", "11778", "11869",
"11961", "12053", "12143", "12234", "12326", "12418", "12509",
"12600", "12692", "12784", "12874", "12965", "13057", "13149",
"13239", "13330", "13422", "13514", "13604", "13695", "13787",
"13879", "13970", "14061", "14153", "14245", "14335", "14426",
"14518", "14610", "14700", "14791", "14883", "14975", "15065",
"15156", "15248", "15340", "15431", "15522", "15614", "15706",
"15796", "15887", "15979", "16071", "16161", "16252", "16344",
"16436", "16526", "16617", "16709"), c("revenue", "cost")))

how to scale the graphical display in R when using text function to print a plot of stem and leaf plot using capture output

i am trying to output a stem and leaf plot to a graphical device. It outputs fine to the device but the problem is only a part of the plot shows in the graphic device. How can I scale the plot to fit into the graphical device (window)?
library(aplpack)
plot.new()
flint <- c(44.6, 25.7, 33.2, 48.3, 39.4, 43.5, 39.8, 40.5, 91.7, 29.3,
39.1, 42.5, 49.6, 40.6, 49.1, 41.7, 30.2, 40.0, 31.9, 42.3,
47.2, 50.5, 44.1, 45.8)
chert <- c(25.8, 6.3, 21.3, 20.6, 22.2, 10.5, 18.9, 25.9, 23.8, 22.0,
10.6, 16.8, 21.8, 15.8, 16.3, 21.7, 17.9, 13.7, 19.1, 15.2,
21.2, 20.2, 10.6, 23.1)
dev.list()
dev.set(2)
tmp <- capture.output(stem.leaf.backback(flint,chert,unit=.1,rule.line="Dixon"))
text (0,1, paste(tmp, collapse='\n'), adj=c(0,1), family='mono')
Output to file instead of to screen tends to be more reproducible when it comes controlling the size of the "paper" you draw on. In case you are happy with - for example - a PDF file with the stem plot, something like this works fine:
library(aplpack)
flint <- c(44.6, 25.7, 33.2, 48.3, 39.4, 43.5, 39.8, 40.5, 91.7, 29.3, 39.1, 42.5, 49.6,
40.6, 49.1, 41.7, 30.2, 40.0, 31.9, 42.3, 47.2, 50.5, 44.1, 45.8)
chert <- c(25.8, 6.3, 21.3, 20.6, 22.2, 10.5, 18.9, 25.9, 23.8, 22.0, 10.6, 16.8, 21.8,
15.8, 16.3, 21.7, 17.9, 13.7, 19.1, 15.2, 21.2, 20.2, 10.6, 23.1)
You can specify the PDF page width and height (in inches). Say 14" by 7".
pdf(file = "stemplot.pdf", width = 14, height = 7)
plot.new()
tmp <- capture.output(stem.leaf.backback(flint, chert, unit = .1, rule.line = "Dixon"))
text(0, 1, paste(tmp, collapse='\n'), adj = c(0,1), family = 'mono')
dev.off()
Of course we changed the page size instead of scaling the plot, so it is not exactly what you originally asked...
In case output to file is acceptable, here are
10 tips for making your R graphics look their best.

Resources