Barplot using ggplot2 for 4 variables - r
I have data like this:
ID height S1 S2 S3
1 927 0.90695438 0.28872194 0.67114294
2 777 0.20981677 0.71783084 0.74498220
3 1659 0.35813799 0.92339744 0.44001698
4 174 0.44829914 0.67493949 0.11503942
5 1408 0.90642643 0.18593999 0.67564278
6 1454 0.38943930 0.34806716 0.73155952
7 2438 0.51745975 0.12351953 0.48398490
8 1114 0.12523909 0.10811622 0.17104804
9 1642 0.03014575 0.29795320 0.67584853
10 515 0.77180549 0.83819990 0.26298995
11 1877 0.32741508 0.99277109 0.34148083
12 2647 0.38947869 0.43713441 0.21024554
13 845 0.04105275 0.20256457 0.01631959
14 1198 0.36139663 0.96387150 0.37676288
15 2289 0.57097808 0.66038711 0.56230740
16 2009 0.68488024 0.29811683 0.67998461
17 618 0.97111675 0.11926219 0.74538877
18 1076 0.70195881 0.59975160 0.95007272
19 1082 0.01154550 0.12019055 0.16309071
20 2072 0.53553213 0.78843202 0.32475690
21 1610 0.83657146 0.36959607 0.13271604
22 2134 0.80686674 0.95632284 0.63729744
23 1617 0.08093264 0.91357666 0.33092961
24 2248 0.23890930 0.82333634 0.64907957
25 1263 0.96598986 0.31948216 0.30288836
26 518 0.03767233 0.87770033 0.07123327
27 2312 0.91640643 0.80035100 0.66239047
28 2646 0.72622658 0.61135664 0.75960356
29 1650 0.20077621 0.07242114 0.55336017
30 837 0.84020075 0.42158771 0.53927210
31 1467 0.39666235 0.34446560 0.84959232
32 2786 0.39270226 0.75173569 0.65322596
33 1049 0.47255689 0.21875132 0.95088576
34 2863 0.58365691 0.29213397 0.61722305
35 2087 0.35238717 0.35595337 0.49284063
36 2669 0.02847401 0.63196192 0.97600657
37 545 0.99508793 0.89253107 0.49034522
38 1890 0.95755846 0.74403278 0.65517230
39 2969 0.55165118 0.45722242 0.59880179
40 395 0.10195396 0.03609544 0.94756902
41 995 0.23791515 0.56851452 0.36801151
42 2596 0.86009766 0.43901589 0.87818701
43 2334 0.73826129 0.60048445 0.45487507
44 2483 0.49731226 0.95138276 0.49646702
45 1812 0.57992109 0.26943131 0.46061562
46 1476 0.01618339 0.65883839 0.61790820
47 2342 0.47212988 0.07647121 0.60414349
48 2653 0.04238973 0.07128521 0.78587960
49 627 0.46315442 0.37033152 0.55526847
50 925 0.62999477 0.29710220 0.76897834
51 995 0.67324929 0.55107827 0.40428567
52 600 0.08703467 0.36989059 0.51071981
53 711 0.14358380 0.84568953 0.52353644
54 828 0.90847850 0.62079070 0.99279921
55 1776 0.12253259 0.39914002 0.42964742
56 764 0.72886279 0.29966153 0.99601125
57 375 0.95037718 0.38111984 0.78660025
58 694 0.04335591 0.70113494 0.51591063
59 1795 0.01959930 0.94686529 0.50268797
60 638 0.19907246 0.77282832 0.91163748
61 1394 0.50508626 0.21955016 0.26441590
62 1943 0.92638876 0.71611036 0.17385687
63 2882 0.13840169 0.66421796 0.40033126
64 2031 0.16919458 0.70625020 0.53835738
65 1338 0.60662738 0.27962799 0.24496437
66 1077 0.81587669 0.71225050 0.37585096
67 1370 0.84338121 0.66094211 0.58025355
68 1339 0.78807719 0.04101269 0.20895531
69 739 0.01902087 0.06114149 0.80133001
70 2085 0.69808750 0.27976169 0.63880242
71 1240 0.81509312 0.30196772 0.73633076
72 987 0.56840006 0.95661083 0.43881241
73 1720 0.48006288 0.38981872 0.57981238
74 2901 0.16137012 0.37178879 0.25604401
75 1987 0.08925623 0.84314249 0.46371823
76 1876 0.16268237 0.84723500 0.16861486
77 2571 0.02672845 0.31933115 0.61389453
78 2325 0.70962948 0.13250605 0.95810262
79 2503 0.76101818 0.61710912 0.47819473
80 279 0.85747478 0.79130451 0.75115933
81 1381 0.43726582 0.33804871 0.02058322
82 1800 0.41713645 0.90544760 0.17096903
83 2760 0.58564949 0.19755671 0.63996650
84 2949 0.82496758 0.79408518 0.16497848
85 118 0.79313923 0.75460289 0.35472278
86 1736 0.32615257 0.91139485 0.18642647
87 2201 0.95793194 0.32268770 0.89765616
88 750 0.65301961 0.08616947 0.23778386
89 906 0.45867582 0.91120045 0.98494348
90 2202 0.60602188 0.95517383 0.02133074
I want to make a barplot using ggplot2 like this:
In the above-mentioned dataset height should be on the y-axis and S1, S2, S3 should be representing colors of each sample.
I have tried the base R function barplot which gave me the following output. Please give me any suggestion.
barplot(t(as.matrix(examp[,3:5])),col=rainbow(3))
It's not clear to me exactly what you want to plot. You say you want height on the y axis, but the examples you show are all 'filled to the top', implying the same height for each ID. Also, it is not clear what the numbers associated with each sample represent. I am guessing they should be relative weightings for the bar heights.
Assuming you actually want a filled bar plot as in the examples, with the relative sizes of the bars dictated by the sample values, you can do:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
mutate(ID = reorder(ID, S3/(S3 + S2 + S1))) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(ID, Value * height, fill = Sample)) +
geom_col(position = "fill", color = NA) +
labs(y = "Height") +
theme_classic() +
scale_fill_manual(values = c("red", "green", "blue"))
Alternative
df %>%
arrange(order(height)) %>%
group_by(height) %>%
summarize(across(everything(), mean)) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(height, Value, fill = Sample, colour = Sample)) +
geom_smooth(method = loess, formula = y ~ x, linetype = 2, alpha = 0.2) +
theme_bw()
Related
ggplot_line: label the top 2 peak with X-axis values
I am new to R programming. I am plotting a mass spectrum with ggplot and would like to label the top 2 peaks with their x-axis values (i.e. m). Does anyone know how to achieve that? Thanks so much for your help! Here is part of the raw data I used for the ggplot. m Intensity 1 30001 2.964e+01 2 30002 3.336e+01 3 30003 3.968e+01 4 30004 5.015e+01 5 30005 6.838e+01 6 30006 1.016e+02 7 30007 1.464e+02 8 30008 2.130e+02 9 30009 3.115e+02 10 30010 3.951e+02 11 30011 5.134e+02 12 30012 5.316e+02 13 30013 6.377e+02 14 30014 8.813e+02 15 30015 1.071e+03 16 30016 1.119e+03 17 30017 1.202e+03 18 30018 1.299e+03 19 30019 1.112e+03 20 30020 1.205e+03 21 30021 1.422e+03 22 30022 1.653e+03 23 30023 1.726e+03 24 30024 2.423e+03 25 30025 3.059e+03 26 30026 3.267e+03 27 30027 3.993e+03 28 30028 5.172e+03 29 30029 5.278e+03 30 30030 2.794e+03 31 30031 1.459e+03 32 30032 2.512e+03 33 30033 6.590e+03 34 30034 1.245e+04 35 30035 1.144e+04 36 30036 5.197e+03 37 30037 6.012e+03 38 30038 1.453e+04 39 30039 1.513e+04 40 30040 5.802e+03 41 30041 9.226e+03 42 30042 5.809e+03 43 30043 3.074e+03 44 30044 3.882e+03 45 30045 9.941e+02 46 30046 8.170e+02 47 30047 1.149e+03 48 30048 3.567e+02 49 30049 3.805e+02 50 30050 3.654e+02 51 30051 4.724e+02 52 30052 7.819e+02 53 30053 8.634e+02 54 30054 5.235e+02 55 30055 1.712e+02 56 30056 9.232e+01 57 30057 9.434e+01 58 30058 7.191e+01 59 30059 8.036e+01 60 30060 4.456e+01 61 30061 9.428e+01 62 30062 9.392e+01 63 30063 8.413e+01 64 30064 5.671e+01 65 30065 2.639e+01 66 30066 2.027e+01 67 30067 4.584e+01 68 30068 6.956e+01 69 30069 6.181e+01 70 30070 6.450e+01 71 30071 2.826e+01 72 30072 3.610e+01 73 30073 6.325e+01 74 30074 3.509e+01 75 30075 3.478e+01 76 30076 1.120e+01 77 30077 6.993e+00 78 30078 9.936e+00 79 30079 7.738e+00 80 30080 9.771e+00 81 30081 1.762e+01 82 30082 3.060e+01 83 30083 2.175e+01 84 30084 2.816e+01 85 30085 2.700e+01 86 30086 2.114e+01 87 30087 4.378e+01 88 30088 5.824e+01 89 30089 6.193e+01 90 30090 4.146e+01 91 30091 9.697e+04 92 30092 9.458e+04 93 30093 9.216e+04 94 30094 8.972e+04 95 30095 8.723e+04 96 30096 8.468e+04 97 30097 8.211e+04 98 30098 7.959e+04 99 30099 7.726e+04 100 30100 7.527e+04 101 30101 7.379e+04 102 30102 7.298e+04 103 30103 7.301e+04 104 30104 7.399e+04 105 30105 7.602e+04 106 30106 7.916e+04 107 30107 8.340e+04 108 30108 8.862e+04 109 30109 9.460e+04 110 30110 1.010e+05 111 30111 1.074e+05 112 30112 1.133e+05 113 30113 1.180e+05 114 30114 1.211e+05 115 30115 1.222e+05 116 30116 1.213e+05 117 30117 1.186e+05 118 30118 1.146e+05 119 30119 1.100e+05 120 30120 1.054e+05 121 30121 1.014e+05 122 30122 9.838e+04 123 30123 9.637e+04 124 30124 9.535e+04 125 30125 9.508e+04 126 30126 9.520e+04 127 30127 9.527e+04 128 30128 9.484e+04 129 30129 9.355e+04 130 30130 9.128e+04 131 30131 8.809e+04 132 30132 8.425e+04 133 30133 8.012e+04 134 30134 7.603e+04 135 30135 7.225e+04 136 30136 6.895e+04 137 30137 6.617e+04 138 30138 6.392e+04 139 30139 6.214e+04 140 30140 6.078e+04 141 30141 5.980e+04 142 30142 5.922e+04 143 30143 5.905e+04 144 30144 5.934e+04 145 30145 6.013e+04 146 30146 6.143e+04 147 30147 6.324e+04 148 30148 6.552e+04 149 30149 6.816e+04 150 30150 7.100e+04 151 30151 7.384e+04 152 30152 7.655e+04 153 30153 7.904e+04 154 30154 8.132e+04 155 30155 8.353e+04 156 30156 8.595e+04 157 30157 8.896e+04 158 30158 9.302e+04 159 30159 9.864e+04 160 30160 1.063e+05 161 30161 1.165e+05 162 30162 1.293e+05 163 30163 1.443e+05 164 30164 1.605e+05 165 30165 1.759e+05 166 30166 1.883e+05 167 30167 1.957e+05 168 30168 1.969e+05 169 30169 1.921e+05 170 30170 1.824e+05 171 30171 1.693e+05 172 30172 1.544e+05 173 30173 1.390e+05 174 30174 1.241e+05 175 30175 1.102e+05 176 30176 9.755e+04 177 30177 8.644e+04 178 30178 7.692e+04 179 30179 6.900e+04 180 30180 6.262e+04 181 30181 5.766e+04 182 30182 5.397e+04 183 30183 5.137e+04 184 30184 4.972e+04 185 30185 4.889e+04 186 30186 4.881e+04 187 30187 4.940e+04 188 30188 5.059e+04 189 30189 5.230e+04 190 30190 5.444e+04 191 30191 5.690e+04 192 30192 5.960e+04 193 30193 6.244e+04 194 30194 6.539e+04 195 30195 6.842e+04 196 30196 7.153e+04 197 30197 7.471e+04 198 30198 7.795e+04 199 30199 8.118e+04 200 30200 8.430e+04 201 30201 8.719e+04 202 30202 8.976e+04 203 30203 9.193e+04 204 30204 9.364e+04 205 30205 9.480e+04 206 30206 9.531e+04 207 30207 9.504e+04 208 30208 9.391e+04 209 30209 9.189e+04 210 30210 8.912e+04 211 30211 8.587e+04 212 30212 8.251e+04 213 30213 7.939e+04 214 30214 7.680e+04 215 30215 7.492e+04 216 30216 7.381e+04 217 30217 7.349e+04 218 30218 7.394e+04 219 30219 7.510e+04 220 30220 7.690e+04 221 30221 7.919e+04 222 30222 8.174e+04 223 30223 8.425e+04 224 30224 8.637e+04 225 30225 8.776e+04 226 30226 8.826e+04 227 30227 8.788e+04 228 30228 8.690e+04 229 30229 8.569e+04 230 30230 8.465e+04 231 30231 8.405e+04 232 30232 8.398e+04 233 30233 8.434e+04 234 30234 8.494e+04 235 30235 8.554e+04 236 30236 8.598e+04 237 30237 8.623e+04 238 30238 8.638e+04 239 30239 8.665e+04 240 30240 8.736e+04 241 30241 8.884e+04 242 30242 9.147e+04 243 30243 9.559e+04 244 30244 1.016e+05 245 30245 1.097e+05 246 30246 1.200e+05 247 30247 1.321e+05 Here is my code for ggplot: ggplot(data=raw.1) + geom_line(mapping = aes(x=m, y=Intensity)) Below is the ggplot output:
I would do it this way. My solution requires the ggrepel package as well as some dplyr functions. The key to this working is that you can set data = for each geom_ layer in ggplot2. The geom_text_repel() layer from ggrepel ensures that the labels will not overlap your data from geom_line(). library(ggplot2) library(dplyr) library(ggrepel) ggplot(mapping = aes(x = m, y = Intensity, label = m)) + geom_line(data=raw.1) + geom_text_repel(data = raw.1 %>% arrange(desc(Intensity)) %>% # arranges in descending order slice_head(n = 2)) # only keeps the top two intensities. My plot does not look like yours since you only shared the first 247 data points. I suspect that this initial solution might not work for you because I am a chemist and have some idea what you hope to accomplish. This approach labels the top two highest intensities, not necessarily the top two peaks. We need to identify local all maxima and then select the two tallest. Here is how we do that. The following code calculates the slope between each point, and then looks for points where a positive slope changes to a negative slope (local maximum), then it sorts and selects the top two by intensity. top_two <- raw.1 %>% mutate(deriv = Intensity - lag(Intensity) , max = case_when(deriv >=0 & lead(deriv) <0 ~ T, T ~ F)) %>% filter(max) %>% arrange(desc(Intensity)) %>% slice_head(n = 2) Let's modify the original plot code to put this in. ggplot(mapping = aes(x = m, y = Intensity, label = m)) + geom_line(data = raw.1) + geom_text_repel(data = top_two, nudge_y = 1e4) Data: raw.1 <- structure(list(m = c(30001, 30002, 30003, 30004, 30005, 30006, 30007, 30008, 30009, 30010, 30011, 30012, 30013, 30014, 30015, 30016, 30017, 30018, 30019, 30020, 30021, 30022, 30023, 30024, 30025, 30026, 30027, 30028, 30029, 30030, 30031, 30032, 30033, 30034, 30035, 30036, 30037, 30038, 30039, 30040, 30041, 30042, 30043, 30044, 30045, 30046, 30047, 30048, 30049, 30050, 30051, 30052, 30053, 30054, 30055, 30056, 30057, 30058, 30059, 30060, 30061, 30062, 30063, 30064, 30065, 30066, 30067, 30068, 30069, 30070, 30071, 30072, 30073, 30074, 30075, 30076, 30077, 30078, 30079, 30080, 30081, 30082, 30083, 30084, 30085, 30086, 30087, 30088, 30089, 30090, 30091, 30092, 30093, 30094, 30095, 30096, 30097, 30098, 30099, 30100, 30101, 30102, 30103, 30104, 30105, 30106, 30107, 30108, 30109, 30110, 30111, 30112, 30113, 30114, 30115, 30116, 30117, 30118, 30119, 30120, 30121, 30122, 30123, 30124, 30125, 30126, 30127, 30128, 30129, 30130, 30131, 30132, 30133, 30134, 30135, 30136, 30137, 30138, 30139, 30140, 30141, 30142, 30143, 30144, 30145, 30146, 30147, 30148, 30149, 30150, 30151, 30152, 30153, 30154, 30155, 30156, 30157, 30158, 30159, 30160, 30161, 30162, 30163, 30164, 30165, 30166, 30167, 30168, 30169, 30170, 30171, 30172, 30173, 30174, 30175, 30176, 30177, 30178, 30179, 30180, 30181, 30182, 30183, 30184, 30185, 30186, 30187, 30188, 30189, 30190, 30191, 30192, 30193, 30194, 30195, 30196, 30197, 30198, 30199, 30200, 30201, 30202, 30203, 30204, 30205, 30206, 30207, 30208, 30209, 30210, 30211, 30212, 30213, 30214, 30215, 30216, 30217, 30218, 30219, 30220, 30221, 30222, 30223, 30224, 30225, 30226, 30227, 30228, 30229, 30230, 30231, 30232, 30233, 30234, 30235, 30236, 30237, 30238, 30239, 30240, 30241, 30242, 30243, 30244, 30245, 30246, 30247), Intensity = c(29.64, 33.36, 39.68, 50.15, 68.38, 101.6, 146.4, 213, 311.5, 395.1, 513.4, 531.6, 637.7, 881.3, 1071, 1119, 1202, 1299, 1112, 1205, 1422, 1653, 1726, 2423, 3059, 3267, 3993, 5172, 5278, 2794, 1459, 2512, 6590, 12450, 11440, 5197, 6012, 14530, 15130, 5802, 9226, 5809, 3074, 3882, 994.1, 817, 1149, 356.7, 380.5, 365.4, 472.4, 781.9, 863.4, 523.5, 171.2, 92.32, 94.34, 71.91, 80.36, 44.56, 94.28, 93.92, 84.13, 56.71, 26.39, 20.27, 45.84, 69.56, 61.81, 64.5, 28.26, 36.1, 63.25, 35.09, 34.78, 11.2, 6.993, 9.936, 7.738, 9.771, 17.62, 30.6, 21.75, 28.16, 27, 21.14, 43.78, 58.24, 61.93, 41.46, 96970, 94580, 92160, 89720, 87230, 84680, 82110, 79590, 77260, 75270, 73790, 72980, 73010, 73990, 76020, 79160, 83400, 88620, 94600, 101000, 107400, 113300, 118000, 121100, 122200, 121300, 118600, 114600, 110000, 105400, 101400, 98380, 96370, 95350, 95080, 95200, 95270, 94840, 93550, 91280, 88090, 84250, 80120, 76030, 72250, 68950, 66170, 63920, 62140, 60780, 59800, 59220, 59050, 59340, 60130, 61430, 63240, 65520, 68160, 71000, 73840, 76550, 79040, 81320, 83530, 85950, 88960, 93020, 98640, 106300, 116500, 129300, 144300, 160500, 175900, 188300, 195700, 196900, 192100, 182400, 169300, 154400, 139000, 124100, 110200, 97550, 86440, 76920, 69000, 62620, 57660, 53970, 51370, 49720, 48890, 48810, 49400, 50590, 52300, 54440, 56900, 59600, 62440, 65390, 68420, 71530, 74710, 77950, 81180, 84300, 87190, 89760, 91930, 93640, 94800, 95310, 95040, 93910, 91890, 89120, 85870, 82510, 79390, 76800, 74920, 73810, 73490, 73940, 75100, 76900, 79190, 81740, 84250, 86370, 87760, 88260, 87880, 86900, 85690, 84650, 84050, 83980, 84340, 84940, 85540, 85980, 86230, 86380, 86650, 87360, 88840, 91470, 95590, 101600, 109700, 120000, 132100 )), row.names = c(NA, -247L), class = c("tbl_df", "tbl", "data.frame" ))
This approach assumes or treats your x-axis as discrete values of a continuous variable and finds the local maxima based on 2nd derivative using code from Finding local maxima and minima Rest of the plotting is similar to Ben Norris's answer using geom_text_repel() to label the points of interest. Also as noted, the data your provided are different vs. the figure in your question. library(ggplot2) library(ggrepel) # find local maxima aka peaks local_maximas <- raw.1[which(diff(sign(diff(raw.1$Intensity)))==-2)+1,] top2 <- tail(local_maximas[order(local_maximas$Intensity),],2) #subset of top 2 highest peaks raw.1$label <- ifelse(raw.1$m %in% top2$m, raw.1$m, NA) #make labels for plot ggplot(data = raw.1) + geom_line(aes(x=m, y=Intensity)) + geom_text_repel(aes(x = m, y = Intensity, label = label))
Why are my 95% confidence intervals of my multivariate regression being plotted as a loess line?
I've been trying to plot a 95% prediction interval for a multivariate regression line in ggplot2. The graph is a regression of three independent variables ("x", "y", and "z") being used to predict a dependent variable ("a") on the y-axis. However, when I actually try to plot the results in ggplot2, I get a rather unusual result where the regression line is straightforward but the 95% prediction interval bands are very squiggly and do not resemble a straight line at all. They look like loess lines more than anything. Here is a picture showing the result I get: Does anyone know why I am getting a result where the 95% confidence intervals aren't smooth lines? The only thing I can think of is that this is related to the fact that this is a multivariate regression rather than a univariate one, but checking the actual variables all three show a strong correlation with the dependent variable (r2 > 0.95). I looked up the results of a plot of a multivariate regression with a 95% confidence interval, but none of them seemed similar to my result, they all seemed to have pretty smooth lines. I tried fitting a method="lm" into the predict() call of my code following this question, but that did not work either. Below is a dataset and code that replicates this result. x y z a 1 2.366153239 5.420534999 2.328204243 10.55858156 2 1.431094272 2.975529566 1.724972338 2.533696814 3 2.60453538 5.75827066 2.399639694 11.48783737 4 2.483771412 5.470167623 2.338838948 10.74706177 5 1.971210737 4.287715955 2.070680071 7.334766592 6 2.5596573 5.558000525 2.357541203 11.6127708 7 2.177892158 4.730480377 2.174966753 8.631949429 8 1.49665751 3.203559121 1.78984891 3.020424886 9 2.728865195 6.376658918 2.525204728 12.51412704 10 1.908668224 4.025351691 2.006327912 6.593044534 11 1.978895443 4.24563401 2.060493633 7.402451521 12 1.627855104 3.344274234 1.828735693 3.731699451 13 1.53436705 3.350605596 1.83046595 3.170525564 14 2.448831586 5.585936937 2.363458681 10.76866329 15 2.443160968 5.331752143 2.309058714 10.58310613 16 2.156078216 4.417635062 2.101817086 8.109576771 17 1.931534652 4.249610334 2.061458303 6.790693233 18 1.452715015 3.225752129 1.796037897 3.356200016 19 1.729354145 3.683866912 1.919340228 5.420225217 20 1.861239059 3.912023005 1.977883466 6.267750682 21 1.822955174 3.804437795 1.950496807 5.991464547 22 2.113126565 4.492001488 2.119434238 8.114076324 23 2.171856126 4.662613282 2.159308519 7.806138626 24 1.391215895 3.010620886 1.735114084 2.461296784 25 1.319165859 2.895911938 1.701737917 2.055404964 26 2.034006688 4.322608316 2.079088338 6.977001452 27 2.85574569 6.160996329 2.482135437 14.34613881 28 1.411579618 3.097385927 1.759939183 2.613006652 29 2.576957482 6.029643051 2.45553315 11.91628836 30 1.796913834 3.923259637 1.980721999 5.911392672 31 2.024389004 4.345833727 2.084666335 8.022132643 32 1.63435577 3.493472658 1.869083374 3.515715835 33 1.584595569 3.453157121 1.858267236 3.397523976 34 1.881578895 4.030076005 2.00750492 6.011267174 35 1.728309802 3.752101123 1.937034105 5.225370259 36 1.414715557 3.140049044 1.772018353 2.736961545 37 1.488730081 3.116621591 1.76539559 2.902519892 38 1.522138034 3.257327011 1.804806641 2.890371758 39 1.800033345 3.987130478 1.996780027 5.640594153 40 1.794222122 4.143928062 2.035664035 6.206575927 41 2.676710091 6.289901082 2.50796752 13.49805633 42 2.328582719 5.13691546 2.266476442 9.430961545 43 2.484723966 5.458712793 2.336388836 10.7561993 44 2.287108375 4.856940066 2.203846652 9.917240545 45 2.417128932 5.582744146 2.362783136 10.54534144 46 2.328332495 5.105945474 2.259633925 9.840475333 47 2.362264634 5.293304825 2.300718328 9.848820151 48 2.28292536 5.018934097 2.24029777 9.269934816 49 1.449825221 3.006177531 1.73383319 3.121042465 50 2.211679876 4.692264893 2.166163635 8.631218063 51 2.704614597 6.072756474 2.464296345 12.31992499 52 2.48097622 5.43590303 2.331502312 11.2245765 53 1.497529983 3.380994674 1.838748127 3.752088968 54 2.696365396 5.825540285 2.413615604 12.36222133 55 2.165729837 4.666265285 2.160153996 8.455875079 56 2.410978268 5.417499423 2.327552239 10.08813972 57 2.185447829 4.991792206 2.234231905 9.215327913 58 2.041898307 4.22566518 2.055642279 7.418180823 59 2.099077244 4.375757022 2.091831021 7.696212639 60 2.000032635 4.234467391 2.057782153 7.110696123 61 2.025963678 4.260852439 2.064183238 6.851163763 62 2.083395224 4.351567427 2.08604109 7.884576511 63 1.981523362 4.318820559 2.07817722 7.43543802 64 2.033235038 4.336636932 2.082459347 7.313220387 65 1.423999144 3.206803244 1.790754937 2.564949357 66 2.217982257 4.825910853 2.196795587 8.920558764 67 1.240285111 2.808498672 1.675857593 1.568615918 68 2.215837149 5.041487758 2.245325758 8.802372134 69 2.134859238 4.731890939 2.175291001 8.132101136 70 2.306998207 5.059171458 2.249260202 9.336074756 71 1.896404791 4.104681782 2.026001427 6.445449942 72 1.922935417 4.151905673 2.037622554 6.818169682 73 2.111422924 4.716264233 2.171696165 8.366370302 74 2.28264494 4.852811209 2.202909714 9.210340372 75 2.190760504 4.574710979 2.1388574 8.447427164 76 2.037589062 4.275276265 2.06767412 6.989197008 77 1.717192759 3.810543836 1.952061433 4.610157727 78 1.876769266 4.043051268 2.010734012 6.306275287 79 2.030134158 4.579339426 2.139939117 7.715792425 80 1.93577016 4.356708827 2.08727306 6.788521191 81 2.056518774 4.445588116 2.108456335 7.636510887 82 2.120080841 4.615120517 2.148283156 7.916807491 83 2.232689054 4.861361591 2.204849562 8.694167142 84 2.181147406 4.782479201 2.186888017 8.854567878 85 2.92779884 6.305666829 2.511108685 13.9593635 86 1.860080456 4.459637473 2.111785376 6.163314804 87 1.913818428 4.602767301 2.145406092 7.174915716 88 1.877883958 4.594104966 2.143386332 6.335054251 89 1.994987686 4.632100752 2.152231575 7.707952547 90 2.14756511 5.023880521 2.241401464 9.161721393 91 1.503591471 3.687628672 1.92031994 4.280824129 92 1.4536743 3.579343567 1.891915317 3.761200116 93 1.50872427 3.584888833 1.893380266 4.106767082 94 1.537573733 3.649466946 1.910357806 4.126327608 95 1.934796461 4.373238129 2.091228856 7.584097036 96 1.526250724 3.248434627 1.802341429 3.228826156 97 1.606399474 3.500439216 1.870946075 4.939855112 98 1.943162189 4.329208633 2.080675043 6.460498957 99 1.963384107 4.353112625 2.086411423 6.649308332 100 2.183124049 4.711248626 2.170541091 8.474527832 101 1.640763809 3.543853682 1.882512598 3.832330237 102 1.659456682 3.523415014 1.877076188 3.997282849 103 1.436096958 3.166318574 1.779415234 2.839078464 104 2.428955194 4.91133048 2.216152179 10.44793169 105 2.668500746 6.154858094 2.480898646 12.73883098 106 2.676812229 6.178980921 2.485755604 12.64109656 107 2.126920019 4.640923356 2.154280241 8.600833727 108 1.878254881 4.025530246 2.00637241 6.253828812 109 2.242102174 4.726797674 2.174119977 8.29404964 110 1.676813632 3.822754538 1.955186574 5.370638028 111 1.874531192 4.17438727 2.043131731 7.265087007 112 1.998637301 4.2363594 2.058241822 6.722389092 113 1.944116978 4.159527009 2.039491851 6.038562805 114 2.308184503 5.192956851 2.278806014 9.36048303 115 2.042370888 4.49535532 2.120225299 7.320526962 116 2.015621187 4.318820559 2.07817722 7.081078135 117 1.81401665 4.146304301 2.036247603 6.492542819 118 1.676813632 3.87937827 1.969613736 5.221868194 119 2.807346477 6.428545769 2.535457704 13.72308897 120 1.621259207 3.543853682 1.882512598 4.162470391 121 1.50100345 3.321793359 1.822578766 3.106378794 122 1.582428764 3.464319806 1.861268333 4.143134726 123 1.654547625 3.591817741 1.895209155 4.509649984 124 2.332936461 4.937777822 2.222111118 9.398917323 125 2.498105588 5.513601542 2.348105948 11.29414737 126 1.890319403 3.887730313 1.97173282 5.847161058 127 1.804890841 3.940999114 1.985194981 6.17864926 128 2.096209309 4.6042388 2.145749007 7.788418833 129 2.047658751 4.337290741 2.082616321 7.612336837 130 2.680572077 5.989462544 2.447337848 12.15745472 131 2.333554566 5.407171771 2.325332615 10.44467195 132 2.212180997 4.932817886 2.220994797 8.881836305 133 1.478852439 3.063390922 1.750254531 2.890371758 134 1.648334702 3.518387649 1.875736562 4.141546164 135 2.307921185 4.90823336 2.215453308 9.305650552 136 2.13384989 4.645130271 2.155256428 8.018790088 137 1.728309802 3.555348061 1.885563062 4.941642423 138 1.691821236 3.556775613 1.885941572 4.886582645 139 1.746238611 3.891820298 1.972769702 5.363543151 140 1.679155631 3.642966397 1.908655652 4.754882459 141 1.94348069 4.156536582 2.038758589 6.277601677 142 1.549402462 3.250374492 1.8028795 3.342508385 143 1.856975574 4.232023463 2.057188242 6.413458957 144 2.529503815 5.684310793 2.38417927 11.22830537 145 2.035545742 4.643428898 2.154861689 7.244227516 146 2.467132416 5.697093487 2.386858497 11.50287513 147 2.298324686 4.870031331 2.206814748 9.286469586 148 1.937388065 4.34601078 2.0847088 7.322972679 149 1.956955486 4.536730733 2.129960266 7.739019572 150 2.036823984 4.518958489 2.125784206 8.594154233 151 1.972996546 4.529692045 2.128307319 7.967481199 152 1.58746864 3.283839256 1.812136655 3.314186005 153 1.521311054 3.464922216 1.861430153 3.681603045 154 2.44446969 5.445011746 2.333454895 10.3609124 155 2.294121109 4.731979033 2.17531125 9.105210941 156 3.126345733 6.927557906 2.632025438 15.6772624 157 1.867746396 4.253056253 2.06229393 6.32459191 158 1.839082858 4.029806041 2.00743768 5.382980154 159 2.127330896 4.844974178 2.201130205 7.863266724 160 2.404523583 5.236441963 2.288327329 10.04902409 161 2.262955985 4.845642719 2.201282063 9.034969801 162 2.253418218 4.727387819 2.174255693 9.130463484 163 2.302083991 5.167955549 2.273313781 10.06411762 164 2.192165626 4.835011259 2.198865903 9.262695602 165 1.672685332 3.734489965 1.93248285 4.565493369 166 1.568460311 3.539508997 1.881358285 3.52282487 167 1.609819887 3.523868735 1.877197042 3.920784511 168 1.616583967 3.587676949 1.894116403 4.394572604 169 1.643301653 3.654700957 1.911727218 3.912023005 170 1.621923158 3.581532841 1.892493815 3.891820298 171 2.090637708 4.527208645 2.127723818 8.536995819 172 2.109497906 4.585222548 2.141313277 8.203668045 173 2.03091153 4.429625613 2.104667578 7.785783239 174 2.09487893 4.582924577 2.140776629 8.204589814 175 2.040382454 4.335786342 2.08225511 6.632541816 176 2.312894869 5.342334252 2.311349011 9.798127037 177 1.430087263 3.148453361 1.774388165 2.939161922 178 2.293711966 4.871098263 2.20705647 9.392661929 179 2.391075023 4.894101478 2.212261621 9.375295332 180 2.517077345 5.718436483 2.391325257 11.47221284 181 1.989024673 4.154969184 2.038374152 6.872128101 182 2.02016078 4.294014757 2.072200463 7.403304815 183 1.797360845 4.076689627 2.019081382 5.90560705 184 1.705239225 3.931825633 1.982883162 5.697965589 185 1.471533812 3.312439025 1.820010721 3.529590596 186 1.438083095 3.346917175 1.829458164 3.533978493 187 1.619261465 3.559624618 1.886696748 4.109233175 188 1.609819887 3.6558396 1.912025 4.166355098 189 2.346796539 5.146965796 2.26869253 9.872567414 190 1.784208279 3.519720884 1.876091918 4.879539029 191 1.832126365 3.811539467 1.952316436 5.259368616 192 1.677986168 3.452840615 1.858182073 3.885884348 193 1.966109701 4.163870625 2.04055645 6.526348436 194 1.701367309 3.828641396 1.956691441 4.605170186 195 1.931534652 4.279440046 2.06868075 6.927802974 196 1.36183801 3.102342009 1.761346646 2.645465326 197 2.432819556 5.883322388 2.425556099 10.46486408 198 2.078341803 4.564943223 2.136572775 7.650468513 199 1.432099112 3.171155089 1.780773733 2.931193752 200 2.174427741 4.839451482 2.199875333 8.482392615 201 2.16404302 4.710430697 2.170352666 8.620246046 202 1.738643812 3.737669618 1.933305361 5.834810737 203 2.303817478 5.000921602 2.236274044 9.718344619 204 1.741189967 3.731819205 1.931791708 5.090062428 205 1.794671893 3.904293207 1.975928442 5.247024072 206 1.757635562 3.857777991 1.964122703 5.006560336 207 1.676226207 3.66137978 1.913473224 4.566637236 208 1.77911412 3.86388263 1.965676125 5.669260041 209 2.059914227 4.564348191 2.136433521 7.695152987 210 1.32424147 3.104586678 1.761983734 2.182674796 211 1.604334732 3.751518852 1.936883799 4.85787254 212 1.662497734 3.79739748 1.948691222 5.073109185 213 1.44885795 3.04690056 1.745537327 2.907447359 214 2.487551021 5.598973005 2.366214911 10.97673998 215 2.438166592 5.528436532 2.351262753 10.75773968 216 1.892477044 4.164647686 2.040746845 7.15334893 217 1.520482581 3.272335343 1.80895974 3.424588334 218 2.488969385 5.681996883 2.383693957 10.74868607 219 2.215837149 4.53044664 2.128484588 7.620705087 220 2.442786243 5.526780079 2.350910479 10.69919132 221 2.570602875 5.907702431 2.430576563 11.59161344 222 2.608344119 6.053264948 2.460338381 12.33182385 223 2.524368131 5.738731256 2.395564914 11.20612853 224 1.539964086 3.38269391 1.839210132 3.571221411 225 1.541550744 3.476614021 1.864568052 3.523119986 226 2.111209474 4.695924549 2.167008202 8.126284621 227 1.910391851 4.139955073 2.034687955 6.467590025 228 2.801971864 6.015864434 2.452725919 13.0280527 229 2.616209119 5.780126041 2.404189269 11.53329656 230 2.570130461 5.673975975 2.38201091 10.97701107 231 2.545595117 5.629669374 2.372692431 11.14107887 232 2.618299253 5.800606659 2.408444863 11.97035031 233 2.443348195 5.385412073 2.320649063 10.85417971 234 2.385152788 5.279188197 2.297648406 10.67131308 235 2.512400994 5.685007319 2.384325338 11.58593194 236 2.39352554 5.12693575 2.26427378 10.4590302 237 1.823796962 3.992680908 1.998169389 6.109247583 238 1.768267491 3.745968421 1.935450444 5.260096154 239 2.376820756 5.302583255 2.302733865 10.4487146 240 2.042402374 4.477336814 2.115971837 7.810068783 241 2.159700495 4.673996377 2.161942732 8.189916149 242 1.948229832 4.378018613 2.092371528 6.932447892 243 1.330510703 3.059880093 1.749251295 2.083184528 244 1.464097665 3.342685111 1.828301154 3.072693315 245 1.446917352 3.196630216 1.787912251 2.829087196 246 2.082252099 4.60990894 2.14706985 8.075582637 247 1.933494729 4.136126096 2.033746812 7.003065459 248 1.840298976 3.949126093 1.987240824 7.056175284 249 1.649584193 3.645188765 1.909237745 4.51129897 250 1.778648064 3.883623531 1.97069113 5.09681299 251 2.526339825 5.903056741 2.429620699 11.66907415 252 2.512244141 5.734958092 2.394777253 10.93748043 253 1.947599667 4.356708827 2.08727306 6.514712691 254 2.181687439 4.946274535 2.224022153 8.799405331 255 2.109497906 4.510859507 2.123878411 8.132101136 256 1.831713667 4.188138442 2.046494183 6.109247583 257 1.5517319 3.446807893 1.856558077 3.765840495 258 2.47549747 5.727881894 2.393299374 10.78967984 259 1.96580772 4.156693187 2.038796995 6.229496711 260 1.978602442 4.21508618 2.053067505 7.258412151 261 2.064000486 4.339901708 2.083243075 7.670717659 262 2.117775721 4.510639702 2.123826665 7.731676304 263 2.221912965 4.838923916 2.199755422 8.877208949 264 1.940925986 4.266896327 2.065646709 6.450865289 265 2.040382454 4.579852378 2.140058966 7.857666456 266 2.173143952 4.666735542 2.160262841 8.561717125 267 2.240859653 4.901564199 2.21394765 8.808442394 268 1.888874933 4.080921542 2.02012909 6.163314804 269 1.845529749 4.082609306 2.020546784 6.885284696 270 2.238519604 4.984229093 2.23253871 8.987910316 271 2.393206767 5.29338648 2.300736073 10.70491521 272 2.702044102 5.884714177 2.425842983 12.3883942 273 2.219296721 4.854631045 2.203322728 9.263449766 274 1.96161829 4.090838423 2.022582118 6.993932975 275 2.00561407 4.171305603 2.042377439 7.324270223 276 2.467836387 5.578051269 2.361789844 10.8016414 277 1.390119244 3.100092289 1.760707894 2.379546134 278 1.365322726 3.044760505 1.744924212 2.401525041 279 1.598782218 3.516726026 1.875293584 4.234975692 280 1.94538671 4.131961426 2.032722663 6.199494461 281 2.172592522 4.89858579 2.213274902 8.804952261 282 1.908668224 4.102312732 2.025416681 6.374172668 283 1.944434766 4.112266337 2.027872367 6.54672802 284 1.58387445 3.505557397 1.872313381 3.941581808 285 1.743721514 3.832670536 1.95772075 5.11349268 286 1.592453126 3.549329989 1.883966557 4.871143315 287 1.283414418 2.79971739 1.673235605 1.54329811 288 1.320439849 2.90690106 1.704963654 2.070653036 289 1.194572818 2.708716646 1.645817926 1.184789985 290 1.231175294 2.681021529 1.637382524 1.115141591 291 1.365322726 3.074795481 1.753509476 2.254444718 292 1.408422528 3.000719815 1.732258588 2.422144328 293 2.184734225 4.886582645 2.210561613 8.803574418 294 2.030652566 4.649187071 2.156197364 7.901007052 295 1.890679763 4.02356438 2.005882444 6.212726329 296 1.855414729 4.027135813 2.006772486 5.858647185 297 1.819146836 3.737669618 1.933305361 5.340274716 298 1.51380043 3.337192052 1.826798306 3.514823642 299 1.923936518 4.162158962 2.040136996 6.485993092 300 2.54480266 5.875913394 2.42402834 11.44989333 301 2.015083881 4.471638793 2.114624977 7.725330038 302 1.902054478 4.30514559 2.074884476 7.141300544 303 1.932189012 4.149463861 2.037023284 7.112433389 304 1.357151358 2.977059008 1.725415605 2.054123734 305 2.172040349 4.677490848 2.162750759 7.584422406 306 2.12856108 4.80073697 2.191058413 8.165269798 307 1.597383378 3.38269391 1.839210132 3.948162052 308 1.571436916 3.451890496 1.857926397 3.970291914 309 1.669116161 3.728100167 1.930828881 4.514479321 310 1.792870023 3.818920387 1.95420582 5.395716273 311 2.701422654 6.042632834 2.45817673 12.51412704 312 2.724885462 6.056784013 2.461053436 12.63817968 313 2.649668658 5.96870756 2.44309385 12.34583459 314 1.328012928 3.19047635 1.786190457 2.231089091 315 2.290836238 4.827072968 2.197060074 8.67484801 316 2.375600157 5.495650681 2.344280419 10.05803872 317 1.625886455 3.693369359 1.92181408 5.110178924 318 2.329332455 5.313205979 2.305039258 9.613803477 319 1.515480102 3.456316681 1.859117178 3.185525845 320 1.472454994 3.284663565 1.812364082 3.025291076 321 1.506165026 3.349904087 1.83027432 3.054001182 322 1.473374347 3.306520335 1.81838399 3.25617161 323 1.527068855 3.325036021 1.82346813 3.36729583 324 2.110354575 4.662495253 2.159281189 8.165363632 325 1.523787537 3.514526067 1.874706928 3.062923523 326 2.023599447 4.094344562 2.02344868 6.938769333 327 2.753898938 5.917548864 2.432601255 12.66032792 328 2.617941755 5.678362097 2.382931408 11.46939146 329 2.119034653 4.483002552 2.117310216 8.240121298 330 2.066147705 4.476199805 2.115703147 7.14397299 331 2.101925481 4.630837933 2.151938181 7.659327016 332 2.508777239 5.407171771 2.325332615 11.8987611 333 2.005568244 4.463030419 2.112588559 7.397665697 334 1.726738648 3.759687344 1.938991321 4.430816799 335 1.774901671 3.812975852 1.952684268 4.937562683 336 1.648959883 3.423610976 1.85030024 3.988984047 337 1.777714463 3.797733859 1.948777529 5.403847868 338 1.704136403 3.63758616 1.9072457 5.272486607 339 1.844729114 3.968445871 1.992095849 6.429622699 340 1.768267491 3.797733859 1.948777529 5.523339153 341 2.159320704 4.744410253 2.178166718 8.301035184 342 2.109497906 4.580877493 2.140298459 7.726636028 343 2.521315024 5.573617308 2.360850971 11.60597801 344 2.576758408 5.79269513 2.406801847 11.8427421 345 2.669803365 5.872117789 2.423245301 12.427118 346 2.441001399 5.430134791 2.330264962 11.48863277 347 2.117775721 4.750395438 2.17954019 8.556413905 348 2.023599447 4.553734634 2.133948133 7.34601021 349 2.394268344 5.066826574 2.250961255 9.954988325 350 2.106053393 4.696472344 2.167134593 7.892825526 351 2.100394247 4.736330019 2.176311103 7.72870183 352 2.160269524 4.922204729 2.21860423 8.255482913 353 2.188997276 4.774912961 2.185157422 9.409191231 354 1.874905013 3.86388263 1.965676125 6.003887067 355 2.061842158 4.182126476 2.045024811 6.745236349 356 1.418864374 3.119939077 1.766334928 2.7631695 data<-read.csv(data.csv,header=T) fit.all<-lm(a~x+y+z,data=data) b<-data.frame(data,predict(fit.all,interval="prediction")) ggplot(data,aes(x=x+y+z,y=a))+ geom_point(size=3,shape=1,col="black")+ geom_smooth(method="lm")+ geom_line(aes(y=lwr), color = "red", linetype = "dashed")+ geom_line(aes(y=upr), color = "red", linetype = "dashed")+ theme_classic()
That approach is not going to produce a sensible graphical display as Ben suggested. What you could do is examine the relationship between each predictor and the outcome separately while holding the other predictors not under immediate consideration constant at some chosen level. Here I use the means as those chosen levels. data_x.yz <- data.frame( x = seq(min(data$x), max(data$x), 0.1), y = mean(data$y), z = mean(data$z) ) data_x.yz <- cbind( data_x.yz, predict(fit.all, newdata = data_x.yz, interval = "prediction") ) ggplot(data_x.yz, aes(x, fit, ymin = lwr, ymax = upr)) + geom_line(color = "blue") + geom_ribbon(fill = NA, color = "red", linetype = "dashed") data_y.xz <- data.frame( x = mean(data$x), y = seq(min(data$y), max(data$y), 0.1), z = mean(data$z) ) data_y.xz <- cbind( data_y.xz, predict(fit.all, newdata = data_y.xz, interval = "prediction") ) ggplot(data_y.xz, aes(y, fit, ymin = lwr, ymax = upr)) + geom_line(color = "blue") + geom_ribbon(fill = NA, color = "red", linetype = "dashed") data_z.yx <- data.frame( x = mean(data$x), y = mean(data$y), z = seq(1.6, 2.6, 0.1) ) data_z.yx <- cbind( data_z.yx, predict(fit.all, newdata = data_z.yx, interval = "prediction") ) ggplot(data_z.yx, aes(z, fit, ymin = lwr, ymax = upr)) + geom_line(color = "blue") + geom_ribbon(fill = NA, color = "red", linetype = "dashed")
How to use ggplot to plot the trend of four variables in R?
I have a data set records the tumor size at four different time points (each row is one patient). I want to perform an analysis on this dataset to show that overall for all patients, the tumor size is decreasing after each time point. What kind of analysis can I do? How should I use ggplot to visualize these data and show the trend? Many thanks! SUBJECTID Baseline 1 2 3 1001 88 78 30 14 1002 29 26 66 16 1003 50 64 54 46 1004 91 90 99 43 1005 98 109 60 42 1007 100 100 54 1008 45 49 47 32 1009 75 66 57 7 1010 60 52 20 3 1011 68 68 56 47 1012 78 84 56 57 1013 71 70 8 5 1015 79 50 11 3 1016 73 60 57 36 1017 54 27 16 1018 50 37 33 26 1019 115 68 33 67 1021 63 55 0 0 1022 98 91 76 75 1024 76 76 0 1025 47 45 42 42 1026 32 25 14 0 1027 40 37 65 1028 60 110 110 0
A box plot might work. Try the following: library(tidyverse) df %>% gather(key = "time", value = "tumor_size", -SUBJECTID) %>% ggplot(aes(time, tumor_size)) + geom_boxplot() + labs(title = "Tumor Size ~ Time", subtitle = "Insert subtitle if you want", caption = "Insert caption if you want", x = "Time", y = "Tumor Size (insert unit)") + theme_bw() + theme( panel.grid.major.x = element_blank(), text = element_text(family = "Palatino"), plot.title = element_text(face = "bold", size = 20) ) You could also add geom_jitter() if you'd like. After the geom_boxplot() + line, add: geom_jitter(width = 0.1, pch = 21, fill = "grey") + You'll get something like this:
To show that overall tumor size is decreasing after each time point, you usually want a mean tumor size after each time frame. It's much easier to plot than every individual element. I've written how to do this using your first four rows, producing a dot graph: baseline <- c(88, 29, 50, 91) dAC <- c(78, 26, 64, 90) InterReg <- c(30, 66, 54, 99) PreSurg <- c(14, 16, 46, 43) matrix <- rbind(baseline, dAC, InterReg, PreSurg) means <- rowMeans(matrix) plot(means) Dot graph: In terms of what analysis to do, I can't really answer that. That depends on what you want it to look like. What I've done is the most basic way of representing the data. You may want to use a column graph, a bar graph, a line graph etc. That's up to your personal preference. In terms of using ggplot, here are many different examples you can use: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
Calculate mean value for each row with interval
i need to calculate the mean value for each row (mean of interval). Here is a basic example (maybe anyone has even better idea to do it): M_1_mb <- (15 : -15)#creating a vector value --> small M_31 <- cut(M_31_mb,128)# getting 128 groups from the small vector #M_1_mb <- (1500 : -1500)#creating a vector value #M_1 <- cut(M_1_mb,128)# getting 128 groups from the vector I do need to get the mean value for each row/group out of 128 intervals created in M_1 (actually i do not need even those intervals, i just need the mean of them) and i cannot figure out how to do it... I had a look at the cut2 function from Hmisc library but unfortunatelly there is no option to set up number of intervals into which vector is to be cut (-> but there is an option to get the mean value of created intervals: levels.mean...) I would appreciate any help! Thanks! Additional Info: cut2 function is working well for bigger vectors (M_1_mb), however when my vector is small (M_31_mb), then i am getting a Warning message: Warning message: In min(xx[xx > upper]) : no non-missing arguments to min; returning Inf and only 31 groups are created: M_31_mb <- (15 : -15) # smaller vector M_31 <- table(cut2(M_31_mb,g=128,levels.mean = TRUE)) whereas g = number of quantile groups
like this? aggregate(M_1_mb,by=list(M_1),mean) EDIT: Result Group.1 x 1 (-1.5e+03,-1.48e+03] -1488.5 2 (-1.48e+03,-1.45e+03] -1465.0 3 (-1.45e+03,-1.43e+03] -1441.5 4 (-1.43e+03,-1.41e+03] -1418.0 5 (-1.41e+03,-1.38e+03] -1394.5 6 (-1.38e+03,-1.36e+03] -1371.0 7 (-1.36e+03,-1.34e+03] -1347.5 8 (-1.34e+03,-1.31e+03] -1324.0 9 (-1.31e+03,-1.29e+03] -1301.0 10 (-1.29e+03,-1.27e+03] -1277.5 11 (-1.27e+03,-1.24e+03] -1254.0 12 (-1.24e+03,-1.22e+03] -1230.5 13 (-1.22e+03,-1.2e+03] -1207.0 14 (-1.2e+03,-1.17e+03] -1183.5 15 (-1.17e+03,-1.15e+03] -1160.0 16 (-1.15e+03,-1.12e+03] -1136.5 17 (-1.12e+03,-1.1e+03] -1113.0 18 (-1.1e+03,-1.08e+03] -1090.0 19 (-1.08e+03,-1.05e+03] -1066.5 20 (-1.05e+03,-1.03e+03] -1043.0 21 (-1.03e+03,-1.01e+03] -1019.5 22 (-1.01e+03,-984] -996.0 23 (-984,-961] -972.5 24 (-961,-938] -949.0 25 (-938,-914] -926.0 26 (-914,-891] -902.5 27 (-891,-867] -879.0 28 (-867,-844] -855.5 29 (-844,-820] -832.0 30 (-820,-797] -808.5 31 (-797,-773] -785.0 32 (-773,-750] -761.5 33 (-750,-727] -738.0 34 (-727,-703] -715.0 35 (-703,-680] -691.5 36 (-680,-656] -668.0 37 (-656,-633] -644.5 38 (-633,-609] -621.0 39 (-609,-586] -597.5 40 (-586,-562] -574.0 41 (-562,-539] -551.0 42 (-539,-516] -527.5 43 (-516,-492] -504.0 44 (-492,-469] -480.5 45 (-469,-445] -457.0 46 (-445,-422] -433.5 47 (-422,-398] -410.0 48 (-398,-375] -386.5 49 (-375,-352] -363.0 50 (-352,-328] -340.0 51 (-328,-305] -316.5 52 (-305,-281] -293.0 53 (-281,-258] -269.5 54 (-258,-234] -246.0 55 (-234,-211] -222.5 56 (-211,-188] -199.0 57 (-188,-164] -176.0 58 (-164,-141] -152.5 59 (-141,-117] -129.0 60 (-117,-93.8] -105.5 61 (-93.8,-70.3] -82.0 62 (-70.3,-46.9] -58.5 63 (-46.9,-23.4] -35.0 64 (-23.4,0] -11.5 65 (0,23.4] 12.0 66 (23.4,46.9] 35.0 67 (46.9,70.3] 58.5 68 (70.3,93.8] 82.0 69 (93.8,117] 105.5 70 (117,141] 129.0 71 (141,164] 152.5 72 (164,188] 176.0 73 (188,211] 199.0 74 (211,234] 222.5 75 (234,258] 246.0 76 (258,281] 269.5 77 (281,305] 293.0 78 (305,328] 316.5 79 (328,352] 340.0 80 (352,375] 363.5 81 (375,398] 387.0 82 (398,422] 410.0 83 (422,445] 433.5 84 (445,469] 457.0 85 (469,492] 480.5 86 (492,516] 504.0 87 (516,539] 527.5 88 (539,562] 551.0 89 (562,586] 574.0 90 (586,609] 597.5 91 (609,633] 621.0 92 (633,656] 644.5 93 (656,680] 668.0 94 (680,703] 691.5 95 (703,727] 715.0 96 (727,750] 738.5 97 (750,773] 762.0 98 (773,797] 785.0 99 (797,820] 808.5 100 (820,844] 832.0 101 (844,867] 855.5 102 (867,891] 879.0 103 (891,914] 902.5 104 (914,938] 926.0 105 (938,961] 949.0 106 (961,984] 972.5 107 (984,1.01e+03] 996.0 108 (1.01e+03,1.03e+03] 1019.5 109 (1.03e+03,1.05e+03] 1043.0 110 (1.05e+03,1.08e+03] 1066.5 111 (1.08e+03,1.1e+03] 1090.0 112 (1.1e+03,1.12e+03] 1113.5 113 (1.12e+03,1.15e+03] 1137.0 114 (1.15e+03,1.17e+03] 1160.0 115 (1.17e+03,1.2e+03] 1183.5 116 (1.2e+03,1.22e+03] 1207.0 117 (1.22e+03,1.24e+03] 1230.5 118 (1.24e+03,1.27e+03] 1254.0 119 (1.27e+03,1.29e+03] 1277.5 120 (1.29e+03,1.31e+03] 1301.0 121 (1.31e+03,1.34e+03] 1324.0 122 (1.34e+03,1.36e+03] 1347.5 123 (1.36e+03,1.38e+03] 1371.0 124 (1.38e+03,1.41e+03] 1394.5 125 (1.41e+03,1.43e+03] 1418.0 126 (1.43e+03,1.45e+03] 1441.5 127 (1.45e+03,1.48e+03] 1465.0 128 (1.48e+03,1.5e+03] 1488.5
Binning a dataframe with equal frequency of samples
I have binned my data using the cut function breaks<-seq(0, 250, by=5) data<-split(df2, cut(df2$val, breaks)) My split dataframe looks like ... ... $`(15,20]` val ks_Result c 15 60 237 18 70 247 ... ... $`(20,25]` val ks_Result c 21 20 317 24 10 140 ... ... My bins looks like > table(data) data (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] 0 0 0 7 128 2748 2307 (35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70] 1404 11472 1064 536 7389 1008 1714 (70,75] (75,80] (80,85] (85,90] (90,95] (95,100] (100,105] 2047 700 329 1107 399 376 323 (105,110] (110,115] (115,120] (120,125] (125,130] (130,135] (135,140] 314 79 1008 77 474 158 381 (140,145] (145,150] (150,155] (155,160] (160,165] (165,170] (170,175] 89 660 15 1090 109 824 247 (175,180] (180,185] (185,190] (190,195] (195,200] (200,205] (205,210] 1226 139 531 174 1041 107 257 (210,215] (215,220] (220,225] (225,230] (230,235] (235,240] (240,245] 72 671 98 212 70 95 25 (245,250] 494 When I mean the bins, I get on an average of ~900 samples > mean(table(data)) [1] 915.9 I want to tell R to make irregular bins in such a way that each bin will contain on an average 900 samples (e.g. (0, 27] = 900, (27,28.5] = 900, and so on). I found something similar here, which deals with only one variable, not the whole dataframe. I also tried Hmisc package, unfortunately the bins don't contain equal frequency!! library(Hmisc) data<-split(df2, cut2(df2$val, g=30, oneval=TRUE)) data<-split(df2, cut2(df2$val, m=1000, oneval=TRUE))
Assuming you want 50 equal sized buckets (based on your seq) statement, you can use something like: df <- data.frame(var=runif(500, 0, 100)) # make data cut.vec <- cut( df$var, breaks=quantile(df$var, 0:50/50), # breaks along 1/50 quantiles include.lowest=T ) df.split <- split(df, cut.vec) Hmisc::cut2 has this option built in as well.
Can be done by the function provided here by Joris Meys EqualFreq2 <- function(x,n){ nx <- length(x) nrepl <- floor(nx/n) nplus <- sample(1:n,nx - nrepl*n) nrep <- rep(nrepl,n) nrep[nplus] <- nrepl+1 x[order(x)] <- rep(seq.int(n),nrep) x } data<-split(df2, EqualFreq2(df2$val, 25))