Barplot using ggplot2 for 4 variables - r

I have data like this:
ID height S1 S2 S3
1 927 0.90695438 0.28872194 0.67114294
2 777 0.20981677 0.71783084 0.74498220
3 1659 0.35813799 0.92339744 0.44001698
4 174 0.44829914 0.67493949 0.11503942
5 1408 0.90642643 0.18593999 0.67564278
6 1454 0.38943930 0.34806716 0.73155952
7 2438 0.51745975 0.12351953 0.48398490
8 1114 0.12523909 0.10811622 0.17104804
9 1642 0.03014575 0.29795320 0.67584853
10 515 0.77180549 0.83819990 0.26298995
11 1877 0.32741508 0.99277109 0.34148083
12 2647 0.38947869 0.43713441 0.21024554
13 845 0.04105275 0.20256457 0.01631959
14 1198 0.36139663 0.96387150 0.37676288
15 2289 0.57097808 0.66038711 0.56230740
16 2009 0.68488024 0.29811683 0.67998461
17 618 0.97111675 0.11926219 0.74538877
18 1076 0.70195881 0.59975160 0.95007272
19 1082 0.01154550 0.12019055 0.16309071
20 2072 0.53553213 0.78843202 0.32475690
21 1610 0.83657146 0.36959607 0.13271604
22 2134 0.80686674 0.95632284 0.63729744
23 1617 0.08093264 0.91357666 0.33092961
24 2248 0.23890930 0.82333634 0.64907957
25 1263 0.96598986 0.31948216 0.30288836
26 518 0.03767233 0.87770033 0.07123327
27 2312 0.91640643 0.80035100 0.66239047
28 2646 0.72622658 0.61135664 0.75960356
29 1650 0.20077621 0.07242114 0.55336017
30 837 0.84020075 0.42158771 0.53927210
31 1467 0.39666235 0.34446560 0.84959232
32 2786 0.39270226 0.75173569 0.65322596
33 1049 0.47255689 0.21875132 0.95088576
34 2863 0.58365691 0.29213397 0.61722305
35 2087 0.35238717 0.35595337 0.49284063
36 2669 0.02847401 0.63196192 0.97600657
37 545 0.99508793 0.89253107 0.49034522
38 1890 0.95755846 0.74403278 0.65517230
39 2969 0.55165118 0.45722242 0.59880179
40 395 0.10195396 0.03609544 0.94756902
41 995 0.23791515 0.56851452 0.36801151
42 2596 0.86009766 0.43901589 0.87818701
43 2334 0.73826129 0.60048445 0.45487507
44 2483 0.49731226 0.95138276 0.49646702
45 1812 0.57992109 0.26943131 0.46061562
46 1476 0.01618339 0.65883839 0.61790820
47 2342 0.47212988 0.07647121 0.60414349
48 2653 0.04238973 0.07128521 0.78587960
49 627 0.46315442 0.37033152 0.55526847
50 925 0.62999477 0.29710220 0.76897834
51 995 0.67324929 0.55107827 0.40428567
52 600 0.08703467 0.36989059 0.51071981
53 711 0.14358380 0.84568953 0.52353644
54 828 0.90847850 0.62079070 0.99279921
55 1776 0.12253259 0.39914002 0.42964742
56 764 0.72886279 0.29966153 0.99601125
57 375 0.95037718 0.38111984 0.78660025
58 694 0.04335591 0.70113494 0.51591063
59 1795 0.01959930 0.94686529 0.50268797
60 638 0.19907246 0.77282832 0.91163748
61 1394 0.50508626 0.21955016 0.26441590
62 1943 0.92638876 0.71611036 0.17385687
63 2882 0.13840169 0.66421796 0.40033126
64 2031 0.16919458 0.70625020 0.53835738
65 1338 0.60662738 0.27962799 0.24496437
66 1077 0.81587669 0.71225050 0.37585096
67 1370 0.84338121 0.66094211 0.58025355
68 1339 0.78807719 0.04101269 0.20895531
69 739 0.01902087 0.06114149 0.80133001
70 2085 0.69808750 0.27976169 0.63880242
71 1240 0.81509312 0.30196772 0.73633076
72 987 0.56840006 0.95661083 0.43881241
73 1720 0.48006288 0.38981872 0.57981238
74 2901 0.16137012 0.37178879 0.25604401
75 1987 0.08925623 0.84314249 0.46371823
76 1876 0.16268237 0.84723500 0.16861486
77 2571 0.02672845 0.31933115 0.61389453
78 2325 0.70962948 0.13250605 0.95810262
79 2503 0.76101818 0.61710912 0.47819473
80 279 0.85747478 0.79130451 0.75115933
81 1381 0.43726582 0.33804871 0.02058322
82 1800 0.41713645 0.90544760 0.17096903
83 2760 0.58564949 0.19755671 0.63996650
84 2949 0.82496758 0.79408518 0.16497848
85 118 0.79313923 0.75460289 0.35472278
86 1736 0.32615257 0.91139485 0.18642647
87 2201 0.95793194 0.32268770 0.89765616
88 750 0.65301961 0.08616947 0.23778386
89 906 0.45867582 0.91120045 0.98494348
90 2202 0.60602188 0.95517383 0.02133074
I want to make a barplot using ggplot2 like this:
In the above-mentioned dataset height should be on the y-axis and S1, S2, S3 should be representing colors of each sample.
I have tried the base R function barplot which gave me the following output. Please give me any suggestion.
barplot(t(as.matrix(examp[,3:5])),col=rainbow(3))

It's not clear to me exactly what you want to plot. You say you want height on the y axis, but the examples you show are all 'filled to the top', implying the same height for each ID. Also, it is not clear what the numbers associated with each sample represent. I am guessing they should be relative weightings for the bar heights.
Assuming you actually want a filled bar plot as in the examples, with the relative sizes of the bars dictated by the sample values, you can do:
library(tidyr)
library(dplyr)
library(ggplot2)
df %>%
mutate(ID = reorder(ID, S3/(S3 + S2 + S1))) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(ID, Value * height, fill = Sample)) +
geom_col(position = "fill", color = NA) +
labs(y = "Height") +
theme_classic() +
scale_fill_manual(values = c("red", "green", "blue"))
Alternative
df %>%
arrange(order(height)) %>%
group_by(height) %>%
summarize(across(everything(), mean)) %>%
pivot_longer(3:5, names_to = "Sample", values_to = "Value") %>%
ggplot(aes(height, Value, fill = Sample, colour = Sample)) +
geom_smooth(method = loess, formula = y ~ x, linetype = 2, alpha = 0.2) +
theme_bw()

Related

ggplot_line: label the top 2 peak with X-axis values

I am new to R programming. I am plotting a mass spectrum with ggplot and would like to label the top 2 peaks with their x-axis values (i.e. m). Does anyone know how to achieve that?
Thanks so much for your help!
Here is part of the raw data I used for the ggplot.
m Intensity
1 30001 2.964e+01
2 30002 3.336e+01
3 30003 3.968e+01
4 30004 5.015e+01
5 30005 6.838e+01
6 30006 1.016e+02
7 30007 1.464e+02
8 30008 2.130e+02
9 30009 3.115e+02
10 30010 3.951e+02
11 30011 5.134e+02
12 30012 5.316e+02
13 30013 6.377e+02
14 30014 8.813e+02
15 30015 1.071e+03
16 30016 1.119e+03
17 30017 1.202e+03
18 30018 1.299e+03
19 30019 1.112e+03
20 30020 1.205e+03
21 30021 1.422e+03
22 30022 1.653e+03
23 30023 1.726e+03
24 30024 2.423e+03
25 30025 3.059e+03
26 30026 3.267e+03
27 30027 3.993e+03
28 30028 5.172e+03
29 30029 5.278e+03
30 30030 2.794e+03
31 30031 1.459e+03
32 30032 2.512e+03
33 30033 6.590e+03
34 30034 1.245e+04
35 30035 1.144e+04
36 30036 5.197e+03
37 30037 6.012e+03
38 30038 1.453e+04
39 30039 1.513e+04
40 30040 5.802e+03
41 30041 9.226e+03
42 30042 5.809e+03
43 30043 3.074e+03
44 30044 3.882e+03
45 30045 9.941e+02
46 30046 8.170e+02
47 30047 1.149e+03
48 30048 3.567e+02
49 30049 3.805e+02
50 30050 3.654e+02
51 30051 4.724e+02
52 30052 7.819e+02
53 30053 8.634e+02
54 30054 5.235e+02
55 30055 1.712e+02
56 30056 9.232e+01
57 30057 9.434e+01
58 30058 7.191e+01
59 30059 8.036e+01
60 30060 4.456e+01
61 30061 9.428e+01
62 30062 9.392e+01
63 30063 8.413e+01
64 30064 5.671e+01
65 30065 2.639e+01
66 30066 2.027e+01
67 30067 4.584e+01
68 30068 6.956e+01
69 30069 6.181e+01
70 30070 6.450e+01
71 30071 2.826e+01
72 30072 3.610e+01
73 30073 6.325e+01
74 30074 3.509e+01
75 30075 3.478e+01
76 30076 1.120e+01
77 30077 6.993e+00
78 30078 9.936e+00
79 30079 7.738e+00
80 30080 9.771e+00
81 30081 1.762e+01
82 30082 3.060e+01
83 30083 2.175e+01
84 30084 2.816e+01
85 30085 2.700e+01
86 30086 2.114e+01
87 30087 4.378e+01
88 30088 5.824e+01
89 30089 6.193e+01
90 30090 4.146e+01
91 30091 9.697e+04
92 30092 9.458e+04
93 30093 9.216e+04
94 30094 8.972e+04
95 30095 8.723e+04
96 30096 8.468e+04
97 30097 8.211e+04
98 30098 7.959e+04
99 30099 7.726e+04
100 30100 7.527e+04
101 30101 7.379e+04
102 30102 7.298e+04
103 30103 7.301e+04
104 30104 7.399e+04
105 30105 7.602e+04
106 30106 7.916e+04
107 30107 8.340e+04
108 30108 8.862e+04
109 30109 9.460e+04
110 30110 1.010e+05
111 30111 1.074e+05
112 30112 1.133e+05
113 30113 1.180e+05
114 30114 1.211e+05
115 30115 1.222e+05
116 30116 1.213e+05
117 30117 1.186e+05
118 30118 1.146e+05
119 30119 1.100e+05
120 30120 1.054e+05
121 30121 1.014e+05
122 30122 9.838e+04
123 30123 9.637e+04
124 30124 9.535e+04
125 30125 9.508e+04
126 30126 9.520e+04
127 30127 9.527e+04
128 30128 9.484e+04
129 30129 9.355e+04
130 30130 9.128e+04
131 30131 8.809e+04
132 30132 8.425e+04
133 30133 8.012e+04
134 30134 7.603e+04
135 30135 7.225e+04
136 30136 6.895e+04
137 30137 6.617e+04
138 30138 6.392e+04
139 30139 6.214e+04
140 30140 6.078e+04
141 30141 5.980e+04
142 30142 5.922e+04
143 30143 5.905e+04
144 30144 5.934e+04
145 30145 6.013e+04
146 30146 6.143e+04
147 30147 6.324e+04
148 30148 6.552e+04
149 30149 6.816e+04
150 30150 7.100e+04
151 30151 7.384e+04
152 30152 7.655e+04
153 30153 7.904e+04
154 30154 8.132e+04
155 30155 8.353e+04
156 30156 8.595e+04
157 30157 8.896e+04
158 30158 9.302e+04
159 30159 9.864e+04
160 30160 1.063e+05
161 30161 1.165e+05
162 30162 1.293e+05
163 30163 1.443e+05
164 30164 1.605e+05
165 30165 1.759e+05
166 30166 1.883e+05
167 30167 1.957e+05
168 30168 1.969e+05
169 30169 1.921e+05
170 30170 1.824e+05
171 30171 1.693e+05
172 30172 1.544e+05
173 30173 1.390e+05
174 30174 1.241e+05
175 30175 1.102e+05
176 30176 9.755e+04
177 30177 8.644e+04
178 30178 7.692e+04
179 30179 6.900e+04
180 30180 6.262e+04
181 30181 5.766e+04
182 30182 5.397e+04
183 30183 5.137e+04
184 30184 4.972e+04
185 30185 4.889e+04
186 30186 4.881e+04
187 30187 4.940e+04
188 30188 5.059e+04
189 30189 5.230e+04
190 30190 5.444e+04
191 30191 5.690e+04
192 30192 5.960e+04
193 30193 6.244e+04
194 30194 6.539e+04
195 30195 6.842e+04
196 30196 7.153e+04
197 30197 7.471e+04
198 30198 7.795e+04
199 30199 8.118e+04
200 30200 8.430e+04
201 30201 8.719e+04
202 30202 8.976e+04
203 30203 9.193e+04
204 30204 9.364e+04
205 30205 9.480e+04
206 30206 9.531e+04
207 30207 9.504e+04
208 30208 9.391e+04
209 30209 9.189e+04
210 30210 8.912e+04
211 30211 8.587e+04
212 30212 8.251e+04
213 30213 7.939e+04
214 30214 7.680e+04
215 30215 7.492e+04
216 30216 7.381e+04
217 30217 7.349e+04
218 30218 7.394e+04
219 30219 7.510e+04
220 30220 7.690e+04
221 30221 7.919e+04
222 30222 8.174e+04
223 30223 8.425e+04
224 30224 8.637e+04
225 30225 8.776e+04
226 30226 8.826e+04
227 30227 8.788e+04
228 30228 8.690e+04
229 30229 8.569e+04
230 30230 8.465e+04
231 30231 8.405e+04
232 30232 8.398e+04
233 30233 8.434e+04
234 30234 8.494e+04
235 30235 8.554e+04
236 30236 8.598e+04
237 30237 8.623e+04
238 30238 8.638e+04
239 30239 8.665e+04
240 30240 8.736e+04
241 30241 8.884e+04
242 30242 9.147e+04
243 30243 9.559e+04
244 30244 1.016e+05
245 30245 1.097e+05
246 30246 1.200e+05
247 30247 1.321e+05
Here is my code for ggplot:
ggplot(data=raw.1) +
geom_line(mapping = aes(x=m, y=Intensity))
Below is the ggplot output:
I would do it this way. My solution requires the ggrepel package as well as some dplyr functions. The key to this working is that you can set data = for each geom_ layer in ggplot2. The geom_text_repel() layer from ggrepel ensures that the labels will not overlap your data from geom_line().
library(ggplot2)
library(dplyr)
library(ggrepel)
ggplot(mapping = aes(x = m, y = Intensity, label = m)) +
geom_line(data=raw.1) +
geom_text_repel(data = raw.1 %>%
arrange(desc(Intensity)) %>% # arranges in descending order
slice_head(n = 2)) # only keeps the top two intensities.
My plot does not look like yours since you only shared the first 247 data points. I suspect that this initial solution might not work for you because I am a chemist and have some idea what you hope to accomplish. This approach labels the top two highest intensities, not necessarily the top two peaks. We need to identify local all maxima and then select the two tallest.
Here is how we do that. The following code calculates the slope between each point, and then looks for points where a positive slope changes to a negative slope (local maximum), then it sorts and selects the top two by intensity.
top_two <- raw.1 %>%
mutate(deriv = Intensity - lag(Intensity) ,
max = case_when(deriv >=0 & lead(deriv) <0 ~ T,
T ~ F)) %>%
filter(max) %>%
arrange(desc(Intensity)) %>%
slice_head(n = 2)
Let's modify the original plot code to put this in.
ggplot(mapping = aes(x = m, y = Intensity, label = m)) +
geom_line(data = raw.1) +
geom_text_repel(data = top_two, nudge_y = 1e4)
Data:
raw.1 <- structure(list(m = c(30001, 30002, 30003, 30004, 30005, 30006,
30007, 30008, 30009, 30010, 30011, 30012, 30013, 30014, 30015,
30016, 30017, 30018, 30019, 30020, 30021, 30022, 30023, 30024,
30025, 30026, 30027, 30028, 30029, 30030, 30031, 30032, 30033,
30034, 30035, 30036, 30037, 30038, 30039, 30040, 30041, 30042,
30043, 30044, 30045, 30046, 30047, 30048, 30049, 30050, 30051,
30052, 30053, 30054, 30055, 30056, 30057, 30058, 30059, 30060,
30061, 30062, 30063, 30064, 30065, 30066, 30067, 30068, 30069,
30070, 30071, 30072, 30073, 30074, 30075, 30076, 30077, 30078,
30079, 30080, 30081, 30082, 30083, 30084, 30085, 30086, 30087,
30088, 30089, 30090, 30091, 30092, 30093, 30094, 30095, 30096,
30097, 30098, 30099, 30100, 30101, 30102, 30103, 30104, 30105,
30106, 30107, 30108, 30109, 30110, 30111, 30112, 30113, 30114,
30115, 30116, 30117, 30118, 30119, 30120, 30121, 30122, 30123,
30124, 30125, 30126, 30127, 30128, 30129, 30130, 30131, 30132,
30133, 30134, 30135, 30136, 30137, 30138, 30139, 30140, 30141,
30142, 30143, 30144, 30145, 30146, 30147, 30148, 30149, 30150,
30151, 30152, 30153, 30154, 30155, 30156, 30157, 30158, 30159,
30160, 30161, 30162, 30163, 30164, 30165, 30166, 30167, 30168,
30169, 30170, 30171, 30172, 30173, 30174, 30175, 30176, 30177,
30178, 30179, 30180, 30181, 30182, 30183, 30184, 30185, 30186,
30187, 30188, 30189, 30190, 30191, 30192, 30193, 30194, 30195,
30196, 30197, 30198, 30199, 30200, 30201, 30202, 30203, 30204,
30205, 30206, 30207, 30208, 30209, 30210, 30211, 30212, 30213,
30214, 30215, 30216, 30217, 30218, 30219, 30220, 30221, 30222,
30223, 30224, 30225, 30226, 30227, 30228, 30229, 30230, 30231,
30232, 30233, 30234, 30235, 30236, 30237, 30238, 30239, 30240,
30241, 30242, 30243, 30244, 30245, 30246, 30247), Intensity = c(29.64,
33.36, 39.68, 50.15, 68.38, 101.6, 146.4, 213, 311.5, 395.1,
513.4, 531.6, 637.7, 881.3, 1071, 1119, 1202, 1299, 1112, 1205,
1422, 1653, 1726, 2423, 3059, 3267, 3993, 5172, 5278, 2794, 1459,
2512, 6590, 12450, 11440, 5197, 6012, 14530, 15130, 5802, 9226,
5809, 3074, 3882, 994.1, 817, 1149, 356.7, 380.5, 365.4, 472.4,
781.9, 863.4, 523.5, 171.2, 92.32, 94.34, 71.91, 80.36, 44.56,
94.28, 93.92, 84.13, 56.71, 26.39, 20.27, 45.84, 69.56, 61.81,
64.5, 28.26, 36.1, 63.25, 35.09, 34.78, 11.2, 6.993, 9.936, 7.738,
9.771, 17.62, 30.6, 21.75, 28.16, 27, 21.14, 43.78, 58.24, 61.93,
41.46, 96970, 94580, 92160, 89720, 87230, 84680, 82110, 79590,
77260, 75270, 73790, 72980, 73010, 73990, 76020, 79160, 83400,
88620, 94600, 101000, 107400, 113300, 118000, 121100, 122200,
121300, 118600, 114600, 110000, 105400, 101400, 98380, 96370,
95350, 95080, 95200, 95270, 94840, 93550, 91280, 88090, 84250,
80120, 76030, 72250, 68950, 66170, 63920, 62140, 60780, 59800,
59220, 59050, 59340, 60130, 61430, 63240, 65520, 68160, 71000,
73840, 76550, 79040, 81320, 83530, 85950, 88960, 93020, 98640,
106300, 116500, 129300, 144300, 160500, 175900, 188300, 195700,
196900, 192100, 182400, 169300, 154400, 139000, 124100, 110200,
97550, 86440, 76920, 69000, 62620, 57660, 53970, 51370, 49720,
48890, 48810, 49400, 50590, 52300, 54440, 56900, 59600, 62440,
65390, 68420, 71530, 74710, 77950, 81180, 84300, 87190, 89760,
91930, 93640, 94800, 95310, 95040, 93910, 91890, 89120, 85870,
82510, 79390, 76800, 74920, 73810, 73490, 73940, 75100, 76900,
79190, 81740, 84250, 86370, 87760, 88260, 87880, 86900, 85690,
84650, 84050, 83980, 84340, 84940, 85540, 85980, 86230, 86380,
86650, 87360, 88840, 91470, 95590, 101600, 109700, 120000, 132100
)), row.names = c(NA, -247L), class = c("tbl_df", "tbl", "data.frame"
))
This approach assumes or treats your x-axis as discrete values of a continuous variable and finds the local maxima based on 2nd derivative using code from Finding local maxima and minima
Rest of the plotting is similar to Ben Norris's answer using geom_text_repel() to label the points of interest.
Also as noted, the data your provided are different vs. the figure in your question.
library(ggplot2)
library(ggrepel)
# find local maxima aka peaks
local_maximas <- raw.1[which(diff(sign(diff(raw.1$Intensity)))==-2)+1,]
top2 <- tail(local_maximas[order(local_maximas$Intensity),],2) #subset of top 2 highest peaks
raw.1$label <- ifelse(raw.1$m %in% top2$m, raw.1$m, NA) #make labels for plot
ggplot(data = raw.1) +
geom_line(aes(x=m, y=Intensity)) +
geom_text_repel(aes(x = m, y = Intensity, label = label))

Why are my 95% confidence intervals of my multivariate regression being plotted as a loess line?

I've been trying to plot a 95% prediction interval for a multivariate regression line in ggplot2. The graph is a regression of three independent variables ("x", "y", and "z") being used to predict a dependent variable ("a") on the y-axis. However, when I actually try to plot the results in ggplot2, I get a rather unusual result where the regression line is straightforward but the 95% prediction interval bands are very squiggly and do not resemble a straight line at all. They look like loess lines more than anything. Here is a picture showing the result I get:
Does anyone know why I am getting a result where the 95% confidence intervals aren't smooth lines? The only thing I can think of is that this is related to the fact that this is a multivariate regression rather than a univariate one, but checking the actual variables all three show a strong correlation with the dependent variable (r2 > 0.95). I looked up the results of a plot of a multivariate regression with a 95% confidence interval, but none of them seemed similar to my result, they all seemed to have pretty smooth lines.
I tried fitting a method="lm" into the predict() call of my code following this question, but that did not work either.
Below is a dataset and code that replicates this result.
x y z a
1 2.366153239 5.420534999 2.328204243 10.55858156
2 1.431094272 2.975529566 1.724972338 2.533696814
3 2.60453538 5.75827066 2.399639694 11.48783737
4 2.483771412 5.470167623 2.338838948 10.74706177
5 1.971210737 4.287715955 2.070680071 7.334766592
6 2.5596573 5.558000525 2.357541203 11.6127708
7 2.177892158 4.730480377 2.174966753 8.631949429
8 1.49665751 3.203559121 1.78984891 3.020424886
9 2.728865195 6.376658918 2.525204728 12.51412704
10 1.908668224 4.025351691 2.006327912 6.593044534
11 1.978895443 4.24563401 2.060493633 7.402451521
12 1.627855104 3.344274234 1.828735693 3.731699451
13 1.53436705 3.350605596 1.83046595 3.170525564
14 2.448831586 5.585936937 2.363458681 10.76866329
15 2.443160968 5.331752143 2.309058714 10.58310613
16 2.156078216 4.417635062 2.101817086 8.109576771
17 1.931534652 4.249610334 2.061458303 6.790693233
18 1.452715015 3.225752129 1.796037897 3.356200016
19 1.729354145 3.683866912 1.919340228 5.420225217
20 1.861239059 3.912023005 1.977883466 6.267750682
21 1.822955174 3.804437795 1.950496807 5.991464547
22 2.113126565 4.492001488 2.119434238 8.114076324
23 2.171856126 4.662613282 2.159308519 7.806138626
24 1.391215895 3.010620886 1.735114084 2.461296784
25 1.319165859 2.895911938 1.701737917 2.055404964
26 2.034006688 4.322608316 2.079088338 6.977001452
27 2.85574569 6.160996329 2.482135437 14.34613881
28 1.411579618 3.097385927 1.759939183 2.613006652
29 2.576957482 6.029643051 2.45553315 11.91628836
30 1.796913834 3.923259637 1.980721999 5.911392672
31 2.024389004 4.345833727 2.084666335 8.022132643
32 1.63435577 3.493472658 1.869083374 3.515715835
33 1.584595569 3.453157121 1.858267236 3.397523976
34 1.881578895 4.030076005 2.00750492 6.011267174
35 1.728309802 3.752101123 1.937034105 5.225370259
36 1.414715557 3.140049044 1.772018353 2.736961545
37 1.488730081 3.116621591 1.76539559 2.902519892
38 1.522138034 3.257327011 1.804806641 2.890371758
39 1.800033345 3.987130478 1.996780027 5.640594153
40 1.794222122 4.143928062 2.035664035 6.206575927
41 2.676710091 6.289901082 2.50796752 13.49805633
42 2.328582719 5.13691546 2.266476442 9.430961545
43 2.484723966 5.458712793 2.336388836 10.7561993
44 2.287108375 4.856940066 2.203846652 9.917240545
45 2.417128932 5.582744146 2.362783136 10.54534144
46 2.328332495 5.105945474 2.259633925 9.840475333
47 2.362264634 5.293304825 2.300718328 9.848820151
48 2.28292536 5.018934097 2.24029777 9.269934816
49 1.449825221 3.006177531 1.73383319 3.121042465
50 2.211679876 4.692264893 2.166163635 8.631218063
51 2.704614597 6.072756474 2.464296345 12.31992499
52 2.48097622 5.43590303 2.331502312 11.2245765
53 1.497529983 3.380994674 1.838748127 3.752088968
54 2.696365396 5.825540285 2.413615604 12.36222133
55 2.165729837 4.666265285 2.160153996 8.455875079
56 2.410978268 5.417499423 2.327552239 10.08813972
57 2.185447829 4.991792206 2.234231905 9.215327913
58 2.041898307 4.22566518 2.055642279 7.418180823
59 2.099077244 4.375757022 2.091831021 7.696212639
60 2.000032635 4.234467391 2.057782153 7.110696123
61 2.025963678 4.260852439 2.064183238 6.851163763
62 2.083395224 4.351567427 2.08604109 7.884576511
63 1.981523362 4.318820559 2.07817722 7.43543802
64 2.033235038 4.336636932 2.082459347 7.313220387
65 1.423999144 3.206803244 1.790754937 2.564949357
66 2.217982257 4.825910853 2.196795587 8.920558764
67 1.240285111 2.808498672 1.675857593 1.568615918
68 2.215837149 5.041487758 2.245325758 8.802372134
69 2.134859238 4.731890939 2.175291001 8.132101136
70 2.306998207 5.059171458 2.249260202 9.336074756
71 1.896404791 4.104681782 2.026001427 6.445449942
72 1.922935417 4.151905673 2.037622554 6.818169682
73 2.111422924 4.716264233 2.171696165 8.366370302
74 2.28264494 4.852811209 2.202909714 9.210340372
75 2.190760504 4.574710979 2.1388574 8.447427164
76 2.037589062 4.275276265 2.06767412 6.989197008
77 1.717192759 3.810543836 1.952061433 4.610157727
78 1.876769266 4.043051268 2.010734012 6.306275287
79 2.030134158 4.579339426 2.139939117 7.715792425
80 1.93577016 4.356708827 2.08727306 6.788521191
81 2.056518774 4.445588116 2.108456335 7.636510887
82 2.120080841 4.615120517 2.148283156 7.916807491
83 2.232689054 4.861361591 2.204849562 8.694167142
84 2.181147406 4.782479201 2.186888017 8.854567878
85 2.92779884 6.305666829 2.511108685 13.9593635
86 1.860080456 4.459637473 2.111785376 6.163314804
87 1.913818428 4.602767301 2.145406092 7.174915716
88 1.877883958 4.594104966 2.143386332 6.335054251
89 1.994987686 4.632100752 2.152231575 7.707952547
90 2.14756511 5.023880521 2.241401464 9.161721393
91 1.503591471 3.687628672 1.92031994 4.280824129
92 1.4536743 3.579343567 1.891915317 3.761200116
93 1.50872427 3.584888833 1.893380266 4.106767082
94 1.537573733 3.649466946 1.910357806 4.126327608
95 1.934796461 4.373238129 2.091228856 7.584097036
96 1.526250724 3.248434627 1.802341429 3.228826156
97 1.606399474 3.500439216 1.870946075 4.939855112
98 1.943162189 4.329208633 2.080675043 6.460498957
99 1.963384107 4.353112625 2.086411423 6.649308332
100 2.183124049 4.711248626 2.170541091 8.474527832
101 1.640763809 3.543853682 1.882512598 3.832330237
102 1.659456682 3.523415014 1.877076188 3.997282849
103 1.436096958 3.166318574 1.779415234 2.839078464
104 2.428955194 4.91133048 2.216152179 10.44793169
105 2.668500746 6.154858094 2.480898646 12.73883098
106 2.676812229 6.178980921 2.485755604 12.64109656
107 2.126920019 4.640923356 2.154280241 8.600833727
108 1.878254881 4.025530246 2.00637241 6.253828812
109 2.242102174 4.726797674 2.174119977 8.29404964
110 1.676813632 3.822754538 1.955186574 5.370638028
111 1.874531192 4.17438727 2.043131731 7.265087007
112 1.998637301 4.2363594 2.058241822 6.722389092
113 1.944116978 4.159527009 2.039491851 6.038562805
114 2.308184503 5.192956851 2.278806014 9.36048303
115 2.042370888 4.49535532 2.120225299 7.320526962
116 2.015621187 4.318820559 2.07817722 7.081078135
117 1.81401665 4.146304301 2.036247603 6.492542819
118 1.676813632 3.87937827 1.969613736 5.221868194
119 2.807346477 6.428545769 2.535457704 13.72308897
120 1.621259207 3.543853682 1.882512598 4.162470391
121 1.50100345 3.321793359 1.822578766 3.106378794
122 1.582428764 3.464319806 1.861268333 4.143134726
123 1.654547625 3.591817741 1.895209155 4.509649984
124 2.332936461 4.937777822 2.222111118 9.398917323
125 2.498105588 5.513601542 2.348105948 11.29414737
126 1.890319403 3.887730313 1.97173282 5.847161058
127 1.804890841 3.940999114 1.985194981 6.17864926
128 2.096209309 4.6042388 2.145749007 7.788418833
129 2.047658751 4.337290741 2.082616321 7.612336837
130 2.680572077 5.989462544 2.447337848 12.15745472
131 2.333554566 5.407171771 2.325332615 10.44467195
132 2.212180997 4.932817886 2.220994797 8.881836305
133 1.478852439 3.063390922 1.750254531 2.890371758
134 1.648334702 3.518387649 1.875736562 4.141546164
135 2.307921185 4.90823336 2.215453308 9.305650552
136 2.13384989 4.645130271 2.155256428 8.018790088
137 1.728309802 3.555348061 1.885563062 4.941642423
138 1.691821236 3.556775613 1.885941572 4.886582645
139 1.746238611 3.891820298 1.972769702 5.363543151
140 1.679155631 3.642966397 1.908655652 4.754882459
141 1.94348069 4.156536582 2.038758589 6.277601677
142 1.549402462 3.250374492 1.8028795 3.342508385
143 1.856975574 4.232023463 2.057188242 6.413458957
144 2.529503815 5.684310793 2.38417927 11.22830537
145 2.035545742 4.643428898 2.154861689 7.244227516
146 2.467132416 5.697093487 2.386858497 11.50287513
147 2.298324686 4.870031331 2.206814748 9.286469586
148 1.937388065 4.34601078 2.0847088 7.322972679
149 1.956955486 4.536730733 2.129960266 7.739019572
150 2.036823984 4.518958489 2.125784206 8.594154233
151 1.972996546 4.529692045 2.128307319 7.967481199
152 1.58746864 3.283839256 1.812136655 3.314186005
153 1.521311054 3.464922216 1.861430153 3.681603045
154 2.44446969 5.445011746 2.333454895 10.3609124
155 2.294121109 4.731979033 2.17531125 9.105210941
156 3.126345733 6.927557906 2.632025438 15.6772624
157 1.867746396 4.253056253 2.06229393 6.32459191
158 1.839082858 4.029806041 2.00743768 5.382980154
159 2.127330896 4.844974178 2.201130205 7.863266724
160 2.404523583 5.236441963 2.288327329 10.04902409
161 2.262955985 4.845642719 2.201282063 9.034969801
162 2.253418218 4.727387819 2.174255693 9.130463484
163 2.302083991 5.167955549 2.273313781 10.06411762
164 2.192165626 4.835011259 2.198865903 9.262695602
165 1.672685332 3.734489965 1.93248285 4.565493369
166 1.568460311 3.539508997 1.881358285 3.52282487
167 1.609819887 3.523868735 1.877197042 3.920784511
168 1.616583967 3.587676949 1.894116403 4.394572604
169 1.643301653 3.654700957 1.911727218 3.912023005
170 1.621923158 3.581532841 1.892493815 3.891820298
171 2.090637708 4.527208645 2.127723818 8.536995819
172 2.109497906 4.585222548 2.141313277 8.203668045
173 2.03091153 4.429625613 2.104667578 7.785783239
174 2.09487893 4.582924577 2.140776629 8.204589814
175 2.040382454 4.335786342 2.08225511 6.632541816
176 2.312894869 5.342334252 2.311349011 9.798127037
177 1.430087263 3.148453361 1.774388165 2.939161922
178 2.293711966 4.871098263 2.20705647 9.392661929
179 2.391075023 4.894101478 2.212261621 9.375295332
180 2.517077345 5.718436483 2.391325257 11.47221284
181 1.989024673 4.154969184 2.038374152 6.872128101
182 2.02016078 4.294014757 2.072200463 7.403304815
183 1.797360845 4.076689627 2.019081382 5.90560705
184 1.705239225 3.931825633 1.982883162 5.697965589
185 1.471533812 3.312439025 1.820010721 3.529590596
186 1.438083095 3.346917175 1.829458164 3.533978493
187 1.619261465 3.559624618 1.886696748 4.109233175
188 1.609819887 3.6558396 1.912025 4.166355098
189 2.346796539 5.146965796 2.26869253 9.872567414
190 1.784208279 3.519720884 1.876091918 4.879539029
191 1.832126365 3.811539467 1.952316436 5.259368616
192 1.677986168 3.452840615 1.858182073 3.885884348
193 1.966109701 4.163870625 2.04055645 6.526348436
194 1.701367309 3.828641396 1.956691441 4.605170186
195 1.931534652 4.279440046 2.06868075 6.927802974
196 1.36183801 3.102342009 1.761346646 2.645465326
197 2.432819556 5.883322388 2.425556099 10.46486408
198 2.078341803 4.564943223 2.136572775 7.650468513
199 1.432099112 3.171155089 1.780773733 2.931193752
200 2.174427741 4.839451482 2.199875333 8.482392615
201 2.16404302 4.710430697 2.170352666 8.620246046
202 1.738643812 3.737669618 1.933305361 5.834810737
203 2.303817478 5.000921602 2.236274044 9.718344619
204 1.741189967 3.731819205 1.931791708 5.090062428
205 1.794671893 3.904293207 1.975928442 5.247024072
206 1.757635562 3.857777991 1.964122703 5.006560336
207 1.676226207 3.66137978 1.913473224 4.566637236
208 1.77911412 3.86388263 1.965676125 5.669260041
209 2.059914227 4.564348191 2.136433521 7.695152987
210 1.32424147 3.104586678 1.761983734 2.182674796
211 1.604334732 3.751518852 1.936883799 4.85787254
212 1.662497734 3.79739748 1.948691222 5.073109185
213 1.44885795 3.04690056 1.745537327 2.907447359
214 2.487551021 5.598973005 2.366214911 10.97673998
215 2.438166592 5.528436532 2.351262753 10.75773968
216 1.892477044 4.164647686 2.040746845 7.15334893
217 1.520482581 3.272335343 1.80895974 3.424588334
218 2.488969385 5.681996883 2.383693957 10.74868607
219 2.215837149 4.53044664 2.128484588 7.620705087
220 2.442786243 5.526780079 2.350910479 10.69919132
221 2.570602875 5.907702431 2.430576563 11.59161344
222 2.608344119 6.053264948 2.460338381 12.33182385
223 2.524368131 5.738731256 2.395564914 11.20612853
224 1.539964086 3.38269391 1.839210132 3.571221411
225 1.541550744 3.476614021 1.864568052 3.523119986
226 2.111209474 4.695924549 2.167008202 8.126284621
227 1.910391851 4.139955073 2.034687955 6.467590025
228 2.801971864 6.015864434 2.452725919 13.0280527
229 2.616209119 5.780126041 2.404189269 11.53329656
230 2.570130461 5.673975975 2.38201091 10.97701107
231 2.545595117 5.629669374 2.372692431 11.14107887
232 2.618299253 5.800606659 2.408444863 11.97035031
233 2.443348195 5.385412073 2.320649063 10.85417971
234 2.385152788 5.279188197 2.297648406 10.67131308
235 2.512400994 5.685007319 2.384325338 11.58593194
236 2.39352554 5.12693575 2.26427378 10.4590302
237 1.823796962 3.992680908 1.998169389 6.109247583
238 1.768267491 3.745968421 1.935450444 5.260096154
239 2.376820756 5.302583255 2.302733865 10.4487146
240 2.042402374 4.477336814 2.115971837 7.810068783
241 2.159700495 4.673996377 2.161942732 8.189916149
242 1.948229832 4.378018613 2.092371528 6.932447892
243 1.330510703 3.059880093 1.749251295 2.083184528
244 1.464097665 3.342685111 1.828301154 3.072693315
245 1.446917352 3.196630216 1.787912251 2.829087196
246 2.082252099 4.60990894 2.14706985 8.075582637
247 1.933494729 4.136126096 2.033746812 7.003065459
248 1.840298976 3.949126093 1.987240824 7.056175284
249 1.649584193 3.645188765 1.909237745 4.51129897
250 1.778648064 3.883623531 1.97069113 5.09681299
251 2.526339825 5.903056741 2.429620699 11.66907415
252 2.512244141 5.734958092 2.394777253 10.93748043
253 1.947599667 4.356708827 2.08727306 6.514712691
254 2.181687439 4.946274535 2.224022153 8.799405331
255 2.109497906 4.510859507 2.123878411 8.132101136
256 1.831713667 4.188138442 2.046494183 6.109247583
257 1.5517319 3.446807893 1.856558077 3.765840495
258 2.47549747 5.727881894 2.393299374 10.78967984
259 1.96580772 4.156693187 2.038796995 6.229496711
260 1.978602442 4.21508618 2.053067505 7.258412151
261 2.064000486 4.339901708 2.083243075 7.670717659
262 2.117775721 4.510639702 2.123826665 7.731676304
263 2.221912965 4.838923916 2.199755422 8.877208949
264 1.940925986 4.266896327 2.065646709 6.450865289
265 2.040382454 4.579852378 2.140058966 7.857666456
266 2.173143952 4.666735542 2.160262841 8.561717125
267 2.240859653 4.901564199 2.21394765 8.808442394
268 1.888874933 4.080921542 2.02012909 6.163314804
269 1.845529749 4.082609306 2.020546784 6.885284696
270 2.238519604 4.984229093 2.23253871 8.987910316
271 2.393206767 5.29338648 2.300736073 10.70491521
272 2.702044102 5.884714177 2.425842983 12.3883942
273 2.219296721 4.854631045 2.203322728 9.263449766
274 1.96161829 4.090838423 2.022582118 6.993932975
275 2.00561407 4.171305603 2.042377439 7.324270223
276 2.467836387 5.578051269 2.361789844 10.8016414
277 1.390119244 3.100092289 1.760707894 2.379546134
278 1.365322726 3.044760505 1.744924212 2.401525041
279 1.598782218 3.516726026 1.875293584 4.234975692
280 1.94538671 4.131961426 2.032722663 6.199494461
281 2.172592522 4.89858579 2.213274902 8.804952261
282 1.908668224 4.102312732 2.025416681 6.374172668
283 1.944434766 4.112266337 2.027872367 6.54672802
284 1.58387445 3.505557397 1.872313381 3.941581808
285 1.743721514 3.832670536 1.95772075 5.11349268
286 1.592453126 3.549329989 1.883966557 4.871143315
287 1.283414418 2.79971739 1.673235605 1.54329811
288 1.320439849 2.90690106 1.704963654 2.070653036
289 1.194572818 2.708716646 1.645817926 1.184789985
290 1.231175294 2.681021529 1.637382524 1.115141591
291 1.365322726 3.074795481 1.753509476 2.254444718
292 1.408422528 3.000719815 1.732258588 2.422144328
293 2.184734225 4.886582645 2.210561613 8.803574418
294 2.030652566 4.649187071 2.156197364 7.901007052
295 1.890679763 4.02356438 2.005882444 6.212726329
296 1.855414729 4.027135813 2.006772486 5.858647185
297 1.819146836 3.737669618 1.933305361 5.340274716
298 1.51380043 3.337192052 1.826798306 3.514823642
299 1.923936518 4.162158962 2.040136996 6.485993092
300 2.54480266 5.875913394 2.42402834 11.44989333
301 2.015083881 4.471638793 2.114624977 7.725330038
302 1.902054478 4.30514559 2.074884476 7.141300544
303 1.932189012 4.149463861 2.037023284 7.112433389
304 1.357151358 2.977059008 1.725415605 2.054123734
305 2.172040349 4.677490848 2.162750759 7.584422406
306 2.12856108 4.80073697 2.191058413 8.165269798
307 1.597383378 3.38269391 1.839210132 3.948162052
308 1.571436916 3.451890496 1.857926397 3.970291914
309 1.669116161 3.728100167 1.930828881 4.514479321
310 1.792870023 3.818920387 1.95420582 5.395716273
311 2.701422654 6.042632834 2.45817673 12.51412704
312 2.724885462 6.056784013 2.461053436 12.63817968
313 2.649668658 5.96870756 2.44309385 12.34583459
314 1.328012928 3.19047635 1.786190457 2.231089091
315 2.290836238 4.827072968 2.197060074 8.67484801
316 2.375600157 5.495650681 2.344280419 10.05803872
317 1.625886455 3.693369359 1.92181408 5.110178924
318 2.329332455 5.313205979 2.305039258 9.613803477
319 1.515480102 3.456316681 1.859117178 3.185525845
320 1.472454994 3.284663565 1.812364082 3.025291076
321 1.506165026 3.349904087 1.83027432 3.054001182
322 1.473374347 3.306520335 1.81838399 3.25617161
323 1.527068855 3.325036021 1.82346813 3.36729583
324 2.110354575 4.662495253 2.159281189 8.165363632
325 1.523787537 3.514526067 1.874706928 3.062923523
326 2.023599447 4.094344562 2.02344868 6.938769333
327 2.753898938 5.917548864 2.432601255 12.66032792
328 2.617941755 5.678362097 2.382931408 11.46939146
329 2.119034653 4.483002552 2.117310216 8.240121298
330 2.066147705 4.476199805 2.115703147 7.14397299
331 2.101925481 4.630837933 2.151938181 7.659327016
332 2.508777239 5.407171771 2.325332615 11.8987611
333 2.005568244 4.463030419 2.112588559 7.397665697
334 1.726738648 3.759687344 1.938991321 4.430816799
335 1.774901671 3.812975852 1.952684268 4.937562683
336 1.648959883 3.423610976 1.85030024 3.988984047
337 1.777714463 3.797733859 1.948777529 5.403847868
338 1.704136403 3.63758616 1.9072457 5.272486607
339 1.844729114 3.968445871 1.992095849 6.429622699
340 1.768267491 3.797733859 1.948777529 5.523339153
341 2.159320704 4.744410253 2.178166718 8.301035184
342 2.109497906 4.580877493 2.140298459 7.726636028
343 2.521315024 5.573617308 2.360850971 11.60597801
344 2.576758408 5.79269513 2.406801847 11.8427421
345 2.669803365 5.872117789 2.423245301 12.427118
346 2.441001399 5.430134791 2.330264962 11.48863277
347 2.117775721 4.750395438 2.17954019 8.556413905
348 2.023599447 4.553734634 2.133948133 7.34601021
349 2.394268344 5.066826574 2.250961255 9.954988325
350 2.106053393 4.696472344 2.167134593 7.892825526
351 2.100394247 4.736330019 2.176311103 7.72870183
352 2.160269524 4.922204729 2.21860423 8.255482913
353 2.188997276 4.774912961 2.185157422 9.409191231
354 1.874905013 3.86388263 1.965676125 6.003887067
355 2.061842158 4.182126476 2.045024811 6.745236349
356 1.418864374 3.119939077 1.766334928 2.7631695
data<-read.csv(data.csv,header=T)
fit.all<-lm(a~x+y+z,data=data)
b<-data.frame(data,predict(fit.all,interval="prediction"))
ggplot(data,aes(x=x+y+z,y=a))+
geom_point(size=3,shape=1,col="black")+
geom_smooth(method="lm")+
geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
geom_line(aes(y=upr), color = "red", linetype = "dashed")+
theme_classic()
That approach is not going to produce a sensible graphical display as Ben suggested.
What you could do is examine the relationship between each predictor and the outcome separately while holding the other predictors not under immediate consideration constant at some chosen level.
Here I use the means as those chosen levels.
data_x.yz <-
data.frame(
x = seq(min(data$x), max(data$x), 0.1),
y = mean(data$y),
z = mean(data$z)
)
data_x.yz <-
cbind(
data_x.yz,
predict(fit.all, newdata = data_x.yz, interval = "prediction")
)
ggplot(data_x.yz, aes(x, fit, ymin = lwr, ymax = upr)) +
geom_line(color = "blue") +
geom_ribbon(fill = NA, color = "red", linetype = "dashed")
data_y.xz <-
data.frame(
x = mean(data$x),
y = seq(min(data$y), max(data$y), 0.1),
z = mean(data$z)
)
data_y.xz <-
cbind(
data_y.xz,
predict(fit.all, newdata = data_y.xz, interval = "prediction")
)
ggplot(data_y.xz, aes(y, fit, ymin = lwr, ymax = upr)) +
geom_line(color = "blue") +
geom_ribbon(fill = NA, color = "red", linetype = "dashed")
data_z.yx <-
data.frame(
x = mean(data$x),
y = mean(data$y),
z = seq(1.6, 2.6, 0.1)
)
data_z.yx <-
cbind(
data_z.yx,
predict(fit.all, newdata = data_z.yx, interval = "prediction")
)
ggplot(data_z.yx, aes(z, fit, ymin = lwr, ymax = upr)) +
geom_line(color = "blue") +
geom_ribbon(fill = NA, color = "red", linetype = "dashed")

How to use ggplot to plot the trend of four variables in R?

I have a data set records the tumor size at four different time points (each row is one patient). I want to perform an analysis on this dataset to show that overall for all patients, the tumor size is decreasing after each time point.
What kind of analysis can I do? How should I use ggplot to visualize these data and show the trend? Many thanks!
SUBJECTID Baseline 1 2 3
1001 88 78 30 14
1002 29 26 66 16
1003 50 64 54 46
1004 91 90 99 43
1005 98 109 60 42
1007 100 100 54
1008 45 49 47 32
1009 75 66 57 7
1010 60 52 20 3
1011 68 68 56 47
1012 78 84 56 57
1013 71 70 8 5
1015 79 50 11 3
1016 73 60 57 36
1017 54 27 16
1018 50 37 33 26
1019 115 68 33 67
1021 63 55 0 0
1022 98 91 76 75
1024 76 76 0
1025 47 45 42 42
1026 32 25 14 0
1027 40 37 65
1028 60 110 110 0
A box plot might work. Try the following:
library(tidyverse)
df %>%
gather(key = "time", value = "tumor_size", -SUBJECTID) %>%
ggplot(aes(time, tumor_size)) +
geom_boxplot() +
labs(title = "Tumor Size ~ Time",
subtitle = "Insert subtitle if you want",
caption = "Insert caption if you want",
x = "Time",
y = "Tumor Size (insert unit)") +
theme_bw() +
theme(
panel.grid.major.x = element_blank(),
text = element_text(family = "Palatino"),
plot.title = element_text(face = "bold", size = 20)
)
You could also add geom_jitter() if you'd like. After the geom_boxplot() + line, add:
geom_jitter(width = 0.1, pch = 21, fill = "grey") +
You'll get something like this:
To show that overall tumor size is decreasing after each time point, you usually want a mean tumor size after each time frame. It's much easier to plot than every individual element. I've written how to do this using your first four rows, producing a dot graph:
baseline <- c(88, 29, 50, 91)
dAC <- c(78, 26, 64, 90)
InterReg <- c(30, 66, 54, 99)
PreSurg <- c(14, 16, 46, 43)
matrix <- rbind(baseline, dAC, InterReg, PreSurg)
means <- rowMeans(matrix)
plot(means)
Dot graph:
In terms of what analysis to do, I can't really answer that. That depends on what you want it to look like. What I've done is the most basic way of representing the data. You may want to use a column graph, a bar graph, a line graph etc. That's up to your personal preference. In terms of using ggplot, here are many different examples you can use: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Calculate mean value for each row with interval

i need to calculate the mean value for each row (mean of interval). Here is a basic example (maybe anyone has even better idea to do it):
M_1_mb <- (15 : -15)#creating a vector value --> small
M_31 <- cut(M_31_mb,128)# getting 128 groups from the small vector
#M_1_mb <- (1500 : -1500)#creating a vector value
#M_1 <- cut(M_1_mb,128)# getting 128 groups from the vector
I do need to get the mean value for each row/group out of 128 intervals created in M_1 (actually i do not need even those intervals, i just need the mean of them) and i cannot figure out how to do it...
I had a look at the cut2 function from Hmisc library but unfortunatelly there is no option to set up number of intervals into which vector is to be cut (-> but there is an option to get the mean value of created intervals: levels.mean...)
I would appreciate any help! Thanks!
Additional Info:
cut2 function is working well for bigger vectors (M_1_mb), however when my vector is small (M_31_mb), then i am getting a Warning message:
Warning message:
In min(xx[xx > upper]) : no non-missing arguments to min; returning Inf
and only 31 groups are created:
M_31_mb <- (15 : -15) # smaller vector
M_31 <- table(cut2(M_31_mb,g=128,levels.mean = TRUE))
whereas
g = number of quantile groups
like this?
aggregate(M_1_mb,by=list(M_1),mean)
EDIT: Result
Group.1 x
1 (-1.5e+03,-1.48e+03] -1488.5
2 (-1.48e+03,-1.45e+03] -1465.0
3 (-1.45e+03,-1.43e+03] -1441.5
4 (-1.43e+03,-1.41e+03] -1418.0
5 (-1.41e+03,-1.38e+03] -1394.5
6 (-1.38e+03,-1.36e+03] -1371.0
7 (-1.36e+03,-1.34e+03] -1347.5
8 (-1.34e+03,-1.31e+03] -1324.0
9 (-1.31e+03,-1.29e+03] -1301.0
10 (-1.29e+03,-1.27e+03] -1277.5
11 (-1.27e+03,-1.24e+03] -1254.0
12 (-1.24e+03,-1.22e+03] -1230.5
13 (-1.22e+03,-1.2e+03] -1207.0
14 (-1.2e+03,-1.17e+03] -1183.5
15 (-1.17e+03,-1.15e+03] -1160.0
16 (-1.15e+03,-1.12e+03] -1136.5
17 (-1.12e+03,-1.1e+03] -1113.0
18 (-1.1e+03,-1.08e+03] -1090.0
19 (-1.08e+03,-1.05e+03] -1066.5
20 (-1.05e+03,-1.03e+03] -1043.0
21 (-1.03e+03,-1.01e+03] -1019.5
22 (-1.01e+03,-984] -996.0
23 (-984,-961] -972.5
24 (-961,-938] -949.0
25 (-938,-914] -926.0
26 (-914,-891] -902.5
27 (-891,-867] -879.0
28 (-867,-844] -855.5
29 (-844,-820] -832.0
30 (-820,-797] -808.5
31 (-797,-773] -785.0
32 (-773,-750] -761.5
33 (-750,-727] -738.0
34 (-727,-703] -715.0
35 (-703,-680] -691.5
36 (-680,-656] -668.0
37 (-656,-633] -644.5
38 (-633,-609] -621.0
39 (-609,-586] -597.5
40 (-586,-562] -574.0
41 (-562,-539] -551.0
42 (-539,-516] -527.5
43 (-516,-492] -504.0
44 (-492,-469] -480.5
45 (-469,-445] -457.0
46 (-445,-422] -433.5
47 (-422,-398] -410.0
48 (-398,-375] -386.5
49 (-375,-352] -363.0
50 (-352,-328] -340.0
51 (-328,-305] -316.5
52 (-305,-281] -293.0
53 (-281,-258] -269.5
54 (-258,-234] -246.0
55 (-234,-211] -222.5
56 (-211,-188] -199.0
57 (-188,-164] -176.0
58 (-164,-141] -152.5
59 (-141,-117] -129.0
60 (-117,-93.8] -105.5
61 (-93.8,-70.3] -82.0
62 (-70.3,-46.9] -58.5
63 (-46.9,-23.4] -35.0
64 (-23.4,0] -11.5
65 (0,23.4] 12.0
66 (23.4,46.9] 35.0
67 (46.9,70.3] 58.5
68 (70.3,93.8] 82.0
69 (93.8,117] 105.5
70 (117,141] 129.0
71 (141,164] 152.5
72 (164,188] 176.0
73 (188,211] 199.0
74 (211,234] 222.5
75 (234,258] 246.0
76 (258,281] 269.5
77 (281,305] 293.0
78 (305,328] 316.5
79 (328,352] 340.0
80 (352,375] 363.5
81 (375,398] 387.0
82 (398,422] 410.0
83 (422,445] 433.5
84 (445,469] 457.0
85 (469,492] 480.5
86 (492,516] 504.0
87 (516,539] 527.5
88 (539,562] 551.0
89 (562,586] 574.0
90 (586,609] 597.5
91 (609,633] 621.0
92 (633,656] 644.5
93 (656,680] 668.0
94 (680,703] 691.5
95 (703,727] 715.0
96 (727,750] 738.5
97 (750,773] 762.0
98 (773,797] 785.0
99 (797,820] 808.5
100 (820,844] 832.0
101 (844,867] 855.5
102 (867,891] 879.0
103 (891,914] 902.5
104 (914,938] 926.0
105 (938,961] 949.0
106 (961,984] 972.5
107 (984,1.01e+03] 996.0
108 (1.01e+03,1.03e+03] 1019.5
109 (1.03e+03,1.05e+03] 1043.0
110 (1.05e+03,1.08e+03] 1066.5
111 (1.08e+03,1.1e+03] 1090.0
112 (1.1e+03,1.12e+03] 1113.5
113 (1.12e+03,1.15e+03] 1137.0
114 (1.15e+03,1.17e+03] 1160.0
115 (1.17e+03,1.2e+03] 1183.5
116 (1.2e+03,1.22e+03] 1207.0
117 (1.22e+03,1.24e+03] 1230.5
118 (1.24e+03,1.27e+03] 1254.0
119 (1.27e+03,1.29e+03] 1277.5
120 (1.29e+03,1.31e+03] 1301.0
121 (1.31e+03,1.34e+03] 1324.0
122 (1.34e+03,1.36e+03] 1347.5
123 (1.36e+03,1.38e+03] 1371.0
124 (1.38e+03,1.41e+03] 1394.5
125 (1.41e+03,1.43e+03] 1418.0
126 (1.43e+03,1.45e+03] 1441.5
127 (1.45e+03,1.48e+03] 1465.0
128 (1.48e+03,1.5e+03] 1488.5

Binning a dataframe with equal frequency of samples

I have binned my data using the cut function
breaks<-seq(0, 250, by=5)
data<-split(df2, cut(df2$val, breaks))
My split dataframe looks like
... ...
$`(15,20]`
val ks_Result c
15 60 237
18 70 247
... ...
$`(20,25]`
val ks_Result c
21 20 317
24 10 140
... ...
My bins looks like
> table(data)
data
(0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35]
0 0 0 7 128 2748 2307
(35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
1404 11472 1064 536 7389 1008 1714
(70,75] (75,80] (80,85] (85,90] (90,95] (95,100] (100,105]
2047 700 329 1107 399 376 323
(105,110] (110,115] (115,120] (120,125] (125,130] (130,135] (135,140]
314 79 1008 77 474 158 381
(140,145] (145,150] (150,155] (155,160] (160,165] (165,170] (170,175]
89 660 15 1090 109 824 247
(175,180] (180,185] (185,190] (190,195] (195,200] (200,205] (205,210]
1226 139 531 174 1041 107 257
(210,215] (215,220] (220,225] (225,230] (230,235] (235,240] (240,245]
72 671 98 212 70 95 25
(245,250]
494
When I mean the bins, I get on an average of ~900 samples
> mean(table(data))
[1] 915.9
I want to tell R to make irregular bins in such a way that each bin will contain on an average 900 samples (e.g. (0, 27] = 900, (27,28.5] = 900, and so on). I found something similar here, which deals with only one variable, not the whole dataframe.
I also tried Hmisc package, unfortunately the bins don't contain equal frequency!!
library(Hmisc)
data<-split(df2, cut2(df2$val, g=30, oneval=TRUE))
data<-split(df2, cut2(df2$val, m=1000, oneval=TRUE))
Assuming you want 50 equal sized buckets (based on your seq) statement, you can use something like:
df <- data.frame(var=runif(500, 0, 100)) # make data
cut.vec <- cut(
df$var,
breaks=quantile(df$var, 0:50/50), # breaks along 1/50 quantiles
include.lowest=T
)
df.split <- split(df, cut.vec)
Hmisc::cut2 has this option built in as well.
Can be done by the function provided here by Joris Meys
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
data<-split(df2, EqualFreq2(df2$val, 25))

Resources