R generate 2D histogram from raw data - r

I have some raw data in 2D, x, y as given below. I want to generate a 2D histogram from the data. Typically, dividing the x,y values into bins of size 0.5, and count the number of occurrences in each bin (for both x and y at the same time). Is there any way to do that?
> df
x y
1 4.2179611 5.7588577
2 5.3901279 5.8219784
3 4.1933089 6.4317645
4 5.8076411 5.8999598
5 5.5781166 5.9382342
6 4.5569735 6.7833469
7 4.4024492 5.8019719
8 4.1734975 6.0896355
9 5.1707871 5.5640962
10 5.6380258 6.9112775
11 4.6405353 5.2251746
12 4.1809004 6.1127144
13 4.2764079 5.4598799
14 5.4466446 6.0130047
15 5.2443804 5.5421851
16 5.7521515 5.4115965
17 4.9667564 5.3519795
18 4.5007141 6.8669231
19 5.0268273 5.7681888
20 4.4738948 6.4241168
21 4.4116357 5.9819519
22 4.5741988 6.4595129
23 4.0839075 6.8105259
24 4.7154364 6.5054761
25 4.8986785 5.5511226
26 5.6262397 6.8996480
27 4.9034275 5.6716375
28 4.1872928 5.8387641
29 4.0444855 5.2554446
30 4.8911393 5.8449165
31 5.7268887 6.7100432
32 5.9136374 6.5059128
33 4.9481286 6.4679917
34 4.6198987 5.7462047
35 5.7306916 6.0613158
36 5.5818586 6.4533566
37 5.9240267 6.7748290
38 4.8160926 6.4942865
39 5.5456258 5.7911897
40 4.3075173 6.8165520
41 4.9654533 5.8904734
42 5.9581820 5.7692468
43 4.2417172 5.7990554
44 5.3670112 5.8252479
45 5.2932098 5.3983672
46 5.7456521 6.2563828
47 4.9398795 5.2879065
48 4.8526884 6.9827555
49 5.6135753 6.5219431
50 4.0727956 5.2647714
51 6.9418969 5.2584325
52 5.4189039 5.9936456
53 3.9193741 6.7099562
54 5.5885252 5.9680734
55 5.9581279 5.1843804
56 4.5724421 6.6774004
57 4.7700303 6.6083613
58 5.5490254 6.2431170
59 4.1668548 5.1017475
60 5.8948947 6.7646917
61 6.5501872 5.2803433
62 5.6011444 4.2733087
63 5.1337226 6.5225780
64 5.3153358 6.6164809
65 3.3815056 6.4077659
66 3.8405670 5.3677008
67 6.7036350 4.3090214
68 3.2446588 4.0965275
69 4.6563593 7.6868628
70 5.2382914 7.0020874
71 6.0771605 6.6232541
72 3.5672511 6.9333691
73 5.0865233 4.0778233
74 5.6743559 5.5177734
75 4.5759146 7.2210012
76 5.8203140 4.9787148
77 3.1106176 6.3937707
78 4.6310679 4.4731806
79 6.8237641 6.2679791
80 3.7653803 5.9188107
81 5.6139040 5.8586176
82 6.2016662 5.3514293
83 3.9362048 5.3217560
84 6.8005236 7.9247371
85 5.8030101 7.7492432
86 6.0143418 6.0709249
87 6.5734089 7.6112815
88 4.0569383 5.8440535
89 4.6825752 7.7926235
90 4.8204027 6.3106798
91 3.5001675 6.3156079
92 3.6521280 7.5155810
93 5.0945236 4.8206873
94 3.8732946 5.6771599
95 6.4812309 5.6082170
96 5.0308355 7.6877289
97 5.2193389 7.7133717
98 6.2239631 5.5387684
99 4.6501488 7.8559335
100 3.5389389 5.4594034
101 5.7139486 4.5008182
102 3.5425132 7.3562487
103 6.9950663 6.1036549
104 5.3801845 5.8903123
105 4.7629191 5.3394552
106 4.4102815 7.2312852
107 5.8723641 4.1410996
108 3.4691208 4.6383708
109 4.6479362 5.8562699
110 3.0315732 6.8614265
111 5.9456145 4.7497545
112 4.8461189 4.4730002
113 4.9606723 5.1099093
114 4.7802659 7.8147864
115 5.0189229 6.9308301
116 6.4738074 5.0539666
117 5.3725075 5.3282273
118 6.5374505 7.0508875
119 4.0907139 5.0855075
120 5.0557532 5.6449829
121 6.5483249 7.5800015
122 3.1083616 7.3697234
123 3.6119548 7.7639486
124 6.5157691 7.7152933
125 4.0305622 7.0521419
126 3.2197769 6.5881246
127 4.7570419 6.4564400
128 4.0063007 6.3981942
129 4.4412649 7.6576221
130 5.7348769 6.7601804
131 3.1312551 5.6295996
132 3.8627964 7.5817083
133 5.2008281 5.1082509
134 6.4229161 6.2816475
135 2.5241894 6.0802138
136 7.3759753 5.1090478
137 3.7284166 5.2045976
138 3.4404286 6.9708127
139 6.4237399 5.1363851
140 4.1829368 5.1612791
141 5.9500285 5.4765621
142 3.3555182 6.2627360
143 7.7691356 5.1877095
144 4.0684189 7.1663495
145 7.3929140 7.3819058
146 2.1659981 7.9796005
147 4.8539955 7.3108966
148 5.3932658 4.7116979
149 3.5610560 4.6096759
150 5.1883331 6.8068501
151 6.4233558 7.2955388
152 7.3308739 6.1761356
153 3.0710449 4.5296235
154 7.5400128 5.1559900
155 3.5776389 5.2057676
156 4.0402288 7.1487121
157 2.3107258 6.9816127
158 7.2065591 7.7307439
159 5.7577620 5.6652052
160 2.0595554 7.4373547
161 7.5994468 4.6216856
162 4.8053745 3.9113634
163 7.5769460 7.6019067
164 5.5362034 8.9270974
165 3.6713241 3.9060205
166 6.0612046 7.3862080
167 6.9205755 7.0792392
168 6.0892821 6.3248315
169 2.0532905 4.1545875
170 3.4086310 3.5510909
171 5.2148895 5.3266145
172 4.7638780 7.9240988
173 6.4717329 5.1350172
174 7.8287022 4.3457324
175 6.0299681 3.0952274
176 3.2760103 5.2730464
177 2.5729991 7.6594251
178 3.9403251 7.8928014
179 6.0021556 7.5313493
180 7.8561727 4.5092728
181 3.5818174 4.1140876
182 7.4972295 5.5313987
183 6.0138287 6.9369784
184 3.9257191 7.6395296
185 3.0462106 3.1347680
186 6.0630447 4.1847229
187 7.4878528 5.1004141
188 4.5145570 4.6389011
189 6.2777996 4.2647980
190 3.0166336 7.5755042
191 2.8791041 6.4471746
192 7.1029767 7.0061048
193 2.4526181 6.3373793
194 5.8762775 7.0746223
195 7.0609100 8.1256569
196 4.7252400 8.4829780
197 3.3695501 8.8786640
198 3.8505741 6.8260398
199 5.3573846 6.3864944
200 3.7039072 8.9951078
201 4.6216933 6.7890198
202 7.0390643 5.9458624
203 5.7172605 6.9083246
204 2.3814644 8.3856125
205 2.4432566 3.2618192
206 4.3881965 6.7022219
207 5.2583749 7.2432485
208 5.8540367 8.5154705
209 6.4267791 4.9593757
210 5.0668461 3.1358129
211 2.6845736 8.9880143
212 7.3094761 5.4049133
213 4.2176252 5.5062193
214 5.2025716 4.0798478
215 6.5592571 8.1852765
216 2.0417939 7.0843906
217 7.6045374 7.4870940
218 6.5971789 8.8641329
219 5.3541694 7.2176914
220 2.8314803 6.4831720
221 2.4252467 4.0918736
222 6.6804732 6.3624739
223 6.0325285 6.2057468
224 2.2751047 5.1275412
225 5.5397481 5.9890834
226 4.6420585 4.6013327
227 7.6385642 5.1722194
228 6.7378078 5.8246169
229 5.0647686 7.9219705
230 2.8672731 6.6371082
231 7.5487359 4.5727898
232 1.0837662 7.1788146
233 5.4483746 6.8955122
234 9.3085746 4.8330044
235 3.8484225 6.0133789
236 2.8034987 3.0023096
237 2.8952626 8.2623788
238 5.7666136 3.2158710
239 6.4978214 5.7866574
240 1.5184268 5.9791716
241 2.3836147 8.2897188
242 4.7318649 6.1174515
243 5.8544588 7.5056688
244 9.6776416 6.5151695
245 0.4319531 4.2470331
246 0.9810053 8.6452087
247 7.0819634 3.2488110
248 1.9084265 6.1122130
249 7.5096342 3.3495096
250 8.9564496 3.4960564
251 5.7603943 6.9091760
252 0.8801204 7.2744429
253 1.2183581 6.4264214
254 1.7761613 7.1199729
255 3.2490662 7.9935963
256 3.5420375 8.4801333
257 8.7709382 3.8011487
258 8.4770868 3.4749692
259 0.9965042 6.7509705
260 7.5049457 5.4313474
261 9.7261151 6.5909553
262 5.3893371 4.0194548
263 9.6154510 7.3117416
264 1.0327841 6.2376586
265 4.0064715 3.7333634
266 6.6941050 3.9452152
267 4.1317951 9.3322756
268 9.6481471 7.5330023
269 7.3474233 1.0310166
270 3.7343864 4.9808341
271 9.1412231 2.6655861
272 5.8414100 0.1329439
273 2.4837309 7.4956203
274 2.7983337 1.3563719
275 0.6335727 7.9273816
276 7.5566740 0.4321263
277 8.6182079 0.6038505
278 0.8928523 8.0131172
279 5.7375090 8.5275545
280 0.7864533 3.3954255
281 8.7808839 1.7059789
282 9.6621659 0.9215045
283 8.4894688 8.7667948
284 1.0358920 7.2505891
285 0.7378660 0.1173287
286 9.5485481 3.3186128
287 6.8987508 9.5480887
288 7.4105831 5.8809522
289 6.6984457 5.9509037
290 1.7878216 9.1932955
291 0.8443295 5.1662902
292 0.4498266 8.9636923
293 2.5068754 5.3692908
294 9.2509052 2.4204235
295 4.1333742 6.2581851
296 6.5510938 7.2923688
297 4.3412873 3.5514825
298 4.2349765 9.3207514
299 2.8730785 7.2752405
300 2.0425362 6.6513146
301 6.4498432 7.2949259
302 5.7453188 6.3263712
303 7.0501276 8.2238207
304 4.1915008 1.5325379
305 8.1307954 7.7681944
306 7.3156552 6.3031412
307 4.0302052 0.3039900
308 3.3740358 2.1386235
309 8.2055657 2.9112215
310 1.8817856 7.0503046
311 7.0820523 6.8739097
312 5.0725238 6.9951556
313 1.6246224 5.4126084
314 3.8865553 7.6398192
315 6.6727672 8.9677947
316 9.6048687 7.6757966
317 2.2006018 9.6385351
318 9.6403802 7.6438900
319 0.1267512 0.9048408
320 1.8160829 7.3193066
321 9.9318386 9.6068456
322 2.1275892 7.8034724
323 1.2232242 1.0695030
324 3.0198057 3.8964732
325 3.3265773 8.5865587
326 5.1519605 7.5068253
327 0.4137485 5.9223826
328 1.6896445 0.6071874
329 1.8534083 2.3554291
330 1.7182264 9.3488597
331 6.4165456 9.8670765
332 7.6270001 2.1839607
333 8.9867227 5.9565743
334 6.9185079 0.2440980
335 6.7359209 7.1072908
336 3.8034763 5.8466404
337 3.4583027 6.9041502
338 1.7983897 1.7108336
339 6.9184406 6.3632716
340 1.3538600 6.8484462
341 3.6731748 4.9846946
342 5.6139620 8.0637827
343 9.0991782 2.3051189
344 1.1220448 8.9624365
345 2.5925265 8.3673795
346 9.9977377 8.5423564
347 5.1761187 5.1240824
348 5.9330451 9.4141322
349 6.3337224 6.8055697
350 2.7287418 5.7100024
351 6.1022411 2.9733360
352 2.7331869 3.7135612
353 6.7394034 8.2721572
354 2.1757932 9.0574057
355 5.5011486 6.0124142
356 4.5301911 2.5865048
357 5.3137001 0.7062267
358 0.6959286 3.2395043
359 5.3494169 6.5742589
360 7.1472046 6.3821916
361 0.1749855 0.3954287
362 6.7709760 6.5212015
363 7.2983482 3.0086604
364 0.6147726 9.3336870
365 7.4417342 2.6836695
366 1.2769881 4.0591093
367 9.5342317 5.3443613
368 0.9368862 1.1391497
369 8.4271193 8.6641296
370 6.2000851 8.2987486
371 2.1768279 6.0684896
372 5.2021222 6.9222675
373 0.6095874 8.4759464
374 2.0217473 9.5844241
375 4.8080163 6.5052801
376 3.6099334 0.3272768
377 6.0132712 7.9920535
378 4.0495344 8.8153621
379 6.9646704 7.0375214
380 3.9211171 2.5994333
381 4.4749268 1.0517360
382 1.1683429 3.8710614
383 1.7618115 0.3513996
384 1.1257639 5.7446745
385 3.7351688 8.7376011
386 4.9234662 7.1975462
387 7.4899861 7.3846309
388 7.4170082 2.2885060
389 0.8526702 3.8160722
390 4.5907512 8.9315418
391 7.6996179 9.8409051
392 0.2340987 4.2906009
393 2.2502736 1.7819172
394 3.5679969 1.7419479
395 5.4214908 5.6001803
396 3.9965213 9.2021549
397 3.8610336 2.0462740
398 5.9490575 4.4422382
399 9.8897791 5.6402915
400 6.1153192 4.1236797
401 5.8906384 2.6153750
402 8.0582664 2.7137804
403 7.2969209 2.9362187
404 3.8673527 1.0837191
405 3.5647339 6.2338014
406 9.6490210 0.8373270
407 0.8133243 6.3393130
408 2.8760565 9.9462423
409 3.3836457 7.4451869
410 4.7772609 2.9141127
411 8.6635971 5.7812494
412 5.6192160 1.4764255
413 9.1334625 8.9822399
414 0.4662385 6.6440937
415 3.4503559 4.2064800
416 0.6704780 2.8508758
417 0.5211872 4.3109175
418 7.5615411 9.2851454
419 7.5081906 4.0019450
420 8.8851669 9.7323717
421 7.3856288 8.6152906
422 9.5926351 0.3993818
423 1.4478981 1.4845263
424 5.0425560 1.3501638
425 0.8952120 7.9407680
426 6.4732584 7.1493210
427 9.6595225 5.2377876
428 7.2204625 2.0300222
429 3.5410601 7.3117738
430 6.7991771 3.6368291
Just for clarification, I want to get something like this plot below (this plot doesn't have to do anything with my raw data, I am just showing it to explain the problem more clearly! If I use hist(df$x) it will show the distribution of x only.)

The ggplot is elegant and fast and pretty, as usual. But if you want to use base graphics (image, contour, persp) and display your actual frequencies (instead of the smoothing 2D kernel), you have to first obtain the binnings yourself and create a matrix of frequencies. Here's some code (not necessarily elegant, but pretty robust) that does 2D binning and generates plots somewhat similar to the ones above:
require(mvtnorm)
xy <- rmvnorm(1000,c(5,10),sigma=rbind(c(3,-2),c(-2,3)))
nbins <- 20
x.bin <- seq(floor(min(xy[,1])), ceiling(max(xy[,1])), length=nbins)
y.bin <- seq(floor(min(xy[,2])), ceiling(max(xy[,2])), length=nbins)
freq <- as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
freq[,1] <- as.numeric(freq[,1])
freq[,2] <- as.numeric(freq[,2])
freq2D <- diag(nbins)*0
freq2D[cbind(freq[,1], freq[,2])] <- freq[,3]
par(mfrow=c(1,2))
image(x.bin, y.bin, freq2D, col=topo.colors(max(freq2D)))
contour(x.bin, y.bin, freq2D, add=TRUE, col=rgb(1,1,1,.7))
palette(rainbow(max(freq2D)))
cols <- (freq2D[-1,-1] + freq2D[-1,-(nbins-1)] + freq2D[-(nbins-1),-(nbins-1)] + freq2D[-(nbins-1),-1])/4
persp(freq2D, col=cols)
For a really fun time, try making an interactive, zoomable, 3D surface:
require(rgl)
surface3d(x.bin,y.bin,freq2D/10, col="red")

Bivariate density estimates can be done with MASS::kde2d, or KernSmooth::bkde2D (both supplied with the base R distribution). The latter uses an algorithm based on the fast Fourier transform over a grid of points, and is very fast. The result can be plotted with contour or persp or similar functions in other graphing packages.
Using your data:
require(KernSmooth)
z <- bkde2D(df, .5)
persp(z$fhat)

If you want it with a 2d contour, you can also use the package ggplot2. Some example code is shown in this question:
gradient breaks in a ggplot stat_bin2d plot
Adjusted slightly:
x <- rnorm(10000)+5
y <- rnorm(10000)+5
df <- data.frame(x,y)
require(ggplot2)
p <- ggplot(df, aes(x, y))
p <- p + stat_bin2d(bins = 20)
p
Here's the output of the code above:

For completeness, you can also use the hist2d{gplots} function. It seems to be the most straightforward for a 2D plot:
library(gplots)
# data is in variable df
# define bin sizes
bin_size <- 0.5
xbins <- (max(df$x) - min(df$x))/bin_size
ybins <- (max(df$y) - min(df$y))/bin_size
# create plot
hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins))
# if you want to retrieve the data for other purposes
df.hist2d <- hist2d(df, same.scale=TRUE, nbins=c(xbins, ybins), show=FALSE)
df.hist2d$counts

i came to this page from http://www.r-bloggers.com/5-ways-to-do-2d-histograms-in-r/ which lists one of the answers above.
It provides code samples for a total of 5 methods:
hist2d from the library gplots
hexbin,hexbinplot from the library hexbin
stat_bin2d from the library ggplot2
kde2d from the library MASS
the "hard way" solution listed above.

freq <- as.data.frame(table(findInterval(xy[,1], x.bin),findInterval(xy[,2], y.bin)))
freq[,1] <- as.numeric(freq[,1])
freq[,2] <- as.numeric(freq[,2])
This is probably wrong since it destroys the original indices.

Related

Using dplyr to compute calculated fields depending on multiple columns without explicitly writing column names

Consider the following code.
set.seed(56)
library(dplyr)
df <- data.frame(
NUM_1 = sample.int(500, replace = TRUE),
DENOM_1 = sample.int(500, replace = TRUE),
NUM_2 = sample.int(500, replace = TRUE),
DENOM_2 = sample.int(500, replace = TRUE)
)
head(df)
NUM_1 DENOM_1 NUM_2 DENOM_2
1 417 379 154 173
2 160 437 239 154
3 243 315 106 361
4 291 169 393 340
5 170 450 429 421
6 422 131 75 64
Without having to manually specify each of the column names (the actual problem has about 40 of these I need to create), I would like to create columns FRAC_1 and FRAC_2 for which FRAC_X = NUM_X/DENOM_X.
So, this would be what I'm looking for with regard to output, but since I'm dealing with about 40 of these, I don't want to have to manually type out each column:
df_frac <- df %>%
mutate(FRAC_1 = NUM_1 / DENOM_1,
FRAC_2 = NUM_2 / DENOM_2)
head(df_frac)
NUM_1 DENOM_1 NUM_2 DENOM_2 FRAC_1 FRAC_2
1 417 379 154 173 1.1002639 0.8901734
2 160 437 239 154 0.3661327 1.5519481
3 243 315 106 361 0.7714286 0.2936288
4 291 169 393 340 1.7218935 1.1558824
5 170 450 429 421 0.3777778 1.0190024
6 422 131 75 64 3.2213740 1.1718750
I would strongly prefer a dplyr solution to this. I thought maybe I could use mutate() with across(), but it isn't clear to me how to tell across() to pair the NUM_x with the corresponding DENOM_x columns.
Here is one in tidyverse
Loop across the columns with names starts_with 'NUM'
Extract the column name cur_column(), replace the substring from 'NUM' to 'DENOM' in str_replace
get the column value, divide by the NUM column, and change the column name in .names to create the 'FRAC' columns
library(dplyr)
library(stringr)
df <- df %>%
mutate(across(starts_with("NUM"), ~
./get(str_replace(cur_column(), 'NUM', 'DENOM')),
.names = "{str_replace(.col, 'NUM', 'FRAC')}"))
-output
head(df)
NUM_1 DENOM_1 NUM_2 DENOM_2 FRAC_1 FRAC_2
1 417 379 154 173 1.1002639 0.8901734
2 160 437 239 154 0.3661327 1.5519481
3 243 315 106 361 0.7714286 0.2936288
4 291 169 393 340 1.7218935 1.1558824
5 170 450 429 421 0.3777778 1.0190024
6 422 131 75 64 3.2213740 1.1718750

ggplot_line: label the top 2 peak with X-axis values

I am new to R programming. I am plotting a mass spectrum with ggplot and would like to label the top 2 peaks with their x-axis values (i.e. m). Does anyone know how to achieve that?
Thanks so much for your help!
Here is part of the raw data I used for the ggplot.
m Intensity
1 30001 2.964e+01
2 30002 3.336e+01
3 30003 3.968e+01
4 30004 5.015e+01
5 30005 6.838e+01
6 30006 1.016e+02
7 30007 1.464e+02
8 30008 2.130e+02
9 30009 3.115e+02
10 30010 3.951e+02
11 30011 5.134e+02
12 30012 5.316e+02
13 30013 6.377e+02
14 30014 8.813e+02
15 30015 1.071e+03
16 30016 1.119e+03
17 30017 1.202e+03
18 30018 1.299e+03
19 30019 1.112e+03
20 30020 1.205e+03
21 30021 1.422e+03
22 30022 1.653e+03
23 30023 1.726e+03
24 30024 2.423e+03
25 30025 3.059e+03
26 30026 3.267e+03
27 30027 3.993e+03
28 30028 5.172e+03
29 30029 5.278e+03
30 30030 2.794e+03
31 30031 1.459e+03
32 30032 2.512e+03
33 30033 6.590e+03
34 30034 1.245e+04
35 30035 1.144e+04
36 30036 5.197e+03
37 30037 6.012e+03
38 30038 1.453e+04
39 30039 1.513e+04
40 30040 5.802e+03
41 30041 9.226e+03
42 30042 5.809e+03
43 30043 3.074e+03
44 30044 3.882e+03
45 30045 9.941e+02
46 30046 8.170e+02
47 30047 1.149e+03
48 30048 3.567e+02
49 30049 3.805e+02
50 30050 3.654e+02
51 30051 4.724e+02
52 30052 7.819e+02
53 30053 8.634e+02
54 30054 5.235e+02
55 30055 1.712e+02
56 30056 9.232e+01
57 30057 9.434e+01
58 30058 7.191e+01
59 30059 8.036e+01
60 30060 4.456e+01
61 30061 9.428e+01
62 30062 9.392e+01
63 30063 8.413e+01
64 30064 5.671e+01
65 30065 2.639e+01
66 30066 2.027e+01
67 30067 4.584e+01
68 30068 6.956e+01
69 30069 6.181e+01
70 30070 6.450e+01
71 30071 2.826e+01
72 30072 3.610e+01
73 30073 6.325e+01
74 30074 3.509e+01
75 30075 3.478e+01
76 30076 1.120e+01
77 30077 6.993e+00
78 30078 9.936e+00
79 30079 7.738e+00
80 30080 9.771e+00
81 30081 1.762e+01
82 30082 3.060e+01
83 30083 2.175e+01
84 30084 2.816e+01
85 30085 2.700e+01
86 30086 2.114e+01
87 30087 4.378e+01
88 30088 5.824e+01
89 30089 6.193e+01
90 30090 4.146e+01
91 30091 9.697e+04
92 30092 9.458e+04
93 30093 9.216e+04
94 30094 8.972e+04
95 30095 8.723e+04
96 30096 8.468e+04
97 30097 8.211e+04
98 30098 7.959e+04
99 30099 7.726e+04
100 30100 7.527e+04
101 30101 7.379e+04
102 30102 7.298e+04
103 30103 7.301e+04
104 30104 7.399e+04
105 30105 7.602e+04
106 30106 7.916e+04
107 30107 8.340e+04
108 30108 8.862e+04
109 30109 9.460e+04
110 30110 1.010e+05
111 30111 1.074e+05
112 30112 1.133e+05
113 30113 1.180e+05
114 30114 1.211e+05
115 30115 1.222e+05
116 30116 1.213e+05
117 30117 1.186e+05
118 30118 1.146e+05
119 30119 1.100e+05
120 30120 1.054e+05
121 30121 1.014e+05
122 30122 9.838e+04
123 30123 9.637e+04
124 30124 9.535e+04
125 30125 9.508e+04
126 30126 9.520e+04
127 30127 9.527e+04
128 30128 9.484e+04
129 30129 9.355e+04
130 30130 9.128e+04
131 30131 8.809e+04
132 30132 8.425e+04
133 30133 8.012e+04
134 30134 7.603e+04
135 30135 7.225e+04
136 30136 6.895e+04
137 30137 6.617e+04
138 30138 6.392e+04
139 30139 6.214e+04
140 30140 6.078e+04
141 30141 5.980e+04
142 30142 5.922e+04
143 30143 5.905e+04
144 30144 5.934e+04
145 30145 6.013e+04
146 30146 6.143e+04
147 30147 6.324e+04
148 30148 6.552e+04
149 30149 6.816e+04
150 30150 7.100e+04
151 30151 7.384e+04
152 30152 7.655e+04
153 30153 7.904e+04
154 30154 8.132e+04
155 30155 8.353e+04
156 30156 8.595e+04
157 30157 8.896e+04
158 30158 9.302e+04
159 30159 9.864e+04
160 30160 1.063e+05
161 30161 1.165e+05
162 30162 1.293e+05
163 30163 1.443e+05
164 30164 1.605e+05
165 30165 1.759e+05
166 30166 1.883e+05
167 30167 1.957e+05
168 30168 1.969e+05
169 30169 1.921e+05
170 30170 1.824e+05
171 30171 1.693e+05
172 30172 1.544e+05
173 30173 1.390e+05
174 30174 1.241e+05
175 30175 1.102e+05
176 30176 9.755e+04
177 30177 8.644e+04
178 30178 7.692e+04
179 30179 6.900e+04
180 30180 6.262e+04
181 30181 5.766e+04
182 30182 5.397e+04
183 30183 5.137e+04
184 30184 4.972e+04
185 30185 4.889e+04
186 30186 4.881e+04
187 30187 4.940e+04
188 30188 5.059e+04
189 30189 5.230e+04
190 30190 5.444e+04
191 30191 5.690e+04
192 30192 5.960e+04
193 30193 6.244e+04
194 30194 6.539e+04
195 30195 6.842e+04
196 30196 7.153e+04
197 30197 7.471e+04
198 30198 7.795e+04
199 30199 8.118e+04
200 30200 8.430e+04
201 30201 8.719e+04
202 30202 8.976e+04
203 30203 9.193e+04
204 30204 9.364e+04
205 30205 9.480e+04
206 30206 9.531e+04
207 30207 9.504e+04
208 30208 9.391e+04
209 30209 9.189e+04
210 30210 8.912e+04
211 30211 8.587e+04
212 30212 8.251e+04
213 30213 7.939e+04
214 30214 7.680e+04
215 30215 7.492e+04
216 30216 7.381e+04
217 30217 7.349e+04
218 30218 7.394e+04
219 30219 7.510e+04
220 30220 7.690e+04
221 30221 7.919e+04
222 30222 8.174e+04
223 30223 8.425e+04
224 30224 8.637e+04
225 30225 8.776e+04
226 30226 8.826e+04
227 30227 8.788e+04
228 30228 8.690e+04
229 30229 8.569e+04
230 30230 8.465e+04
231 30231 8.405e+04
232 30232 8.398e+04
233 30233 8.434e+04
234 30234 8.494e+04
235 30235 8.554e+04
236 30236 8.598e+04
237 30237 8.623e+04
238 30238 8.638e+04
239 30239 8.665e+04
240 30240 8.736e+04
241 30241 8.884e+04
242 30242 9.147e+04
243 30243 9.559e+04
244 30244 1.016e+05
245 30245 1.097e+05
246 30246 1.200e+05
247 30247 1.321e+05
Here is my code for ggplot:
ggplot(data=raw.1) +
geom_line(mapping = aes(x=m, y=Intensity))
Below is the ggplot output:
I would do it this way. My solution requires the ggrepel package as well as some dplyr functions. The key to this working is that you can set data = for each geom_ layer in ggplot2. The geom_text_repel() layer from ggrepel ensures that the labels will not overlap your data from geom_line().
library(ggplot2)
library(dplyr)
library(ggrepel)
ggplot(mapping = aes(x = m, y = Intensity, label = m)) +
geom_line(data=raw.1) +
geom_text_repel(data = raw.1 %>%
arrange(desc(Intensity)) %>% # arranges in descending order
slice_head(n = 2)) # only keeps the top two intensities.
My plot does not look like yours since you only shared the first 247 data points. I suspect that this initial solution might not work for you because I am a chemist and have some idea what you hope to accomplish. This approach labels the top two highest intensities, not necessarily the top two peaks. We need to identify local all maxima and then select the two tallest.
Here is how we do that. The following code calculates the slope between each point, and then looks for points where a positive slope changes to a negative slope (local maximum), then it sorts and selects the top two by intensity.
top_two <- raw.1 %>%
mutate(deriv = Intensity - lag(Intensity) ,
max = case_when(deriv >=0 & lead(deriv) <0 ~ T,
T ~ F)) %>%
filter(max) %>%
arrange(desc(Intensity)) %>%
slice_head(n = 2)
Let's modify the original plot code to put this in.
ggplot(mapping = aes(x = m, y = Intensity, label = m)) +
geom_line(data = raw.1) +
geom_text_repel(data = top_two, nudge_y = 1e4)
Data:
raw.1 <- structure(list(m = c(30001, 30002, 30003, 30004, 30005, 30006,
30007, 30008, 30009, 30010, 30011, 30012, 30013, 30014, 30015,
30016, 30017, 30018, 30019, 30020, 30021, 30022, 30023, 30024,
30025, 30026, 30027, 30028, 30029, 30030, 30031, 30032, 30033,
30034, 30035, 30036, 30037, 30038, 30039, 30040, 30041, 30042,
30043, 30044, 30045, 30046, 30047, 30048, 30049, 30050, 30051,
30052, 30053, 30054, 30055, 30056, 30057, 30058, 30059, 30060,
30061, 30062, 30063, 30064, 30065, 30066, 30067, 30068, 30069,
30070, 30071, 30072, 30073, 30074, 30075, 30076, 30077, 30078,
30079, 30080, 30081, 30082, 30083, 30084, 30085, 30086, 30087,
30088, 30089, 30090, 30091, 30092, 30093, 30094, 30095, 30096,
30097, 30098, 30099, 30100, 30101, 30102, 30103, 30104, 30105,
30106, 30107, 30108, 30109, 30110, 30111, 30112, 30113, 30114,
30115, 30116, 30117, 30118, 30119, 30120, 30121, 30122, 30123,
30124, 30125, 30126, 30127, 30128, 30129, 30130, 30131, 30132,
30133, 30134, 30135, 30136, 30137, 30138, 30139, 30140, 30141,
30142, 30143, 30144, 30145, 30146, 30147, 30148, 30149, 30150,
30151, 30152, 30153, 30154, 30155, 30156, 30157, 30158, 30159,
30160, 30161, 30162, 30163, 30164, 30165, 30166, 30167, 30168,
30169, 30170, 30171, 30172, 30173, 30174, 30175, 30176, 30177,
30178, 30179, 30180, 30181, 30182, 30183, 30184, 30185, 30186,
30187, 30188, 30189, 30190, 30191, 30192, 30193, 30194, 30195,
30196, 30197, 30198, 30199, 30200, 30201, 30202, 30203, 30204,
30205, 30206, 30207, 30208, 30209, 30210, 30211, 30212, 30213,
30214, 30215, 30216, 30217, 30218, 30219, 30220, 30221, 30222,
30223, 30224, 30225, 30226, 30227, 30228, 30229, 30230, 30231,
30232, 30233, 30234, 30235, 30236, 30237, 30238, 30239, 30240,
30241, 30242, 30243, 30244, 30245, 30246, 30247), Intensity = c(29.64,
33.36, 39.68, 50.15, 68.38, 101.6, 146.4, 213, 311.5, 395.1,
513.4, 531.6, 637.7, 881.3, 1071, 1119, 1202, 1299, 1112, 1205,
1422, 1653, 1726, 2423, 3059, 3267, 3993, 5172, 5278, 2794, 1459,
2512, 6590, 12450, 11440, 5197, 6012, 14530, 15130, 5802, 9226,
5809, 3074, 3882, 994.1, 817, 1149, 356.7, 380.5, 365.4, 472.4,
781.9, 863.4, 523.5, 171.2, 92.32, 94.34, 71.91, 80.36, 44.56,
94.28, 93.92, 84.13, 56.71, 26.39, 20.27, 45.84, 69.56, 61.81,
64.5, 28.26, 36.1, 63.25, 35.09, 34.78, 11.2, 6.993, 9.936, 7.738,
9.771, 17.62, 30.6, 21.75, 28.16, 27, 21.14, 43.78, 58.24, 61.93,
41.46, 96970, 94580, 92160, 89720, 87230, 84680, 82110, 79590,
77260, 75270, 73790, 72980, 73010, 73990, 76020, 79160, 83400,
88620, 94600, 101000, 107400, 113300, 118000, 121100, 122200,
121300, 118600, 114600, 110000, 105400, 101400, 98380, 96370,
95350, 95080, 95200, 95270, 94840, 93550, 91280, 88090, 84250,
80120, 76030, 72250, 68950, 66170, 63920, 62140, 60780, 59800,
59220, 59050, 59340, 60130, 61430, 63240, 65520, 68160, 71000,
73840, 76550, 79040, 81320, 83530, 85950, 88960, 93020, 98640,
106300, 116500, 129300, 144300, 160500, 175900, 188300, 195700,
196900, 192100, 182400, 169300, 154400, 139000, 124100, 110200,
97550, 86440, 76920, 69000, 62620, 57660, 53970, 51370, 49720,
48890, 48810, 49400, 50590, 52300, 54440, 56900, 59600, 62440,
65390, 68420, 71530, 74710, 77950, 81180, 84300, 87190, 89760,
91930, 93640, 94800, 95310, 95040, 93910, 91890, 89120, 85870,
82510, 79390, 76800, 74920, 73810, 73490, 73940, 75100, 76900,
79190, 81740, 84250, 86370, 87760, 88260, 87880, 86900, 85690,
84650, 84050, 83980, 84340, 84940, 85540, 85980, 86230, 86380,
86650, 87360, 88840, 91470, 95590, 101600, 109700, 120000, 132100
)), row.names = c(NA, -247L), class = c("tbl_df", "tbl", "data.frame"
))
This approach assumes or treats your x-axis as discrete values of a continuous variable and finds the local maxima based on 2nd derivative using code from Finding local maxima and minima
Rest of the plotting is similar to Ben Norris's answer using geom_text_repel() to label the points of interest.
Also as noted, the data your provided are different vs. the figure in your question.
library(ggplot2)
library(ggrepel)
# find local maxima aka peaks
local_maximas <- raw.1[which(diff(sign(diff(raw.1$Intensity)))==-2)+1,]
top2 <- tail(local_maximas[order(local_maximas$Intensity),],2) #subset of top 2 highest peaks
raw.1$label <- ifelse(raw.1$m %in% top2$m, raw.1$m, NA) #make labels for plot
ggplot(data = raw.1) +
geom_line(aes(x=m, y=Intensity)) +
geom_text_repel(aes(x = m, y = Intensity, label = label))

Why are my 95% confidence intervals of my multivariate regression being plotted as a loess line?

I've been trying to plot a 95% prediction interval for a multivariate regression line in ggplot2. The graph is a regression of three independent variables ("x", "y", and "z") being used to predict a dependent variable ("a") on the y-axis. However, when I actually try to plot the results in ggplot2, I get a rather unusual result where the regression line is straightforward but the 95% prediction interval bands are very squiggly and do not resemble a straight line at all. They look like loess lines more than anything. Here is a picture showing the result I get:
Does anyone know why I am getting a result where the 95% confidence intervals aren't smooth lines? The only thing I can think of is that this is related to the fact that this is a multivariate regression rather than a univariate one, but checking the actual variables all three show a strong correlation with the dependent variable (r2 > 0.95). I looked up the results of a plot of a multivariate regression with a 95% confidence interval, but none of them seemed similar to my result, they all seemed to have pretty smooth lines.
I tried fitting a method="lm" into the predict() call of my code following this question, but that did not work either.
Below is a dataset and code that replicates this result.
x y z a
1 2.366153239 5.420534999 2.328204243 10.55858156
2 1.431094272 2.975529566 1.724972338 2.533696814
3 2.60453538 5.75827066 2.399639694 11.48783737
4 2.483771412 5.470167623 2.338838948 10.74706177
5 1.971210737 4.287715955 2.070680071 7.334766592
6 2.5596573 5.558000525 2.357541203 11.6127708
7 2.177892158 4.730480377 2.174966753 8.631949429
8 1.49665751 3.203559121 1.78984891 3.020424886
9 2.728865195 6.376658918 2.525204728 12.51412704
10 1.908668224 4.025351691 2.006327912 6.593044534
11 1.978895443 4.24563401 2.060493633 7.402451521
12 1.627855104 3.344274234 1.828735693 3.731699451
13 1.53436705 3.350605596 1.83046595 3.170525564
14 2.448831586 5.585936937 2.363458681 10.76866329
15 2.443160968 5.331752143 2.309058714 10.58310613
16 2.156078216 4.417635062 2.101817086 8.109576771
17 1.931534652 4.249610334 2.061458303 6.790693233
18 1.452715015 3.225752129 1.796037897 3.356200016
19 1.729354145 3.683866912 1.919340228 5.420225217
20 1.861239059 3.912023005 1.977883466 6.267750682
21 1.822955174 3.804437795 1.950496807 5.991464547
22 2.113126565 4.492001488 2.119434238 8.114076324
23 2.171856126 4.662613282 2.159308519 7.806138626
24 1.391215895 3.010620886 1.735114084 2.461296784
25 1.319165859 2.895911938 1.701737917 2.055404964
26 2.034006688 4.322608316 2.079088338 6.977001452
27 2.85574569 6.160996329 2.482135437 14.34613881
28 1.411579618 3.097385927 1.759939183 2.613006652
29 2.576957482 6.029643051 2.45553315 11.91628836
30 1.796913834 3.923259637 1.980721999 5.911392672
31 2.024389004 4.345833727 2.084666335 8.022132643
32 1.63435577 3.493472658 1.869083374 3.515715835
33 1.584595569 3.453157121 1.858267236 3.397523976
34 1.881578895 4.030076005 2.00750492 6.011267174
35 1.728309802 3.752101123 1.937034105 5.225370259
36 1.414715557 3.140049044 1.772018353 2.736961545
37 1.488730081 3.116621591 1.76539559 2.902519892
38 1.522138034 3.257327011 1.804806641 2.890371758
39 1.800033345 3.987130478 1.996780027 5.640594153
40 1.794222122 4.143928062 2.035664035 6.206575927
41 2.676710091 6.289901082 2.50796752 13.49805633
42 2.328582719 5.13691546 2.266476442 9.430961545
43 2.484723966 5.458712793 2.336388836 10.7561993
44 2.287108375 4.856940066 2.203846652 9.917240545
45 2.417128932 5.582744146 2.362783136 10.54534144
46 2.328332495 5.105945474 2.259633925 9.840475333
47 2.362264634 5.293304825 2.300718328 9.848820151
48 2.28292536 5.018934097 2.24029777 9.269934816
49 1.449825221 3.006177531 1.73383319 3.121042465
50 2.211679876 4.692264893 2.166163635 8.631218063
51 2.704614597 6.072756474 2.464296345 12.31992499
52 2.48097622 5.43590303 2.331502312 11.2245765
53 1.497529983 3.380994674 1.838748127 3.752088968
54 2.696365396 5.825540285 2.413615604 12.36222133
55 2.165729837 4.666265285 2.160153996 8.455875079
56 2.410978268 5.417499423 2.327552239 10.08813972
57 2.185447829 4.991792206 2.234231905 9.215327913
58 2.041898307 4.22566518 2.055642279 7.418180823
59 2.099077244 4.375757022 2.091831021 7.696212639
60 2.000032635 4.234467391 2.057782153 7.110696123
61 2.025963678 4.260852439 2.064183238 6.851163763
62 2.083395224 4.351567427 2.08604109 7.884576511
63 1.981523362 4.318820559 2.07817722 7.43543802
64 2.033235038 4.336636932 2.082459347 7.313220387
65 1.423999144 3.206803244 1.790754937 2.564949357
66 2.217982257 4.825910853 2.196795587 8.920558764
67 1.240285111 2.808498672 1.675857593 1.568615918
68 2.215837149 5.041487758 2.245325758 8.802372134
69 2.134859238 4.731890939 2.175291001 8.132101136
70 2.306998207 5.059171458 2.249260202 9.336074756
71 1.896404791 4.104681782 2.026001427 6.445449942
72 1.922935417 4.151905673 2.037622554 6.818169682
73 2.111422924 4.716264233 2.171696165 8.366370302
74 2.28264494 4.852811209 2.202909714 9.210340372
75 2.190760504 4.574710979 2.1388574 8.447427164
76 2.037589062 4.275276265 2.06767412 6.989197008
77 1.717192759 3.810543836 1.952061433 4.610157727
78 1.876769266 4.043051268 2.010734012 6.306275287
79 2.030134158 4.579339426 2.139939117 7.715792425
80 1.93577016 4.356708827 2.08727306 6.788521191
81 2.056518774 4.445588116 2.108456335 7.636510887
82 2.120080841 4.615120517 2.148283156 7.916807491
83 2.232689054 4.861361591 2.204849562 8.694167142
84 2.181147406 4.782479201 2.186888017 8.854567878
85 2.92779884 6.305666829 2.511108685 13.9593635
86 1.860080456 4.459637473 2.111785376 6.163314804
87 1.913818428 4.602767301 2.145406092 7.174915716
88 1.877883958 4.594104966 2.143386332 6.335054251
89 1.994987686 4.632100752 2.152231575 7.707952547
90 2.14756511 5.023880521 2.241401464 9.161721393
91 1.503591471 3.687628672 1.92031994 4.280824129
92 1.4536743 3.579343567 1.891915317 3.761200116
93 1.50872427 3.584888833 1.893380266 4.106767082
94 1.537573733 3.649466946 1.910357806 4.126327608
95 1.934796461 4.373238129 2.091228856 7.584097036
96 1.526250724 3.248434627 1.802341429 3.228826156
97 1.606399474 3.500439216 1.870946075 4.939855112
98 1.943162189 4.329208633 2.080675043 6.460498957
99 1.963384107 4.353112625 2.086411423 6.649308332
100 2.183124049 4.711248626 2.170541091 8.474527832
101 1.640763809 3.543853682 1.882512598 3.832330237
102 1.659456682 3.523415014 1.877076188 3.997282849
103 1.436096958 3.166318574 1.779415234 2.839078464
104 2.428955194 4.91133048 2.216152179 10.44793169
105 2.668500746 6.154858094 2.480898646 12.73883098
106 2.676812229 6.178980921 2.485755604 12.64109656
107 2.126920019 4.640923356 2.154280241 8.600833727
108 1.878254881 4.025530246 2.00637241 6.253828812
109 2.242102174 4.726797674 2.174119977 8.29404964
110 1.676813632 3.822754538 1.955186574 5.370638028
111 1.874531192 4.17438727 2.043131731 7.265087007
112 1.998637301 4.2363594 2.058241822 6.722389092
113 1.944116978 4.159527009 2.039491851 6.038562805
114 2.308184503 5.192956851 2.278806014 9.36048303
115 2.042370888 4.49535532 2.120225299 7.320526962
116 2.015621187 4.318820559 2.07817722 7.081078135
117 1.81401665 4.146304301 2.036247603 6.492542819
118 1.676813632 3.87937827 1.969613736 5.221868194
119 2.807346477 6.428545769 2.535457704 13.72308897
120 1.621259207 3.543853682 1.882512598 4.162470391
121 1.50100345 3.321793359 1.822578766 3.106378794
122 1.582428764 3.464319806 1.861268333 4.143134726
123 1.654547625 3.591817741 1.895209155 4.509649984
124 2.332936461 4.937777822 2.222111118 9.398917323
125 2.498105588 5.513601542 2.348105948 11.29414737
126 1.890319403 3.887730313 1.97173282 5.847161058
127 1.804890841 3.940999114 1.985194981 6.17864926
128 2.096209309 4.6042388 2.145749007 7.788418833
129 2.047658751 4.337290741 2.082616321 7.612336837
130 2.680572077 5.989462544 2.447337848 12.15745472
131 2.333554566 5.407171771 2.325332615 10.44467195
132 2.212180997 4.932817886 2.220994797 8.881836305
133 1.478852439 3.063390922 1.750254531 2.890371758
134 1.648334702 3.518387649 1.875736562 4.141546164
135 2.307921185 4.90823336 2.215453308 9.305650552
136 2.13384989 4.645130271 2.155256428 8.018790088
137 1.728309802 3.555348061 1.885563062 4.941642423
138 1.691821236 3.556775613 1.885941572 4.886582645
139 1.746238611 3.891820298 1.972769702 5.363543151
140 1.679155631 3.642966397 1.908655652 4.754882459
141 1.94348069 4.156536582 2.038758589 6.277601677
142 1.549402462 3.250374492 1.8028795 3.342508385
143 1.856975574 4.232023463 2.057188242 6.413458957
144 2.529503815 5.684310793 2.38417927 11.22830537
145 2.035545742 4.643428898 2.154861689 7.244227516
146 2.467132416 5.697093487 2.386858497 11.50287513
147 2.298324686 4.870031331 2.206814748 9.286469586
148 1.937388065 4.34601078 2.0847088 7.322972679
149 1.956955486 4.536730733 2.129960266 7.739019572
150 2.036823984 4.518958489 2.125784206 8.594154233
151 1.972996546 4.529692045 2.128307319 7.967481199
152 1.58746864 3.283839256 1.812136655 3.314186005
153 1.521311054 3.464922216 1.861430153 3.681603045
154 2.44446969 5.445011746 2.333454895 10.3609124
155 2.294121109 4.731979033 2.17531125 9.105210941
156 3.126345733 6.927557906 2.632025438 15.6772624
157 1.867746396 4.253056253 2.06229393 6.32459191
158 1.839082858 4.029806041 2.00743768 5.382980154
159 2.127330896 4.844974178 2.201130205 7.863266724
160 2.404523583 5.236441963 2.288327329 10.04902409
161 2.262955985 4.845642719 2.201282063 9.034969801
162 2.253418218 4.727387819 2.174255693 9.130463484
163 2.302083991 5.167955549 2.273313781 10.06411762
164 2.192165626 4.835011259 2.198865903 9.262695602
165 1.672685332 3.734489965 1.93248285 4.565493369
166 1.568460311 3.539508997 1.881358285 3.52282487
167 1.609819887 3.523868735 1.877197042 3.920784511
168 1.616583967 3.587676949 1.894116403 4.394572604
169 1.643301653 3.654700957 1.911727218 3.912023005
170 1.621923158 3.581532841 1.892493815 3.891820298
171 2.090637708 4.527208645 2.127723818 8.536995819
172 2.109497906 4.585222548 2.141313277 8.203668045
173 2.03091153 4.429625613 2.104667578 7.785783239
174 2.09487893 4.582924577 2.140776629 8.204589814
175 2.040382454 4.335786342 2.08225511 6.632541816
176 2.312894869 5.342334252 2.311349011 9.798127037
177 1.430087263 3.148453361 1.774388165 2.939161922
178 2.293711966 4.871098263 2.20705647 9.392661929
179 2.391075023 4.894101478 2.212261621 9.375295332
180 2.517077345 5.718436483 2.391325257 11.47221284
181 1.989024673 4.154969184 2.038374152 6.872128101
182 2.02016078 4.294014757 2.072200463 7.403304815
183 1.797360845 4.076689627 2.019081382 5.90560705
184 1.705239225 3.931825633 1.982883162 5.697965589
185 1.471533812 3.312439025 1.820010721 3.529590596
186 1.438083095 3.346917175 1.829458164 3.533978493
187 1.619261465 3.559624618 1.886696748 4.109233175
188 1.609819887 3.6558396 1.912025 4.166355098
189 2.346796539 5.146965796 2.26869253 9.872567414
190 1.784208279 3.519720884 1.876091918 4.879539029
191 1.832126365 3.811539467 1.952316436 5.259368616
192 1.677986168 3.452840615 1.858182073 3.885884348
193 1.966109701 4.163870625 2.04055645 6.526348436
194 1.701367309 3.828641396 1.956691441 4.605170186
195 1.931534652 4.279440046 2.06868075 6.927802974
196 1.36183801 3.102342009 1.761346646 2.645465326
197 2.432819556 5.883322388 2.425556099 10.46486408
198 2.078341803 4.564943223 2.136572775 7.650468513
199 1.432099112 3.171155089 1.780773733 2.931193752
200 2.174427741 4.839451482 2.199875333 8.482392615
201 2.16404302 4.710430697 2.170352666 8.620246046
202 1.738643812 3.737669618 1.933305361 5.834810737
203 2.303817478 5.000921602 2.236274044 9.718344619
204 1.741189967 3.731819205 1.931791708 5.090062428
205 1.794671893 3.904293207 1.975928442 5.247024072
206 1.757635562 3.857777991 1.964122703 5.006560336
207 1.676226207 3.66137978 1.913473224 4.566637236
208 1.77911412 3.86388263 1.965676125 5.669260041
209 2.059914227 4.564348191 2.136433521 7.695152987
210 1.32424147 3.104586678 1.761983734 2.182674796
211 1.604334732 3.751518852 1.936883799 4.85787254
212 1.662497734 3.79739748 1.948691222 5.073109185
213 1.44885795 3.04690056 1.745537327 2.907447359
214 2.487551021 5.598973005 2.366214911 10.97673998
215 2.438166592 5.528436532 2.351262753 10.75773968
216 1.892477044 4.164647686 2.040746845 7.15334893
217 1.520482581 3.272335343 1.80895974 3.424588334
218 2.488969385 5.681996883 2.383693957 10.74868607
219 2.215837149 4.53044664 2.128484588 7.620705087
220 2.442786243 5.526780079 2.350910479 10.69919132
221 2.570602875 5.907702431 2.430576563 11.59161344
222 2.608344119 6.053264948 2.460338381 12.33182385
223 2.524368131 5.738731256 2.395564914 11.20612853
224 1.539964086 3.38269391 1.839210132 3.571221411
225 1.541550744 3.476614021 1.864568052 3.523119986
226 2.111209474 4.695924549 2.167008202 8.126284621
227 1.910391851 4.139955073 2.034687955 6.467590025
228 2.801971864 6.015864434 2.452725919 13.0280527
229 2.616209119 5.780126041 2.404189269 11.53329656
230 2.570130461 5.673975975 2.38201091 10.97701107
231 2.545595117 5.629669374 2.372692431 11.14107887
232 2.618299253 5.800606659 2.408444863 11.97035031
233 2.443348195 5.385412073 2.320649063 10.85417971
234 2.385152788 5.279188197 2.297648406 10.67131308
235 2.512400994 5.685007319 2.384325338 11.58593194
236 2.39352554 5.12693575 2.26427378 10.4590302
237 1.823796962 3.992680908 1.998169389 6.109247583
238 1.768267491 3.745968421 1.935450444 5.260096154
239 2.376820756 5.302583255 2.302733865 10.4487146
240 2.042402374 4.477336814 2.115971837 7.810068783
241 2.159700495 4.673996377 2.161942732 8.189916149
242 1.948229832 4.378018613 2.092371528 6.932447892
243 1.330510703 3.059880093 1.749251295 2.083184528
244 1.464097665 3.342685111 1.828301154 3.072693315
245 1.446917352 3.196630216 1.787912251 2.829087196
246 2.082252099 4.60990894 2.14706985 8.075582637
247 1.933494729 4.136126096 2.033746812 7.003065459
248 1.840298976 3.949126093 1.987240824 7.056175284
249 1.649584193 3.645188765 1.909237745 4.51129897
250 1.778648064 3.883623531 1.97069113 5.09681299
251 2.526339825 5.903056741 2.429620699 11.66907415
252 2.512244141 5.734958092 2.394777253 10.93748043
253 1.947599667 4.356708827 2.08727306 6.514712691
254 2.181687439 4.946274535 2.224022153 8.799405331
255 2.109497906 4.510859507 2.123878411 8.132101136
256 1.831713667 4.188138442 2.046494183 6.109247583
257 1.5517319 3.446807893 1.856558077 3.765840495
258 2.47549747 5.727881894 2.393299374 10.78967984
259 1.96580772 4.156693187 2.038796995 6.229496711
260 1.978602442 4.21508618 2.053067505 7.258412151
261 2.064000486 4.339901708 2.083243075 7.670717659
262 2.117775721 4.510639702 2.123826665 7.731676304
263 2.221912965 4.838923916 2.199755422 8.877208949
264 1.940925986 4.266896327 2.065646709 6.450865289
265 2.040382454 4.579852378 2.140058966 7.857666456
266 2.173143952 4.666735542 2.160262841 8.561717125
267 2.240859653 4.901564199 2.21394765 8.808442394
268 1.888874933 4.080921542 2.02012909 6.163314804
269 1.845529749 4.082609306 2.020546784 6.885284696
270 2.238519604 4.984229093 2.23253871 8.987910316
271 2.393206767 5.29338648 2.300736073 10.70491521
272 2.702044102 5.884714177 2.425842983 12.3883942
273 2.219296721 4.854631045 2.203322728 9.263449766
274 1.96161829 4.090838423 2.022582118 6.993932975
275 2.00561407 4.171305603 2.042377439 7.324270223
276 2.467836387 5.578051269 2.361789844 10.8016414
277 1.390119244 3.100092289 1.760707894 2.379546134
278 1.365322726 3.044760505 1.744924212 2.401525041
279 1.598782218 3.516726026 1.875293584 4.234975692
280 1.94538671 4.131961426 2.032722663 6.199494461
281 2.172592522 4.89858579 2.213274902 8.804952261
282 1.908668224 4.102312732 2.025416681 6.374172668
283 1.944434766 4.112266337 2.027872367 6.54672802
284 1.58387445 3.505557397 1.872313381 3.941581808
285 1.743721514 3.832670536 1.95772075 5.11349268
286 1.592453126 3.549329989 1.883966557 4.871143315
287 1.283414418 2.79971739 1.673235605 1.54329811
288 1.320439849 2.90690106 1.704963654 2.070653036
289 1.194572818 2.708716646 1.645817926 1.184789985
290 1.231175294 2.681021529 1.637382524 1.115141591
291 1.365322726 3.074795481 1.753509476 2.254444718
292 1.408422528 3.000719815 1.732258588 2.422144328
293 2.184734225 4.886582645 2.210561613 8.803574418
294 2.030652566 4.649187071 2.156197364 7.901007052
295 1.890679763 4.02356438 2.005882444 6.212726329
296 1.855414729 4.027135813 2.006772486 5.858647185
297 1.819146836 3.737669618 1.933305361 5.340274716
298 1.51380043 3.337192052 1.826798306 3.514823642
299 1.923936518 4.162158962 2.040136996 6.485993092
300 2.54480266 5.875913394 2.42402834 11.44989333
301 2.015083881 4.471638793 2.114624977 7.725330038
302 1.902054478 4.30514559 2.074884476 7.141300544
303 1.932189012 4.149463861 2.037023284 7.112433389
304 1.357151358 2.977059008 1.725415605 2.054123734
305 2.172040349 4.677490848 2.162750759 7.584422406
306 2.12856108 4.80073697 2.191058413 8.165269798
307 1.597383378 3.38269391 1.839210132 3.948162052
308 1.571436916 3.451890496 1.857926397 3.970291914
309 1.669116161 3.728100167 1.930828881 4.514479321
310 1.792870023 3.818920387 1.95420582 5.395716273
311 2.701422654 6.042632834 2.45817673 12.51412704
312 2.724885462 6.056784013 2.461053436 12.63817968
313 2.649668658 5.96870756 2.44309385 12.34583459
314 1.328012928 3.19047635 1.786190457 2.231089091
315 2.290836238 4.827072968 2.197060074 8.67484801
316 2.375600157 5.495650681 2.344280419 10.05803872
317 1.625886455 3.693369359 1.92181408 5.110178924
318 2.329332455 5.313205979 2.305039258 9.613803477
319 1.515480102 3.456316681 1.859117178 3.185525845
320 1.472454994 3.284663565 1.812364082 3.025291076
321 1.506165026 3.349904087 1.83027432 3.054001182
322 1.473374347 3.306520335 1.81838399 3.25617161
323 1.527068855 3.325036021 1.82346813 3.36729583
324 2.110354575 4.662495253 2.159281189 8.165363632
325 1.523787537 3.514526067 1.874706928 3.062923523
326 2.023599447 4.094344562 2.02344868 6.938769333
327 2.753898938 5.917548864 2.432601255 12.66032792
328 2.617941755 5.678362097 2.382931408 11.46939146
329 2.119034653 4.483002552 2.117310216 8.240121298
330 2.066147705 4.476199805 2.115703147 7.14397299
331 2.101925481 4.630837933 2.151938181 7.659327016
332 2.508777239 5.407171771 2.325332615 11.8987611
333 2.005568244 4.463030419 2.112588559 7.397665697
334 1.726738648 3.759687344 1.938991321 4.430816799
335 1.774901671 3.812975852 1.952684268 4.937562683
336 1.648959883 3.423610976 1.85030024 3.988984047
337 1.777714463 3.797733859 1.948777529 5.403847868
338 1.704136403 3.63758616 1.9072457 5.272486607
339 1.844729114 3.968445871 1.992095849 6.429622699
340 1.768267491 3.797733859 1.948777529 5.523339153
341 2.159320704 4.744410253 2.178166718 8.301035184
342 2.109497906 4.580877493 2.140298459 7.726636028
343 2.521315024 5.573617308 2.360850971 11.60597801
344 2.576758408 5.79269513 2.406801847 11.8427421
345 2.669803365 5.872117789 2.423245301 12.427118
346 2.441001399 5.430134791 2.330264962 11.48863277
347 2.117775721 4.750395438 2.17954019 8.556413905
348 2.023599447 4.553734634 2.133948133 7.34601021
349 2.394268344 5.066826574 2.250961255 9.954988325
350 2.106053393 4.696472344 2.167134593 7.892825526
351 2.100394247 4.736330019 2.176311103 7.72870183
352 2.160269524 4.922204729 2.21860423 8.255482913
353 2.188997276 4.774912961 2.185157422 9.409191231
354 1.874905013 3.86388263 1.965676125 6.003887067
355 2.061842158 4.182126476 2.045024811 6.745236349
356 1.418864374 3.119939077 1.766334928 2.7631695
data<-read.csv(data.csv,header=T)
fit.all<-lm(a~x+y+z,data=data)
b<-data.frame(data,predict(fit.all,interval="prediction"))
ggplot(data,aes(x=x+y+z,y=a))+
geom_point(size=3,shape=1,col="black")+
geom_smooth(method="lm")+
geom_line(aes(y=lwr), color = "red", linetype = "dashed")+
geom_line(aes(y=upr), color = "red", linetype = "dashed")+
theme_classic()
That approach is not going to produce a sensible graphical display as Ben suggested.
What you could do is examine the relationship between each predictor and the outcome separately while holding the other predictors not under immediate consideration constant at some chosen level.
Here I use the means as those chosen levels.
data_x.yz <-
data.frame(
x = seq(min(data$x), max(data$x), 0.1),
y = mean(data$y),
z = mean(data$z)
)
data_x.yz <-
cbind(
data_x.yz,
predict(fit.all, newdata = data_x.yz, interval = "prediction")
)
ggplot(data_x.yz, aes(x, fit, ymin = lwr, ymax = upr)) +
geom_line(color = "blue") +
geom_ribbon(fill = NA, color = "red", linetype = "dashed")
data_y.xz <-
data.frame(
x = mean(data$x),
y = seq(min(data$y), max(data$y), 0.1),
z = mean(data$z)
)
data_y.xz <-
cbind(
data_y.xz,
predict(fit.all, newdata = data_y.xz, interval = "prediction")
)
ggplot(data_y.xz, aes(y, fit, ymin = lwr, ymax = upr)) +
geom_line(color = "blue") +
geom_ribbon(fill = NA, color = "red", linetype = "dashed")
data_z.yx <-
data.frame(
x = mean(data$x),
y = mean(data$y),
z = seq(1.6, 2.6, 0.1)
)
data_z.yx <-
cbind(
data_z.yx,
predict(fit.all, newdata = data_z.yx, interval = "prediction")
)
ggplot(data_z.yx, aes(z, fit, ymin = lwr, ymax = upr)) +
geom_line(color = "blue") +
geom_ribbon(fill = NA, color = "red", linetype = "dashed")

Subset timeseries (date sequence) into a list

I have a dataframe with a series of dates, here's a simplified version of it:
> eventdates
dr.rank dr.start dr.end
1 14 1964-09-30 1964-10-06
2 16 1964-11-01 1964-12-24
I also have a time series of dates with values etc. associated with that, here's a much simplified version of the timeseries:
ts1964 <- data.frame(DATE = seq(from = as.Date("1964-01-01"), to = as.Date("1964-12-31"), by = "days"),
Q = 1:366)
What I am trying to do is subset by each date in eventdates, i.e.:
> filter(ts1964, ts1964$DATE >= eventdates[1,2] & ts1964$DATE <= eventdates[1,3])
DATE Q
1 1964-09-30 274
2 1964-10-01 275
3 1964-10-02 276
4 1964-10-03 277
5 1964-10-04 278
6 1964-10-05 279
7 1964-10-06 280
8 1964-10-07 281
9 1964-10-08 282
10 1964-10-09 283
11 1964-10-10 284
12 1964-10-11 285
13 1964-10-12 286
14 1964-10-13 287
15 1964-10-14 288
16 1964-10-15 289
17 1964-10-16 290
18 1964-10-17 291
19 1964-10-18 292
20 1964-10-19 293
21 1964-10-20 294
22 1964-10-21 295
23 1964-10-22 296
24 1964-10-23 297
25 1964-10-24 298
26 1964-10-25 299
27 1964-10-26 300
28 1964-10-27 301
29 1964-10-28 302
30 1964-10-29 303
31 1964-10-30 304
32 1964-10-31 305
33 1964-11-01 306
>
But I need to do this hundreds of times. What I would like to do is have each subset form an element in a list. I would normally be considering to using something like dlply in plyr but this isn't an option when I'm using dplyr. Could anyone advise on how I might achieve this otherwise? Thanks
We can use Map
Map(function(x,y) filter(ts1964, DATE >= x & DATE <= y),
eventdates$dr.start, eventdates$dr.end)

grep: How can i search through my data using a wildcard in R

I have recently started using R. So now I am trying to get some data out of it. However, the results I get are quite confusing. I have datas from the year 1961 to 1963 of everyday in the format 1961-04-25. I created a vector called: date
So when I try to use grep to just search for the period between April 10 and May 21 and display the dates I used this command:
date[date >= grep("196.-04-10", date, value = TRUE) &
date <= grep("196.-05-21", date, value = TRUE)]
The results I get is are somehow confusing as it is making 3 days steps instead of giving me every single day... see below.
[1] "1961-04-10" "1961-04-13" "1961-04-16" "1961-04-19" "1961-04-22" "1961-04-25" "1961-04-28" "1961-05-01" "1961-05-04" "1961-05-07" "1961-05-10"
[12] "1961-05-13" "1961-05-16" "1961-05-19" "1962-04-12" "1962-04-15" "1962-04-18" "1962-04-21" "1962-04-24" "1962-04-27" "1962-04-30" "1962-05-03"
[23] "1962-05-06" "1962-05-09" "1962-05-12" "1962-05-15" "1962-05-18" "1962-05-21" "1963-04-11" "1963-04-14" "1963-04-17" "1963-04-20" "1963-04-23"
[34] "1963-04-26" "1963-04-29" "1963-05-02" "1963-05-05" "1963-05-08" "1963-05-11" "1963-05-14" "1963-05-17" "1963-05-20"
I think the grep strategy is misguided, but maybe something like this will work ... basically, I'm computing the day-of-year (Julian date, yday()) and using that for comparison.
z <- as.Date(c("1961-04-10","1961-04-11","1961-04-12",
"1961-05-21","1961-05-22","1961-05-23",
"1963-04-09","1963-04-12","1963-05-21","1963-05-22"))
library(lubridate)
z[yday(z)>=yday(as.Date("1961-04-10")) & yday(z)<=yday(as.Date("1961-05-21"))]
## [1] "1961-04-10" "1961-04-11" "1961-04-12" "1961-05-21" "1963-04-12"
## [6] "1963-05-21"yz <- year(z)
Actually, this solution is fragile to leap-years ...
Better (?):
yz <- year(z)
z[z>=as.Date(paste0(yz,"-04-10")) & z<=as.Date(paste0(yz,"-05-21"))]
(You should definitely test this for yourself, I haven't tested carefully!)
Using a date format for your variable would be the best bet here.
## set up some test data
datevar <- seq.Date(as.Date("1961-01-01"),as.Date("1963-12-31"),by="day")
test <- data.frame(date=datevar,id=1:(length(datevar)))
head(test)
## which looks like:
> head(test)
date id
1 1961-01-01 1
2 1961-01-02 2
3 1961-01-03 3
4 1961-01-04 4
5 1961-01-05 5
6 1961-01-06 6
## find the date ranges you want
selectdates <-
(format(test$date,"%m") == "04" & as.numeric(format(test$date,"%d")) >= 10) |
(format(test$date,"%m") == "05" & as.numeric(format(test$date,"%d")) <= 21)
## subset the original data
result <- test[selectdates,]
## which looks as expected:
> result
date id
100 1961-04-10 100
101 1961-04-11 101
102 1961-04-12 102
103 1961-04-13 103
104 1961-04-14 104
105 1961-04-15 105
106 1961-04-16 106
107 1961-04-17 107
108 1961-04-18 108
109 1961-04-19 109
110 1961-04-20 110
111 1961-04-21 111
112 1961-04-22 112
113 1961-04-23 113
114 1961-04-24 114
115 1961-04-25 115
116 1961-04-26 116
117 1961-04-27 117
118 1961-04-28 118
119 1961-04-29 119
120 1961-04-30 120
121 1961-05-01 121
122 1961-05-02 122
123 1961-05-03 123
124 1961-05-04 124
125 1961-05-05 125
126 1961-05-06 126
127 1961-05-07 127
128 1961-05-08 128
129 1961-05-09 129
130 1961-05-10 130
131 1961-05-11 131
132 1961-05-12 132
133 1961-05-13 133
134 1961-05-14 134
135 1961-05-15 135
136 1961-05-16 136
137 1961-05-17 137
138 1961-05-18 138
139 1961-05-19 139
140 1961-05-20 140
141 1961-05-21 141
465 1962-04-10 465
...

Resources