I have a time series with the following data:
Provice Date Confirmed.cases virus fever Wuhan_Pneumonia temp wuhan sars
20 Anhui 02/09 779 30 0 0 10 25 0
21 Anhui 02/10 830 0 0 21 12 28 0
22 Anhui 02/11 860 43 0 21 12 0 0
23 Anhui 02/12 889 0 0 0 14 0 0
47 Chongqing 01/21 0 0 0 48 10 61 50
48 Chongqing 01/22 1 67 0 31 11 23 46
49 Chongqing 01/23 5 38 0 36 11 71 54
50 Chongqing 01/24 18 84 0 41 9 43 0
51 Chongqing 01/25 48 59 100 84 8 100 61
52 Chongqing 01/26 66 84 0 35 7 33 100
and would like to plot an overlapping time series plot using ggplot. However, when I do this the time series plot produced is distorted such as the one below. Can somebody please shed some light as to what I'm doing wrong?
Here is the code I am using to produce the plot below:
ggplot(dta3, aes(x = Date, y = Confirmed.cases, color = Provice, group = 1)) +
geom_line() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
I had suspicions, and Ronak Shah voiced the same.
Does this fix it?
ggplot(dta3, aes(x = Date, y = Confirmed.cases, color = Provice, group = Provice)) +
geom_line() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Related
I wish to plot some count data (likely as a bubble plot). I've some different experiments and for each experiment, I've three replicates. The output from the table() command is given below.
> with(myData.df, table(ChargeGroup,Expt,Repx))
, , Repx = 1
Expt
ChargeGroup Ctrl CV2 Gas n15 n30 n45 n60 p15 p30 v0
<+10 540 512 567 204 642 648 71 2 2 6
+10:+15 219 258 262 156 283 16 0 1 0 7
+15:+20 119 118 14 200 14 0 0 7 0 51
+20:+25 57 38 0 84 1 0 0 31 7 87
+25: 30 16 0 17 0 0 0 24 19 18
, , Repx = 2
Expt
ChargeGroup Ctrl CV2 Gas n15 n30 n45 n60 p15 p30 v0
<+10 529 522 582 201 642 626 77 1 2 5
+10:+15 232 249 264 150 273 14 0 1 0 5
+15:+20 116 113 18 204 13 0 0 12 0 41
+20:+25 53 46 0 82 0 0 0 36 6 94
+25: 28 12 0 26 0 0 0 33 21 28
, , Repx = 3
Expt
ChargeGroup Ctrl CV2 Gas n15 n30 n45 n60 p15 p30 v0
<+10 536 525 591 224 671 641 63 1 2 6
+10:+15 236 238 257 170 276 16 0 2 1 10
+15:+20 113 108 15 212 12 0 0 10 0 47
+20:+25 57 40 0 77 0 0 0 34 3 107
+25: 32 11 0 25 0 0 0 26 15 26
Can anyone help in to further process the output so that I can go directly for plotting in either base graphics or ggplot?
Thanks
There are couple of methods - with base R, by looping over the third dmension and plotting with barplot
par(mfrow = c(3, 1))
apply(with(myData.df, table(ChargeGroup,Expt,Repx)), 3, barplot)
-testing
par(mfrow = c(3, 1))
apply(with(mtcars, table(cyl, vs, gear)), 3, barplot)
Or convert to a single data.frame with as.data.frame and using ggplot or directly get the data.frame/tibble output with count
library(dplyr)
library(ggplot2)
myData.df %>%
count(ChargeGroup,Expt,Repx) %>%
ggplot(aes(x=ChargeGroup, y = n, fill = Expt)) +
geom_col() +
facet_wrap(~ Repx)
-testing
mtcars %>%
count(cyl = factor(cyl), vs = factor(vs), gear = factor(gear)) %>%
ggplot(aes(x = cyl, y = n, fill = vs)) +
geom_col() +
facet_wrap(~ gear)
I would like to create an interrupted time series plot with ggplot2 in R with three different times and 2 interventions.
I've done the plot with only 1 intervention and 2 times and this worked..
fas1 <- as.Date("2020-03-09")
fas2 <- as.Date("2020-04-04")
df$post1pre2 <- (df$data>= fas1 & df$data < fas2)
df$post2 <- (df$data >= fas2)
df$DateNum <- as.numeric(df$data)
df$DateNumCtr <- df$DateNum - as.numeric(fas1)
df$DateNumCtr1 <- df$DateNum - as.numeric(fas2)
My data:
> df
id N data fase Nri Nti Ncg PCT VPG NDg post1pre2
1 1 7 2020-02-24 0 0 0 0 0.0000000 NA 0 FALSE
2 2 28 2020-02-25 0 0 0 0 0.0000000 0.0000000 0 FALSE
3 3 49 2020-02-26 0 0 0 0 0.0000000 0.0000000 0 FALSE
4 4 70 2020-02-27 0 0 0 0 0.0000000 0.0000000 0 FALSE
5 5 91 2020-02-28 0 0 0 0 0.0000000 0.0000000 0 FALSE
6 6 112 2020-02-29 0 0 0 0 0.0000000 0.0000000 0 FALSE
7 7 133 2020-03-01 0 0 0 6 11.1111111 0.0000000 0 FALSE
8 8 154 2020-03-02 0 0 0 3 11.5384615 50.0000000 0 FALSE
9 9 175 2020-03-03 0 1 0 4 4.7058824 44.4444444 0 FALSE
10 10 196 2020-03-04 0 3 0 5 22.7272727 38.4615385 0 FALSE
11 11 217 2020-03-05 0 4 0 3 14.2857143 16.6666667 0 FALSE
12 12 238 2020-03-06 0 4 0 10 5.5555556 47.6190476 0 FALSE
13 13 259 2020-03-07 0 7 1 11 0.0000000 35.4838710 0 FALSE
14 14 280 2020-03-08 0 7 1 15 3.5714286 35.7142857 1 FALSE
15 15 301 2020-03-09 1 18 1 36 10.3746398 63.1578947 0 TRUE
16 16 322 2020-03-10 1 27 6 23 8.9147287 24.7311828 2 TRUE
17 17 343 2020-03-11 1 16 5 10 2.1231423 8.6206897 3 TRUE
18 18 364 2020-03-12 1 23 10 41 7.7212806 32.5396825 2 TRUE
19 19 385 2020-03-13 1 59 8 90 16.5137615 53.8922156 2 TRUE
20 20 406 2020-03-14 1 67 11 44 19.3832599 17.1206226 3 TRUE
21 21 427 2020-03-15 1 98 12 46 148.3870968 15.2823920 1 TRUE
22 22 448 2020-03-16 1 96 19 39 2.7008310 11.2391931 8 TRUE
23 23 469 2020-03-17 1 104 21 8 7.4766355 2.0725389 8 TRUE
24 24 490 2020-03-18 1 128 27 68 0.0000000 17.2588833 1 TRUE
25 25 511 2020-03-19 1 134 29 137 -15.1214128 29.6536797 5 TRUE
26 26 532 2020-03-20 1 143 32 57 6.2500000 9.5158598 2 TRUE
27 27 553 2020-03-21 1 152 37 134 13.5216953 20.4268293 4 TRUE
28 28 574 2020-03-22 1 163 47 84 10.4218362 10.6329114 5 TRUE
29 29 595 2020-03-23 1 185 41 56 8.4464555 6.4073227 7 TRUE
30 30 616 2020-03-24 1 195 49 62 5.6261343 6.6666667 10 TRUE
31 31 637 2020-03-25 1 200 52 147 15.1859504 14.8185484 6 TRUE
32 32 658 2020-03-26 1 212 54 84 6.8459658 7.3748903 2 TRUE
33 33 679 2020-03-27 1 222 57 94 8.3928571 7.6860180 4 TRUE
34 34 700 2020-03-28 1 223 59 119 13.4920635 9.0356872 11 TRUE
35 35 721 2020-03-29 1 236 60 44 6.5281899 3.0640669 11 TRUE
36 36 742 2020-03-30 1 229 60 21 3.4653465 1.4189189 9 TRUE
37 37 763 2020-03-31 1 215 60 92 10.2678571 6.1292472 6 TRUE
38 38 784 2020-04-01 1 206 60 92 8.4558824 5.7752668 9 TRUE
39 39 805 2020-04-02 1 199 60 114 10.0529100 6.7655786 7 TRUE
40 40 826 2020-04-03 1 201 61 80 2.7932961 4.4469150 7 TRUE
41 41 847 2020-04-04 1 183 50 107 9.3777388 5.6945184 9 FALSE
42 42 868 2020-04-05 1 185 50 62 11.7870722 3.1218530 9 FALSE
43 43 889 2020-04-06 1 189 46 55 8.8709677 2.6855469 4 FALSE
44 44 910 2020-04-07 1 185 44 50 3.9556962 2.3775559 6 FALSE
45 45 931 2020-04-08 1 162 41 65 5.1505547 3.0190432 5 FALSE
46 46 952 2020-04-09 1 167 37 81 5.7569296 3.6519387 2 FALSE
47 47 973 2020-04-10 1 167 33 50 2.6427061 2.1748586 8 FALSE
48 48 994 2020-04-11 1 172 28 44 2.7707809 1.8731375 6 FALSE
49 49 1015 2020-04-12 1 160 28 38 12.5412541 1.5879649 10 FALSE
50 50 1036 2020-04-13 1 159 30 51 5.3515215 2.0979021 7 FALSE
51 51 1057 2020-04-14 1 166 28 38 4.6172539 1.5310234 4 FALSE
52 52 1078 2020-04-15 1 163 24 24 1.1787819 0.9523810 6 FALSE
53 53 1099 2020-04-16 1 159 26 72 3.6622584 2.8301887 5 FALSE
54 54 1120 2020-04-17 1 148 23 59 1.3836773 2.2553517 3 FALSE
55 55 1141 2020-04-18 1 141 27 56 1.7162121 2.0934579 2 FALSE
56 56 1162 2020-04-19 1 140 25 14 1.0534236 0.5126327 3 FALSE
57 57 1183 2020-04-20 1 140 22 30 5.5658627 1.0928962 14 FALSE
58 58 1204 2020-04-21 1 139 21 17 0.5091345 0.6126126 2 FALSE
59 59 1225 2020-04-22 1 134 20 25 1.0552976 0.8954155 5 FALSE
60 60 1246 2020-04-23 1 138 18 41 2.0009761 1.4554491 10 FALSE
61 61 1267 2020-04-24 1 136 16 24 0.9287926 0.8397481 2 FALSE
62 62 1288 2020-04-25 1 122 15 21 0.7309433 0.7286607 5 FALSE
63 63 1309 2020-04-26 1 129 13 14 1.0719755 0.4822597 1 FALSE
64 64 1330 2020-04-27 1 130 13 60 4.7656871 2.0569078 7 FALSE
65 65 1351 2020-04-28 1 136 13 18 0.5605730 0.6046355 7 FALSE
66 66 1372 2020-04-29 1 134 12 15 0.5729565 0.5008347 7 FALSE
post2 DateNum DateNumCtr DateNumCtr1
1 FALSE 18316 -14 -40
2 FALSE 18317 -13 -39
3 FALSE 18318 -12 -38
4 FALSE 18319 -11 -37
5 FALSE 18320 -10 -36
6 FALSE 18321 -9 -35
7 FALSE 18322 -8 -34
8 FALSE 18323 -7 -33
9 FALSE 18324 -6 -32
10 FALSE 18325 -5 -31
11 FALSE 18326 -4 -30
12 FALSE 18327 -3 -29
13 FALSE 18328 -2 -28
14 FALSE 18329 -1 -27
15 FALSE 18330 0 -26
16 FALSE 18331 1 -25
17 FALSE 18332 2 -24
18 FALSE 18333 3 -23
19 FALSE 18334 4 -22
20 FALSE 18335 5 -21
21 FALSE 18336 6 -20
22 FALSE 18337 7 -19
23 FALSE 18338 8 -18
24 FALSE 18339 9 -17
25 FALSE 18340 10 -16
26 FALSE 18341 11 -15
27 FALSE 18342 12 -14
28 FALSE 18343 13 -13
29 FALSE 18344 14 -12
30 FALSE 18345 15 -11
31 FALSE 18346 16 -10
32 FALSE 18347 17 -9
33 FALSE 18348 18 -8
34 FALSE 18349 19 -7
35 FALSE 18350 20 -6
36 FALSE 18351 21 -5
37 FALSE 18352 22 -4
38 FALSE 18353 23 -3
39 FALSE 18354 24 -2
40 FALSE 18355 25 -1
41 TRUE 18356 26 0
42 TRUE 18357 27 1
43 TRUE 18358 28 2
44 TRUE 18359 29 3
45 TRUE 18360 30 4
46 TRUE 18361 31 5
47 TRUE 18362 32 6
48 TRUE 18363 33 7
49 TRUE 18364 34 8
50 TRUE 18365 35 9
51 TRUE 18366 36 10
52 TRUE 18367 37 11
53 TRUE 18368 38 12
54 TRUE 18369 39 13
55 TRUE 18370 40 14
56 TRUE 18371 41 15
57 TRUE 18372 42 16
58 TRUE 18373 43 17
59 TRUE 18374 44 18
60 TRUE 18375 45 19
61 TRUE 18376 46 20
62 TRUE 18377 47 21
63 TRUE 18378 48 22
64 TRUE 18379 49 23
65 TRUE 18380 50 24
66 TRUE 18381 51 25
glsFit1 <- gls(model = Ncg ~ DateNumCtr + post1pre2 + DateNumCtr:post1pre2,
data = df,
correlation = corAR1(0.25))
summary(glsFit1)
glsFit2 <- gls(model = Ncg ~ DateNumCtr1 + post2 + DateNumCtr1:post2,
data = df,
correlation = corAR1(0.25))
summary(glsFit2)
newdata <- data.frame(DateNumCtr = seq(min(df$DateNumCtr), max(df$DateNumCtr), by = 1))
newdata$post1pre2 <- (newdata$DateNumCtr >= 0)
newdata <- data.frame(DateNumCtr1 = seq(min(df$DateNumCtr1), max(df$DateNumCtr1), by = 1))
newdata$post2 <- (newdata$DateNumCtr1 >= 0)
newdata$Ncg <- predict(glsFit1, newdata = newdata)
newdata$Ncg1 <- predict(glsFit2, newdata = newdata)
ggplot(data = df, mapping = aes(x = DateNumCtr, y = Ncg)) +
geom_line(stat = "identity", position = "identity",size=1) +
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr < 0),
stat = "identity", position = "identity",
color="red", size=1.5)+
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr >= 0),
stat = "identity", position = "identity",
color="blue", size=1.5)+
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr >= 0),
stat = "identity", position = "identity",
color="green", size=1.5)+
theme_bw() + theme(legend.key = element_blank())+ labs(y= "Ncg %", x = "Giorno")
I've tried also in this way (even if I don't have the stats in this way)
geom_line() +
geom_smooth(method="lm", se=FALSE, aes(colour=post1pre2)) +
theme_bw() +
labs(colour="")
And in this:
geom_line() +
geom_smooth(method="lm", se=FALSE, aes(colour=fase)) +
theme_bw() +
labs(colour="")
But No results...
With only 1 intervention and 2 times I obtain this:
with this code:
df$data<- as.Date(df$data,format="%d/%m/%y")
fas1 <- as.Date("2020-03-09")
df$postfase1 <- (df$data >= fas1)
df$DateNum <- as.numeric(df$data)
df$DateNumCtr <- df$DateNum - as.numeric(fas1)
ggplot(data = df, mapping = aes(x = data, y = Ncg)) + layer(geom = "line", stat = "identity", position = "identity") + theme_bw() + theme(legend.key = element_blank())
glsFit1 <- gls(model = Ncg ~ DateNumCtr + postfase1 + DateNumCtr:postfase1,
data = df,
correlation = corAR1(0.25))
summary(glsFit1)
newdata <- data.frame(DateNumCtr = seq(min(df$DateNumCtr), max(df$DateNumCtr), by = 1))
newdata$postfase1 <- (newdata$DateNumCtr >= 0)
newdata$Ncg <- predict(glsFit1, newdata = newdata)
ggplot(data = df, mapping = aes(x = DateNumCtr, y = Ncg)) +
geom_line(stat = "identity", position = "identity",size=1) +
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr < 0),
stat = "identity", position = "identity",
color="red", size=1.5)+
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr >= 0),
stat = "identity", position = "identity",
color="blue", size=1.5)+
theme_bw() + theme(legend.key = element_blank())+ labs(y= "Ncg %", x = "Giorno")
I would like something like this:
Thank you!
Maybe I'm not understanding correctly, but it seems the reason you are not getting your three lines is due to overplotting - in the code you posted for the red, blue, and green lines, the code for drawing the blue and green lines is identical. If you remove the blue line code, you should see red and green lines.
Nevertheless, it took me a while to see what you were doing, but I think I have a better method to suggest to you. Rather than using three separate calls to geom_smooth or geom_line to draw the three line colors, you should use the innate functions of ggplot to decide how to group the lines together. In this case, you can use the color= aesthetic, which also can control how the lines are connected and which datapoints belong to which group.
ggplot2 is part of the tidyverse, and follows what's called Tidy Data Principles. The information you are using to separate the lines is not in one column, but separated out over two columns: df$post1pre2 and df$post2. You should gather() these two columns into one column that has either the label that the points belong to the group "post1pre2", "post2" or... "middle" (nothing). You can do this using the gather() function from dplyr, or in this case, I just setup a nested ifelse() statement to create a new column called grp in your dataframe:
df$grp <- ifelse(df$post1pre2,'post1pre2',ifelse(df$post2,'post2','post1'))
Then, if you set that column to be assigned to the color= aesthetic, you will generate the plot you desire.
ggplot(df, aes(x=DateNumCtr, y=Ncg)) + theme_bw() +
geom_line(color='gray30') +
geom_smooth(method='lm', se=FALSE, aes(color=grp), size=1.5) +
scale_color_manual(values=list('post1'='red', 'post1pre2'='blue', 'post2'='green4'))
I have a csv file with three columns. The first column is pentad dates (73 pentads in a year) while the second and third columns are for precipitation values.
What I want to do:
[1]. Get the first pentad when the precipitation exceeds the "annual mean" in "at least three consecutive pentads".
I can subset the first column like this:
dat<-read.csv("test.csv",header=T,sep=",")
aa<-which(dat$RR>mean(dat$RR))
This gives me the following:
[1] 27 28 29 30 31 34 36 37 38 41 42 43 44 45 46 52 53 54 55 56 57
The correct output should be P27 in this case.
In the second column:
[1] 31 32 36 38 39 40 41 42 43 44 45 46 47 48 49 50 53 54 55 57 59 60 61
The correct output should be P38.
How can I add a conditional statement here taking into consideration the "three consecutive pentads"?
I don't know how I can implement this in R (in a code). I'll appreciate any suggestion.
I have the following data:
Pentad RR YY
1 0 0.5771428571
2 0.0142857143 0
3 0 1.2828571429
4 0.0885714286 1.4457142857
5 0.0714285714 0.1114285714
6 0 0.36
7 0.0657142857 0
8 0.0285714286 0
9 0.0942857143 0
10 0.0114285714 1
11 0 0.0114285714
12 0 0.0085714286
13 0 0.3057142857
14 0 0
15 0 0
16 0 0
17 0.04 0
18 0 0.8
19 0.8142857143 0.0628571429
20 0.2857142857 0
21 1.14 0
22 5.3342857143 0
23 2.3514285714 0
24 1.9857142857 0.0133333333
25 1.4942857143 0.0433333333
26 2.0057142857 1.4866666667
27 20.0485714286 0
28 25.0085714286 2.4866666667
29 16.32 1.9433333333
30 11.0685714286 0.7733333333
31 8.9657142857 8.1066666667
32 3.9857142857 7.7333333333
33 5.2028571429 0.5
34 7.8028571429 4.3566666667
35 4.4514285714 2.66
36 9.22 6.6266666667
37 32.0485714286 4.4042857143
38 19.5057142857 7.9771428571
39 3.1485714286 12.9428571429
40 2.4342857143 18.4942857143
41 9.0571428571 7.3571428571
42 28.7085714286 11.0828571429
43 34.1514285714 9.0342857143
44 33.0257142857 14.2914285714
45 46.5057142857 34.6142857143
46 70.6171428571 45.3028571429
47 3.1685714286 6.66
48 1.9285714286 6.7028571429
49 7.0314285714 5.9628571429
50 0.9028571429 14.8542857143
51 5.3771428571 2.1
52 11.3571428571 2.8371428571
53 15.0457142857 7.3914285714
54 11.6628571429 32.0371428571
55 21.24 9.0057142857
56 11.4371428571 3.5257142857
57 11.6942857143 12.32
58 2.9771428571 2.32
59 4.3371428571 7.9942857143
60 0.8714285714 6.5657142857
61 1.3914285714 4.7714285714
62 0.8714285714 2.3542857143
63 1.1457142857 0.0057142857
64 2.3171428571 2.5085714286
65 0.1828571429 0.8171428571
66 0.2828571429 2.8857142857
67 0.3485714286 0.8971428571
68 0 0
69 0.3457142857 0
70 0.1428571429 0
71 0.18 0
72 4.8942857143 0.1457142857
73 0.0371428571 0.4342857143
Something like this should do it:
first_exceed_seq <- function(x, thresh = mean(x), len = 3)
{
# Logical vector, does x exceed the threshold
exceed_thresh <- x > thresh
# Indices of transition points; where exceed_thresh[i - 1] != exceed_thresh[i]
transition <- which(diff(c(0, exceed_thresh)) != 0)
# Reference index, grouping observations after each transition
index <- vector("numeric", length(x))
index[transition] <- 1
index <- cumsum(index)
# Break x into groups following the transitions
exceed_list <- split(exceed_thresh, index)
# Get the number of values exceeded in each index period
num_exceed <- vapply(exceed_list, sum, numeric(1))
# Get the starting index of the first sequence where more then len exceed thresh
transition[as.numeric(names(which(num_exceed >= len))[1])]
}
first_exceed_seq(dat$RR)
first_exceed_seq(dat$YY)
I have a load of genomic data as follows:
chr leftPos Sample1 AnotherSample EtcSample
1 4324 434 43 33
1 5353 63 34 532
1 6632 543 3544 23
2 1443 25 345 543
2 7644 74 26 324
2 8886 23 9 23
3 1287 643 45 23
3 5443 93 23 77
3 7668 33 45 33
I would like to create a heatmap organised by chromosome with sample along the x-axis and leftPos along the Y axis. I think this would look good in a facet_wrap image (organised by chromosome) but this means I have to use heatmaps in ggplots and I understand this isn't a thing so I have to use geom_tiles().
So I tried googling all over the place but I'm stuck with how to firstly do a heatmap per chromosome and secondly do tiles per sample. All the examples seem to just use two columns.
df <- data.frame(chr=c(1,1,1,2,2,2,3,3,3),
leftPos=c(4324, 5353, 6632, 1443, 7644, 8886, 1287, 5443, 7668),
Sample1=c(434,63,543,25,74,23,643,93,33),
AnotherSample=c(43,34,3544,345,26,9,45,23,45),
EtcSample=c(33,532,23,543,324,23,23,77,33))
Reshape your data in a long format.
df.l <- reshape(df,
varying = c("Sample1", "AnotherSample", "EtcSample"),
idvar="chr",
v.names = "value",
timevar = "sample",
times=c("Sample1", "AnotherSample", "EtcSample"),
new.row.names=c(1:(3*nrow(df))),
direction = "long")
> df.l
chr leftPos sample value
1 1 4324 Sample1 434
2 1 5353 Sample1 63
3 1 6632 Sample1 543
4 2 1443 Sample1 25
5 2 7644 Sample1 74
...
12 1 6632 AnotherSample 3544
13 2 1443 AnotherSample 345
14 2 7644 AnotherSample 26
15 2 8886 AnotherSample 9
16 3 1287 AnotherSample 45
...
23 2 7644 EtcSample 324
24 2 8886 EtcSample 23
25 3 1287 EtcSample 23
26 3 5443 EtcSample 77
27 3 7668 EtcSample 33
For representation purpose based on your data, I converted leftPos into factor.
library(ggplot2)
df.l$leftPos <- factor(df.l$leftPos)
ggplot(df.l, aes(sample, leftPos)) + geom_tile(aes(fill = value)) +
scale_fill_gradient(low = "white", high = "red") + facet_wrap(~chr)+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
My question is similar to Fill region between two loess-smoothed lines in R with ggplot1
But I have two groups.
g1<-ggplot(NVIQ_predict,aes(cogn.age, predict, color=as.factor(NVIQ_predict$group)))+
geom_smooth(aes(x = cogn.age, y = upper,group=group),se=F)+
geom_line(aes(linetype = group), size = 0.8)+
geom_smooth(aes(x = cogn.age, y = lower,group=group),se=F)
I want to fill red and blue for each group.
I tried:
gg1 <- ggplot_build(g1)
df2 <- data.frame(x = gg1$data[[1]]$x,
ymin = gg1$data[[1]]$y,
ymax = gg1$data[[3]]$y)
g1 + geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax),fill = "grey", alpha = 0.4)
But it gave me the error: Aesthetics must either be length one, or the same length as the dataProblems
I get the same error every time my geom_ribbon() data and ggplot() data differ.
Can somebody help me with it? Thank you so much!
My data looks like:
> NVIQ_predict
cogn.age predict upper lower group
1 7 39.04942 86.68497 18.00000 1
2 8 38.34993 82.29627 18.00000 1
3 10 37.05174 74.31657 18.00000 1
4 11 36.45297 70.72421 18.00000 1
5 12 35.88770 67.39555 18.00000 1
6 13 35.35587 64.32920 18.00000 1
7 14 34.85738 61.52322 18.00000 1
8 16 33.95991 56.68024 18.00000 1
9 17 33.56057 54.63537 18.00000 1
10 18 33.19388 52.83504 18.00000 1
11 19 32.85958 51.27380 18.00000 1
12 20 32.55752 49.94791 18.00000 1
13 21 32.28766 48.85631 18.00000 1
14 24 31.67593 47.09206 18.00000 1
15 25 31.53239 46.91136 18.00000 1
16 28 31.28740 48.01764 18.00000 1
17 32 31.36627 50.55201 18.00000 1
18 35 31.73386 53.19630 18.00000 1
19 36 31.91487 54.22624 18.00000 1
20 37 32.13026 55.25721 18.00000 1
21 38 32.38237 56.26713 18.00000 1
22 40 32.98499 58.36229 18.00000 1
23 44 34.59044 62.80187 18.00000 1
24 45 35.06804 64.01951 18.00000 1
25 46 35.57110 65.31888 18.00000 1
26 47 36.09880 66.64696 17.93800 1
27 48 36.72294 67.60053 17.97550 1
28 49 37.39182 68.49995 18.03062 1
29 50 38.10376 69.35728 18.10675 1
30 51 38.85760 70.17693 18.18661 1
31 52 39.65347 70.95875 18.27524 1
32 53 40.49156 71.70261 18.38020 1
33 54 41.35332 72.44006 17.90682 1
34 59 46.37849 74.91802 18.63206 1
35 60 47.53897 75.66218 19.64432 1
36 61 48.74697 76.43933 20.82346 1
37 63 51.30607 78.02426 23.73535 1
38 71 63.43129 86.05467 40.43482 1
39 72 65.15618 87.44794 42.72704 1
40 73 66.92714 88.95324 45.01966 1
41 84 89.42079 114.27939 68.03834 1
42 85 91.73831 117.44007 69.83676 1
43 7 33.69504 54.03695 15.74588 2
44 8 34.99931 53.96500 18.00533 2
45 10 37.61963 54.05684 22.43516 2
46 11 38.93493 54.21969 24.60049 2
47 12 40.25315 54.45963 26.73027 2
48 13 41.57397 54.77581 28.82348 2
49 14 42.89710 55.16727 30.87982 2
50 16 45.54954 56.17193 34.88453 2
51 17 46.87877 56.78325 36.83632 2
52 18 48.21025 57.46656 38.75807 2
53 19 49.54461 58.22266 40.65330 2
54 20 50.88313 59.05509 42.52505 2
55 21 52.22789 59.97318 44.36944 2
56 24 56.24397 63.21832 49.26963 2
57 25 57.55394 64.33850 50.76938 2
58 28 61.45282 68.05043 54.85522 2
59 32 66.44875 72.85234 60.04517 2
60 35 69.96560 76.06171 63.86949 2
61 36 71.09268 77.06821 65.11714 2
62 37 72.19743 78.04559 66.34927 2
63 38 73.28041 78.99518 67.56565 2
64 40 75.37861 80.81593 69.94129 2
65 44 79.29028 84.20275 74.37780 2
66 45 80.20272 85.00888 75.39656 2
67 46 81.08645 85.80180 76.37110 2
68 47 81.93696 86.57689 77.29704 2
69 48 82.75920 87.34100 78.17739 2
70 49 83.55055 88.09165 79.00945 2
71 50 84.30962 88.82357 79.79567 2
72 51 85.03743 89.53669 80.53817 2
73 52 85.73757 90.23223 81.24291 2
74 53 86.41419 90.91607 81.91232 2
75 54 87.05716 91.58632 82.52800 2
76 59 89.75923 94.58218 84.93629 2
77 60 90.18557 95.05573 85.31541 2
78 61 90.58166 95.51469 85.64864 2
79 63 91.27115 96.31107 86.23124 2
80 71 92.40983 98.35031 86.46934 2
81 72 92.36362 98.52258 86.20465 2
82 73 92.27734 98.67161 85.88308 2
83 84 88.66150 98.84699 78.47602 2
84 85 88.08846 98.73625 77.44067 2
According to Gregor, I tried inherit.aes = FALSE, the error is gone. But my plot looks like:
We've got all the info we need. Now we just need to, ahem, connect the dots ;-)
First the input data:
NVIQ_predict <- read.table(text = "
id cogn.age predict upper lower group
1 7 39.04942 86.68497 18.00000 1
2 8 38.34993 82.29627 18.00000 1
3 10 37.05174 74.31657 18.00000 1
4 11 36.45297 70.72421 18.00000 1
5 12 35.88770 67.39555 18.00000 1
6 13 35.35587 64.32920 18.00000 1
7 14 34.85738 61.52322 18.00000 1
8 16 33.95991 56.68024 18.00000 1
9 17 33.56057 54.63537 18.00000 1
10 18 33.19388 52.83504 18.00000 1
11 19 32.85958 51.27380 18.00000 1
12 20 32.55752 49.94791 18.00000 1
13 21 32.28766 48.85631 18.00000 1
14 24 31.67593 47.09206 18.00000 1
15 25 31.53239 46.91136 18.00000 1
16 28 31.28740 48.01764 18.00000 1
17 32 31.36627 50.55201 18.00000 1
18 35 31.73386 53.19630 18.00000 1
19 36 31.91487 54.22624 18.00000 1
20 37 32.13026 55.25721 18.00000 1
21 38 32.38237 56.26713 18.00000 1
22 40 32.98499 58.36229 18.00000 1
23 44 34.59044 62.80187 18.00000 1
24 45 35.06804 64.01951 18.00000 1
25 46 35.57110 65.31888 18.00000 1
26 47 36.09880 66.64696 17.93800 1
27 48 36.72294 67.60053 17.97550 1
28 49 37.39182 68.49995 18.03062 1
29 50 38.10376 69.35728 18.10675 1
30 51 38.85760 70.17693 18.18661 1
31 52 39.65347 70.95875 18.27524 1
32 53 40.49156 71.70261 18.38020 1
33 54 41.35332 72.44006 17.90682 1
34 59 46.37849 74.91802 18.63206 1
35 60 47.53897 75.66218 19.64432 1
36 61 48.74697 76.43933 20.82346 1
37 63 51.30607 78.02426 23.73535 1
38 71 63.43129 86.05467 40.43482 1
39 72 65.15618 87.44794 42.72704 1
40 73 66.92714 88.95324 45.01966 1
41 84 89.42079 114.27939 68.03834 1
42 85 91.73831 117.44007 69.83676 1
43 7 33.69504 54.03695 15.74588 2
44 8 34.99931 53.96500 18.00533 2
45 10 37.61963 54.05684 22.43516 2
46 11 38.93493 54.21969 24.60049 2
47 12 40.25315 54.45963 26.73027 2
48 13 41.57397 54.77581 28.82348 2
49 14 42.89710 55.16727 30.87982 2
50 16 45.54954 56.17193 34.88453 2
51 17 46.87877 56.78325 36.83632 2
52 18 48.21025 57.46656 38.75807 2
53 19 49.54461 58.22266 40.65330 2
54 20 50.88313 59.05509 42.52505 2
55 21 52.22789 59.97318 44.36944 2
56 24 56.24397 63.21832 49.26963 2
57 25 57.55394 64.33850 50.76938 2
58 28 61.45282 68.05043 54.85522 2
59 32 66.44875 72.85234 60.04517 2
60 35 69.96560 76.06171 63.86949 2
61 36 71.09268 77.06821 65.11714 2
62 37 72.19743 78.04559 66.34927 2
63 38 73.28041 78.99518 67.56565 2
64 40 75.37861 80.81593 69.94129 2
65 44 79.29028 84.20275 74.37780 2
66 45 80.20272 85.00888 75.39656 2
67 46 81.08645 85.80180 76.37110 2
68 47 81.93696 86.57689 77.29704 2
69 48 82.75920 87.34100 78.17739 2
70 49 83.55055 88.09165 79.00945 2
71 50 84.30962 88.82357 79.79567 2
72 51 85.03743 89.53669 80.53817 2
73 52 85.73757 90.23223 81.24291 2
74 53 86.41419 90.91607 81.91232 2
75 54 87.05716 91.58632 82.52800 2
76 59 89.75923 94.58218 84.93629 2
77 60 90.18557 95.05573 85.31541 2
78 61 90.58166 95.51469 85.64864 2
79 63 91.27115 96.31107 86.23124 2
80 71 92.40983 98.35031 86.46934 2
81 72 92.36362 98.52258 86.20465 2
82 73 92.27734 98.67161 85.88308 2
83 84 88.66150 98.84699 78.47602 2
84 85 88.08846 98.73625 77.44067 2", header = TRUE)
NVIQ_predict$id <- NULL
Make sure the group column is a factor variable, so we can use it as a line type.
NVIQ_predict$group <- as.factor(NVIQ_predict$group)
Then build the plot.
library(ggplot2)
g1 <- ggplot(NVIQ_predict, aes(cogn.age, predict, color=group)) +
geom_smooth(aes(x = cogn.age, y = upper, group=group), method = loess, se = FALSE) +
geom_smooth(aes(x = cogn.age, y = lower, group=group), method = loess, se = FALSE) +
geom_line(aes(linetype = group), size = 0.8)
Finally, extract the (x,ymin) and (x,ymax) coordinates of the curves for group 1 as well as group 2. These pairs have identical x-coordinates, so connecting those points mimics shading the areas between both curves. This was explained in Fill region between two loess-smoothed lines in R with ggplot. The only difference here is that we need to be a bit more careful to select and connect the points that belong to the correct curves...
gp <- ggplot_build(g1)
d1 <- gp$data[[1]]
d2 <- gp$data[[2]]
df1 <- data.frame(x = d1[d1$group == 1,]$x,
ymin = d2[d2$group == 1,]$y,
ymax = d1[d1$group == 1,]$y)
df2 <- data.frame(x = d1[d1$group == 2,]$x,
ymin = d2[d2$group == 2,]$y,
ymax = d1[d1$group == 2,]$y)
g1 + geom_ribbon(data = df1, aes(x = x, ymin = ymin, ymax = ymax), inherit.aes = FALSE, fill = "grey", alpha = 0.4) +
geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax), inherit.aes = FALSE, fill = "grey", alpha = 0.4)
The result looks like this: