geom_ribbon error: Aesthetics must either be length one - r

My question is similar to Fill region between two loess-smoothed lines in R with ggplot1
But I have two groups.
g1<-ggplot(NVIQ_predict,aes(cogn.age, predict, color=as.factor(NVIQ_predict$group)))+
geom_smooth(aes(x = cogn.age, y = upper,group=group),se=F)+
geom_line(aes(linetype = group), size = 0.8)+
geom_smooth(aes(x = cogn.age, y = lower,group=group),se=F)
I want to fill red and blue for each group.
I tried:
gg1 <- ggplot_build(g1)
df2 <- data.frame(x = gg1$data[[1]]$x,
ymin = gg1$data[[1]]$y,
ymax = gg1$data[[3]]$y)
g1 + geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax),fill = "grey", alpha = 0.4)
But it gave me the error: Aesthetics must either be length one, or the same length as the dataProblems
I get the same error every time my geom_ribbon() data and ggplot() data differ.
Can somebody help me with it? Thank you so much!
My data looks like:
> NVIQ_predict
cogn.age predict upper lower group
1 7 39.04942 86.68497 18.00000 1
2 8 38.34993 82.29627 18.00000 1
3 10 37.05174 74.31657 18.00000 1
4 11 36.45297 70.72421 18.00000 1
5 12 35.88770 67.39555 18.00000 1
6 13 35.35587 64.32920 18.00000 1
7 14 34.85738 61.52322 18.00000 1
8 16 33.95991 56.68024 18.00000 1
9 17 33.56057 54.63537 18.00000 1
10 18 33.19388 52.83504 18.00000 1
11 19 32.85958 51.27380 18.00000 1
12 20 32.55752 49.94791 18.00000 1
13 21 32.28766 48.85631 18.00000 1
14 24 31.67593 47.09206 18.00000 1
15 25 31.53239 46.91136 18.00000 1
16 28 31.28740 48.01764 18.00000 1
17 32 31.36627 50.55201 18.00000 1
18 35 31.73386 53.19630 18.00000 1
19 36 31.91487 54.22624 18.00000 1
20 37 32.13026 55.25721 18.00000 1
21 38 32.38237 56.26713 18.00000 1
22 40 32.98499 58.36229 18.00000 1
23 44 34.59044 62.80187 18.00000 1
24 45 35.06804 64.01951 18.00000 1
25 46 35.57110 65.31888 18.00000 1
26 47 36.09880 66.64696 17.93800 1
27 48 36.72294 67.60053 17.97550 1
28 49 37.39182 68.49995 18.03062 1
29 50 38.10376 69.35728 18.10675 1
30 51 38.85760 70.17693 18.18661 1
31 52 39.65347 70.95875 18.27524 1
32 53 40.49156 71.70261 18.38020 1
33 54 41.35332 72.44006 17.90682 1
34 59 46.37849 74.91802 18.63206 1
35 60 47.53897 75.66218 19.64432 1
36 61 48.74697 76.43933 20.82346 1
37 63 51.30607 78.02426 23.73535 1
38 71 63.43129 86.05467 40.43482 1
39 72 65.15618 87.44794 42.72704 1
40 73 66.92714 88.95324 45.01966 1
41 84 89.42079 114.27939 68.03834 1
42 85 91.73831 117.44007 69.83676 1
43 7 33.69504 54.03695 15.74588 2
44 8 34.99931 53.96500 18.00533 2
45 10 37.61963 54.05684 22.43516 2
46 11 38.93493 54.21969 24.60049 2
47 12 40.25315 54.45963 26.73027 2
48 13 41.57397 54.77581 28.82348 2
49 14 42.89710 55.16727 30.87982 2
50 16 45.54954 56.17193 34.88453 2
51 17 46.87877 56.78325 36.83632 2
52 18 48.21025 57.46656 38.75807 2
53 19 49.54461 58.22266 40.65330 2
54 20 50.88313 59.05509 42.52505 2
55 21 52.22789 59.97318 44.36944 2
56 24 56.24397 63.21832 49.26963 2
57 25 57.55394 64.33850 50.76938 2
58 28 61.45282 68.05043 54.85522 2
59 32 66.44875 72.85234 60.04517 2
60 35 69.96560 76.06171 63.86949 2
61 36 71.09268 77.06821 65.11714 2
62 37 72.19743 78.04559 66.34927 2
63 38 73.28041 78.99518 67.56565 2
64 40 75.37861 80.81593 69.94129 2
65 44 79.29028 84.20275 74.37780 2
66 45 80.20272 85.00888 75.39656 2
67 46 81.08645 85.80180 76.37110 2
68 47 81.93696 86.57689 77.29704 2
69 48 82.75920 87.34100 78.17739 2
70 49 83.55055 88.09165 79.00945 2
71 50 84.30962 88.82357 79.79567 2
72 51 85.03743 89.53669 80.53817 2
73 52 85.73757 90.23223 81.24291 2
74 53 86.41419 90.91607 81.91232 2
75 54 87.05716 91.58632 82.52800 2
76 59 89.75923 94.58218 84.93629 2
77 60 90.18557 95.05573 85.31541 2
78 61 90.58166 95.51469 85.64864 2
79 63 91.27115 96.31107 86.23124 2
80 71 92.40983 98.35031 86.46934 2
81 72 92.36362 98.52258 86.20465 2
82 73 92.27734 98.67161 85.88308 2
83 84 88.66150 98.84699 78.47602 2
84 85 88.08846 98.73625 77.44067 2
According to Gregor, I tried inherit.aes = FALSE, the error is gone. But my plot looks like:

We've got all the info we need. Now we just need to, ahem, connect the dots ;-)
First the input data:
NVIQ_predict <- read.table(text = "
id cogn.age predict upper lower group
1 7 39.04942 86.68497 18.00000 1
2 8 38.34993 82.29627 18.00000 1
3 10 37.05174 74.31657 18.00000 1
4 11 36.45297 70.72421 18.00000 1
5 12 35.88770 67.39555 18.00000 1
6 13 35.35587 64.32920 18.00000 1
7 14 34.85738 61.52322 18.00000 1
8 16 33.95991 56.68024 18.00000 1
9 17 33.56057 54.63537 18.00000 1
10 18 33.19388 52.83504 18.00000 1
11 19 32.85958 51.27380 18.00000 1
12 20 32.55752 49.94791 18.00000 1
13 21 32.28766 48.85631 18.00000 1
14 24 31.67593 47.09206 18.00000 1
15 25 31.53239 46.91136 18.00000 1
16 28 31.28740 48.01764 18.00000 1
17 32 31.36627 50.55201 18.00000 1
18 35 31.73386 53.19630 18.00000 1
19 36 31.91487 54.22624 18.00000 1
20 37 32.13026 55.25721 18.00000 1
21 38 32.38237 56.26713 18.00000 1
22 40 32.98499 58.36229 18.00000 1
23 44 34.59044 62.80187 18.00000 1
24 45 35.06804 64.01951 18.00000 1
25 46 35.57110 65.31888 18.00000 1
26 47 36.09880 66.64696 17.93800 1
27 48 36.72294 67.60053 17.97550 1
28 49 37.39182 68.49995 18.03062 1
29 50 38.10376 69.35728 18.10675 1
30 51 38.85760 70.17693 18.18661 1
31 52 39.65347 70.95875 18.27524 1
32 53 40.49156 71.70261 18.38020 1
33 54 41.35332 72.44006 17.90682 1
34 59 46.37849 74.91802 18.63206 1
35 60 47.53897 75.66218 19.64432 1
36 61 48.74697 76.43933 20.82346 1
37 63 51.30607 78.02426 23.73535 1
38 71 63.43129 86.05467 40.43482 1
39 72 65.15618 87.44794 42.72704 1
40 73 66.92714 88.95324 45.01966 1
41 84 89.42079 114.27939 68.03834 1
42 85 91.73831 117.44007 69.83676 1
43 7 33.69504 54.03695 15.74588 2
44 8 34.99931 53.96500 18.00533 2
45 10 37.61963 54.05684 22.43516 2
46 11 38.93493 54.21969 24.60049 2
47 12 40.25315 54.45963 26.73027 2
48 13 41.57397 54.77581 28.82348 2
49 14 42.89710 55.16727 30.87982 2
50 16 45.54954 56.17193 34.88453 2
51 17 46.87877 56.78325 36.83632 2
52 18 48.21025 57.46656 38.75807 2
53 19 49.54461 58.22266 40.65330 2
54 20 50.88313 59.05509 42.52505 2
55 21 52.22789 59.97318 44.36944 2
56 24 56.24397 63.21832 49.26963 2
57 25 57.55394 64.33850 50.76938 2
58 28 61.45282 68.05043 54.85522 2
59 32 66.44875 72.85234 60.04517 2
60 35 69.96560 76.06171 63.86949 2
61 36 71.09268 77.06821 65.11714 2
62 37 72.19743 78.04559 66.34927 2
63 38 73.28041 78.99518 67.56565 2
64 40 75.37861 80.81593 69.94129 2
65 44 79.29028 84.20275 74.37780 2
66 45 80.20272 85.00888 75.39656 2
67 46 81.08645 85.80180 76.37110 2
68 47 81.93696 86.57689 77.29704 2
69 48 82.75920 87.34100 78.17739 2
70 49 83.55055 88.09165 79.00945 2
71 50 84.30962 88.82357 79.79567 2
72 51 85.03743 89.53669 80.53817 2
73 52 85.73757 90.23223 81.24291 2
74 53 86.41419 90.91607 81.91232 2
75 54 87.05716 91.58632 82.52800 2
76 59 89.75923 94.58218 84.93629 2
77 60 90.18557 95.05573 85.31541 2
78 61 90.58166 95.51469 85.64864 2
79 63 91.27115 96.31107 86.23124 2
80 71 92.40983 98.35031 86.46934 2
81 72 92.36362 98.52258 86.20465 2
82 73 92.27734 98.67161 85.88308 2
83 84 88.66150 98.84699 78.47602 2
84 85 88.08846 98.73625 77.44067 2", header = TRUE)
NVIQ_predict$id <- NULL
Make sure the group column is a factor variable, so we can use it as a line type.
NVIQ_predict$group <- as.factor(NVIQ_predict$group)
Then build the plot.
library(ggplot2)
g1 <- ggplot(NVIQ_predict, aes(cogn.age, predict, color=group)) +
geom_smooth(aes(x = cogn.age, y = upper, group=group), method = loess, se = FALSE) +
geom_smooth(aes(x = cogn.age, y = lower, group=group), method = loess, se = FALSE) +
geom_line(aes(linetype = group), size = 0.8)
Finally, extract the (x,ymin) and (x,ymax) coordinates of the curves for group 1 as well as group 2. These pairs have identical x-coordinates, so connecting those points mimics shading the areas between both curves. This was explained in Fill region between two loess-smoothed lines in R with ggplot. The only difference here is that we need to be a bit more careful to select and connect the points that belong to the correct curves...
gp <- ggplot_build(g1)
d1 <- gp$data[[1]]
d2 <- gp$data[[2]]
df1 <- data.frame(x = d1[d1$group == 1,]$x,
ymin = d2[d2$group == 1,]$y,
ymax = d1[d1$group == 1,]$y)
df2 <- data.frame(x = d1[d1$group == 2,]$x,
ymin = d2[d2$group == 2,]$y,
ymax = d1[d1$group == 2,]$y)
g1 + geom_ribbon(data = df1, aes(x = x, ymin = ymin, ymax = ymax), inherit.aes = FALSE, fill = "grey", alpha = 0.4) +
geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax), inherit.aes = FALSE, fill = "grey", alpha = 0.4)
The result looks like this:

Related

Lookup table based on multiple conditions in R

Thank you for taking a look at my question!
I have the following (dummy) data for patient performance on 3 tasks:
patient_df = data.frame(id = seq(1:5),
age = c(30,72,46,63,58),
education = c(11, 22, 18, 12, 14),
task1 = c(21, 28, 20, 24, 22),
task2 = c(15, 15, 10, 11, 14),
task3 = c(82, 60, 74, 78, 78))
> patient_df
id age education task1 task2 task3
1 1 30 11 21 15 82
2 2 72 22 28 15 60
3 3 46 18 20 10 74
4 4 63 12 24 11 78
5 5 58 14 22 14 78
I also have the following (dummy) lookup table for age and education-based cutoff values to define a patient's performance as impaired or not impaired on each task:
cutoffs = data.frame(age = rep(seq(from = 35, to = 70, by = 5), 2),
education = c(rep("<16", 8), rep(">=16",8)),
task1_cutoff = c(rep(24, 16)),
task2_cutoff = c(11,11,11,11,10,10,10,10,9,13,13,13,13,12,12,11),
task3_cutoff = c(rep(71,8), 70, rep(74,2), rep(73, 5)))
> cutoffs
age education task1_cutoff task2_cutoff task3_cutoff
1 35 <16 24 11 71
2 40 <16 24 11 71
3 45 <16 24 11 71
4 50 <16 24 11 71
5 55 <16 24 10 71
6 60 <16 24 10 71
7 65 <16 24 10 71
8 70 <16 24 10 71
9 35 >=16 24 9 70
10 40 >=16 24 13 74
11 45 >=16 24 13 74
12 50 >=16 24 13 73
13 55 >=16 24 13 73
14 60 >=16 24 12 73
15 65 >=16 24 12 73
16 70 >=16 24 11 73
My goal is to create 3 new variables in patient_df that indicate whether or not a patient is impaired on each task with a binary indicator. For example, for id=1 in patient_df, their age is <=35 and their education is <16 years, so the cutoff value for task1 would be 24, the cutoff value for task2 would be 11, and the cutoff value for task3 would be 71, such that scores below these values would denote impairment.
I would like to do this for each id by referencing the age and education-associated cutoff value in the cutoff dataset, so that the outcome would look something like this:
> goal_patient_df
id age education task1 task2 task3 task1_impaired task2_impaired task3_impaired
1 1 30 11 21 15 82 1 1 0
2 2 72 22 28 15 60 0 0 1
3 3 46 18 20 10 74 1 1 0
4 4 63 12 24 11 78 1 0 0
5 5 58 14 22 14 78 1 0 0
In actuality, my patient_df has 600+ patients and there are 7+ tasks each with age- and education-associated cutoff values, so a 'clean' way of doing this would be greatly appreciated! My only alternative that I can think of right now is writing a TON of if_else statements or case_whens which would not be incredibly reproducible for anyone else who would use my code :(
Thank you in advance!
I would recommend putting both your lookup table and patient_df dataframe in long form. I think that might be easier to manage with multiple tasks.
Your education column is numeric; so converting to character "<16" or ">=16" will help with matching in lookup table.
Using fuzzy_inner_join will match data with lookup table where task and education match exactly == but age will between an age_low and age_high if you specify a range of ages for each lookup table row.
Finally, impaired is calculated comparing the values from the two data frames for the particular task.
Please note for output, id of 1 is missing, as falls outside of age range from lookup table. You can add more rows to that table to address this.
library(tidyverse)
library(fuzzyjoin)
cutoffs_long <- cutoffs %>%
pivot_longer(cols = starts_with("task"), names_to = "task", values_to = "cutoff_value", names_pattern = "task(\\d+)") %>%
mutate(age_low = age,
age_high = age + 4) %>%
select(-age)
patient_df %>%
pivot_longer(cols = starts_with("task"), names_to = "task", values_to = "patient_value", names_pattern = "(\\d+)") %>%
mutate(education = ifelse(education < 16, "<16", ">=16")) %>%
fuzzy_inner_join(cutoffs_long, by = c("age" = "age_low", "age" = "age_high", "education", "task"), match_fun = list(`>=`, `<=`, `==`, `==`)) %>%
mutate(impaired = +(patient_value < cutoff_value))
Output
# A tibble: 12 x 11
id age education.x task.x patient_value education.y task.y cutoff_value age_low age_high impaired
<int> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <int>
1 2 72 >=16 1 28 >=16 1 24 70 74 0
2 2 72 >=16 2 15 >=16 2 11 70 74 0
3 2 72 >=16 3 60 >=16 3 73 70 74 1
4 3 46 >=16 1 20 >=16 1 24 45 49 1
5 3 46 >=16 2 10 >=16 2 13 45 49 1
6 3 46 >=16 3 74 >=16 3 74 45 49 0
7 4 63 <16 1 24 <16 1 24 60 64 0
8 4 63 <16 2 11 <16 2 10 60 64 0
9 4 63 <16 3 78 <16 3 71 60 64 0
10 5 58 <16 1 22 <16 1 24 55 59 1
11 5 58 <16 2 14 <16 2 10 55 59 0
12 5 58 <16 3 78 <16 3 71 55 59 0

Interrupted time series, three levels ggplot2

I would like to create an interrupted time series plot with ggplot2 in R with three different times and 2 interventions.
I've done the plot with only 1 intervention and 2 times and this worked..
fas1 <- as.Date("2020-03-09")
fas2 <- as.Date("2020-04-04")
df$post1pre2 <- (df$data>= fas1 & df$data < fas2)
df$post2 <- (df$data >= fas2)
df$DateNum <- as.numeric(df$data)
df$DateNumCtr <- df$DateNum - as.numeric(fas1)
df$DateNumCtr1 <- df$DateNum - as.numeric(fas2)
My data:
> df
id N data fase Nri Nti Ncg PCT VPG NDg post1pre2
1 1 7 2020-02-24 0 0 0 0 0.0000000 NA 0 FALSE
2 2 28 2020-02-25 0 0 0 0 0.0000000 0.0000000 0 FALSE
3 3 49 2020-02-26 0 0 0 0 0.0000000 0.0000000 0 FALSE
4 4 70 2020-02-27 0 0 0 0 0.0000000 0.0000000 0 FALSE
5 5 91 2020-02-28 0 0 0 0 0.0000000 0.0000000 0 FALSE
6 6 112 2020-02-29 0 0 0 0 0.0000000 0.0000000 0 FALSE
7 7 133 2020-03-01 0 0 0 6 11.1111111 0.0000000 0 FALSE
8 8 154 2020-03-02 0 0 0 3 11.5384615 50.0000000 0 FALSE
9 9 175 2020-03-03 0 1 0 4 4.7058824 44.4444444 0 FALSE
10 10 196 2020-03-04 0 3 0 5 22.7272727 38.4615385 0 FALSE
11 11 217 2020-03-05 0 4 0 3 14.2857143 16.6666667 0 FALSE
12 12 238 2020-03-06 0 4 0 10 5.5555556 47.6190476 0 FALSE
13 13 259 2020-03-07 0 7 1 11 0.0000000 35.4838710 0 FALSE
14 14 280 2020-03-08 0 7 1 15 3.5714286 35.7142857 1 FALSE
15 15 301 2020-03-09 1 18 1 36 10.3746398 63.1578947 0 TRUE
16 16 322 2020-03-10 1 27 6 23 8.9147287 24.7311828 2 TRUE
17 17 343 2020-03-11 1 16 5 10 2.1231423 8.6206897 3 TRUE
18 18 364 2020-03-12 1 23 10 41 7.7212806 32.5396825 2 TRUE
19 19 385 2020-03-13 1 59 8 90 16.5137615 53.8922156 2 TRUE
20 20 406 2020-03-14 1 67 11 44 19.3832599 17.1206226 3 TRUE
21 21 427 2020-03-15 1 98 12 46 148.3870968 15.2823920 1 TRUE
22 22 448 2020-03-16 1 96 19 39 2.7008310 11.2391931 8 TRUE
23 23 469 2020-03-17 1 104 21 8 7.4766355 2.0725389 8 TRUE
24 24 490 2020-03-18 1 128 27 68 0.0000000 17.2588833 1 TRUE
25 25 511 2020-03-19 1 134 29 137 -15.1214128 29.6536797 5 TRUE
26 26 532 2020-03-20 1 143 32 57 6.2500000 9.5158598 2 TRUE
27 27 553 2020-03-21 1 152 37 134 13.5216953 20.4268293 4 TRUE
28 28 574 2020-03-22 1 163 47 84 10.4218362 10.6329114 5 TRUE
29 29 595 2020-03-23 1 185 41 56 8.4464555 6.4073227 7 TRUE
30 30 616 2020-03-24 1 195 49 62 5.6261343 6.6666667 10 TRUE
31 31 637 2020-03-25 1 200 52 147 15.1859504 14.8185484 6 TRUE
32 32 658 2020-03-26 1 212 54 84 6.8459658 7.3748903 2 TRUE
33 33 679 2020-03-27 1 222 57 94 8.3928571 7.6860180 4 TRUE
34 34 700 2020-03-28 1 223 59 119 13.4920635 9.0356872 11 TRUE
35 35 721 2020-03-29 1 236 60 44 6.5281899 3.0640669 11 TRUE
36 36 742 2020-03-30 1 229 60 21 3.4653465 1.4189189 9 TRUE
37 37 763 2020-03-31 1 215 60 92 10.2678571 6.1292472 6 TRUE
38 38 784 2020-04-01 1 206 60 92 8.4558824 5.7752668 9 TRUE
39 39 805 2020-04-02 1 199 60 114 10.0529100 6.7655786 7 TRUE
40 40 826 2020-04-03 1 201 61 80 2.7932961 4.4469150 7 TRUE
41 41 847 2020-04-04 1 183 50 107 9.3777388 5.6945184 9 FALSE
42 42 868 2020-04-05 1 185 50 62 11.7870722 3.1218530 9 FALSE
43 43 889 2020-04-06 1 189 46 55 8.8709677 2.6855469 4 FALSE
44 44 910 2020-04-07 1 185 44 50 3.9556962 2.3775559 6 FALSE
45 45 931 2020-04-08 1 162 41 65 5.1505547 3.0190432 5 FALSE
46 46 952 2020-04-09 1 167 37 81 5.7569296 3.6519387 2 FALSE
47 47 973 2020-04-10 1 167 33 50 2.6427061 2.1748586 8 FALSE
48 48 994 2020-04-11 1 172 28 44 2.7707809 1.8731375 6 FALSE
49 49 1015 2020-04-12 1 160 28 38 12.5412541 1.5879649 10 FALSE
50 50 1036 2020-04-13 1 159 30 51 5.3515215 2.0979021 7 FALSE
51 51 1057 2020-04-14 1 166 28 38 4.6172539 1.5310234 4 FALSE
52 52 1078 2020-04-15 1 163 24 24 1.1787819 0.9523810 6 FALSE
53 53 1099 2020-04-16 1 159 26 72 3.6622584 2.8301887 5 FALSE
54 54 1120 2020-04-17 1 148 23 59 1.3836773 2.2553517 3 FALSE
55 55 1141 2020-04-18 1 141 27 56 1.7162121 2.0934579 2 FALSE
56 56 1162 2020-04-19 1 140 25 14 1.0534236 0.5126327 3 FALSE
57 57 1183 2020-04-20 1 140 22 30 5.5658627 1.0928962 14 FALSE
58 58 1204 2020-04-21 1 139 21 17 0.5091345 0.6126126 2 FALSE
59 59 1225 2020-04-22 1 134 20 25 1.0552976 0.8954155 5 FALSE
60 60 1246 2020-04-23 1 138 18 41 2.0009761 1.4554491 10 FALSE
61 61 1267 2020-04-24 1 136 16 24 0.9287926 0.8397481 2 FALSE
62 62 1288 2020-04-25 1 122 15 21 0.7309433 0.7286607 5 FALSE
63 63 1309 2020-04-26 1 129 13 14 1.0719755 0.4822597 1 FALSE
64 64 1330 2020-04-27 1 130 13 60 4.7656871 2.0569078 7 FALSE
65 65 1351 2020-04-28 1 136 13 18 0.5605730 0.6046355 7 FALSE
66 66 1372 2020-04-29 1 134 12 15 0.5729565 0.5008347 7 FALSE
post2 DateNum DateNumCtr DateNumCtr1
1 FALSE 18316 -14 -40
2 FALSE 18317 -13 -39
3 FALSE 18318 -12 -38
4 FALSE 18319 -11 -37
5 FALSE 18320 -10 -36
6 FALSE 18321 -9 -35
7 FALSE 18322 -8 -34
8 FALSE 18323 -7 -33
9 FALSE 18324 -6 -32
10 FALSE 18325 -5 -31
11 FALSE 18326 -4 -30
12 FALSE 18327 -3 -29
13 FALSE 18328 -2 -28
14 FALSE 18329 -1 -27
15 FALSE 18330 0 -26
16 FALSE 18331 1 -25
17 FALSE 18332 2 -24
18 FALSE 18333 3 -23
19 FALSE 18334 4 -22
20 FALSE 18335 5 -21
21 FALSE 18336 6 -20
22 FALSE 18337 7 -19
23 FALSE 18338 8 -18
24 FALSE 18339 9 -17
25 FALSE 18340 10 -16
26 FALSE 18341 11 -15
27 FALSE 18342 12 -14
28 FALSE 18343 13 -13
29 FALSE 18344 14 -12
30 FALSE 18345 15 -11
31 FALSE 18346 16 -10
32 FALSE 18347 17 -9
33 FALSE 18348 18 -8
34 FALSE 18349 19 -7
35 FALSE 18350 20 -6
36 FALSE 18351 21 -5
37 FALSE 18352 22 -4
38 FALSE 18353 23 -3
39 FALSE 18354 24 -2
40 FALSE 18355 25 -1
41 TRUE 18356 26 0
42 TRUE 18357 27 1
43 TRUE 18358 28 2
44 TRUE 18359 29 3
45 TRUE 18360 30 4
46 TRUE 18361 31 5
47 TRUE 18362 32 6
48 TRUE 18363 33 7
49 TRUE 18364 34 8
50 TRUE 18365 35 9
51 TRUE 18366 36 10
52 TRUE 18367 37 11
53 TRUE 18368 38 12
54 TRUE 18369 39 13
55 TRUE 18370 40 14
56 TRUE 18371 41 15
57 TRUE 18372 42 16
58 TRUE 18373 43 17
59 TRUE 18374 44 18
60 TRUE 18375 45 19
61 TRUE 18376 46 20
62 TRUE 18377 47 21
63 TRUE 18378 48 22
64 TRUE 18379 49 23
65 TRUE 18380 50 24
66 TRUE 18381 51 25
glsFit1 <- gls(model = Ncg ~ DateNumCtr + post1pre2 + DateNumCtr:post1pre2,
data = df,
correlation = corAR1(0.25))
summary(glsFit1)
glsFit2 <- gls(model = Ncg ~ DateNumCtr1 + post2 + DateNumCtr1:post2,
data = df,
correlation = corAR1(0.25))
summary(glsFit2)
newdata <- data.frame(DateNumCtr = seq(min(df$DateNumCtr), max(df$DateNumCtr), by = 1))
newdata$post1pre2 <- (newdata$DateNumCtr >= 0)
newdata <- data.frame(DateNumCtr1 = seq(min(df$DateNumCtr1), max(df$DateNumCtr1), by = 1))
newdata$post2 <- (newdata$DateNumCtr1 >= 0)
newdata$Ncg <- predict(glsFit1, newdata = newdata)
newdata$Ncg1 <- predict(glsFit2, newdata = newdata)
ggplot(data = df, mapping = aes(x = DateNumCtr, y = Ncg)) +
geom_line(stat = "identity", position = "identity",size=1) +
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr < 0),
stat = "identity", position = "identity",
color="red", size=1.5)+
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr >= 0),
stat = "identity", position = "identity",
color="blue", size=1.5)+
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr >= 0),
stat = "identity", position = "identity",
color="green", size=1.5)+
theme_bw() + theme(legend.key = element_blank())+ labs(y= "Ncg %", x = "Giorno")
I've tried also in this way (even if I don't have the stats in this way)
geom_line() +
geom_smooth(method="lm", se=FALSE, aes(colour=post1pre2)) +
theme_bw() +
labs(colour="")
And in this:
geom_line() +
geom_smooth(method="lm", se=FALSE, aes(colour=fase)) +
theme_bw() +
labs(colour="")
But No results...
With only 1 intervention and 2 times I obtain this:
with this code:
df$data<- as.Date(df$data,format="%d/%m/%y")
fas1 <- as.Date("2020-03-09")
df$postfase1 <- (df$data >= fas1)
df$DateNum <- as.numeric(df$data)
df$DateNumCtr <- df$DateNum - as.numeric(fas1)
ggplot(data = df, mapping = aes(x = data, y = Ncg)) + layer(geom = "line", stat = "identity", position = "identity") + theme_bw() + theme(legend.key = element_blank())
glsFit1 <- gls(model = Ncg ~ DateNumCtr + postfase1 + DateNumCtr:postfase1,
data = df,
correlation = corAR1(0.25))
summary(glsFit1)
newdata <- data.frame(DateNumCtr = seq(min(df$DateNumCtr), max(df$DateNumCtr), by = 1))
newdata$postfase1 <- (newdata$DateNumCtr >= 0)
newdata$Ncg <- predict(glsFit1, newdata = newdata)
ggplot(data = df, mapping = aes(x = DateNumCtr, y = Ncg)) +
geom_line(stat = "identity", position = "identity",size=1) +
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr < 0),
stat = "identity", position = "identity",
color="red", size=1.5)+
geom_line(mapping = NULL, data = subset(newdata, DateNumCtr >= 0),
stat = "identity", position = "identity",
color="blue", size=1.5)+
theme_bw() + theme(legend.key = element_blank())+ labs(y= "Ncg %", x = "Giorno")
I would like something like this:
Thank you!
Maybe I'm not understanding correctly, but it seems the reason you are not getting your three lines is due to overplotting - in the code you posted for the red, blue, and green lines, the code for drawing the blue and green lines is identical. If you remove the blue line code, you should see red and green lines.
Nevertheless, it took me a while to see what you were doing, but I think I have a better method to suggest to you. Rather than using three separate calls to geom_smooth or geom_line to draw the three line colors, you should use the innate functions of ggplot to decide how to group the lines together. In this case, you can use the color= aesthetic, which also can control how the lines are connected and which datapoints belong to which group.
ggplot2 is part of the tidyverse, and follows what's called Tidy Data Principles. The information you are using to separate the lines is not in one column, but separated out over two columns: df$post1pre2 and df$post2. You should gather() these two columns into one column that has either the label that the points belong to the group "post1pre2", "post2" or... "middle" (nothing). You can do this using the gather() function from dplyr, or in this case, I just setup a nested ifelse() statement to create a new column called grp in your dataframe:
df$grp <- ifelse(df$post1pre2,'post1pre2',ifelse(df$post2,'post2','post1'))
Then, if you set that column to be assigned to the color= aesthetic, you will generate the plot you desire.
ggplot(df, aes(x=DateNumCtr, y=Ncg)) + theme_bw() +
geom_line(color='gray30') +
geom_smooth(method='lm', se=FALSE, aes(color=grp), size=1.5) +
scale_color_manual(values=list('post1'='red', 'post1pre2'='blue', 'post2'='green4'))

Plot based on variable name

I am using this code to plot my dataframe. The statistics variable contains two values: "mean" and "sd".
ggplot(NDVIdf_forplot, aes(x = statistic, y= value, group = ID)) + geom_line()
If I use that code the graph includes both the "mean" and "sd" categories. I want to use only those observations that are in the "mean" class of the statistic variable and later use the "sd" class to plot geom_errorbar
I used this code before but did not manage to create what I want:
ggplot(NDVIdf_forplot,aes(x=mean,y=value))+geom_errorbar(aes(ymin=NDVI_mean-NDVI_sd, ymax=NDVI_mean+NDVI_sd), width=0.1)+geom_line()+geom_point()
edit ---
The data I want to plot look like this (I'm showing only the top rows). The idea is to use NDVI_mean to create the lines and NDVI_sd to create the error bars on the same graph
> NDVIdf_forplot
ID statistic value
1 1 NDVI_mean 0.052957208
2 2 NDVI_mean 0.044501794
3 3 NDVI_mean 0.077902512
4 4 NDVI_mean 0.141576609
5 5 NDVI_mean 0.653835647
6 6 NDVI_mean 0.716164870
7 7 NDVI_mean 0.386612348
8 8 NDVI_mean 0.486527816
9 9 NDVI_mean 0.226190208
10 10 NDVI_mean 0.573239754
11 1 NDVI_sd 0.008259909
12 2 NDVI_sd 0.015453091
13 3 NDVI_sd 0.099944407
14 4 NDVI_sd 0.091479545
15 5 NDVI_sd 0.223150965
16 6 NDVI_sd 0.074045394
17 7 NDVI_sd 0.058177949
18 8 NDVI_sd 0.109762451
19 9 NDVI_sd 0.019822312
20 10 NDVI_sd 0.104795771
21 1 NDVI_mean.1 0.081417705
22 2 NDVI_mean.1 0.036114126
23 3 NDVI_mean.1 0.037729680
24 4 NDVI_mean.1 0.016398037
25 5 NDVI_mean.1 0.052672604
26 6 NDVI_mean.1 0.024580946
27 7 NDVI_mean.1 0.064811390
28 8 NDVI_mean.1 0.119724256
29 9 NDVI_mean.1 0.078961665
30 10 NDVI_mean.1 0.041025489
31 1 NDVI_sd.1 0.016093458
32 2 NDVI_sd.1 0.027927592
33 3 NDVI_sd.1 0.046937888
34 4 NDVI_sd.1 0.011805721
35 5 NDVI_sd.1 0.026467984
36 6 NDVI_sd.1 0.028896611
37 7 NDVI_sd.1 0.016313583
38 8 NDVI_sd.1 0.066647683
39 9 NDVI_sd.1 0.022800589
40 10 NDVI_sd.1 0.015085673
41 1 NDVI_mean.2 0.063375514
42 2 NDVI_mean.2 0.086191853
43 3 NDVI_mean.2 0.092580942
44 4 NDVI_mean.2 0.144053635
45 5 NDVI_mean.2 0.696155509
46 6 NDVI_mean.2 0.252707792
47 7 NDVI_mean.2 0.144636380
48 8 NDVI_mean.2 0.757321462
49 9 NDVI_mean.2 0.689617575
50 10 NDVI_mean.2 0.179591653
51 1 NDVI_sd.2 0.010017152
52 2 NDVI_sd.2 0.023206464
53 3 NDVI_sd.2 0.106580902
54 4 NDVI_sd.2 0.097440674
55 5 NDVI_sd.2 0.231063744
56 6 NDVI_sd.2 0.043961963
57 7 NDVI_sd.2 0.010335935
58 8 NDVI_sd.2 0.061841114
59 9 NDVI_sd.2 0.048363788
60 10 NDVI_sd.2 0.111704779
61 1 NDVI_mean.3 0.048932939
62 2 NDVI_mean.3 0.110942174
63 3 NDVI_mean.3 0.080362752
64 4 NDVI_mean.3 0.132868790
65 5 NDVI_mean.3 0.682639604
66 6 NDVI_mean.3 0.503766225
67 7 NDVI_mean.3 0.120794820
68 8 NDVI_mean.3 0.777808416
69 9 NDVI_mean.3 0.755741184
70 10 NDVI_mean.3 0.058089687
71 1 NDVI_sd.3 0.009048781
72 2 NDVI_sd.3 0.029528930
73 3 NDVI_sd.3 0.098454753
74 4 NDVI_sd.3 0.089512544
75 5 NDVI_sd.3 0.241257647
76 6 NDVI_sd.3 0.114466677
77 7 NDVI_sd.3 0.013347437
78 8 NDVI_sd.3 0.066441491
79 9 NDVI_sd.3 0.065787691
80 10 NDVI_sd.3 0.013351357
So far this image shows how the plot is being produced. As you can see both NDVI_mean and NDVI_sd are used but this should not be the case. NDVI_sd should be used to produce geom_errorbar
Code:
# Transform data
# Here we make table with three columns (ID Mean SD)
pd <- reshape2::dcast(NDVIdf_forplot, ID ~ statistic, value.var = "value")
# Plot data using ggplot2
library(ggplot2)
ggplot(pd, aes(ID, NDVI_mean)) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin = NDVI_mean - NDVI_sd,
ymax = NDVI_mean + NDVI_sd))
Result plot:

R: How plot negative and positive anomaly (for this data) with ggplot? [duplicate]

This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 5 years ago.
I have this df
x acc
1 1902-01-01 0.782887804
2 1903-01-01 -0.003144199
3 1904-01-01 0.100006276
4 1905-01-01 0.326173392
5 1906-01-01 1.285114692
6 1907-01-01 2.844399973
7 1920-01-01 -0.300232190
8 1921-01-01 1.464389342
9 1922-01-01 0.142638653
10 1923-01-01 -0.020162385
11 1924-01-01 0.361928571
12 1925-01-01 0.616325588
13 1926-01-01 -0.108206003
14 1927-01-01 -0.318441954
15 1928-01-01 -0.267884586
16 1929-01-01 -0.022473777
17 1930-01-01 -0.294452983
18 1931-01-01 -0.654927109
19 1932-01-01 -0.263508341
20 1933-01-01 0.622530992
21 1934-01-01 1.009666043
22 1935-01-01 0.675484421
23 1936-01-01 1.209162008
24 1937-01-01 1.655280986
25 1948-01-01 2.080021785
26 1949-01-01 0.854572563
27 1950-01-01 0.997540963
28 1951-01-01 1.000244163
29 1952-01-01 0.958322941
30 1953-01-01 0.816259474
31 1954-01-01 0.814488644
32 1955-01-01 1.233694537
33 1958-01-01 0.460120970
34 1959-01-01 0.344201474
35 1960-01-01 1.601430139
36 1961-01-01 0.387850967
37 1962-01-01 -0.385954401
38 1963-01-01 0.699355708
39 1964-01-01 0.084519926
40 1965-01-01 0.708964572
41 1966-01-01 1.456280443
42 1967-01-01 1.479412638
43 1968-01-01 1.199000726
44 1969-01-01 0.282942042
45 1970-01-01 -0.181724504
46 1971-01-01 0.012170186
47 1972-01-01 -0.095891043
48 1973-01-01 -0.075384446
49 1974-01-01 -0.156668145
50 1975-01-01 -0.303023258
51 1976-01-01 -0.516027310
52 1977-01-01 -0.826791524
53 1980-01-01 -0.947112221
54 1981-01-01 -1.634878300
55 1982-01-01 -1.955298323
56 1987-01-01 -1.854447550
57 1988-01-01 -1.458955443
58 1989-01-01 -1.256102245
59 1990-01-01 -0.864108585
60 1991-01-01 -1.293373024
61 1992-01-01 -1.049530431
62 1993-01-01 -1.002526230
63 1994-01-01 -0.868783614
64 1995-01-01 -1.081858981
65 1996-01-01 -1.302103374
66 1997-01-01 -1.288048194
67 1998-01-01 -1.455750340
68 1999-01-01 -1.015467069
69 2000-01-01 -0.682789640
70 2001-01-01 -0.811058004
71 2002-01-01 -0.972374057
72 2003-01-01 -0.536505225
73 2004-01-01 -0.518686263
74 2005-01-01 -0.976298621
75 2006-01-01 -0.946429713
I would like plot the data in this kind:
where on x axes there is column x of df, and on y axes column acc.
Is possible plot it with ggplot?
I tried with this code:
ggplot(df,aes(x=x,y=acc))+
geom_linerange(data =df , aes(colour = ifelse(acc <0, "blue", "red")),ymin=min(df),ymax=max(cdf))
but the result is this:
Please, how I can do it?
Is this what you want? I'm not sure.
ggplot(data = df,mapping = aes(x,acc))+geom_segment(data = df , mapping = aes(x=x,y=ystart,xend=x,yend=acc,color=col))
df$x=year(as.Date(df$x))
df$ystart=0
df$col=ifelse(df$acc>=0,"blue","red")

Mean and SD in R

maybe it is a very easy question. This is my data.frame:
> read.table("text.txt")
V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860
It means that I have 22516 sequences with length 26, 17129 sequences with length 28, etc. I would like to know the sequence length mean and its standard deviation. I know how to do it, but I know to do it creating a list full of 26 repeated 22516 times and so on... and then compute the mean and SD. However, I thing there is a easier method. Any idea?
Thanks.
For mean: (V1 %*% V2)/sum(V2)
For SD: sqrt(((V1-(V1 %*% V2)/sum(V2))**2 %*% V2)/sum(V2))
I do not find mean(rep(V1,V2)) # 61.902 and sd(rep(V1,V2)) # 14.23891 that complex, but alternatively you might try:
weighted.mean(V1,V2) # 61.902
# recipe from http://www.ltcconline.net/greenl/courses/201/descstat/meansdgrouped.htm
sqrt((sum((V1^2)*V2)-(sum(V1*V2)^2)/sum(V2))/(sum(V2)-1)) # 14.23891
Step1: Set up data:
dat.df <- read.table(text="id V1 V2
1 26 22516
2 28 17129
3 30 38470
4 32 12920
5 34 30835
6 36 36244
7 38 24482
8 40 67482
9 42 23121
10 44 51643
11 46 61064
12 48 37678
13 50 98817
14 52 31741
15 54 74672
16 56 85648
17 58 53813
18 60 135534
19 62 46621
20 64 89266
21 66 99818
22 68 60071
23 70 168558
24 72 67059
25 74 194730
26 76 278473
27 78 217860",header=T)
Step2: Convert to data.table (only for simplicity and laziness in typing)
library(data.table)
dat <- data.table(dat.df)
Step3: Set up new columns with products, and use them to find mean
dat[,pr:=V1*V2]
dat[,v1sq:=as.numeric(V1*V1*V2)]
dat.Mean <- sum(dat$pr)/sum(dat$V2)
dat.SD <- sqrt( (sum(dat$v1sq)/sum(dat$V2)) - dat.Mean^2)
Hope this helps!!
MEAN = (V1*V2)/sum(V2)
SD = sqrt((V1*V1*V2)/sum(V2) - MEAN^2)

Resources