Lookup table based on multiple conditions in R

Lookup table based on multiple conditions in R - r

Thank you for taking a look at my question!
I have the following (dummy) data for patient performance on 3 tasks:
patient_df = data.frame(id = seq(1:5),
age = c(30,72,46,63,58),
education = c(11, 22, 18, 12, 14),
task1 = c(21, 28, 20, 24, 22),
task2 = c(15, 15, 10, 11, 14),
task3 = c(82, 60, 74, 78, 78))
> patient_df
id age education task1 task2 task3
1 1 30 11 21 15 82
2 2 72 22 28 15 60
3 3 46 18 20 10 74
4 4 63 12 24 11 78
5 5 58 14 22 14 78
I also have the following (dummy) lookup table for age and education-based cutoff values to define a patient's performance as impaired or not impaired on each task:
cutoffs = data.frame(age = rep(seq(from = 35, to = 70, by = 5), 2),
education = c(rep("<16", 8), rep(">=16",8)),
task1_cutoff = c(rep(24, 16)),
task2_cutoff = c(11,11,11,11,10,10,10,10,9,13,13,13,13,12,12,11),
task3_cutoff = c(rep(71,8), 70, rep(74,2), rep(73, 5)))
> cutoffs
age education task1_cutoff task2_cutoff task3_cutoff
1 35 <16 24 11 71
2 40 <16 24 11 71
3 45 <16 24 11 71
4 50 <16 24 11 71
5 55 <16 24 10 71
6 60 <16 24 10 71
7 65 <16 24 10 71
8 70 <16 24 10 71
9 35 >=16 24 9 70
10 40 >=16 24 13 74
11 45 >=16 24 13 74
12 50 >=16 24 13 73
13 55 >=16 24 13 73
14 60 >=16 24 12 73
15 65 >=16 24 12 73
16 70 >=16 24 11 73
My goal is to create 3 new variables in patient_df that indicate whether or not a patient is impaired on each task with a binary indicator. For example, for id=1 in patient_df, their age is <=35 and their education is <16 years, so the cutoff value for task1 would be 24, the cutoff value for task2 would be 11, and the cutoff value for task3 would be 71, such that scores below these values would denote impairment.
I would like to do this for each id by referencing the age and education-associated cutoff value in the cutoff dataset, so that the outcome would look something like this:
> goal_patient_df
id age education task1 task2 task3 task1_impaired task2_impaired task3_impaired
1 1 30 11 21 15 82 1 1 0
2 2 72 22 28 15 60 0 0 1
3 3 46 18 20 10 74 1 1 0
4 4 63 12 24 11 78 1 0 0
5 5 58 14 22 14 78 1 0 0
In actuality, my patient_df has 600+ patients and there are 7+ tasks each with age- and education-associated cutoff values, so a 'clean' way of doing this would be greatly appreciated! My only alternative that I can think of right now is writing a TON of if_else statements or case_whens which would not be incredibly reproducible for anyone else who would use my code :(
Thank you in advance!

I would recommend putting both your lookup table and patient_df dataframe in long form. I think that might be easier to manage with multiple tasks.
Your education column is numeric; so converting to character "<16" or ">=16" will help with matching in lookup table.
Using fuzzy_inner_join will match data with lookup table where task and education match exactly == but age will between an age_low and age_high if you specify a range of ages for each lookup table row.
Finally, impaired is calculated comparing the values from the two data frames for the particular task.
Please note for output, id of 1 is missing, as falls outside of age range from lookup table. You can add more rows to that table to address this.
library(tidyverse)
library(fuzzyjoin)
cutoffs_long <- cutoffs %>%
pivot_longer(cols = starts_with("task"), names_to = "task", values_to = "cutoff_value", names_pattern = "task(\\d+)") %>%
mutate(age_low = age,
age_high = age + 4) %>%
select(-age)
patient_df %>%
pivot_longer(cols = starts_with("task"), names_to = "task", values_to = "patient_value", names_pattern = "(\\d+)") %>%
mutate(education = ifelse(education < 16, "<16", ">=16")) %>%
fuzzy_inner_join(cutoffs_long, by = c("age" = "age_low", "age" = "age_high", "education", "task"), match_fun = list(`>=`, `<=`, `==`, `==`)) %>%
mutate(impaired = +(patient_value < cutoff_value))
Output
# A tibble: 12 x 11
id age education.x task.x patient_value education.y task.y cutoff_value age_low age_high impaired
<int> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <int>
1 2 72 >=16 1 28 >=16 1 24 70 74 0
2 2 72 >=16 2 15 >=16 2 11 70 74 0
3 2 72 >=16 3 60 >=16 3 73 70 74 1
4 3 46 >=16 1 20 >=16 1 24 45 49 1
5 3 46 >=16 2 10 >=16 2 13 45 49 1
6 3 46 >=16 3 74 >=16 3 74 45 49 0
7 4 63 <16 1 24 <16 1 24 60 64 0
8 4 63 <16 2 11 <16 2 10 60 64 0
9 4 63 <16 3 78 <16 3 71 60 64 0
10 5 58 <16 1 22 <16 1 24 55 59 1
11 5 58 <16 2 14 <16 2 10 55 59 0
12 5 58 <16 3 78 <16 3 71 55 59 0

Related

R - create vector with sequence c(1,4,5,8,9,12,13,16),etc

We are looking to create a vector with the following sequence:
1,4,5,8,9,12,13,16,17,20,21,...
Start with 1, then skip 2 numbers, then add 2 numbers, then skip 2 numbers, etc., not going above 2000. We also need the inverse sequence 2,3,6,7,10,11,...

We may use recyling vector to filter the sequence
(1:21)[c(TRUE, FALSE, FALSE, TRUE)]
[1] 1 4 5 8 9 12 13 16 17 20 21

Here's an approach using rep and cumsum. Effectively, "add up alternating increments of 1 (successive #s) and 3 (skip two)."
cumsum(rep(c(1,3), 500))
and
cumsum(rep(c(3,1), 500)) - 1

Got this one myself - head(sort(c(seq(1, 2000, 4), seq(4, 2000, 4))), 20)

We can try like below
> (v <- seq(21))[v %% 4 %in% c(0, 1)]
[1] 1 4 5 8 9 12 13 16 17 20 21

You may arrange the data in a matrix and extract 1st and 4th column.
val <- 1:100
sort(c(matrix(val, ncol = 4, byrow = TRUE)[, c(1, 4)]))
# [1] 1 4 5 8 9 12 13 16 17 20 21 24 25 28 29 32 33
#[18] 36 37 40 41 44 45 48 49 52 53 56 57 60 61 64 65 68
#[35] 69 72 73 76 77 80 81 84 85 88 89 92 93 96 97 100

A tidyverse option.
library(purrr)
library(dplyr)
map_int(1:11, ~ case_when(. == 1 ~ as.integer(1),
. %% 2 == 0 ~ as.integer(.*2),
T ~ as.integer((.*2)-1)))
# [1] 1 4 5 8 9 12 13 16 17 20 21

Calculate area under the curve for time serie data

I want to calculate the area under the curve for the time points for each id and column. Any suggestions? Which R packages to use? Many thanks!
id <- rep(1:3,each=5)
time <- rep(c(10,20,30,40,50),3)
q1 <- sample(100,15, replace=T)
q2 <- sample(100,15, replace=T)
q3 <- sample(100,15, replace=T)
df <- data.frame(id,time,q1,q2,q3)
df
id time q1 q2 q3
1 10 38 55 38
1 20 46 29 88
1 30 16 28 97
1 40 37 20 81
1 50 59 27 42
2 10 82 81 54
2 20 45 3 23
2 30 82 67 59
2 40 27 3 42
2 50 45 71 45
3 10 39 8 29
3 20 12 6 90
3 30 92 11 7
3 40 52 8 37
3 50 81 57 80
Wanted output, something like this:
q1 q2 q3
1 area area area
2 area area area
3 area area area

library(tidyverse)
id <- rep(1:3,each=5)
time <- rep(c(10,20,30,40,50),3)
q1 <- sample(100,15, replace=T)
q2 <- sample(100,15, replace=T)
q3 <- sample(100,15, replace=T)
df <- data.frame(id,time,q1,q2,q3)
df %>%
arrange(time) %>%
pivot_longer(cols = c(q1, q2, q3)) -> longer_df
longer_df %>%
ggplot(aes(x = time, y = value, col = factor(id))) +
geom_line() +
geom_point() +
facet_wrap(. ~ name)
longer_df %>%
group_by(id, name) %>%
mutate(lag_value = lag(value),
midpoint_value = (value + lag_value)/2) %>%
summarize(area = 10*sum(midpoint_value, na.rm = T)) %>%
pivot_wider(values_from = area)
#> `summarise()` has grouped output by 'id'. You can override using the `.groups` argument.
#> # A tibble: 3 x 4
#> # Groups: id [3]
#> id q1 q2 q3
#> <int> <dbl> <dbl> <dbl>
#> 1 1 1960 1980 2075
#> 2 2 1025 2215 2180
#> 3 3 2105 1590 2110
Created on 2021-06-30 by the reprex package (v2.0.0)

Here I will use the trapz function to calculate the integral.
library(data.table)
library(caTools) # integrate with its trapz function
# data
df <- fread("id time q1 q2 q3
1 10 38 55 38
1 20 46 29 88
1 30 16 28 97
1 40 37 20 81
1 50 59 27 42
2 10 82 81 54
2 20 45 3 23
2 30 82 67 59
2 40 27 3 42
2 50 45 71 45
3 10 39 8 29
3 20 12 6 90
3 30 92 11 7
3 40 52 8 37
3 50 81 57 80")
# calculate the area with `trapz`
df[,lapply(.SD[,2:4], function(y) trapz(time,y)),by=id]
#> id q1 q2 q3
#> 1: 1 1475 1180 3060
#> 2: 2 2175 1490 1735
#> 3: 3 2160 575 1885
Created on 2021-06-30 by the reprex package (v2.0.0)

How can I transform multiple repeated measures from wide to long format?

I have a data set that looks like that:
id <- c(1:3)
gender <- factor(c("male","female","female"))
age <- c(51,69,44)
cortisol_1 <- c(23,32,54)
cortisol_2 <- c(34,52,49)
cortisol_3 <- c(34,65,12)
blood_1 <- c(12,64,54)
blood_2 <- c(52,32,75)
blood_3 <- c(12,12,75)
temp_1 <- c(38.5,38.7,37.9)
temp_3 <- c(36.5,36.4,37.1)
df <- data.frame(id,gender,age,cortisol_1,cortisol_2,cortisol_3,blood_1,blood_2,blood_3,temp_1,temp_3)
df
id gender age cortisol_1 cortisol_2 cortisol_3 blood_1 blood_2 blood_3 temp_1 temp_3
1 1 male 51 23 34 34 12 52 12 38.5 36.5
2 2 female 69 32 52 65 64 32 12 38.7 36.4
3 3 female 44 54 49 12 54 75 75 37.9 37.1
So I have cortisol level and blood pressure which were measured annually at three time points. However, body temperature was only assessed at baseline and wave 3.
How can I change the data structure from wide to long? I would hope that the data looks like that:
id gender wave cortisol blood temp
1 1 male 1 23 12 38.5
2 1 male 2 34 52 NA
3 1 male 3 34 12 36.5
4 2 female 1 32 64 37.7
5 2 female 2 52 32 NA
6 2 female 3 65 12 36.4
7 3 female 1 54 54 37.9
8 3 female 2 49 75 NA
9 3 female 3 12 75 37.1
Best
Jascha

We can use pivot_longer
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -c(id, gender, age),
names_to = c('.value', 'grp'), names_sep = "_") %>%
select(-grp)
-output
# A tibble: 9 x 6
# id gender age cortisol blood temp
# <int> <fct> <dbl> <dbl> <dbl> <dbl>
#1 1 male 51 23 12 38.5
#2 1 male 51 34 52 NA
#3 1 male 51 34 12 36.5
#4 2 female 69 32 64 38.7
#5 2 female 69 52 32 NA
#6 2 female 69 65 12 36.4
#7 3 female 44 54 54 37.9
#8 3 female 44 49 75 NA
#9 3 female 44 12 75 37.1

Rowwise Column Count in Dataframe

Let's say I have the following dataframe
country_df <- tibble(
population = c(328, 38, 30, 56, 1393, 126, 57),
population2 = c(133, 12, 99, 83, 1033, 101, 33),
population3 = c(89, 39, 33, 56, 193, 126, 58),
pop = 45
)
All I need is a concise way inside the mutate function to get the number of columns (population to population3) that are greater than the value of the pop column within each row.
So what I need is the following results (more specifically the GreaterTotal column) Note: I can get the answer by working through each column but it would take a while with more columns)
population population2 population3 pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
<dbl> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <int>
1 328 133 89 45 TRUE TRUE TRUE 3
2 38 12 39 45 FALSE FALSE FALSE 0
3 30 99 33 45 FALSE TRUE FALSE 1
4 56 83 56 45 TRUE TRUE TRUE 3
5 1393 1033 193 45 TRUE TRUE TRUE 3
6 126 101 126 45 TRUE TRUE TRUE 3
7 57 33 58 45 TRUE FALSE TRUE 2
I've tried using apply with the row index, but I can't get at it. Can somebody please point me in the right direction?

You can select the 'Population' columns and compare those column with pop and use rowSums to count how many of them are greater in each row.
cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)
# population population2 population3 pop GreaterTotal
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 328 133 89 45 3
#2 38 12 39 45 0
#3 30 99 33 45 1
#4 56 83 56 45 3
#5 1393 1033 193 45 3
#6 126 101 126 45 3
#7 57 33 58 45 2
In dplyr 1.0.0, you can do this with rowwise and c_across :
country_df %>%
rowwise() %>%
mutate(GreaterTotal = sum(c_across(population:population3) > pop))

Using tidyverse, we can do
library(dplyr)
country_df %>%
mutate(GreaterTotal = rowSums(select(.,
starts_with('population')) > .$pop) )
-output
# A tibble: 7 x 5
# population population2 population3 pop GreaterTotal
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 328 133 89 45 3
#2 38 12 39 45 0
#3 30 99 33 45 1
#4 56 83 56 45 3
#5 1393 1033 193 45 3
#6 126 101 126 45 3
#7 57 33 58 45 2

geom_ribbon error: Aesthetics must either be length one

My question is similar to Fill region between two loess-smoothed lines in R with ggplot1
But I have two groups.
g1<-ggplot(NVIQ_predict,aes(cogn.age, predict, color=as.factor(NVIQ_predict$group)))+
geom_smooth(aes(x = cogn.age, y = upper,group=group),se=F)+
geom_line(aes(linetype = group), size = 0.8)+
geom_smooth(aes(x = cogn.age, y = lower,group=group),se=F)
I want to fill red and blue for each group.
I tried:
gg1 <- ggplot_build(g1)
df2 <- data.frame(x = gg1$data[[1]]$x,
ymin = gg1$data[[1]]$y,
ymax = gg1$data[[3]]$y)
g1 + geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax),fill = "grey", alpha = 0.4)
But it gave me the error: Aesthetics must either be length one, or the same length as the dataProblems
I get the same error every time my geom_ribbon() data and ggplot() data differ.
Can somebody help me with it? Thank you so much!
My data looks like:
> NVIQ_predict
cogn.age predict upper lower group
1 7 39.04942 86.68497 18.00000 1
2 8 38.34993 82.29627 18.00000 1
3 10 37.05174 74.31657 18.00000 1
4 11 36.45297 70.72421 18.00000 1
5 12 35.88770 67.39555 18.00000 1
6 13 35.35587 64.32920 18.00000 1
7 14 34.85738 61.52322 18.00000 1
8 16 33.95991 56.68024 18.00000 1
9 17 33.56057 54.63537 18.00000 1
10 18 33.19388 52.83504 18.00000 1
11 19 32.85958 51.27380 18.00000 1
12 20 32.55752 49.94791 18.00000 1
13 21 32.28766 48.85631 18.00000 1
14 24 31.67593 47.09206 18.00000 1
15 25 31.53239 46.91136 18.00000 1
16 28 31.28740 48.01764 18.00000 1
17 32 31.36627 50.55201 18.00000 1
18 35 31.73386 53.19630 18.00000 1
19 36 31.91487 54.22624 18.00000 1
20 37 32.13026 55.25721 18.00000 1
21 38 32.38237 56.26713 18.00000 1
22 40 32.98499 58.36229 18.00000 1
23 44 34.59044 62.80187 18.00000 1
24 45 35.06804 64.01951 18.00000 1
25 46 35.57110 65.31888 18.00000 1
26 47 36.09880 66.64696 17.93800 1
27 48 36.72294 67.60053 17.97550 1
28 49 37.39182 68.49995 18.03062 1
29 50 38.10376 69.35728 18.10675 1
30 51 38.85760 70.17693 18.18661 1
31 52 39.65347 70.95875 18.27524 1
32 53 40.49156 71.70261 18.38020 1
33 54 41.35332 72.44006 17.90682 1
34 59 46.37849 74.91802 18.63206 1
35 60 47.53897 75.66218 19.64432 1
36 61 48.74697 76.43933 20.82346 1
37 63 51.30607 78.02426 23.73535 1
38 71 63.43129 86.05467 40.43482 1
39 72 65.15618 87.44794 42.72704 1
40 73 66.92714 88.95324 45.01966 1
41 84 89.42079 114.27939 68.03834 1
42 85 91.73831 117.44007 69.83676 1
43 7 33.69504 54.03695 15.74588 2
44 8 34.99931 53.96500 18.00533 2
45 10 37.61963 54.05684 22.43516 2
46 11 38.93493 54.21969 24.60049 2
47 12 40.25315 54.45963 26.73027 2
48 13 41.57397 54.77581 28.82348 2
49 14 42.89710 55.16727 30.87982 2
50 16 45.54954 56.17193 34.88453 2
51 17 46.87877 56.78325 36.83632 2
52 18 48.21025 57.46656 38.75807 2
53 19 49.54461 58.22266 40.65330 2
54 20 50.88313 59.05509 42.52505 2
55 21 52.22789 59.97318 44.36944 2
56 24 56.24397 63.21832 49.26963 2
57 25 57.55394 64.33850 50.76938 2
58 28 61.45282 68.05043 54.85522 2
59 32 66.44875 72.85234 60.04517 2
60 35 69.96560 76.06171 63.86949 2
61 36 71.09268 77.06821 65.11714 2
62 37 72.19743 78.04559 66.34927 2
63 38 73.28041 78.99518 67.56565 2
64 40 75.37861 80.81593 69.94129 2
65 44 79.29028 84.20275 74.37780 2
66 45 80.20272 85.00888 75.39656 2
67 46 81.08645 85.80180 76.37110 2
68 47 81.93696 86.57689 77.29704 2
69 48 82.75920 87.34100 78.17739 2
70 49 83.55055 88.09165 79.00945 2
71 50 84.30962 88.82357 79.79567 2
72 51 85.03743 89.53669 80.53817 2
73 52 85.73757 90.23223 81.24291 2
74 53 86.41419 90.91607 81.91232 2
75 54 87.05716 91.58632 82.52800 2
76 59 89.75923 94.58218 84.93629 2
77 60 90.18557 95.05573 85.31541 2
78 61 90.58166 95.51469 85.64864 2
79 63 91.27115 96.31107 86.23124 2
80 71 92.40983 98.35031 86.46934 2
81 72 92.36362 98.52258 86.20465 2
82 73 92.27734 98.67161 85.88308 2
83 84 88.66150 98.84699 78.47602 2
84 85 88.08846 98.73625 77.44067 2
According to Gregor, I tried inherit.aes = FALSE, the error is gone. But my plot looks like:

We've got all the info we need. Now we just need to, ahem, connect the dots ;-)
First the input data:
NVIQ_predict <- read.table(text = "
id cogn.age predict upper lower group
1 7 39.04942 86.68497 18.00000 1
2 8 38.34993 82.29627 18.00000 1
3 10 37.05174 74.31657 18.00000 1
4 11 36.45297 70.72421 18.00000 1
5 12 35.88770 67.39555 18.00000 1
6 13 35.35587 64.32920 18.00000 1
7 14 34.85738 61.52322 18.00000 1
8 16 33.95991 56.68024 18.00000 1
9 17 33.56057 54.63537 18.00000 1
10 18 33.19388 52.83504 18.00000 1
11 19 32.85958 51.27380 18.00000 1
12 20 32.55752 49.94791 18.00000 1
13 21 32.28766 48.85631 18.00000 1
14 24 31.67593 47.09206 18.00000 1
15 25 31.53239 46.91136 18.00000 1
16 28 31.28740 48.01764 18.00000 1
17 32 31.36627 50.55201 18.00000 1
18 35 31.73386 53.19630 18.00000 1
19 36 31.91487 54.22624 18.00000 1
20 37 32.13026 55.25721 18.00000 1
21 38 32.38237 56.26713 18.00000 1
22 40 32.98499 58.36229 18.00000 1
23 44 34.59044 62.80187 18.00000 1
24 45 35.06804 64.01951 18.00000 1
25 46 35.57110 65.31888 18.00000 1
26 47 36.09880 66.64696 17.93800 1
27 48 36.72294 67.60053 17.97550 1
28 49 37.39182 68.49995 18.03062 1
29 50 38.10376 69.35728 18.10675 1
30 51 38.85760 70.17693 18.18661 1
31 52 39.65347 70.95875 18.27524 1
32 53 40.49156 71.70261 18.38020 1
33 54 41.35332 72.44006 17.90682 1
34 59 46.37849 74.91802 18.63206 1
35 60 47.53897 75.66218 19.64432 1
36 61 48.74697 76.43933 20.82346 1
37 63 51.30607 78.02426 23.73535 1
38 71 63.43129 86.05467 40.43482 1
39 72 65.15618 87.44794 42.72704 1
40 73 66.92714 88.95324 45.01966 1
41 84 89.42079 114.27939 68.03834 1
42 85 91.73831 117.44007 69.83676 1
43 7 33.69504 54.03695 15.74588 2
44 8 34.99931 53.96500 18.00533 2
45 10 37.61963 54.05684 22.43516 2
46 11 38.93493 54.21969 24.60049 2
47 12 40.25315 54.45963 26.73027 2
48 13 41.57397 54.77581 28.82348 2
49 14 42.89710 55.16727 30.87982 2
50 16 45.54954 56.17193 34.88453 2
51 17 46.87877 56.78325 36.83632 2
52 18 48.21025 57.46656 38.75807 2
53 19 49.54461 58.22266 40.65330 2
54 20 50.88313 59.05509 42.52505 2
55 21 52.22789 59.97318 44.36944 2
56 24 56.24397 63.21832 49.26963 2
57 25 57.55394 64.33850 50.76938 2
58 28 61.45282 68.05043 54.85522 2
59 32 66.44875 72.85234 60.04517 2
60 35 69.96560 76.06171 63.86949 2
61 36 71.09268 77.06821 65.11714 2
62 37 72.19743 78.04559 66.34927 2
63 38 73.28041 78.99518 67.56565 2
64 40 75.37861 80.81593 69.94129 2
65 44 79.29028 84.20275 74.37780 2
66 45 80.20272 85.00888 75.39656 2
67 46 81.08645 85.80180 76.37110 2
68 47 81.93696 86.57689 77.29704 2
69 48 82.75920 87.34100 78.17739 2
70 49 83.55055 88.09165 79.00945 2
71 50 84.30962 88.82357 79.79567 2
72 51 85.03743 89.53669 80.53817 2
73 52 85.73757 90.23223 81.24291 2
74 53 86.41419 90.91607 81.91232 2
75 54 87.05716 91.58632 82.52800 2
76 59 89.75923 94.58218 84.93629 2
77 60 90.18557 95.05573 85.31541 2
78 61 90.58166 95.51469 85.64864 2
79 63 91.27115 96.31107 86.23124 2
80 71 92.40983 98.35031 86.46934 2
81 72 92.36362 98.52258 86.20465 2
82 73 92.27734 98.67161 85.88308 2
83 84 88.66150 98.84699 78.47602 2
84 85 88.08846 98.73625 77.44067 2", header = TRUE)
NVIQ_predict$id <- NULL
Make sure the group column is a factor variable, so we can use it as a line type.
NVIQ_predict$group <- as.factor(NVIQ_predict$group)
Then build the plot.
library(ggplot2)
g1 <- ggplot(NVIQ_predict, aes(cogn.age, predict, color=group)) +
geom_smooth(aes(x = cogn.age, y = upper, group=group), method = loess, se = FALSE) +
geom_smooth(aes(x = cogn.age, y = lower, group=group), method = loess, se = FALSE) +
geom_line(aes(linetype = group), size = 0.8)
Finally, extract the (x,ymin) and (x,ymax) coordinates of the curves for group 1 as well as group 2. These pairs have identical x-coordinates, so connecting those points mimics shading the areas between both curves. This was explained in Fill region between two loess-smoothed lines in R with ggplot. The only difference here is that we need to be a bit more careful to select and connect the points that belong to the correct curves...
gp <- ggplot_build(g1)
d1 <- gp$data[[1]]
d2 <- gp$data[[2]]
df1 <- data.frame(x = d1[d1$group == 1,]$x,
ymin = d2[d2$group == 1,]$y,
ymax = d1[d1$group == 1,]$y)
df2 <- data.frame(x = d1[d1$group == 2,]$x,
ymin = d2[d2$group == 2,]$y,
ymax = d1[d1$group == 2,]$y)
g1 + geom_ribbon(data = df1, aes(x = x, ymin = ymin, ymax = ymax), inherit.aes = FALSE, fill = "grey", alpha = 0.4) +
geom_ribbon(data = df2, aes(x = x, ymin = ymin, ymax = ymax), inherit.aes = FALSE, fill = "grey", alpha = 0.4)
The result looks like this:

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Lookup table based on multiple conditions in R - r

Related

R - create vector with sequence c(1,4,5,8,9,12,13,16),etc

Calculate area under the curve for time serie data

How can I transform multiple repeated measures from wide to long format?

Rowwise Column Count in Dataframe

geom_ribbon error: Aesthetics must either be length one

Categories

Resources