Sparse Functional Data Plot - r

I'm wondering how to reproduce the following figure using R.
The data used in the figure are sparse functional data of bone mineral density. Basically each participant's bone mineral level is observed a few times during the experiment. But the observation times and number of observations for each participant are different.
The figure is from article 'Principal component models for sparse functional data'.
You can find it here Principal component models for sparse functional data or Principal component models for sparse functional data

You could reproduce the figure with made-up data like this:
library(ggplot2)
# Create sample data
set.seed(8) # Makes data reproducible
ages <- runif(40, 8, 24)
df <- do.call(rbind, lapply(seq_along(ages), function(x) {
age <- ages[x] + cumsum(runif(sample(2:5, 1), 1, 2))
y <- (tanh((age - 10)/pi - pi/2) + 2.5)/3
y <- y + rnorm(1, 0, 0.1)
y <- y + cumsum(rnorm(length(y), 0, 0.02))
data.frame(ID = x, age = age, BMD = y)
}))
# Draw plot
ggplot(df, aes(x = age, y = BMD)) +
geom_path(aes(group = ID), color = 'gray70', na.rm = TRUE) +
geom_point(color = 'gray70', na.rm = TRUE) +
geom_smooth(color = 'black', se = FALSE, formula =y ~ s(x, bs = "cs"),
method = 'gam', na.rm = TRUE) +
theme_classic(base_size = 16) +
scale_x_continuous(limits = c(8, 28)) +
labs(y = 'Spinal Bone Density', x = 'Age') +
theme(panel.border = element_rect(fill = NA))
Without knowing your own data structure however, it's difficult to say how applicable you will find this to your own use case.

You can do this in ggplot2 as long as you have data in long format and with a grouping variable such as id in my example:
dat <- tibble::tribble(
~id, ~age, ~bone_dens,
1, 10, 0.6,
1, 15, 0.8,
1, 19, 1.12,
2, 11, 0.7,
2, 18, 1.1,
3, 16, 1.1,
3, 18, 1.2,
3, 25, 1.0)
You first plot the dots with geom_point(), then you add the lines that join dots with the same id with geom_line():
dat |>
ggplot(aes(x = age, y = bone_dens)) +
geom_point() +
geom_line(aes(group = id))
Output will look like this - you'll be able to customise it like any other ggplot.

Related

R: How to Fill Points in ggplot2 with a variable

I am attempting to make a ggplot2 scatter plot that is grouped by bins in R. I successfully made the first model, which I did not try to alter the fill for. But when I tried to have the fill of the scatter plot be based upon my variable (Miss.) ,which is a numeric value ranging from 0.00 to 0.46, it essentially ignores the heat map scale and turns everything gray.
ggplot(data = RightFB, mapping = aes(x = TMHrzBrk, y = TMIndVertBrk))+
geom_bin_2d(bins = 15)+
scale_fill_continuous(type = "viridis")+
ylim(5, 20)+
xlim(0,15)+
coord_fixed(1.3)
ggplot(data = RightFB, mapping = aes(x = TMHrzBrk, y = TMIndVertBrk, fill
=Miss.))+
geom_bin_2d(bins = 15)+
scale_fill_continuous(type = "viridis")+
ylim(5, 20)+
xlim(0,15)+
coord_fixed(1.3)
I appreciate any help! Thanks!
I think I understand your problem, so let's replicate it with a reproducible example. Obviously we don't have your data, but the following data frame has the same names, types and ranges as your own data, so this walk-through should work for you.
set.seed(1)
RightFB <- data.frame(TMHrzBrk = runif(1000, 0, 15),
TMIndVertBrk = runif(1000, 5, 20),
Miss. = runif(1000, 0, 0.46))
Your first plot will look something like this:
library(tidyverse)
ggplot(data = RightFB, mapping = aes(x = TMHrzBrk, y = TMIndVertBrk)) +
geom_bin_2d(bins = 15) +
scale_fill_continuous(type = "viridis") +
ylim(5, 20) +
xlim(0, 15) +
coord_fixed(1.3)
#> Warning: Removed 56 rows containing missing values (`geom_tile()`).
Here, the fill colors represent the counts of observations within each bin. But if you try to map the fill to Miss., you get all gray squares:
ggplot(data = RightFB, mapping = aes(x = TMHrzBrk, y = TMIndVertBrk,
fill = Miss.)) +
geom_bin_2d(bins = 15) +
scale_fill_continuous(type = "viridis") +
ylim(5, 20) +
xlim(0, 15) +
coord_fixed(1.3)
#> Warning: The following aesthetics were dropped during statistical transformation: fill
#> i This can happen when ggplot fails to infer the correct grouping structure in
#> the data.
#> i Did you forget to specify a `group` aesthetic or to convert a numerical
#> variable into a factor?
#> Removed 56 rows containing missing values (`geom_tile()`).
The reason this happens is that by default geom_bin_2d calculates the bins and the counts within each bin to get the fill variable. There are multiple observations within each bin, and they all have a different value of Miss. . Furthermore, geom_bin_2d doesn't know what you want to do with this variable. My guess is that you are looking for the average of Miss. within each bin, but this is difficult to achieve within the framework of geom_bin_2d.
The alternative is to calculate the bins yourself, get the average of Miss. in each bin, and plot as a geom_tile
RightFB %>%
mutate(TMHrzBrk = cut(TMHrzBrk, breaks = seq(0, 15, 1), seq(0.5, 14.5, 1)),
TMIndVertBrk = cut(TMIndVertBrk, seq(5, 20, 1), seq(5.5, 19.5, 1))) %>%
group_by(TMHrzBrk, TMIndVertBrk) %>%
summarize(Miss. = mean(Miss., na.rm = TRUE), .groups = "drop") %>%
mutate(across(TMHrzBrk:TMIndVertBrk, ~as.numeric(as.character(.x)))) %>%
ggplot(aes(x = TMHrzBrk, y = TMIndVertBrk, fill = Miss.)) +
geom_tile() +
scale_fill_continuous(type = "viridis") +
ylim(5, 20) +
xlim(0, 15) +
coord_fixed(1.3)
EDIT
With the link to the data in the comments, here is a full reprex:
library(tidyverse)
RightFB <- read.csv(paste0("https://raw.githubusercontent.com/rileyfeltner/",
"FB-Analysis/main/Right%20FB.csv"))
RightFB <- RightFB[c(2:6, 9, 11, 13, 18, 19)]
RightFB$Miss. <- as.numeric(as.character(RightFB$Miss.))
#> Warning: NAs introduced by coercion
RightFB$TMIndVertBrk <- as.numeric(as.character(RightFB$TMIndVertBrk))
#> Warning: NAs introduced by coercion
RightFB <- na.omit(RightFB)
RightFB1 <- filter(RightFB, P > 24)
RightFB %>%
mutate(TMHrzBrk = cut(TMHrzBrk, breaks = seq(0, 15, 1), seq(0.5, 14.5, 1)),
TMIndVertBrk = cut(TMIndVertBrk, seq(5, 20, 1), seq(5.5, 19.5, 1))) %>%
group_by(TMHrzBrk, TMIndVertBrk) %>%
summarize(Miss. = mean(Miss., na.rm = TRUE), .groups = "drop") %>%
mutate(across(TMHrzBrk:TMIndVertBrk, ~as.numeric(as.character(.x)))) %>%
ggplot(aes(x = TMHrzBrk, y = TMIndVertBrk, fill = Miss.)) +
geom_tile() +
scale_fill_continuous(type = "viridis") +
ylim(5, 20) +
xlim(0, 15) +
coord_fixed(1.3)
#> Warning: Removed 18 rows containing missing values (`geom_tile()`).
Created on 2022-11-23 with reprex v2.0.2

Loop violin plots of controls with clinical case data points overlaid (ggplot)

New to posting on here. Apologies if I miss including something needed to solve my situation.
I have a matched case-control design where three 'younger' clinical cases have been age-matched to a 'younger' control group, and three 'older' cases have been matched to an 'older' control group. I am plotting the control group distribution in a violin plot and overlaying the corresponding matched cases as data points.
I have a lot of variables and I would like to loop through them to minimise error and increase efficiency. I have had a go at writing the code for the loop but I am not sure what to do with the fact that I have two types of plots (violin and point) and two data frames (controls and cases) involved.
Here is the code I have for the plots:
#fake data
cases <- data.frame(
id = factor(1:6),
strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
id = 7:23,
strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
age = c(rep("young", 9), rep("old", 8))
)
#plots
ggplot(data = controls, aes(strange_stories_ToM_mean, age)) +
geom_violin(
trim = FALSE,
alpha = 0.2,
draw_quantiles = c(0.25, 0.5, 0.75),
fill = "gray90"
) +
geom_point(
data = cases,
aes(colour = id, shape = id), # map color/shape to individual cases
size = 5,
show.legend = FALSE
) +
scale_shape_manual(values=c(16, 17, 15, 16, 17, 15)) +
scale_colour_manual(values=c("deeppink1","indianred3", "blueviolet", "springgreen3", "chartreuse2", "darkgreen")) +
scale_size_manual(values=c(5, 4, 5, 5, 4, 5)) +
theme_classic()
ggsave("strange_stories_ToM_mean.svg", width = 8, height = 8, units = "cm")
I looked at using 'for' and created a list to loop through (what I have is below) but I came unstuck at where the list should be incorporated when two data frames are being used and two plots...could lapply be best?
variables <- list() # Create empty listfor(i in ncol(FTD_data)) { # Using for-loop to add all columns tolist variables[[i]] <- FTD_data[ , i]}
names(variables) <- colnames(FTD_data) #rename list elements with variable names from df
for (i in variables)
{CODE TO PLOT INSERT HERE}
One approach to achieve your desired result would be to put your plotting code inside a function which takes one argument, the name of the column to plot. The only change I made to your plotting code is to replace the hardcoded strange_stories_ToM_mean by .data[[col]] to tell ggplot I want to plot the data column whose name is stored in col.
Also, instead of using a for loop I would recommend to use lapply when using ggplot2:
library(ggplot2)
plot_fun <- function(col) {
ggplot(data = controls, aes(.data[[col]], age)) +
geom_violin(
trim = FALSE,
alpha = 0.2,
draw_quantiles = c(0.25, 0.5, 0.75),
fill = "gray90"
) +
geom_point(
data = cases,
aes(colour = id, shape = id),
size = 5,
show.legend = FALSE
) +
scale_shape_manual(values=c(16, 17, 15, 16, 17, 15)) +
scale_colour_manual(values=c("deeppink1","indianred3", "blueviolet", "springgreen3", "chartreuse2", "darkgreen")) +
scale_size_manual(values=c(5, 4, 5, 5, 4, 5)) +
theme_classic()
}
cols_to_plot <- names(controls)[!names(controls) %in% c("id", "age")]
names(cols_to_plot) <- cols_to_plot
p <- lapply(cols_to_plot, plot_fun)
lapply(cols_to_plot, function(x) ggsave(paste0(x, ".svg"), plot = p[[x]], width = 8, height = 8, units = "cm"))
#> $strange_stories_ToM_mean
#> [1] "strange_stories_ToM_mean.svg"
#>
#> $strange_stories_ToM_median
#> [1] "strange_stories_ToM_median.svg"
p
#> $strange_stories_ToM_mean
#>
#> $strange_stories_ToM_median
DATA
set.seed(123)
cases <- data.frame(
id = factor(1:6),
strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
strange_stories_ToM_median = sample(6:8, 6, replace = TRUE),
age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
id = 7:23,
strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
strange_stories_ToM_median = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
age = c(rep("young", 9), rep("old", 8))
)

Displaying geom_smooth() trend line from a specified x value

Suppose a dataset containing count data per multiple time periods and per multiple groups in the following format:
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
group week rate
1 1 1 604
2 1 2 598
3 1 3 578
4 1 4 591
5 1 5 589
6 1 6 571
7 1 7 581
8 1 8 597
9 1 9 589
10 1 10 584
I'm interested in fitting a model-based trend line per groups, however, I want this trend line to be displayed only from a certain x value. To visualize the trend line using all data points (requires ggplot2):
df %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
Or to fit a model based on a specific range of values (requires ggplot2 and dplyr):
df %>%
group_by(group) %>%
mutate(rate2 = ifelse(week < 35, NA, rate)) %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(aes(y = rate2),
method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
However, I cannot find a way to fit the models using all data, but display the trend line only from a specific x value (let's say 35+). Thus, I essentially want the trend line as computed for plot one, but displaying it according the second plot, using ggplot2 and ideally only one pipeline.
I went to look at the after_stat function mentioned by #tjebo. See if the following works for you?
df %>%
ggplot(aes(x = week,
y = rate,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
aes(group = after_stat(interaction(group, x > 35)),
colour = after_scale(alpha(colour, as.numeric(x > 35)))),
method.args = list(family = "quasipoisson"),
se = F)
This works by splitting the points associated with each line into two groups, those in the x <=35 region and those in the x >35 region, since a line's colour shouldn't vary, and defining a separate colour transparency for each new group. As a result, only the lines in the x > 35 region are visible.
When used, the code triggers a warning that the after_scale modification isn't applied to the legend. I don't think that's a problem though, since we don't need it to appear in the legend anyway.
If you can tolerate a warning, you can solve this with 1 line difference from the example code using stage().
library(tidyverse)
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
df %>%
ggplot(aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
aes(x = stage(week, after_stat = ifelse(x > 35, x, NA))),
se = FALSE)
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 165 rows containing missing values (geom_smooth).
One way to do this is to construct the fitted values outside of ggplot so you have control over them:
df$fit <- glm(rate ~ week + group, data = df, family = "quasipoisson")$fitted.values
library(dplyr)
library(ggplot2)
ggplot(df, aes(x = week, group = group, lty = group)) +
geom_line(aes(y = rate)) +
geom_point(aes(y = rate)) +
geom_line(data = df %>% filter(week >= 35), aes(y = fit), color = "blue", size = 1.25)
I am not sure if it is generally correct to use a linear model in time series. The whole point about time series is that they require specific statistics because of their expected autocorrelation. You might want something like average rolling models instead.
I am not sure if your visualisation might not be quite confusing and, more dangerously, misleading.
Besides, an interesting problem. I thought the new after_stat might somehow help, but I couldn't get it working.
So, here a quick hack. Change the order of your geom-s and draw a rectangle in-between. I am cheekily using a different theme, but if you really want to use theme_grey(), you can fake the axis lines as well.
library(tidyverse)
set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
week = rep(1:50, 3),
rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))
df %>%
ggplot(aes(x = week, y = rate, group = group, lty = group)) +
stat_smooth(se = FALSE) +
geom_rect(xmin = -Inf, xmax = 35, ymin = -Inf, ymax = Inf,
fill = "white") +
geom_line() +
geom_point() +
theme_classic()
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2021-02-09 by the reprex package (v1.0.0)
P.S. I've removed a few of the unnecessary bits in the code to reproduce this, like the model specs.
You could use ggplot_build to get the structure of the plot :
p <- ggplot(df, aes(x = week,
y = rate,
group = group,
lty = group)) +
geom_line() +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "quasipoisson"),
se = FALSE)
p_build <- ggplot_build(p)
You could then modify the internal data, here the third element of the data list (geom_smooth):
p_build$data[[3]]$x <- sapply(p_build$data[[3]]$x,function(x) {ifelse(x<35,NA,x)})
and use ggplot_gtable to regenerate the plot (the lm calculations still apply to the whole dataset):
plot(ggplot_gtable(p_build))

Adding trend lines across groups and setting tick labels in a grouped violin plot or box plot

I have xy grouped data that I'm plotting using R's ggplot2 geom_violin adding regression trend lines:
Here are the data:
library(dplyr)
library(plotly)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
And my current plot:
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal()
My questions are:
How do I get rid of the alpha part of the legend?
I would like the x-axis ticks to be df$group rather than df$group_age, which means a tick per each group at the center of that group where the label is group. Consider a situation where not all groups have all ages - for example, if a certain group has only two of the ages and I'm pretty sure ggplot will only present only these two ages, I'd like the tick to still be centered between their two ages.
One more question:
It would also be nice to have the p-values of each fitted slope plotted on top of each group.
I tried:
library(ggpmisc)
my.formula <- value ~ group_age
ggplot(df,aes(x=group_age,y=value,fill=age,color=age,alpha=0.5)) +
geom_violin() + geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(data=df,mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) + theme_minimal() +
stat_poly_eq(formula = my.formula,aes(label=stat(p.value.label)),parse=T)
But I get the same plot as above with the following warning message:
Warning message:
Computation failed in `stat_poly_eq()`:
argument "x" is missing, with no default
geom_smooth() fits a line, while stat_poly_eqn() issues an error. A factor is a categorical variable with unordered levels. A trend against a factor is undefined. geom_smooth() may be taking the levels and converting them to "arbitrary" numerical values, but these values are just indexes rather than meaningful values.
To obtain a plot similar to what is described in the question but using code that provides correct linear regression lines and the corresponding p-values I would use the code below. The main change is that the numerical variable time is mapped to x making the fitting of a regression a valid operation. To allow for a linear fit an x-scale with a log10 transformation is used, with breaks and labels at the ages for which data is available.
library(dplyr)
library(ggplot2)
library(ggpmisc)
set.seed(1)
df <-
data.frame(
value = c(
rnorm(500, 8, 1), rnorm(600, 6, 1.5), rnorm(400, 4, 0.5),
rnorm(500, 2, 2), rnorm(400, 4, 1), rnorm(600, 7, 0.5),
rnorm(500, 3, 1), rnorm(500, 3, 1), rnorm(500, 3, 1)
),
age = c(
rep("d3", 500), rep("d8", 600), rep("d24", 400),
rep("d3", 500), rep("d8", 400), rep("d24", 600),
rep("d3", 500), rep("d8", 500), rep("d24", 500)
),
group = c(rep("A", 1500), rep("B", 1500), rep("C", 1500))
) %>%
mutate(time = as.integer(gsub("d", "", age))) %>%
arrange(group, time) %>%
mutate(age = factor(age, levels = c("d3", "d8", "d24")),
group = factor(group))
my_formula = y ~ x
ggplot(df, aes(x = time, y = value)) +
geom_violin(aes(fill = age, color = age), alpha = 0.3) +
geom_boxplot(width = 0.1,
aes(color = age), fill = NA) +
geom_smooth(color = "black", formula = my_formula, method = 'lm') +
stat_poly_eq(aes(label = stat(p.value.label)),
formula = my_formula, parse = TRUE,
npcx = "center", npcy = "bottom") +
scale_x_log10(name = "Age", breaks = c(3, 8, 24)) +
facet_wrap(~group) +
theme_minimal()
Which creates the following figure:
Here is a solution. The alpha - legend issue is easy. Anything you place into the aes() functioning will get placed in a legend. This feature should be used when you want a feature of the data to be used as an aestetic. Putting alpha outside of an aes will remove it from the legend.
I'm not sure the x legend is what you wanted but i did it manually so it should be easy to configure.
Regarding the p.values, i did separate linear regressions and store the p.value in three different vectors which can be called into the ggplot using the annotate. For two of the groups the p.value was <.001 so the round functioning will round it to 0. Therefore, i just added p. <.001
Good luck with this!
library(dplyr)
library(ggplot2)
set.seed(1)
df <- data.frame(value = c(rnorm(500,8,1),rnorm(600,6,1.5),rnorm(400,4,0.5),rnorm(500,2,2),rnorm(400,4,1),rnorm(600,7,0.5),rnorm(500,3,1),rnorm(500,3,1),rnorm(500,3,1)),
age = c(rep("d3",500),rep("d8",600),rep("d24",400),rep("d3",500),rep("d8",400),rep("d24",600),rep("d3",500),rep("d8",500),rep("d24",500)),
group = c(rep("A",1500),rep("B",1500),rep("C",1500))) %>%
dplyr::mutate(time = as.integer(age)) %>%
dplyr::arrange(group,time) %>%
dplyr::mutate(group_age=paste0(group,"_",age))
df$group_age <- factor(df$group_age,levels=unique(df$group_age))
mod1 <- lm(value ~ time,df\[df$group == 'A',\])
mod1 <- summary(mod1)$coefficients\[8\] %>% round(2)
mod2 <- lm(value ~ time,df\[df$group == 'B',\])
mod2 <- summary(mod2)$coefficients\[8\] %>% round(2)
mod3 <- lm(value ~ time,df\[df$group == 'C',\])
mod3 <- summary(mod3)$coefficients\[8\] %>% round(2)
ggplot(df,aes(x=group_age,y=value,fill=age,color=age)) +
geom_violin(alpha=0.5) +
geom_boxplot(width=0.1,aes(fill=age,color=age,middle=mean(value))) +
geom_smooth(mapping=aes(x=group_age,y=value,group=group),color="black",method='lm',size=1,se=T) +
scale_x_discrete(labels = c('','A','','','B','','','C','')) +
annotate('text',x = 2,y = -1,label = paste('pvalue: <.001')) +
annotate('text',x = 6,y = 10,label = paste('pvalue: <.001')) +
annotate('text',x = 8,y = -1.2,label = paste('pvalue:',mod3))+
theme_minimal()

R - Overlay multiple least squares plots with colour coding

I'm trying to visualize some data that looks like this
line1 <- data.frame(x = c(4, 24), y = c(0, -0.42864), group = "group1")
line2 <- data.frame(x = c(4, 12 ,24), y = c(0, 2.04538, 3.4135), group = "group2")
line3 <- data.frame(x = c(4, 12, 24), y = c(0, 3.14633, 3.93718), group = "group3")
line4 <- data.frame(x = c(0, 3, 7, 12, 18), y = c(0, -0.50249, 0.11994, -0.68694, -0.98949), group = "group4")
line5 <- data.frame(x = c(0, 3, 7, 12, 18, 24), y = c(0, -0.55753, -0.66006, 0.43796, 1.38723, 3.17906), group = "group5")
df <- do.call(rbind, list(line1, line2, line3, line4, line5))
What I'm trying to do is plot the least squares line (and points) for each group on the same plot. And I'd like the colour of the lines and points to correspond to the group.
All I've been able to do is plot the points according to their group
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10))
But I have no idea how to add in the lines as well and make their colours correspond to the points that they are fitting.
I'd really appreciate any help with this. It's turning out to be so much harder than I though it would be.
You can simply add a geom_smooth layer to your plot
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10)) +
geom_smooth(method="lm",se=FALSE)
method="lm" specifies that you want a linear model
se=FALSE to avoid plotting confidence intervals

Resources