R - Overlay multiple least squares plots with colour coding - r

I'm trying to visualize some data that looks like this
line1 <- data.frame(x = c(4, 24), y = c(0, -0.42864), group = "group1")
line2 <- data.frame(x = c(4, 12 ,24), y = c(0, 2.04538, 3.4135), group = "group2")
line3 <- data.frame(x = c(4, 12, 24), y = c(0, 3.14633, 3.93718), group = "group3")
line4 <- data.frame(x = c(0, 3, 7, 12, 18), y = c(0, -0.50249, 0.11994, -0.68694, -0.98949), group = "group4")
line5 <- data.frame(x = c(0, 3, 7, 12, 18, 24), y = c(0, -0.55753, -0.66006, 0.43796, 1.38723, 3.17906), group = "group5")
df <- do.call(rbind, list(line1, line2, line3, line4, line5))
What I'm trying to do is plot the least squares line (and points) for each group on the same plot. And I'd like the colour of the lines and points to correspond to the group.
All I've been able to do is plot the points according to their group
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10))
But I have no idea how to add in the lines as well and make their colours correspond to the points that they are fitting.
I'd really appreciate any help with this. It's turning out to be so much harder than I though it would be.

You can simply add a geom_smooth layer to your plot
ggplot(data = df, aes(x, y, colour = group)) + geom_point(aes(size = 10)) +
geom_smooth(method="lm",se=FALSE)
method="lm" specifies that you want a linear model
se=FALSE to avoid plotting confidence intervals

Related

Plot rectangles using geom_rect with continous x-axis and discrete values in y-axis (R)

I am trying to plot rectangles in the x-axis for different classes in the y-axis. I want to do this with geom_rect, but I don't want to use y_min and y_max since I want these to be determined by the classes (i.e. factors) I have in my data.
I managed to get the plot I want changing the breaks and the tick labels manually, but I am sure there must be a better way to do this.
Small toy example:
data <- data.frame(x_start = c(0, 2, 4, 6),
x_end = c(1, 3, 5, 7),
y_start = c(0, 0, 2, 2),
y_end = c(1, 1, 3, 3),
info = c("x", "x", "y", "y"))
Original plot:
ggplot(data ,aes(xmin=x_start, xmax=x_end, ymin=y_start, ymax=y_end, fill=info)) + geom_rect()
Plot that I want:
ggplot(data ,aes(xmin=x_start, xmax=x_end, ymin=y_start, ymax=y_end, fill=info)) + geom_rect() +
scale_y_continuous(breaks = c(0.5,2.5), labels = c("x","y"))
library(dplyr)
y_lab <- data %>%
distinct(y_end, y_start, info) %>%
mutate(y_mid = (y_end + y_start)/2)
ggplot(data, aes(xmin=x_start, xmax=x_end, ymin=y_start, ymax=y_end, fill=info)) +
geom_rect() +
scale_y_continuous(breaks = y_lab$y_mid, labels = y_lab$info)
Or using geom_tile:
ggplot(data, aes(x = (x_start + x_end)/2, y = info, fill=info, width = 1)) +
geom_tile()

Sparse Functional Data Plot

I'm wondering how to reproduce the following figure using R.
The data used in the figure are sparse functional data of bone mineral density. Basically each participant's bone mineral level is observed a few times during the experiment. But the observation times and number of observations for each participant are different.
The figure is from article 'Principal component models for sparse functional data'.
You can find it here Principal component models for sparse functional data or Principal component models for sparse functional data
You could reproduce the figure with made-up data like this:
library(ggplot2)
# Create sample data
set.seed(8) # Makes data reproducible
ages <- runif(40, 8, 24)
df <- do.call(rbind, lapply(seq_along(ages), function(x) {
age <- ages[x] + cumsum(runif(sample(2:5, 1), 1, 2))
y <- (tanh((age - 10)/pi - pi/2) + 2.5)/3
y <- y + rnorm(1, 0, 0.1)
y <- y + cumsum(rnorm(length(y), 0, 0.02))
data.frame(ID = x, age = age, BMD = y)
}))
# Draw plot
ggplot(df, aes(x = age, y = BMD)) +
geom_path(aes(group = ID), color = 'gray70', na.rm = TRUE) +
geom_point(color = 'gray70', na.rm = TRUE) +
geom_smooth(color = 'black', se = FALSE, formula =y ~ s(x, bs = "cs"),
method = 'gam', na.rm = TRUE) +
theme_classic(base_size = 16) +
scale_x_continuous(limits = c(8, 28)) +
labs(y = 'Spinal Bone Density', x = 'Age') +
theme(panel.border = element_rect(fill = NA))
Without knowing your own data structure however, it's difficult to say how applicable you will find this to your own use case.
You can do this in ggplot2 as long as you have data in long format and with a grouping variable such as id in my example:
dat <- tibble::tribble(
~id, ~age, ~bone_dens,
1, 10, 0.6,
1, 15, 0.8,
1, 19, 1.12,
2, 11, 0.7,
2, 18, 1.1,
3, 16, 1.1,
3, 18, 1.2,
3, 25, 1.0)
You first plot the dots with geom_point(), then you add the lines that join dots with the same id with geom_line():
dat |>
ggplot(aes(x = age, y = bone_dens)) +
geom_point() +
geom_line(aes(group = id))
Output will look like this - you'll be able to customise it like any other ggplot.

How to combine ggplot and plotly graph?

I prepare a data.frame as follow;
#create dataframe
df <-data.frame(x = c(rnorm(300, 3, 2.5), rnorm(150, 7, 2)), # create random data
y = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
z = c(rnorm(300, 6, 2.5), rnorm(150, 2, 2)),
group = c(rep('A', 300), rep('B', 150))) # add two groups
The relationship between y and x is like below, when showing by ggplot2;
#for y-x correlation by group with fit curve
gg <- ggplot(df, aes(x=x, y=y)) +
stat_density_2d(geom = "polygon", aes(alpha = ..level..,fill=group))+
geom_smooth(method = 'loess')
print(gg)
Then, I created plot_ly 3D figure as follows;
#plot_ly 3D plot
s = interp(x = df$x, y = df$y, z = df$z,duplicate = "mean") # prepare for plot_ly plot
p <- plot_ly(x = s$x, y = s$y, z = s$z,colorscale = 'Jet')%>% # plot_ly
add_surface()
, which created a graph as below;
Then, here is the question.
I would like to add the first ggplot2 figure at the bottom of the second plot_ly figure, like as below;
Is there any way (function or package) to accomplish this with R?

Using ggplot in R to create a line graph for two different groups

I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5).
I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.
Here's a reproducible example:
#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)
#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
geom_point(size = 4, shape = 21) +
stat_summary(fun.y = mean, colour = "red", geom = "line")
The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:
If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.
I'd like it to look like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.
Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.
library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
Condition = rep(c(0, 1), each = 15),
time = c(1, 2, 3, 4, 5),
eat = eat
)
df$Condition <- as.factor(df$Condition)
ggplot(df, aes(x = time, y = eat, fill = Condition)) +
geom_point(size = 4, shape = 21, colour = "black") +
stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
stat_summary(
mapping = aes(colour = Condition),
geom = "line",
fun.y = mean,
show.legend = FALSE
)
Created on 2018-07-09 by the reprex package (v0.2.0).
Here's my best guess at what you want:
# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
geom_smooth(
aes(fill = Condition, linetype = Condition),
method = "lm",
level = 0.65,
color = "black",
size = 0.3
) +
geom_point(aes(color = Condition))
Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.
I think this code will get you most of the way there
library(tidyverse)
eat <- sample(1:7, size = 30, replace = TRUE)
tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
Condition = factor(rep(c(0, 1), each = 15)),
time = factor(rep(c(1, 2, 3, 4, 5), 6)),
eat = eat) %>%
ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
geom_point(size = 4, shape = 21) +
geom_smooth()
geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.

Mix color and fill aesthetics in ggplot

I wonder if there is the possibility to change the fill main colour according to a categorical variable
Here is a reproducible example
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = c(rep('a', times = 10),
rep('b', times = 10)),
val = rep(1:10, times = 2))
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(color = grp,
fill = val,
size = val))
Of course it is easy to change the circle colour/shape, according to the variable grp, but I'd like to have the a group in shades of red and the b group in shades of blue.
I also thought about using facets, but don't know if the fill gradient can be changed for the two panels.
Anyone knows if that can be done, without gridExtra?
Thanks!
I think there are two ways to do this. The first is using the alpha aesthetic for your val column. This is a quick and easy way to accomplish your goal but may not be exactly what you want:
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(alpha=val,
fill = grp,
size = val)) + theme_minimal()
The second way would be to do something similar to this post: Vary the color gradient on a scatter plot created with ggplot2. I edited the code slightly so its not a range from white to your color of interest but from a lighter color to a darker color. This requires a little bit of work and using the scale_fill_identity function which basically takes a variable that has the colors you want and maps them directly to each point (so it doesn't do any scaling).
This code is:
#Rescale val to [0,1]
df$scaled_val <- rescale(df$val)
low_cols <- c("firebrick1","deepskyblue")
high_cols <- c("darkred","deepskyblue4")
df$col <- ddply(df, .(grp), function(x)
data.frame(col=apply(colorRamp(c(low_cols[as.numeric(x$grp)[1]], high_cols[as.numeric(x$grp)[1]]))(x$scaled_val),
1,function(x)rgb(x[1],x[2],x[3], max=255)))
)$col
df
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(
fill = col,
size = val)) + theme_minimal() +scale_fill_identity()
Thanks to this other post I found a way to visualize the fill bar in the legend, even though that wasn't what I meant to do.
Here's the ouptup
And the code
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = factor(c(rep('a', times = 10),
rep('b', times = 10)),
levels = c('a', 'b')),
val = rep(1:10, times = 2)) %>%
group_by(grp) %>%
mutate(scaledVal = rescale(val)) %>%
ungroup %>%
mutate(scaledValOffSet = scaledVal + 100*(as.integer(grp) - 1))
scalerange <- range(df$scaledVal)
gradientends <- scalerange + rep(c(0,100,200), each=2)
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(fill = scaledValOffSet,
size = val)) +
scale_fill_gradientn(colours = c('white',
'darkred',
'white',
'deepskyblue4'),
values = rescale(gradientends))
Basically one should rescale fill values (e.g. between 0 and 1) and separate them using another order of magnitude, provided by the categorical variable grp.
This is not what I wanted though: the snippet can be improved, of course, to make the whole thing less manual, but still lacks the simple usual discrete fill legend.

Resources