Plot lines in ggplot from a list of dataframes - r

I have a list of data.frames:
samplelist = list(a = data.frame(x = c(1:10), y=rnorm(10),
b= data.frame(x=c(5:10), y = rnorm(5),
c = data.frame(x=c(2:12), y=rnorm(10))
I'd like to structure a ggplot of the following format:
ggplot()+
geom_line(data=samplelist[[1]], aes(x,y))+
geom_line(data=samplelist[[2]], aes(x,y))+
geom_line(data=samplelist[[3]], aes(x,y))
But that isn't super automated. Does anyone have a suggestion for how to address this?
Thanks!

ggplot works most efficiently with data in "long" format. In this case, that means stacking your three data frames into a single data frame with an extra column added to identify the source data frame. In that format, you need only one call to geom_line, while the new column identifying the source data frame can be used as a colour aesthetic, resulting in a different line for each source data frame. The dplyr function bind_rows allows you to stack the data frames on the fly, within the call to ggplot.
library(dplyr)
library(ggplot2)
samplelist = list(a = data.frame(x=c(1:10), y=rnorm(10)),
b = data.frame(x=c(5:10), y=rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
ggplot(bind_rows(samplelist, .id="df"), aes(x, y, colour=df)) +
geom_line()
I assumed above that you would want each line to be a different color and for there to be a legend showing the color mapping. However, if, for some reason, you just want three black lines and no legend, just change colour=df to group=df.

Or you could use lapply.
library(ggplot2)
samplelist = list(a = data.frame(x = c(1:10), y=rnorm(10)),
b= data.frame(x=c(5:10), y = rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
p <- ggplot()
plot <- function(df){
p <<- p + geom_line(data=df, aes(x,y))
}
lapply(samplelist, plot)
p

This will work -
library(ggplot2)
samplelist <- list(a = data.frame(x = c(1:10), y=rnorm(10)),
b = data.frame(x=c(5:10), y = rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
p <- ggplot()
for (i in 1:3) p <- p + geom_line(data=samplelist[[i]], aes(x,y))
p

Reduce is another option to add things iteratively,
library(ggplot2)
samplelist = list(a = data.frame(x = c(1:10), y=rnorm(10)),
b= data.frame(x=c(5:10), y = rnorm(6)),
c = data.frame(x=c(2:12), y=rnorm(11)))
pl <- Reduce(f = function(p, d) p + geom_line(data=d, aes(x,y)),
x = samplelist, init = ggplot(), accumulate = TRUE)
gridExtra::grid.arrange(grobs = pl)

Related

Convert an vector to a specific color for `plot()`

Below is a minimal working example.
library(ggplot2)
set.seed(926)
df <- data.frame(x. = rnorm(100),
y. = rnorm(100),
color. = rnorm(100))
library(ggplot2)
p <- ggplot(df, aes(x = x., y = y., color = color.)) +
geom_point() +
viridis::scale_color_viridis(option = "C")
p
p_build <- ggplot_build(p)
# The desired vector is below somehow I feel there must have an easier way to get it
p_build[["data"]][[1]][["colour"]]
df$color_converted <- p_build[["data"]][[1]][["colour"]]
Specifically, I like to use viridis::viridis(option = "C") color scheme. Could anyone help with this? Thanks.
*Modify*
Sorry, my question wasn't clear enough. Let me put it this way, I couldn't utilize ggplot2 package and had to use the pure plot() function that comes with R, in my specific project.
My goal is to try to reproduce the above plot with the base R package.
plot(df$x., df$y., color = df$color_converted)
If possible, could anyone also direct me on how to customize a gradient legend that is similar to ggplot2, with base legend()?
First of all you can assign the colors to a vector called "color2" and use scale_colour_gradientn to assign these colors to your plot. The problem is that the colors are not sorted right so you have to do that first by using the TSP package. In the output below you can see that you can recreate the plot without using scale_color_viridis:
set.seed(926)
df <- data.frame(x. = rnorm(100),
y. = rnorm(100),
color. = rnorm(100))
library(ggplot2)
library(TSP)
p <- ggplot(df, aes(x = x., y = y., color = color.)) +
geom_point() +
viridis::scale_color_viridis(option = "C")
p
p_build <- ggplot_build(p)
# The desired vector is below somehow I feel there must have an easier way to get it
color2 <- p_build[["data"]][[1]][["colour"]]
rgb <- col2rgb(color2)
lab <- convertColor(t(rgb), 'sRGB', 'Lab')
ordered_cols2 <- color2[order(lab[, 'L'])]
ggplot(df, aes(x = x., y = y.)) +
geom_point(aes(colour = color.)) +
scale_colour_gradientn(colours = ordered_cols2, guide = "colourbar")
#viridis::scale_color_viridis(option = "C")
Created on 2022-08-17 with reprex v2.0.2
Base r
You can use the following code:
color2 <- p_build[["data"]][[1]][["colour"]]
rgb <- col2rgb(color2)
lab <- convertColor(t(rgb), 'sRGB', 'Lab')
ordered_cols2 <- color2[order(lab[, 'L'])]
layout(matrix(1:2,ncol=2), width = c(2,1),height = c(1,1))
plot(df$x., df$y., col = df$color_converted)
legend_image <- as.raster(matrix(ordered_cols2, ncol=1))
plot(c(0,2),c(0,1),type = 'n', axes = F,xlab = '', ylab = '', main = 'legend title')
text(x=1.5, y = seq(0,1,l=5), labels = seq(-3,3,l=5))
rasterImage(legend_image, 0, 0, 1,1)
Output:

geom_density plots with nested vectors

I have a data frame with a nested vector in one column. Any ideas how to ggplot a geom_density using the values from the nested vector?
If I use pivot_longer the entire data frame, I get 25 million rows, so I'd prefer to avoid that if possible.
library(ggplot2)
df = data.frame(a = rep(letters[1:5],length.out = 100), b = sample(LETTERS, 100, replace = T))
df[["c"]] = purrr::map(1:100, function(x) rnorm(100))
# works but too heavy for the actual implementation
ggplot(tidyr::unnest(df, c), aes(c, group = a)) + geom_density() + facet_wrap(vars(b))
# doesn't work
ggplot(df, aes(c, group = a)) + geom_density() + facet_wrap(vars(b))
Different solution: Prepare each plot separately and rearrange your plots afterwards using gridExtra package.
library(ggplot2)
df = data.frame(a = rep(letters[1:5],length.out = 100), b = sample(LETTERS, 100, replace = T))
df[["c"]] = purrr::map(1:100, function(x) rnorm(100))
lst_plot <- lapply(sort(unique(df$b)), function(x){
data <- df[df$b == x,
data <- purrr::map_dfr(seq(length(data$a)), ~ data.frame(a = data$a[.x], c = data$c[.x][[1]]))
gg <- ggplot(data) +
geom_density(aes(c, group = a)) +
ylab(NULL)
return(gg)
})
gridExtra::grid.arrange(grobs = lst_plot, ncol = 6, left = "density")
To be honest, I'm not sure how well this works with your massive dataset...

How to print several lines into a graph with info from three files with ggplots in R?

I'm trying plots several lines into a graph. My info is in three csv. files with two columns for each one 'tiempo' and 'costo':
Example a .csv file:
tiempo;costo
0;0
1;0
2;0
3;0
4;0
5;0
...
I builded an scripts and I gets good results but in three differents graphs:
library(ggplot2)
library(reshape2)
library(ggpubr)
A <- read.csv(file='m-r1-g.csv',TRUE,";")
B <- read.csv(file='m-r1-w.csv',TRUE,";")
C <- read.csv(file='m-r1-h.csv',TRUE,";")
i <- ggplot(A, aes(x = tiempo, y = costo)) + geom_line(aes(colour = costo))
j <- ggplot(B, aes(x = tiempo, y = costo)) + geom_line(aes(colour = costo))
k <- ggplot(C, aes(x = tiempo, y = costo)) + geom_line(aes(colour = costo))
ggarrange(i, j, k + rremove("x.text"),
labels = c("A", "B", "C"),
ncol = 2, nrow = 2)
How do I can join these lines into a graphs with differents lines color?
thanks.
Here is a possible solution - I created simple data frames as I don't have your csvs
set.seed(123)
A <- data.frame(tiempo = seq(1,8), costo = rep(0,8))
B <- data.frame(tiempo = seq(1,8), costo = sample(seq(1,20),8))
C <- data.frame(tiempo = seq(1,8), costo = sample(seq(1,20),8))
Make into one data frame - could do with other methods
but this is nice because it adds a column to id the original frames
df <- bind_rows(list(A = A, B = B, C = C), .id = "id")
Here is the simple ggplot
ggplot(df, aes(x = tiempo, y = costo, color = id)) + geom_line()

How to draw all plots of a data frame in R?

I have a data frame representing a benchmark and I would like to produce all possible comparison plots. Here is a small example of data frame that represents my problem.
df = data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3), "B"=c(1,1,1,2,2,2,1,1,1,2,2,2), "C"=c(1,1,1,1,1,1,2,2,2,2,2,2), "D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
I want to produce the following plots.
D in function of A, when B and C are fixed. This would produce four (4) different lines, one for each couple (B,C).
D in function of B, when A and C are fixed. This would also produce six (6) different lines.
D in function of C, when A and B are fixed. Again, six (6) different lines.
Is there a simple way to this in R ?
For now, I don't mind that they are in different plots or not. Any representation would be ok at this point. I only need all plots to be produced, since I don't know how we want to display our results.
Edit
I forgot to specify in my example that the columns of the data frame do not have the same factor levels. Here is a more complete example.
df = data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3),
"B"=c("[0,1]","[0,1]","[0,1]","[1,3]","[1,3]","[1,3]","[0,1]","[0,1]","[0,1]","[1,3]","[1,3]","[1,3]"),
"C"=c(1,1,1,1,1,1,2,2,2,2,2,2),
"D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
Using #mattek's solution, I have the following plots.
This is great. If I could remove the extra values from the x-axis and keep only the corresponding factors for each column, that would be perfect.
library(ggplot2)
library(reshape2)
First, we melt your table:
df.plot = melt(df,
measure.vars = c('A', 'B', 'C'),
id.vars = 'D',
variable.name = 'var.name',
value.name = 'val.abc')
Then, we add groupings column:
df.plot$grouping = rep(1:4, 3, each = 3)
And we are ready to plot:
ggplot(df.plot, aes(x = val.abc, y = D, group = as.factor(grouping))) +
facet_wrap(~ var.name) +
geom_line(aes(colour = var.name)) +
geom_point(aes(colour = var.name))
Using facet_wrap(~ var.name, scale = "free_x") instead would get rid of non-existant factors in every facet.
Possible answer for exploratory analysis that will show correlation between variables and also a smoothing line:
df = data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3), "B"=c(1,1,1,2,2,2,1,1,1,2,2,2), "C"=c(1,1,1,1,1,1,2,2,2,2,2,2), "D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- cor(x, y)
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(df, lower.panel = panel.smooth, upper.panel = panel.cor)
Another option comes from ggplot using the GGaly package:
library(ggplot2)
library(GGally)
this helps a lot if some of your data is a factor, using your data, lets assume that A is a factor variables
df = data.frame("A"=as.factor(c(1,2,3,1,2,3,1,2,3,1,2,3)), "B"=c(1,1,1,2,2,2,1,1,1,2,2,2), "C"=c(1,1,1,1,1,1,2,2,2,2,2,2), "D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
then ggpairs would make boxplots instead of points, you can choose there
ggpairs(df)
Here's what I would do, I would create three new variables which capture the different combinations of A, B, and C fixed:
library(dplyr)
library(ggplot2)
dat <- data.frame("A"=c(1,2,3,1,2,3,1,2,3,1,2,3),
"B"=c(1,1,1,2,2,2,1,1,1,2,2,2),
"C"=c(1,1,1,1,1,1,2,2,2,2,2,2),
"D"=c(4,5,6,7,8,9,10,11,12,13,14,15))
# add variables for A-B, A-C, B-C
dat <- dat %>%
mutate('A - B' = paste(A, '-', B),
'A - C' = paste(A, '-', C),
'B - C' = paste(B, '-', C))
Then we make the plots:
ggplot(dat, aes(y = D))+
geom_line(aes(x = C, colour = `A - B`))
ggplot(dat, aes(y = D))+
geom_line(aes(x = B, colour = `A - C`))
ggplot(dat, aes(y = D))+
geom_line(aes(x = A, colour = `B - C`))

Create grid and color cells with average values of scatterplot using ggplot2

Given a numeric dataset {(x_i, y_i, z_i)} with N points, one can create a scatterplot by drawing a point P_i=(x_i,y_i) for each i=1,...,N and color each point with an intensity depending on the value of z_i.
library(ggplot2)
N = 1000;
dfA = data.frame(runif(N), runif(N), runif(N))
dfB = data.frame(runif(N), runif(N), runif(N))
names(dfA) = c("x", "y", "z")
names(dfB) = c("x", "y", "z")
PlotA <- ggplot(data = dfA, aes(x = x, y = y)) + geom_point(aes(colour = z));
PlotB <- ggplot(data = dfB, aes(x = x, y = y)) + geom_point(aes(colour = z));
Assume I have created these scatterplots. What I would like to do for each dataset is to divide the plane with a grid (rectangular, hexagonal, triangular, ... doesn't matter) and color each cell of the grid with the average intensity of all the points that fall within the cell.
Additionally, suppose I have created two such plots PlotA and PlotB (as above) for two different datasets dfA and dfB. Let c_i^k be the i-th cell of plot k. I want to create a third plot such that c_i^3 = c_i^1 * c_i^2 for every i.
Thank you.
EDIT: Minimum example
Dividing the plane and calculating summaries for rectangles is pretty straight-forward with the stat_summary2d function. First, i'm going to create explicit breaks rather than letting ggplot choose them so they will be the exact same for both plots
bb<-seq(0,1,length.out=10+1)
breaks<-list(x=bb, y=bb)
p1 <- ggplot(data = dfA, aes(x = x, y = y, z=z)) +
stat_summary2d(fun=mean, breaks=breaks) + ggtitle("A");
p2 <- ggplot(data = dfB, aes(x = x, y = y, z=z)) +
stat_summary2d(fun=mean, breaks=breaks) + ggtitle("B");
Then to get the different is a bit messier, but we can extract the data from the plots we've already created and combine them
#get data
d1 <- ggplot_build(p1)$data[[1]][, 2:4]
d2 <- ggplot_build(p2)$data[[1]][, 2:4]
mm <- merge(d1, d2, by=c("xbin","ybin"))
#turn factor back into numeric values
mids <- diff(bb)/2+bb[-length(bb)]
#plot difference
ggplot(mm, aes(x=mids[xbin], y=mids[ybin], fill=value.x-value.y)) +
geom_tile() + scale_fill_gradient2(name="diff") + labs(x="x",y="y")

Resources