I have a dataframe of 12 variables and I'd like to plot exactly one variable against others using ggplot's geom_point(). Wouldn't want to do it manually so i need to loop through the variables making plots.
For example, I have a df like this (simplified to 4 variables for readability):
> head(df)
letters value1 value2 value3
A 1 0 10
B 3 1 9
C 6 0 8
D 76 0 7
E 13 1 6
F 58 1 5
And I'd like to produce two plots where value1 is plotted over value2 and value3.
I've tried this:
plts <- vector()
for (i in names(df)) {
p <- ggplot(df, aes(x=value1, y=i, fill=letters)) + geom_point())
plts <- append(plts, p)
}
but it treats the values 2 & 3 different than the value 1 and produces something like this (e.g., value1 over value3):
Plot of value1 over value3
What should be done to improve this and achieve the goal of having the plots like this:
ggplot(df, aes(x=value1, y=value3, fill=letters)) + geom_point()
Produced without a loop
I think using aes_string() instead of aes will give you what you want. Your problem is caused by the tidyverse's use of non-standard evaluation (NSE).
lapply(
names(df),
function(y) {
df %>% ggplot() + geom_point(aes_string(x="value1", y=y, colour="letters"))
}
)
giving, for example
You can customise the first argument to lapply to select the variables you need.
That said, I think it would be easier and more robust to reformat your data frame to a more helpful layout and then create your plots...
For example,
df %>%
pivot_longer(
cols=c("value2", "value3"),
names_to="Variable",
values_to="y"
) %>%
ggplot() +
geom_point(aes(x=value1, y=y, colour=letters)) +
facet_grid(rows=vars(Variable))
Giving
By the way, using colour=letters is probably more informative than fill=letters when using geom_point.
Related
I am struggling with creating multiple ggplots using a loop.
I use data in the following format:
a <- c(1,2,3,4)
b <- c(5,6,7,8)
c <- c(9,10,11,12)
d <- c(13,14,15,16)
time <- c(1,2,3,4)
data <- cbind(a,b,c,d,time)
What I want to create is a list of plots that plot one of the letters against the variable time.
Which I tried in the following way:
library(ggplot2)
library(gridExtra)
plots <- list()
for (i in 1:4){
plots[[i]] <- ggplot() + geom_line(data = data, aes(x = time, y = data[,i]))
}
grid.arrange(plots[[1]], plots[[2]], plots[[3]], plots[[4]])
This results in four times the fourth plot. How do I index this correctly in a way that creates the four intended plots?
(Up front: the reason that your plots are all identical is due to ggplot's "lazy" evaluation of code. See my #2 below, where I identify that the data[,i] is evaluated when you try to plot the data, at which point i is 4, the last pass in the for loop.)
It's generally preferred/recommended to use data.frames instead of matrices or vectors (as you're doing here). It gives a bit more power and control.
data <- data.frame(a,b,c,d,time)
Also, I tend to prefer lapply to for-loops and lists, for various (some subjective) reasons. Ultimately, the issue you're having is that ggplot2 is evaluating the data lazily, so plots is a list with four plots that make reference to i ... and that is realized when you try to plot them all, at which point i is 4 (from the last pass through the loop). One benefit of using lapply is that the i referenced is a local-only (inside of the anon-func) version of i that is preserved as you would expect.
plots <- lapply(names(data)[1:4],
function(nm) ggplot(data, aes(x = time, y = .data[[nm]])) + geom_line())
gridExtra::grid.arrange(plots[[1]], plots[[2]])
I also prefer patchwork to gridExtra, mostly because it makes more-customized layouts a bit more intuitive, plus adds functionality such as axis-alignment, shared legends, shared titles, etc. (None of those other features are demonstrated here.)
library(patchwork)
plots[[1]] / plots[[2]] # same plot
plots[[1]] + plots[[2]] # side-by-side instead of top/bottom
(plots[[1]] + plots[[2]]) / (plots[[3]] + plots[[4]]) # grid
Ultimately, though, I suggest that facets can be useful and very powerful. For this, we need to melt/pivot the data into a "long format" so that the column names a-b are actually in one column.
reshape2::melt(data, id.vars = "time") |>
ggplot(aes(time, value)) +
geom_line() +
facet_grid(variable ~ ., scales = "free_y")
I assumed the preference for independent (free) y-scales, ergo the scales="free_y". Try it without if you want to see the options. (There are also scales="free_x" and scales="free" (both).)
To see what I mean by "long" format:
reshape2::melt(data, id.vars = "time")
# time variable value
# 1 1 a 1
# 2 2 a 2
# 3 3 a 3
# 4 4 a 4
# 5 1 b 5
# 6 2 b 6
# 7 3 b 7
# 8 4 b 8
# 9 1 c 9
# 10 2 c 10
# 11 3 c 11
# 12 4 c 12
# 13 1 d 13
# 14 2 d 14
# 15 3 d 15
# 16 4 d 16
This can also be done with tidyr::pivot_longer(data, -time), albeit the variable name is now name. For this use, there is no advantage to reshape2::melt or tidyr::pivot_longer; there are opportunities for significantly more complex pivoting in the latter, not relevant with this data.
Data
data <- structure(list(a = c(1, 2, 3, 4), b = c(5, 6, 7, 8), c = c(9, 10, 11, 12), d = c(13, 14, 15, 16), time = c(1, 2, 3, 4)), class = "data.frame", row.names = c(NA, -4L))
I have 5 tibbles for successive years 2016 to 2020. I am doing the same thing to each of the sets of tibbles so I want to use a for-loop rather than copying and pasting the same code 5 times. I have named the tibbles in the following way with the final number indicating the year of the data:
alpha_20
beta_20
gamma_20
delta_20
epsilon_20
My thought was to do this:
for (i in 16:20) {
alpha_a_[i]<-alpha_[i]%>%
mutate(NEWVAR=1+OLDVAR)%>%
select(NEWVAR, VAR2, VAR3)
beta_a_[i]<-beta_[i]%>%
group_by(PIN)%>%
summarize(sum(VAR1))
# and so on for all 5 tibbles
}
But I think I am not calling the tibble correctly because the code breaks at the first mutate. I can't seem to figure out how to instruct it to take the tibbles ending in "16" and then the tibbles ending in "17" and so on.
There's a couple things going on here. First, in order to actually call the name of your tibble, you're going to want use the get() function on the string name. Try typing "alpha_20" vs. get("alpha_20") in the command line. However, the way you have it coded now as alpha_[i] won't generate the string you want. To generate the name of your tibble as a string, you're going to need to do something like get(paste0("alpha_", i)).
That's all just to get the tibble you want. To edit/save it within the for loop, look into the assign() command (see Change variable name in for loop using R). So all in all, your code will look something like this:
> require(tidyverse)
> alpha_20 <- data.frame(x = 1:5, y = 6:10)
> alpha_20
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
>
> for (i in 20) {
+ assign(paste0('alpha_', i),
+ get(paste0('alpha_', i)) %>%
+ mutate(z = 11:15))
+ }
> alpha_20
x y z
1 1 6 11
2 2 7 12
3 3 8 13
4 4 9 14
5 5 10 15
You can try a combination of get, assign and paste.
for (i in 16:20) {
alpha <- get(paste("alpha_", i, sep = "")) %>%
mutate(NEWVAR = 1 + OLDVAR) %>%
select(NEWVAR, VAR2, VAR3)
assign(paste("alpha_a_", i, sep = ""), alpha)
beta <- get(paste("beta_", i, sep = "")) %>%
group_by(PIN) %>%
summarize(sum(VAR1))
assign(paste("beta_a_", i, sep = ""), beta)
# and so on for all 5 tibbles
}
I am have made a series of lists that contain ggplots. I would like to evaluate the objects in order to bite the plotting time early. I have gathered the variable names that I would like to evaluate in a string vector. Additionally, I want to keep the variable names before.
The solution I tried was to lapply the eval(as.symbol("myvarstring")). To my knowledge, it evaluates the variable without storing the evaluated expression.
Adding as.symbol("myvarstring") <- eval(as.symbol("myvarstring")) does not work for me.
Below is a minimal reproducible example of my failed solution.
library(tidyverse)
tbl <- tibble(
x = 1:10,
y = 1:10
)
g <- ggplot(tbl, aes(x, y)) + geom_point()
my_plot_list1 <- list(g,g,g,g,g,g)
my_plot_list2 <- list(g,g,g,g,g,g)
my_plot_list3 <- list(g,g,g,g,g,g)
my_vars <- c(
"my_plot_list1",
"my_plot_list2",
"my_plot_list3"
)
lapply(my_vars, FUN = function(x) {as.symbol(x) <- eval(as.symbol(x))})
How would you accomplish this task?
Thank you
EDIT:
These graphs will ultimately be displayed through an rmarkdown script. The graphs will be loaded in the rscript. My graphs take an enormous amount of time to plot. If I could save an environment with "rendered" graphs, it would shorten the rmarkdown runtime. Shortening runtime of the rmarkdown runtime is the ultimate goal.
Why don't you just store the lists in a list, rather than relying on tricks to get them from the global environment?
library(tidyverse)
tbl <- tibble(
x = 1:10,
y = 1:10
)
g <- ggplot(tbl, aes(x, y)) + geom_point()
my_plot_list1 <- list(g,g,g,g,g,g)
my_plot_list2 <- list(g,g,g,g,g,g)
my_plot_list3 <- list(g,g,g,g,g,g)
my_vars <- list(
my_plot_list1,
my_plot_list2,
my_plot_list3
)
lapply(my_vars, function(x) lapply(x, function(y) y))
If you want to ensure that the plots print (eg, if you were to call this code in a function or script) then replace the inner function(y) y with function(y) print(y)
EDIT: I believe I misunderstood.
If you want to assign variables to a programmatically generated name, you would do:
x <- "mygeneratedname"
assign(x, g, envir = .GlobalEnv)
The get function in base R will retrieve the object from the character string. For example:
get("tbl")
# # A tibble: 10 x 2
# x y
# <int> <int>
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
# 6 6 6
# 7 7 7
# 8 8 8
# 9 9 9
# 10 10 10
So in your example:
lapply(my_vars, FUN = function(x) { get(x)})
should work.
I believe there are better approaches depending on the next steps of what you want to do with the plots. Consider if this the best way to handle the data. Can a list of lists work? Store the lists in a vector?
Okay guys I need a hand with using ggplot2 in a loop over a list (with lapply) to obtain a separate chart for each element of the list.
I'm new to R, so forgive the noob-ness.
Say I have a dataframe as such:
df <- cbind.data.frame(Time = c(1,2,3,4,1,2,3,4),
Person = c("A","A","A","A","B","B","B","B"),
Quantity = c(1,4,6,8,1,6,2,10))
df <- data.table(df)
> df
Time Person Quantity
1: 1 A 1
2: 2 A 4
3: 3 A 6
4: 4 A 8
5: 1 B 1
6: 2 B 6
7: 3 B 2
8: 4 B 10
I want to produce a chart for person A and person B separately.
At the moment I have my function set up like this:
Persons = c("A","B")
PersonList = as.list(Persons)
MyFunction <- function(x){
SubsetPersons = Persons[!(Persons %in% x)]
df <- df[!(df$Person %in% SubsetPersons)]
g <- ggplot(data=df, aes(x=Time, y=Quantity))
g <- g + geom_line()
print(g)
}
Results <- lapply(Persons, MyFunction)
But I'm not sure how to save the charts with different names corresponding to the list elements?
NOTE: I know this function may seem an odd way to solve this problem, but for the larger more complex problem I have at hand it is required.
I am simply trying to figure out how to save different names for the charts in the list!
Thanks in advance!
Persons = c("A","B")
MyFunction <- function(x){
dfs <- df[df$Person == x,]
g <- ggplot(data=dfs, aes(x=Time, y=Quantity))
g <- g + geom_line()
#have added extra bracket after ".PNG"
ggsave(paste0("plot_for_person", x, ".PNG"), g)
print(g)
return(g)
}
Results <- lapply(Persons, MyFunction)
This may be silly, but I am not getting how to do it,
What I want?
My function goes like this.
plot_exp <-
function(i){
dat <- subset(dat.frame,En == i )
ggplot(dat,aes(x=hours, y=variable, fill = Mn)) +
geom_point(aes(x=hours, y=variable, fill = Mn),size = 3,color = Mi) + geom_smooth(stat= "smooth" , alpha = I(0.01))
}
ll <- lapply(seq_len(EXP), plot_exp)
do.call(grid.arrange, ll)
and I have two variables
Var1, Var2 (Which will be passed through the command line, so cant group it using subset)
I want to run the above function for var1 and var2, my function produces two plots for each complete execution. So now it should produce 2 plots for var1 and two plots for var2.
I just want to know how can I apply the logic here to handle what I want? Thank you
This is what data.frame looks like
En Mn Hours var1 var2
1 1 1 0.1023488 0.6534707
1 1 2 0.1254325 0.5423215
1 1 3 0.1523245 0.2542354
1 2 1 0.1225425 0.2154533
1 2 2 0.1452354 0.4521255
1 2 3 0.1853324 0.2545545
2 1 1 0.1452369 0.2321542
2 1 2 0.1241241 0.2525212
2 1 3 0.0542232 0.2626214
2 2 1 0.8542154 0.2154522
2 2 2 0.0215420 0.5245125
2 2 3 0.2541254 0.2542512
I will table the above data.frame as input and I want to run my function once for var1 and produce two plots and then again run the same function for var2 and produce two more plots, then combine all of then using grid.arrange.
The variable values I have to read from the command line and then I have to do the following to get the required data out of main data frame.
subset((aggregate(cbind(variable1,variable2)~En+Mn+Hours,a, FUN=mean)))
after I read from the commandline and store them inside the "variable1" and "variable2" if I directly call them in the above command its not working. what should I do to enter those two variable values inside the command line.
I made a few changes and ran it on your sample data. Basically i just needed to use aes_string rather than aes to allow for a variable with a column name.
myvars<-c("var1", "var2")
plot_exp <- function(i, plotvar) {
dat <- subset(dat.frame,En == i )
ggplot(dat,aes_string(x="Hours", y=plotvar, fill = "Mn")) +
geom_point(aes(color=Mn), size = 3) +
geom_smooth(stat= "smooth" , alpha = I(0.01), method="loess")
}
ll <- do.call(Map, c(plot_exp, expand.grid(i=1:2, plotvar=myvars, stringsAsFactors=F)))
do.call(grid.arrange, ll)
(I'm not sure why the colors of the legends are messed up in the image, they look fine on screen)
For subsetting, use
myvars <- c("var1", "var2")
subset(a[,myvars], a[,c("En","Mn","Hours")], FUN=mean)