Store multiple plots into a list using a function - r

I am trying to store multiple plots produced by ggplot2 into a list.
I am attempting to use the list function suggested in a previous thread, however I am having difficulty creating my own function to meet my needs.
First, I split a dataframe based on a factor into a list with the following code:
heatlist.germ <- split(heatlist.germ, f=as.factor(heatlist.germ$plot))
Afterwhich, I attempt to create a list function that I can later use lapply with.
plot_data_fcn <- function (heatlist.germ) {
ggplot(heatlist.germ[[i]], aes(x=posX, y=posY, fill=germ_bin)) +
geom_tile(aes(fill=germ_bin)) +
geom_text(aes(label=germ_bin)) +
scale_fill_gradient(low = "gray90", high="darkolivegreen4") +
ggtitle(plot) +
scale_x_continuous("Position X", breaks=seq(1,30)) +
scale_y_continuous("Position Y (REVERSED)", breaks=seq(1,20))
}
heatlist.test <- lapply(heatlist.germ[[i]], plot_data_fcn)
Two main things I am trying to accomplish:
Store the 12 ggplots (hence 12 factors of plot) in a list.
Create a title called "Plot [i] Germination".
Any help would be appreciated.

I don't have your data, so I'll simplify the plotting mechanism.
The first problem is that you should not use your [[i]] referencing in your function. Just have your function deal with data as-is, it really doesn't know that its argument is (in another environment) an element with a list. It knows just the object itself.
# a simple plot function
myfunc <- function(x) ggplot(x, aes_string(names(x)[1], names(x)[2])) + geom_point()
# a list of frames, nothing fancy here
datalist <- replicate(3, mtcars, simplify = FALSE)
# just call it ...
myplots <- lapply(datalist, myfunc)
class(myplots[[1]])
# [1] "gg" "ggplot"
When myfunc is called, its argument x is just a data.frame, the function has no idea that x is the first (or second or third) frame in a list of frames.
If you want to include the nth frame with an index indicating which element it is, this is in my view "zipping" data together, so I suggest Map. (You can also use purrr::imap or related tidyverse functions.)
myfunc2 <- function(x, title = "") ggplot(x, aes_string(names(x)[1], names(x)[2])) + geom_point() + labs(title = title)
myplots <- Map(myfunc2, datalist, sprintf("Plot number %s", seq_along(datalist)))
class(myplots[[1]])
# [1] "gg" "ggplot"
To understand how Map relates to lapply, then understand that lapply(datalist, myfunc) is "unrolled" to something like:
myfunc(datalist[[1]])
myfunc(datalist[[2]])
myfunc(datalist[[3]])
With Map, however, it takes one function that must accept one or more arguments in each call. With that, Map accepts as many lists (or vectors) as the function accepts arguments. The two functions are synonomously
lapply(datalist, myfunc) # data first, function second
Map(myfunc, datalist) # function first, data second
and a more complicated call unrolls like thus:
titles <- sprintf("Plot number %d", seq_along(datalist)) # "Plot number 1", ...
Map(myfunc2, datalist, titles)
# equivalent to
myfunc2(datalist[[1]], titles[[1]])
myfunc2(datalist[[2]], titles[[2]])
myfunc2(datalist[[3]], titles[[3]])
It doesn't really matter if each of the arguments is a true list (as in datalist) or a vector (as in titles), as long as they are the same length (or length 1).

Related

Apply a function to different dataframes

I am trying to run a function over different datasets, and can't seem to get it work. The variable names x and y are the same across datasets, but the dataset (argument z in my custom function) is different.
I have tried lapply but it is not working
Running the function over individual datasets works fine:
resultsmadrid <- customfunction (x=types, y=score, z=madrid)
resultsnavarra <- customfunction (x=types, y=score, z=navarra)
resultsaragon <- customfunction (x=types, y=score, z=aragon)
Trying to do it in one take is not working
regiones <- list(madrid, navarra, aragon) #Creates the list
resultregiones <- lapply(regiones, customfunction(x=types, y=score, z=regiones)) #Applies that to the list (?)
It's not looping the analysis across the dataframes in the list, the error message says there is a missing argument in the function.
I am not clear on how to call each dataframe from the function argument that does that (z, in my case). It seems the name of the comprehensive list object is not the right approach. Thanks for the help!
since types and scores are the same, you need to 'loop' throught the elements of you regiones list. Try it like this:
regiones <- list(madrid, navarra, aragon) #Creates the list
resultregiones <- lapply(regiones,function(X) customfunction(x=types, y=score, z=X))

Can I create value name from function argument, and assign a function output to it?

I have created a function which cleans up my data and plots using ggplot. I want to name the cleaned data and plot with a suffix so that it can be recalled easily.
For example:
data_frame
data_frame_cleaned
data_frame_plot
I haven't managed to find anything that might pull this off.
I read about using deparse(substitute(x)) to turn the variable into a string, so I gave it a shot together with paste().
import a new data frame
my_data <- read.csv("my_data.csv")
analyze_data(my_data)
function with dpylr and ggplot.
Then, I want to store analyse_data and data_plot in the environment, here is what I thought might work, but no...
analyze_data <- function(x){
x_data <- x %>%
filter()%>%
group_by() %>%
summarize() %>%
mutate()
x_plot <- ggplot(x_data)
x_name <- deparse(substitute(x))
assign(paste(x_name,"cleaned",sep="_"),x_data)
assign(paste(x_name,"plot",sep="_"),x_plot)
}
I got warning message instead.
Warning messages:
1: In assign(paste(x_name, "cost_plot", sep = "_"), campg_data) :
only the first element is used as variable name
Using assign to assign variables is not the best idea. You can litter your environment with lots of variables, which can become confusing, and makes it difficult to handle them programmatically. It's better to store your objects in something like a list, which allows you to extract data easily or modify it in sequence using the *apply or map_* functions. That said…
I cannot replicate the warning when I run your function more or less as it is above. Nevertheless, although the function seems to run just fine, it doesn't do what is desired, i.e. no new variables appear in .GlobalEnv. The issue is that you haven't specified the environment in which the variables should be assigned, so they are assigned within the function's own local environment and vanish when the function completes.
You can use pos = 1 to assign your variables within the .GlobalEnv. The following code create variables mtcars_cleaned and mtcars_plot in my .GlobalEnv:
library(dplyr)
analyze_data <- function(x){
x_data <- x %>%
filter(cyl > 4)
x_plot <- ggplot(x_data, aes(mpg, disp)) + geom_point()
x_name <- deparse(substitute(x))
assign(paste(x_name,"cleaned", sep="_"), x_data, pos = 1)
assign(paste(x_name,"plot", sep="_"), x_plot, pos = 1)
}
analyze_data(mtcars)

Creating a function that names objects created based on a function in R

I have a question about creating a function that creates ggplots. I want to create my own function to graph values in multiple data frames quickly instead of writing a whole ggplot with each argument filled out each time. What I want to do is to input a vector of the names of the data frames, have the function create the graphs and have each saved as a new object with a different name. Example of my idea is…
myfunction <- function(x) {
ggplot(x, aes(x = time, y = result)) +
geom_point()
}
I want to be able to do something like
myfunction(c(testtype1, testtype2, testtype3))
and have the function create objects plot1, plot2, plot3. As of now, I can only do
plot1 <- myfunction(testtype1)
plot2 <- myfunction(testtype2)
plot3 <- myfunction (testtype3)
I don’t want to keep typing that over and over, especially if I have a lot of test types. Is there a way that the function can be modified to use the function to name the objects according to some formula?
With this, you can provide any number of (appropriate) data frames, and the l_my_fun would return a list containing the plots.
l_my_fun <- function(x, ...) {
l <- list(x, ...)
ps <- lapply(l, myfunction)
ps
}
out <- l_my_fun(testtype1, testtype2, testtype3)
For example, now access the second plot as
out[[2]]

Using dplyr for exploratory plots

I regularly used d_ply to produce exploratory plots.
A trivial example:
require(plyr)
plot_species <- function(species_data){
p <- qplot(data=species_data,
x=Sepal.Length,
y=Sepal.Width)
print(p)
}
d_ply(.data=iris,
.variables="Species",
function(x)plot_species(x))
Which produces three separate plots, one for each species.
I would like to reproduce this behaviour using functions in dplyr.
This seems to require the reassembly of the data.frame within the function called by summarise, which is often impractical.
require(dplyr)
iris_by_species <- group_by(iris,Species)
plot_species <- function(Sepal.Length,Sepal.Width){
species_data <- data.frame(Sepal.Length,Sepal.Width)
p <- qplot(data=species_data,
x=Sepal.Length,
y=Sepal.Width)
print(p)
}
summarise(iris_by_species, plot_species(Sepal.Length,Sepal.Width))
Can parts of the data.frame be passed to the function called by summarise directly, rather than passing columns?
I believe you can work with do for this task with the same function you used in d_ply. It will print directly to the plotting window, but also saves the plots as a list within the resulting data.frame if you use a named argument (see help page, this is essentially like using dlply). I don't fully grasp all that do can do, but if I don't use a named argument I get an error message but the plots still print to the plotting window (in RStudio).
plot_species <- function(species_data){
p <- qplot(data=species_data,
x=Sepal.Length,
y=Sepal.Width)
print(p)
}
group_by(iris, Species) %>%
do(plot = plot_species(.))

R - title plots based on nested lists

I have 2 lists, and inside each are two more lists containing data frames (in other words, nested lists). I want plot each data frame and title it based on the names of both the primary and nested lists.
For example, say we have:
a=list(
list(a=data.frame(x=rpois(5,1),y=rpois(5,1)),
b=data.frame(x=rpois(5,1),y=rpois(5,1))),
list(c=data.frame(x=rpois(5,1),y=rpois(5,1)),
d=data.frame(x=rpois(5,1),y=rpois(5,1))))
And we have the names of the primary list:
names(a)=c("alpha","bravo")
Inside the two primary lists alpha and bravo, we have two more lists, charlie and delta:
for(i in 1:length(a)) {
names(a[[i]])=c("charlie","delta") }
I can use lapply to loop through each list and plot the data frames, but I am having trouble getting the titles to combine the name of the primary list (alpha and bravo) and the nested list (charlie and delta) for each data frame. For instance, in this case, I would like to have four plots called: alpha_charlie, alpha_delta,bravo_charlie, and bravo_delta.
lapply(a,function(i) {
lapply(names(i), function(j) {
ggplot()+
geom_point(data=i[[j]],aes(x,y))+
opts(title=paste(names(i),j,sep="_")) #Here is where I am struggling!
} ) } )
Any help would be much appreciated. Thank you!
You could use lapply on an indexing sequence instead of the names themselves.
lapply(seq(a), function(i){
lapply(seq(a[[i]]), function(j){
ggplot() +
geom_point(data = a[[i]][[j]], aes(x, y))+
opts(title = paste(names(a)[i], names(a[[i]])[j], sep = "_"))
})})
My preference would be to stick with for loops in this situation. Doing so makes it easy to save the plots into a new list and then print them all at once using grid.arrange and do.call.
library(ggplot2)
plot_list = list() # Save plots to list.
for (name_1 in names(a)) {
for (name_2 in names(a[[name_1]])) {
title_string = paste(name_1, name_2, sep="_")
plt = ggplot(data=a[[name_1]][[name_2]], aes(x=x, y=y)) +
geom_point() +
opts(title=title_string)
plot_list[[title_string]] = plt
}
}
library(gridExtra)
png("plots.png", height=600, width=600)
do.call(grid.arrange, plot_list)
dev.off()
In your first lapply, you've lost the names, so this winds up being yucky. Dason gave you a good fix.
However, I think you'd be much better served converting the list of lists of data.frames into a single data.frame! and use faceting!
nested.fun <- function(l) {
out <- ldply(l, data.frame)
names(out)[1] <- 'inner.id'
return(out)
}
one.df <- ldply(a, nested.fun)
ggplot(one.df, aes(x,y))+geom_point()+facet_grid(.id~inner.id)

Resources