I have 2 lists, and inside each are two more lists containing data frames (in other words, nested lists). I want plot each data frame and title it based on the names of both the primary and nested lists.
For example, say we have:
a=list(
list(a=data.frame(x=rpois(5,1),y=rpois(5,1)),
b=data.frame(x=rpois(5,1),y=rpois(5,1))),
list(c=data.frame(x=rpois(5,1),y=rpois(5,1)),
d=data.frame(x=rpois(5,1),y=rpois(5,1))))
And we have the names of the primary list:
names(a)=c("alpha","bravo")
Inside the two primary lists alpha and bravo, we have two more lists, charlie and delta:
for(i in 1:length(a)) {
names(a[[i]])=c("charlie","delta") }
I can use lapply to loop through each list and plot the data frames, but I am having trouble getting the titles to combine the name of the primary list (alpha and bravo) and the nested list (charlie and delta) for each data frame. For instance, in this case, I would like to have four plots called: alpha_charlie, alpha_delta,bravo_charlie, and bravo_delta.
lapply(a,function(i) {
lapply(names(i), function(j) {
ggplot()+
geom_point(data=i[[j]],aes(x,y))+
opts(title=paste(names(i),j,sep="_")) #Here is where I am struggling!
} ) } )
Any help would be much appreciated. Thank you!
You could use lapply on an indexing sequence instead of the names themselves.
lapply(seq(a), function(i){
lapply(seq(a[[i]]), function(j){
ggplot() +
geom_point(data = a[[i]][[j]], aes(x, y))+
opts(title = paste(names(a)[i], names(a[[i]])[j], sep = "_"))
})})
My preference would be to stick with for loops in this situation. Doing so makes it easy to save the plots into a new list and then print them all at once using grid.arrange and do.call.
library(ggplot2)
plot_list = list() # Save plots to list.
for (name_1 in names(a)) {
for (name_2 in names(a[[name_1]])) {
title_string = paste(name_1, name_2, sep="_")
plt = ggplot(data=a[[name_1]][[name_2]], aes(x=x, y=y)) +
geom_point() +
opts(title=title_string)
plot_list[[title_string]] = plt
}
}
library(gridExtra)
png("plots.png", height=600, width=600)
do.call(grid.arrange, plot_list)
dev.off()
In your first lapply, you've lost the names, so this winds up being yucky. Dason gave you a good fix.
However, I think you'd be much better served converting the list of lists of data.frames into a single data.frame! and use faceting!
nested.fun <- function(l) {
out <- ldply(l, data.frame)
names(out)[1] <- 'inner.id'
return(out)
}
one.df <- ldply(a, nested.fun)
ggplot(one.df, aes(x,y))+geom_point()+facet_grid(.id~inner.id)
Related
I am trying to store multiple plots produced by ggplot2 into a list.
I am attempting to use the list function suggested in a previous thread, however I am having difficulty creating my own function to meet my needs.
First, I split a dataframe based on a factor into a list with the following code:
heatlist.germ <- split(heatlist.germ, f=as.factor(heatlist.germ$plot))
Afterwhich, I attempt to create a list function that I can later use lapply with.
plot_data_fcn <- function (heatlist.germ) {
ggplot(heatlist.germ[[i]], aes(x=posX, y=posY, fill=germ_bin)) +
geom_tile(aes(fill=germ_bin)) +
geom_text(aes(label=germ_bin)) +
scale_fill_gradient(low = "gray90", high="darkolivegreen4") +
ggtitle(plot) +
scale_x_continuous("Position X", breaks=seq(1,30)) +
scale_y_continuous("Position Y (REVERSED)", breaks=seq(1,20))
}
heatlist.test <- lapply(heatlist.germ[[i]], plot_data_fcn)
Two main things I am trying to accomplish:
Store the 12 ggplots (hence 12 factors of plot) in a list.
Create a title called "Plot [i] Germination".
Any help would be appreciated.
I don't have your data, so I'll simplify the plotting mechanism.
The first problem is that you should not use your [[i]] referencing in your function. Just have your function deal with data as-is, it really doesn't know that its argument is (in another environment) an element with a list. It knows just the object itself.
# a simple plot function
myfunc <- function(x) ggplot(x, aes_string(names(x)[1], names(x)[2])) + geom_point()
# a list of frames, nothing fancy here
datalist <- replicate(3, mtcars, simplify = FALSE)
# just call it ...
myplots <- lapply(datalist, myfunc)
class(myplots[[1]])
# [1] "gg" "ggplot"
When myfunc is called, its argument x is just a data.frame, the function has no idea that x is the first (or second or third) frame in a list of frames.
If you want to include the nth frame with an index indicating which element it is, this is in my view "zipping" data together, so I suggest Map. (You can also use purrr::imap or related tidyverse functions.)
myfunc2 <- function(x, title = "") ggplot(x, aes_string(names(x)[1], names(x)[2])) + geom_point() + labs(title = title)
myplots <- Map(myfunc2, datalist, sprintf("Plot number %s", seq_along(datalist)))
class(myplots[[1]])
# [1] "gg" "ggplot"
To understand how Map relates to lapply, then understand that lapply(datalist, myfunc) is "unrolled" to something like:
myfunc(datalist[[1]])
myfunc(datalist[[2]])
myfunc(datalist[[3]])
With Map, however, it takes one function that must accept one or more arguments in each call. With that, Map accepts as many lists (or vectors) as the function accepts arguments. The two functions are synonomously
lapply(datalist, myfunc) # data first, function second
Map(myfunc, datalist) # function first, data second
and a more complicated call unrolls like thus:
titles <- sprintf("Plot number %d", seq_along(datalist)) # "Plot number 1", ...
Map(myfunc2, datalist, titles)
# equivalent to
myfunc2(datalist[[1]], titles[[1]])
myfunc2(datalist[[2]], titles[[2]])
myfunc2(datalist[[3]], titles[[3]])
It doesn't really matter if each of the arguments is a true list (as in datalist) or a vector (as in titles), as long as they are the same length (or length 1).
I am trying to run a function over different datasets, and can't seem to get it work. The variable names x and y are the same across datasets, but the dataset (argument z in my custom function) is different.
I have tried lapply but it is not working
Running the function over individual datasets works fine:
resultsmadrid <- customfunction (x=types, y=score, z=madrid)
resultsnavarra <- customfunction (x=types, y=score, z=navarra)
resultsaragon <- customfunction (x=types, y=score, z=aragon)
Trying to do it in one take is not working
regiones <- list(madrid, navarra, aragon) #Creates the list
resultregiones <- lapply(regiones, customfunction(x=types, y=score, z=regiones)) #Applies that to the list (?)
It's not looping the analysis across the dataframes in the list, the error message says there is a missing argument in the function.
I am not clear on how to call each dataframe from the function argument that does that (z, in my case). It seems the name of the comprehensive list object is not the right approach. Thanks for the help!
since types and scores are the same, you need to 'loop' throught the elements of you regiones list. Try it like this:
regiones <- list(madrid, navarra, aragon) #Creates the list
resultregiones <- lapply(regiones,function(X) customfunction(x=types, y=score, z=X))
I have a question about creating a function that creates ggplots. I want to create my own function to graph values in multiple data frames quickly instead of writing a whole ggplot with each argument filled out each time. What I want to do is to input a vector of the names of the data frames, have the function create the graphs and have each saved as a new object with a different name. Example of my idea is…
myfunction <- function(x) {
ggplot(x, aes(x = time, y = result)) +
geom_point()
}
I want to be able to do something like
myfunction(c(testtype1, testtype2, testtype3))
and have the function create objects plot1, plot2, plot3. As of now, I can only do
plot1 <- myfunction(testtype1)
plot2 <- myfunction(testtype2)
plot3 <- myfunction (testtype3)
I don’t want to keep typing that over and over, especially if I have a lot of test types. Is there a way that the function can be modified to use the function to name the objects according to some formula?
With this, you can provide any number of (appropriate) data frames, and the l_my_fun would return a list containing the plots.
l_my_fun <- function(x, ...) {
l <- list(x, ...)
ps <- lapply(l, myfunction)
ps
}
out <- l_my_fun(testtype1, testtype2, testtype3)
For example, now access the second plot as
out[[2]]
It seems like every question involving loops in R is met with "Loops are bad" and "You're doing it wrong" with advice to use list, or tapply or whatnot.
I'm learning R, and have implemented the following loop to create image files for each factor level, with the # of factor levels changing each time I run it:
for(i in unique(df$factor)) {
lnam <- paste("test_", i, sep="")
assign(lnam, subset(df, factor==i))
lfile <- paste(lnam, ".png", sep="")
png(file = lfile, bg="transparent")
with(get(lnam), hist(x, main = paste("Histogram of x for ", i, " factor", sep="")))
dev.off()
}
This works. I want to expand it to perhaps run various tests on those subgroups (also output to files), etc.
Is this a valid and legitimate use of loops? Or is there a preferred way to skin this cat?
There's nothing wrong with loops in general. Sometimes, particularly when you're working with files or calling functions for their side-effects rather than their outputs, loops can be easier to follow than *apply calls. However, when you use a loop to simulate a operation that can be vectorised, it's often much slower, hence the recommendation to avoid them.
Re your specific example, though, I'd make the following comments:
If you want to do something for each level in a factor, it's more straightforward to use levels(factor) rather than unique(factor).
You don't need to create a new data frame specifically for each factor level.
With that in mind:
for(i in levels(df$factor))
{
lf <- paste("test_", i, ".png", sep="")
png(file=lf, bg="transparent",
with(subset(df, factor == i), hist(x, ....)
dev.off()
}
In this case, a reasonable option is to use split to convert your data frame into a list of data frames, each containing subset of with a specific factor level.
split_df <- split(df, df$factor)
As Colin mentioned, paste can be vectorised, so you only need to call it once.
lfile <- paste("test_", names(split_df), ".png", sep = "")
Group all your plotting code into a function.
draw_and_save_histogram <- function(data, file)
{
png(file)
with(data, hist(x))
dev.off()
}
Now you can more easily compare the difference between a plain loop and an *apply function (in this case mapply, since we need two inputs).
for(i in seq_along(split_df))
{
draw_and_save_histogram(split_df[[i]], lfile[i])
}
mapply(
draw_and_save_histogram,
split_df,
lfile
)
Rather than drawing a lots of histograms to be saved in different files, it is much more preferable to draw one plot with several panels using lattice or ggplot2.
library(lattice)
histogram(~ x | factor, df)
library(ggplot2)
ggplot(df, aes(x)) + geom_histogram() + facet_wrap(~ factor)
I have 11 lists of different length, imported into R as p1,p2,p3,...,p11. Now I want to get the rollmean (library TTR) from all lists and name the result p1y,p2y,...,p11y.
This seems to be the job for a loop, but I read that this is often not good practice in R. I tried something (foolish) like
sample=10
for (i in 1:11){
paste("p",i,"y",sep="")<-rollmean(paste("p",i,sep=""),sample)
}
which does not work.
I also tried to use it in combination with assign(), but as I understand assign can only take a variable and a single value.
As always it strikes me that I am missing some fundamental function of R.
As Manuel pointed out, your life will be easier if you combine the variables into a list. For this, you want mget (short for "multiple get").
var_names <- paste("p", 1:11, sep = "")
p_all <- mget(var_names, envir = globalenv())
Now simply use lapply to call rollmean on each element of your list.
sample <- 10
rolling_means <- lapply(p_all, rollmean, sample)
(Also, consider renaming the sample to something that isn't already a function name.)
I suggest leaving the answers as a list, but if you really like the idea of having separate rolling mean variables to match the separate p1, p11 variables then use list2env.
names(rolling_means) <- paste(var_names, "y", sep = "")
list2env(rolling_means, envir = globalenv())
You could group your lists into one and do the following
sample <- 10
mylist <- list(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11)
for(i in 1:11) assign(paste('p',i,'y',sep=''), rollmean(mylist[i], sample))
This can be done with ?get and ?do.call .
x1<-1:3
x2 <- seq(3.5,5.5,1)
for (i in 1:2) {
sx<- (do.call("sin",list(c(get(paste('x',i,sep='',collapse=''))))))
cat(sx)
}
Sloppy example, but you get the idea, I hope.