I generate a bunch of graphs and write them into a list variable, something like the following.
graphsListHolder <- list()
loop around the following code for as many plots as I make
filename <- paste some elements together to create a unique name
graphsListHolder[[filename]] <- p # p is the name of the ggplot plot
I save graphsListHolder as a .rds file.
Later I want to read in the res file, choose from plots in the graphsListHolder file and display with grid.arrange. I can hardcode the plot number and the following example works fine when run, plotting two graphs, one on top of the other.
grid.arrange(
graphsListHolder[[3]], graphsListHolder[[5]]
)
But if I construct a character variable temp like this (or variations on this)
temp <- "graphsListHolder[[3]], graphsListHolder[[5]]"
and change the grid.arrange code to
grid.arrange(
temp
)
I get
Error in gList(list("graphsListHolder[[3]], graphsListHolder[[5]]", wrapvp = list( :
only 'grobs' allowed in "gList"
In addition: Warning message:
In grob$wrapvp <- vp : Coercing LHS to a list
I also tried eval(parse(text = temp) without success.
I'm not sure how you want to choose them, but say you had a vector of the elements you wanted
x <- c(3,5)
Then you could do
grid.arrange(grobs=graphsListHolder[x])
Trying to turn arbitrary strings into executable code usually isn't a good idea. Often there are more "traditional" alternatives in R.
For example
graphsListHolder<-Map(function(x) {
ggplot(data.frame(x=1:10, y=x*1:10)) + geom_point(aes(x,y)) + ggtitle(x)}, 1:5)
x <- c(3,5)
grid.arrange(grobs=graphsListHolder[x])
Related
That's my current code for saving a list of different ggplots:
The problem is that it saves all the different plotnames (raphael_2021_022.png
and raphael_2021_023.png - which I want!) but every unique png file contains always the same ggplot (raphael_2021_023.png) even though it has a different name
names(barplots_emmeans) <-
sub("\\.xlsx$",
".png",
names(raphael_calc_sum))
> barplots_emmeans
$raphael_2021_022.png
$raphael_2021_023.png
lapply(names(barplots_emmeans),
function(nm) barplots_emmeans[[nm]] +
ggsave(filename = file.path("C:/Users/Raphael/Desktop/barplot_emmeans/",
nm )))
How can I fix that?
My pipeline reads in a csv to a dataframe, assigns rownames, removes a column, performs a pca, plots the pca and extracxts the meaningful variables from the pca which are also plotted.
Here is my current code, which only goes as far as the first plot:
library(ggplot2)
library(ggrepel)
tsv = read.csv('matrix.tsv', sep='\t')
bell= read.csv('bell.tsv', sep='\t')
tail= read.csv('tail.tsv', sep='\t')
dfList = list(tail, tsv, bell)
#process csv's
dfList = lapply(dfList, function(dum){
rownames(dum) = dum[,1]
dum[,1] = NULL
dum$X = NULL
dum = dum[, -grep('un', colnames(dum))]
})
#create pca's of dataframes
pcaList = lapply(dfList, function(pca){
prin_comp = prcomp(pca, scale. = T)
})
#plot top 2 principle components in the pca
plotList = lapply(pcaList, function(prin_comp){
t = qplot(x=prin_comp$rotation[,1], y=prin_comp$rotation[,2]) + geom_text_repel(aes(label=row.names(prin_comp$rotation)))
})
#this plots the 3 plots, one for each pca, but they are un-named
plotList
The problem is that the plots don't have meaningful names/titles. I don't know how to keep that information present, passed from function to function.
I know there must be a more elegant way of doing this. And I have spent a day reading similar and not so similar questions regarding processing multiple csv files. But either they weren't applicable or didn't work for my case.
And as the title of this question implies, I would prefer to do this on one csv at a time, not all 3 at a time, as the csv's in question are very large, over 5GB each, so keeping each dataframe and pca in memory at the same time is impossible.
You just need to keep a string you want to use as the title somewhere and add ggtitle(YOUR_TITLE) to your plot, but this is not so easy with your current code. Instead of performing each step of the analysis for each CSV before going to the next step, why don't you just perform all steps for one CSV at a time?
Your code could look like:
library(ggplot2)
library(ggrepel)
csvs <- c("matrix.tsv","bell.tsv","tail.tsv")
for (i in csvs) {
# read file
df <- read.csv(i, sep='\t')
# process file
rownames(df) <- df[,1]
df[,1] <- NULL
df$X = NULL
df = df[, -grep('un', colnames(df))]
# create pca
pca <- prcomp(df, scale = T)
# plot pca
pcaPlot <- qplot(x=pca$rotation[,1], y=pca$rotation[,2]) +
geom_text_repel(aes(label=row.names(pca$rotation))) +
ggtitle(i)
print(pcaPlot)
# extract and plot meaningful variables
# ...
}
Basically I just put everything you do in a lapply call inside of a for loop, this approach also does the processing for one CSV at the time.
I have a list, which contains 75 matrix with their names, and I want to do a plot for each matrix, and save each plot with the name that the matrix have.
My code do the plots with a loop and it works, I get 75 correct plots, but the problem is that the name of the plot file is like a vector "c(99,86,94....)",too long and I don´t know which one is.
I´m ussing that code, probably isn´t the best. I´m a beginner, and I have been looking for a solution one week, but it was impossible.
for (i in ssamblist) {
svg(paste("Corr",i,".svg", sep=""),width = 45, height = 45)
pairs(~CDWA+CDWM+HI+NGM2+TKW+YIELD10+GDD_EA,
data=i,lower.panel=panel.smooth, upper.panel=panel.cor,
pch=0, main=i)
dev.off()}
How put to a each plot his name?.
I try change "i" for names(i), but the name was the name of the first column,and only creates one plot. I try to do it with lapply but I could't.
PS: the plots are huge, and I have to expand the margins. I´m using Rstudio.
Thank you¡
Using for loop or apply:
# dummy data
ssamblist <- list(a = mtcars[1:10, 1:4], b = mtcars[11:20, 1:4], c = mtcars[21:30, 1:4])
# using for loop
for(i in names(ssamblist)) {
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()}
# using apply
sapply(names(ssamblist), function(i){
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()})
I have several data files (numeric) with around 150000 rows and 25 columns. Before I was using gnuplot (where script lines are proportional plot objects) to to plot the data but as I have to do now some additional analysis with it I moved to R and ggplot2.
How to organize the data, thought? Is one big data.frame with an additional column to mark from which file the data is coming from really the only option? Or is there some way around that?
Edit: To be a bit more precise, I'll give as an example in what form I have the data now:
filelst=c("filea.dat", "fileb.dat", "filec.dat")
dat=c()
for(i in 1:length(filelst)) {
dat[[i]]=read.table(file[i])
}
Assuming you have filenames ending with ".dat", here's a mockup example of the strategies proposed by Chase,
require(plyr)
# list the files
lf = list.files(pattern = "\.dat")
str(lf)
# 1. read the files into a data.frame
d = ldply(lf, read.table, header = TRUE, skip = 1) # or whatever options to read
str(d) # should contain all the data, and and ID column called L1
# use the data, e.g. plot
pdf("all.pdf")
d_ply(d, "L1", plot, t="l")
dev.off()
# or using ggplot2
ggplot(d, aes(x, y, colour=L1)) + geom_line()
# 2. read the files into a list
ld = lapply(lf, read.table, header = TRUE, skip = 1) # or whatever options to read
names(ld) = gsub("\.dat", "", lf) # strip the file extension
str(ld)
# use the data, e.g. plot
pdf("all2.pdf")
lapply(names(l), function(ii) plot(l[[ii]], main=ii), t="l")
dev.off()
# 3. is not fun
Your question is a little vague. If I followed along properly, I think you have three main options:
Do as you suggest and then use any one of the "split-apply-combine" functions that exist in R to conduct your analyses by group. These functions may include by, aggregate, ave, package(plyr), package(data.table) and many others.
Store your data object as separate elements in a list(). Then use lapply() and friends to work on them.
Keep everything separate in different data objects and work on them individually. This is probably the most inefficient way to go about doing things, unless you have memory constraints et al.
I've made a loop to create multiple boxplots. The thing is, I want to save all the boxplots without overwriting each other. Any suggestions?
This is my current code:
boxplot <- list()
for (x in 1:nrow(checkresults)){
boxplots <- boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x], EV[,x],
main=colnames(PIM)[x],
xlab="PIM, MYC, OBX, WDR, EV")
}
Do you want to save them in some files, or save them to be able to look at them in different windows ?
If it is the first case, you can use a png, pdf or whatever function call inside your for loop :
R> for (i in 1:5) {
R> png(file=paste("plot",i,".png",sep=""))
R> plot(rnorm(10))
R> dev.off()
R> }
If you want to display them in separate windows, just use dev.new :
R> for (i in 1:5) {
R> dev.new()
R> plot(rnorm(10));
R> }
Just to add to #juba's answer, if you want to save the plots to a multi-page pdf file, then you don't have to use the paste command that #juba suggested. This
pdf("myboxplots.pdf")
for (x in seq_along(boxplots)){
boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x],EV[,x],
main = colnames(PIM)[x],
xlab = "PIM, MYC, OBX, WDR, EV")
}
dev.off()
creates a single multi-page pdf document, where each page is a boxplot. If you want to store the boxplots in separate pdf documents, then use the file=paste command.
First, create a list of the right length - it just makes things easier and is good practice to allocate storage before filling objects in via a loop:
boxplots <- vector(mode = "list", length = nrow(checkresults))
Then we can loop over the data you want, assigning to each component of the boxplots list as we go, using the [[x]] notation:
for (x in seq_along(boxplots)){
boxplots[[x]] <- boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x],EV[,x],
main = colnames(PIM)[x],
xlab = "PIM, MYC, OBX, WDR, EV")
}
Before, your code was overwriting the previous boxplot info during subsequent iterations.