I'm using R to loop through a data frame, perform a calculation and to make a plot.
for(i in 2 : 15){
# get data
dataframe[,i]
# do analysis
# make plot
a <- plot()
}
Is there a way that I can make the plot object name 'a', using the value of 'i'? For example, a + "i" <- plot(). Then I want to add that to a vector so I have a series of plots that I can then use at a later stage when I want to make a pdf. Or perhaps there is another way of storing this.
I'm familiar with the paste() function but I haven't figured out how to define an object using it.
If you want a "vector" of plot objects, the easiest way is probably to store them in a list. Use paste() to create a name for your plot and then add it to the list:
# Create a list to hold the plot objects.
pltList <- list()
for( i in 2:15 ){
# Get data, perform analysis, ect.
# Create plot name.
pltName <- paste( 'a', i, sep = '' )
# Store a plot in the list using the name as an index.
# Note that the plotting function used must return an *object*.
# Functions from the `graphics` package, such as `plot`, do not return objects.
pltList[[ pltName ]] <- some_plotting_function()
}
If you didn't want to store the plots in a list and literally wanted to create a new object that had the name contained in pltName, then you could use assign():
# Use assign to create a new object in the Global Environment
# that gets it's name from the value of pltName and it's contents
# from the results of plot()
assign( pltName, plot(), envir = .GlobalEnv )
Have a look at the packages lattice or ggplot2, the plot functions in these packages create objects which can be assigned to variables and can be printed or plotted at a later stage.
For instance with lattice:
library("lattice")
i <- 1
assign(sprintf("a%d", i), xyplot(1:10 ~ 1:10))
print(a1) # you have to "print" or "plot" the objects explicitly
Or append the objects to a list:
p <- list()
p[[1]] <- xyplot(...)
p[[2]] <- xyplot(...)
Related
I work with SAS files (sas7bdat = dataframes) and SAS formats (sas7bcat).
My sas7bdat files are in a "data" file, so I can get a list in object files_names.
Here is the first part of my code, working perfectly
files_names <- list.files(here("data"))
nb_files <- length(files_names)
data_names <- vector("list",length=nb_files)
for (i in 1 : nb_files) {
data_names[i] <- strsplit(files_names[i], split=".sas7bdat")
}
for (i in 1:nb_files) {
assign(data_names[[i]],
read_sas(paste(here("data", files_names[i])), "formats/formats.sas7bcat")
)}
but I get some issues when trying to apply function as_factor from package haven (in order to apply labels on my new dataframes and get like SEX = "Male" instead of SEX = 1).
I can make it work dataframe by dataframe like the code below
df_labelled <- haven::as_factor(df, only_labelled = TRUE)
I would like to create a loop but didn't work because my data_names[i] isn't a dataframe and as_factor requires a dataframe in first argument.
I'm quite new to R, thank you very much if someone could help me.
you might want to think about using different data structures, for example you can use a named list to save your dataframes then you can easily loop through them.
In fact you could do everything in one loop, I'm sure there's a more efficient way to do this, but here's an example of one way without changing your code too much :
files_names <- list.files(here("data"))
raw_dfs <- list()
labelled_dfs <- list()
for (file_name in files_names) {
# # strsplit returns a list either extract the first element
# # like this
# df_name <- (strsplit(file_name, split=".sas7bdat"))[[1]]
# # or use something else like gsub
df_name <- gsub(".sas7bdat", '', file_name)
raw_dfs[df_name] <- read_sas(paste(here("data", file_name)), "formats/formats.sas7bcat")
labelled_dfs[df_name] <- haven::as_factor(raw_dfs[[df_name]], only_labelled = TRUE)
}
My pipeline reads in a csv to a dataframe, assigns rownames, removes a column, performs a pca, plots the pca and extracxts the meaningful variables from the pca which are also plotted.
Here is my current code, which only goes as far as the first plot:
library(ggplot2)
library(ggrepel)
tsv = read.csv('matrix.tsv', sep='\t')
bell= read.csv('bell.tsv', sep='\t')
tail= read.csv('tail.tsv', sep='\t')
dfList = list(tail, tsv, bell)
#process csv's
dfList = lapply(dfList, function(dum){
rownames(dum) = dum[,1]
dum[,1] = NULL
dum$X = NULL
dum = dum[, -grep('un', colnames(dum))]
})
#create pca's of dataframes
pcaList = lapply(dfList, function(pca){
prin_comp = prcomp(pca, scale. = T)
})
#plot top 2 principle components in the pca
plotList = lapply(pcaList, function(prin_comp){
t = qplot(x=prin_comp$rotation[,1], y=prin_comp$rotation[,2]) + geom_text_repel(aes(label=row.names(prin_comp$rotation)))
})
#this plots the 3 plots, one for each pca, but they are un-named
plotList
The problem is that the plots don't have meaningful names/titles. I don't know how to keep that information present, passed from function to function.
I know there must be a more elegant way of doing this. And I have spent a day reading similar and not so similar questions regarding processing multiple csv files. But either they weren't applicable or didn't work for my case.
And as the title of this question implies, I would prefer to do this on one csv at a time, not all 3 at a time, as the csv's in question are very large, over 5GB each, so keeping each dataframe and pca in memory at the same time is impossible.
You just need to keep a string you want to use as the title somewhere and add ggtitle(YOUR_TITLE) to your plot, but this is not so easy with your current code. Instead of performing each step of the analysis for each CSV before going to the next step, why don't you just perform all steps for one CSV at a time?
Your code could look like:
library(ggplot2)
library(ggrepel)
csvs <- c("matrix.tsv","bell.tsv","tail.tsv")
for (i in csvs) {
# read file
df <- read.csv(i, sep='\t')
# process file
rownames(df) <- df[,1]
df[,1] <- NULL
df$X = NULL
df = df[, -grep('un', colnames(df))]
# create pca
pca <- prcomp(df, scale = T)
# plot pca
pcaPlot <- qplot(x=pca$rotation[,1], y=pca$rotation[,2]) +
geom_text_repel(aes(label=row.names(pca$rotation))) +
ggtitle(i)
print(pcaPlot)
# extract and plot meaningful variables
# ...
}
Basically I just put everything you do in a lapply call inside of a for loop, this approach also does the processing for one CSV at the time.
I generate a bunch of graphs and write them into a list variable, something like the following.
graphsListHolder <- list()
loop around the following code for as many plots as I make
filename <- paste some elements together to create a unique name
graphsListHolder[[filename]] <- p # p is the name of the ggplot plot
I save graphsListHolder as a .rds file.
Later I want to read in the res file, choose from plots in the graphsListHolder file and display with grid.arrange. I can hardcode the plot number and the following example works fine when run, plotting two graphs, one on top of the other.
grid.arrange(
graphsListHolder[[3]], graphsListHolder[[5]]
)
But if I construct a character variable temp like this (or variations on this)
temp <- "graphsListHolder[[3]], graphsListHolder[[5]]"
and change the grid.arrange code to
grid.arrange(
temp
)
I get
Error in gList(list("graphsListHolder[[3]], graphsListHolder[[5]]", wrapvp = list( :
only 'grobs' allowed in "gList"
In addition: Warning message:
In grob$wrapvp <- vp : Coercing LHS to a list
I also tried eval(parse(text = temp) without success.
I'm not sure how you want to choose them, but say you had a vector of the elements you wanted
x <- c(3,5)
Then you could do
grid.arrange(grobs=graphsListHolder[x])
Trying to turn arbitrary strings into executable code usually isn't a good idea. Often there are more "traditional" alternatives in R.
For example
graphsListHolder<-Map(function(x) {
ggplot(data.frame(x=1:10, y=x*1:10)) + geom_point(aes(x,y)) + ggtitle(x)}, 1:5)
x <- c(3,5)
grid.arrange(grobs=graphsListHolder[x])
I have a list, which contains 75 matrix with their names, and I want to do a plot for each matrix, and save each plot with the name that the matrix have.
My code do the plots with a loop and it works, I get 75 correct plots, but the problem is that the name of the plot file is like a vector "c(99,86,94....)",too long and I don´t know which one is.
I´m ussing that code, probably isn´t the best. I´m a beginner, and I have been looking for a solution one week, but it was impossible.
for (i in ssamblist) {
svg(paste("Corr",i,".svg", sep=""),width = 45, height = 45)
pairs(~CDWA+CDWM+HI+NGM2+TKW+YIELD10+GDD_EA,
data=i,lower.panel=panel.smooth, upper.panel=panel.cor,
pch=0, main=i)
dev.off()}
How put to a each plot his name?.
I try change "i" for names(i), but the name was the name of the first column,and only creates one plot. I try to do it with lapply but I could't.
PS: the plots are huge, and I have to expand the margins. I´m using Rstudio.
Thank you¡
Using for loop or apply:
# dummy data
ssamblist <- list(a = mtcars[1:10, 1:4], b = mtcars[11:20, 1:4], c = mtcars[21:30, 1:4])
# using for loop
for(i in names(ssamblist)) {
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()}
# using apply
sapply(names(ssamblist), function(i){
svg(paste0("Corr_", i, ".svg"))
pairs(ssamblist[[i]], main = i)
dev.off()})
I've made a loop to create multiple boxplots. The thing is, I want to save all the boxplots without overwriting each other. Any suggestions?
This is my current code:
boxplot <- list()
for (x in 1:nrow(checkresults)){
boxplots <- boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x], EV[,x],
main=colnames(PIM)[x],
xlab="PIM, MYC, OBX, WDR, EV")
}
Do you want to save them in some files, or save them to be able to look at them in different windows ?
If it is the first case, you can use a png, pdf or whatever function call inside your for loop :
R> for (i in 1:5) {
R> png(file=paste("plot",i,".png",sep=""))
R> plot(rnorm(10))
R> dev.off()
R> }
If you want to display them in separate windows, just use dev.new :
R> for (i in 1:5) {
R> dev.new()
R> plot(rnorm(10));
R> }
Just to add to #juba's answer, if you want to save the plots to a multi-page pdf file, then you don't have to use the paste command that #juba suggested. This
pdf("myboxplots.pdf")
for (x in seq_along(boxplots)){
boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x],EV[,x],
main = colnames(PIM)[x],
xlab = "PIM, MYC, OBX, WDR, EV")
}
dev.off()
creates a single multi-page pdf document, where each page is a boxplot. If you want to store the boxplots in separate pdf documents, then use the file=paste command.
First, create a list of the right length - it just makes things easier and is good practice to allocate storage before filling objects in via a loop:
boxplots <- vector(mode = "list", length = nrow(checkresults))
Then we can loop over the data you want, assigning to each component of the boxplots list as we go, using the [[x]] notation:
for (x in seq_along(boxplots)){
boxplots[[x]] <- boxplot(PIM[,x], MYC [,x], OBX[,x], WDR[,x],EV[,x],
main = colnames(PIM)[x],
xlab = "PIM, MYC, OBX, WDR, EV")
}
Before, your code was overwriting the previous boxplot info during subsequent iterations.