This question already has an answer here:
Adding elements to a list in for loop in R
(1 answer)
Closed 3 years ago.
using ggplot2 inside for loop doesn't show the plots names
I want to print the plots into one page and save them
I have 60 csv files each consists of two columns 1st is date and second is ssh and each has different rows number. I list them in variable called files and then I plot them. the for loop produced 60 plots. the problem is how to know the name of those plots to call them when I want e.g. to print 4 plots in one page so I will have at the end 15 pages each contains 4 plots.
when I used
library(gridExtra)
grid.arrange(p1,p2,p3,p4, nrow=2,ncol=2)
grid.arrange(p5,p6,p7,p8, nrow=2,ncol=2)
and so on till p60
it showed no result. warning messages said there is no object called p1, p2,p3,p4,....p60.
the code is as follows:
files<- list.files("F:/R Practice/time series")
for (i in seq_along(files)){
mydf <- read.csv(files[i], stringsAsFactors=FALSE)
a<- data.frame(as.Date(mydf$date, "%d-%m-%y"),mydf[,-1])
names(a)[1]<- "Date"
names(a)[2]<- "SSH"
b <- zoo(a[,-1],order.by=as.Date(a[,1]))
p<- ggplot(a, aes(x=Date,y=SSH, color=SSH)) +geom_line(colour="darkblue")# +labs(title = "gridcell of (31.25N, 33.25E) ",x="Date", y="SSH")
p<-p+ggtitle(readline(prompt = "enter cell coordinates: "))+xlab("Year")+ylab("weighted average SSH of center cell")
# adding atrribute to the plot p[i]
p<-p+theme(axis.title.x=element_text(color = "black",size = 12),
axis.title.y=element_text(color = "black",size = 8),
axis.text.x=element_text(color="black",size=8),
axis.text.y = element_text(color = "black",size = 8),
panel.background = element_rect(colour = "black", size=0.5 ,fill = NA),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
axis.line = element_line(colour = "black",size=1),
legend.position="none" ,
plot.title=element_text(hjust = 0.5,vjust= 0.5,lineheight = .3,face = "bold"))
print(p)
}
Your code is overwriting p each time to the last plot. Here is simpler code using the examples from the ggplot2 help example but with a loop like yours.
df <- data.frame(
gp = factor(rep(letters[1:3], each = 10)),
y = rnorm(30)
)
p.list<- list()
for (i in 1:4)
{
df<- sample_frac(df, 0.5)
p<- ggplot(df, aes(gp, y)) +
geom_point() +
geom_point(data = ds, aes(y = mean), colour = 'red', size = 3)
p.list[[i]] <-p
}
grid.arrange(p.list[[1]], p.list[[2]], p.list[[3]], p.list[[4]], nrow= 2 )
Note two things:
p.list begins as an empty list.
At the end of each loop you add a plot to a numbered element of a growing list i.e. p.list[[i]] <- p
p.list grows to become a list with each numbered element (1:4) holding a single plot, each plot also being a named list.
Related
I'm new to R. I wrote this piece of code for plotting a dataset, giving the plot a meaningful title and saving it as an image:
#Define all the values I need
MyFile <- "Spectrum.csv"
MyTitle <- gsub(".csv", "", MyFile)
MyImage <- gsub(".csv", ".png", MyFile)
MyData <- read.csv(MyFile)
#Select the relevant range in the dataset
MyData_select <- MyData %>%
filter(Wavelength >= 400 & Wavelength <= 1000)
#Plot the dataset
p1 <- ggplot() + geom_line(aes(y = Reflectance, x = Wavelength), size=1, data = MyData_select) +
scale_x_continuous(breaks=seq(400,1000,100)) +
scale_y_continuous(expand = expand_scale()) +
theme(text=element_text(family="Arial"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"),
panel.border = element_rect(colour = "black", fill=NA, size=0.5))
#Give the plot and the axes meaningful titles
p1 + labs(title = MyTitle, x = "Wavelength (nm)", y = "Reflectance (arb. unit)")
#Save the plot as a png image
ggsave(MyImage)
The problem is that I have hundreds of those datasets in one directory and I would like to loop the code above to produce an image with the titled plot for each of them.
I tried to work around something like this:
FileList <- data_frame(filename = list.files())
for (i in FileList) {
#Do the plotting/saving stuff
}
I really cannot find the way to make it work. In particular, I'm not sure how to use "i" in the rest of the code. Any help will be appreciated, and thank you for your patience... it's a steep learning curve.
for (MyFile in list.files()){
print(MyFile)
## do other stuff
}
TL;DR: I would like to generate a markdown with the plot of normalized count for a list of gene. As this list is quite long (> 100 genes), I would like to generate a "grid" of 4x4 graph on one figure page for the first 16 genes, then the same for genes 17 to 32, etc...until the end of the list. Currently, my code is only displaying 16 gene, even tho I run the grid.arrange command INSIDE the loop (it worked when I used the "plot" fonction inside the loop but it displays only one graph per page ofc).
I'm currently doing some RNAseq analysis and I'm looking for differentially expressed gene (DEGs) between two population. To have a more visual representation of DEGs, I would like to plot the normalized count for each population for some genes of interest (GoI).
However the list of these GOI can be quite long (for ex., if I'm focusing on DEG that are coding for membrane protein, I've some 159 candidates). I'm able to plot them using a for loop, with the following code (from a first analysis):
# top gene contains all of Gene of Interest
# group_origin contains the factor used to discriminate cell population to compare
# dds_g is a dds object from DESeq2 and contained counts number I want to plot
for (i in unique(1:length(top_gene))){
gene <- top_gene[i]
d <- plotCounts(dds_g, gene = gene, intgroup = "group_origin", returnData = TRUE)
b <- ggplot(d, aes(x = group_origin, y = count)) +
stat_boxplot(geom = 'errorbar', aes(colour = factor(group_origin)), width = 0.2) +
geom_boxplot(aes(colour = factor(group_origin)), width = 0.2) +
stat_summary(fun.y=mean, geom="point", shape=17, size=1.5, aes(color=factor(group_origin), fill=factor(group_origin))) +
labs (title = paste0(resg_cb_db$symbol[gene],' (',gene,')'), x = element_blank()) +
theme_bw() +
scale_color_manual(values = mycolors) +
theme(text= element_text(size=10),
axis.text.x = element_text(size = 7.5, angle = 45, hjust = 1),
axis.text.y = element_text(size = 10),
legend.position = "none")
plot(b)
}
By using this approach, I'm able to generate (separately) the plot for all the gene ID contained in my "top_gene" vector.
However, the markdown is creating one plot per page which is annoying when you have high number of genes. I would like to group them, e.g., every genes 1 to 16 in the list are plotted together on one page, then its 16 to 32, etc....
I've tried (without sucess) par mfrow. I also tried Grid.extra with the following code (from another analysis example):
for (i in unique(1:length(top_gene))){
gene <- top_gene[i]
d <- plotCounts(dds, gene = gene, intgroup = "cell_type", returnData = TRUE)
b <- ggplot(d, aes(x = cell_type, y = count)) +
stat_boxplot(geom = 'errorbar', aes(colour = factor(cell_type)), width = 0.2) +
geom_boxplot(aes(colour = factor(cell_type)), width = 0.2) +
stat_summary(fun.y=mean, geom="point", shape=17, size=1.5, aes(color=factor(cell_type), fill=factor(cell_type))) +
labs (title = paste0(resg$symbol[gene],' (',gene,')'), x = element_blank()) +
theme_bw() +
scale_color_manual(values = mycolors) +
theme(text= element_text(size=7),
axis.text.x = element_text(size = 7, angle = 45, hjust = 1),
axis.text.y = element_text(size = 6),
legend.position = "none")
plot_list[[gene]] <- b
}
t <- length(plot_list)
x <- seq(1, t, by = 15)
for (i in x){
z <- i+1
if (!is.na(x[z])) {
test_list <- plot_list[c(x[i]:x[z])]
do.call(grid.arrange,test_list)
}
else {
z <- length(plot_list)
test_list <- plot_list[c(x[i]:z)]
do.call(grid.arrange,test_list)
}
}
Which give me an error "Error in x[i]:z : argument NA / NaN"
The idea here was to, for every 16 genes, plot a 4x4 graph page. Of course, when it reach the last "16 group", there are less than 16 genes remaining (unless you have total number of genes that can be divided by 16). So that's why the if loop is there to prevent the error (but it's not working).
Also, I've tried to remove this last part and just try to generate the first "9 x 16 genes" of my list, discarding the last ones. It "works" because I can clearly see the first 16 genes ploted in 4 x 4 but nothing about the rest.
Why plotting inside a for loop using "plot(b)" is working but not using grid.arrange on a list() created inside a for loop too?
Very sorry for my code, I know it's not perfect (I'm still learning all of this) but I hope it's clear enough for you to understand my question...
Thanks!
!EDIT! : solved the first error by adding (i in length(x)). Feel stupid :D. Anyway, it's still only "printing" 16 plots instead of 159...
I think you have some confusion in your for loop. When you say for (i in x) each iteration you get the values in x (1,16,31,46,...) not the index number so when you set z <- i + 1 you get the values (2,17,31,...). This makes your x[i] and x[z] values NA for most values of i and z.
The for loop below should fix the indexing and prevent you from going outside the length of plot_list in your edge case.
for (i in 1:length(x)){ #replaced this line
z <- i+1
if (!is.na(x[z])) {
# changed this to make sure x[z] doesn't show up on two plots
test_list <- plot_list[x[i]:(x[z]-1)]
do.call(grid.arrange,test_list)
}
else {
z <- length(plot_list)
test_list <- plot_list[x[i]:z]
do.call(grid.arrange,test_list)
}
}
I found how to estimate the historical Variance Decomposition for VAR models in R in the below link
Historical Variance Error Decompotision Daniel Ryback
Daniel Ryback presents the result in an excel plot, but I wanted to prepare it with ggplot so I created some lines to get it, nevertheless, the plot I got in ggplot is very different to the one showed by Daniel in Excel. I replicated in excel and got the same result than Daniel so it seems there is an error in the way I am preparing the ggplot. Does anyone have a suggestion to arrive to the excel result?
See below my code
library(vars)
library(ggplot2)
library(reshape2)
this code is run after runing the code developed by Daniel Ryback in the link above to define the HD function
data(Canada)
ab<-VAR(Canada, p = 2, type = "both")
HD <- VARhd(Estimation=ab)
HD[,,1]
ex <- HD[,,1]
ex1 <- as.data.frame(ex) # transforming the HD matrix as data frame #
ex2 <- ex1[3:84,1:4] # taking our the first 2 rows as they are N/As #
colnames(ex2) <- c("Emplyment", "Productivity", "Real Wages", "Unemplyment") # renaming columns #
ex2$Period <- 1:nrow(ex2) # creating an id column #
col_id <- grep("Period", names(ex2)) # setting the new variable as id #
ex3 <- ex2[, c(col_id, (1:ncol(ex2))[-col_id])] # moving id variable to the first column #
molten.ex <- melt(ex3, id = "Period") # melting the data frame #
ggplot(molten.ex, aes(x = Period, y = value, fill = variable)) +
geom_bar(stat = "identity") +
guides(fill = guide_legend(reverse = TRUE))
ggplot version
Excel version
The difference is that ggplot2 is ordering the variable factor and plotting it in a different order than excel. If you reorder the factor before plotting it will put 'unemployment' at the bottom and 'employment' at the top, as in excel:
molten.ex$variable <- factor(molten.ex$variable, levels = c("Unemployment",
"Real Wages",
"Productivity",
"Employment"))
ggplot(molten.ex, aes(x = Period, y = value, fill = variable)) +
geom_bar(stat = "identity", width = 0.6) +
guides(fill = guide_legend(reverse = TRUE)) +
# Making the R plot look more like excel for comparison...
scale_y_continuous(limits = c(-6,8), breaks = seq(-6,8, by = 2)) +
scale_fill_manual(name = NULL,
values = c(Unemployment = "#FFc000", # yellow
`Real Wages` = "#A4A4A4", # grey
Productivity = "#EC7C30", # orange
Employment = "#5E99CE")) + # blue
theme(rect = element_blank(),
panel.grid.major.y = element_line(colour = "#DADADA"),
legend.position = "bottom",
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.key.size = unit(3, "mm"))
Giving:
To roughly match the excel graph in Daniel Ryback's post:
I have a data.frame that I'm trying to plot in a facetted manner with R's ggplot's geom_boxplot:
set.seed(1)
vals <- rnorm(12)
min.vals <- vals-0.5
low.vals <- vals-0.25
max.vals <- vals+0.5
high.vals <- vals+0.25
df <- data.frame(sample=c("c0.A_1","c0.A_2","c1.A_1","c1.A_2","c2.A_1","c2.A_2","c0.B_1","c0.B_2","c1.B_1","c1.B_2","c2.B_1","c2.B_2"),
replicate=rep(c(1,2),6),val=vals,min.val=min.vals,low.val=low.vals,max.val=max.vals,high.val=high.vals,
group=c(rep("A",6),rep("B",6)),cycle=rep(c("c0","c0","c1","c1","c2","c2"),2),
stringsAsFactors = F)
In this example there are two factors which I'd like to facet:
facet.factors <- c("group","cycle")
for(f in 1:length(facet.factors)) df[,facet.factors[f]] <- factor(df[,facet.factors[f]],levels=unique(df[,facet.factors[f]]))
levels.vec <- sapply(facet.factors,function(f) length(levels(df[,f])))
But in other cases I may have only one or more than two factors.
Is there a way to pass to facet_wrap the vector of factors by which to facet and the number of columns?
Here's what I tried, where in addition I created my own colors for each factor level:
library(RColorBrewer,quietly=T)
library(scales,quietly=T)
level.colors <- brewer.pal(sum(levels.vec),"Set2")
require(ggplot2)
ggplot(df,aes_string(x="replicate",ymin="min.val",lower="low.val",middle="val",upper="high.val",ymax="max.val",col=facet.factors,fill=facet.factors))+
geom_boxplot(position=position_dodge(width=0),alpha=0.5,stat="identity")+
facet_wrap(~facet.factors,ncol=max(levels.vec))+
labs(x="Replicate",y="Val")+
scale_x_continuous(breaks=unique(df$replicate))+
scale_color_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+scale_fill_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+
theme_bw()+theme(legend.position="none",panel.border=element_blank(),strip.background=element_blank(),axis.title=element_text(size=8))
which obviously throws this error:
Error in combine_vars(data, params$plot_env, vars, drop = params$drop) :
At least one layer must contain all variables used for facetting
Clearly this works:
ggplot(df,aes_string(x="replicate",ymin="min.val",lower="low.val",middle="val",upper="high.val",ymax="max.val",col=facet.factors,fill=facet.factors))+
geom_boxplot(position=position_dodge(width=0),alpha=0.5,stat="identity")+
facet_wrap(group~cycle,ncol=max(levels.vec))+
labs(x="Replicate",y="Val")+
scale_x_continuous(breaks=unique(df$replicate))+
scale_color_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+scale_fill_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+
theme_bw()+theme(legend.position="none",panel.border=element_blank(),strip.background=element_blank(),axis.title=element_text(size=8))
But it ignores the colors I'm passing and doesn't add the legend, I imagine since I cannot pass a vector to col and fill in aesthetics, and clearly I have to hard code the facetting.
This doesn't work either for the facetting problem:
ggplot(df,aes_string(x="replicate",ymin="min.val",lower="low.val",middle="val",upper="high.val",ymax="max.val",col=facet.factors,fill=facet.factors))+
geom_boxplot(position=position_dodge(width=0),alpha=0.5,stat="identity")+
facet_wrap(facet.factors[1]~facet.factors[2],ncol=max(levels.vec))+
labs(x="Replicate",y="Val")+
scale_x_continuous(breaks=unique(df$replicate))+
scale_color_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+scale_fill_manual(values=level.colors,labels=unname(unlist(sapply(facet.factors,function(f) levels(df[,f])))),name="factor level")+
theme_bw()+theme(legend.position="none",panel.border=element_blank(),strip.background=element_blank(),axis.title=element_text(size=8))
So my questions are:
1. Is there a way to pass a vector to facet_wrap?
2. Is there a way to color and fill by a vector of factors rather by single ones?
We cannot specify two colors for coloring/filling to a single box, I suggested that the faceting variables be pasted together as coloring/filling scale:
df$col.fill <- Reduce(paste, df[facet.factors])
facets of facet_wrap accepts both character vector or a one sided formula:
facet.formula <- as.formula(paste('~', paste(facet.factors, collapse = '+')))
So the code finally looks like this:
ggplot(df,
aes_string(
x = "replicate", ymin = "min.val", ymax = "max.val",
lower = "low.val", middle = "val", upper = "high.val",
col = "col.fill", fill = "col.fill"
)) +
geom_boxplot(position = position_dodge(width = 0),
alpha = 0.5,
stat = "identity") +
facet_wrap(facet.factors, ncol = max(levels.vec)) +
# alternatively: facet_wrap(facet.formula, ncol = max(levels.vec)) +
labs(x = "Replicate", y = "Val") +
scale_x_continuous(breaks = unique(df$replicate)) +
theme_bw() +
theme(
#legend.position = "none",
panel.border = element_blank(),
strip.background = element_blank(),
axis.title = element_text(size = 8)
)
The legend is not displayed because you added legend.position = "none",.
BTW, it would definitely improve readibility if you add some space and line break in you code.
The graph I'm currently trying to make falls a little between two stools. I want to make a histogram that is composed of stacked and labelled boxes. Here's an example of exactly the sort of thing I'm talking about, taken from a recent article in the New York Times:
http://farm8.staticflickr.com/7109/7026409819_1d2aaacd0a.jpg
Is it possible to achieve this using ggplot2?
To amplify the question somewhat, so far what I have is:
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15)
)
ggplot(dfr, aes(x=percent, fill=name)) + geom_bar() +
stat_bin(geom="text", aes(label=name))
...which I'm clearly doing all wrong. Ultimately what I'd ideally like is something along the lines of the manually-modified graph below, with (say) letters A to M filled one shade and N to Z filled another.
http://farm8.staticflickr.com/7116/7026536711_4df9a1aa12.jpg
Here you go!
set.seed(3421)
# added type to mimick which candidate is supported
dfr <- data.frame(
name = LETTERS[1:26],
percent = rnorm(26, mean=15),
type = sample(c("A", "B"), 26, replace = TRUE)
)
# easier to prepare data in advance. uses two ideas
# 1. calculate histogram bins (quite flexible)
# 2. calculate frequencies and label positions
dfr <- transform(dfr, perc_bin = cut(percent, 5))
dfr <- ddply(dfr, .(perc_bin), mutate,
freq = length(name), pos = cumsum(freq) - 0.5*freq)
# start plotting. key steps are
# 1. plot bars, filled by type and grouped by name
# 2. plot labels using name at position pos
# 3. get rid of grid, border, background, y axis text and lables
ggplot(dfr, aes(x = perc_bin)) +
geom_bar(aes(y = freq, group = name, fill = type), colour = 'gray',
show_guide = F) +
geom_text(aes(y = pos, label = name), colour = 'white') +
scale_fill_manual(values = c('red', 'orange')) +
theme_bw() + xlab("") + ylab("") +
opts(panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(),
axis.ticks = theme_blank(), panel.border = theme_blank(),
axis.text.y = theme_blank())