ggplot: adding a label to a geom_line aes_string - r

I have a for loop plotting 3 geom_lines, how do I add a label/legend so they won't all be 3 indiscernible black lines?
methods.list <- list(rwf,snaive,meanf)
cv.list <- lapply(methods.list, function(method) {
taylor%>% tsCV(forecastfunction = method, h=48)
})
gg <- ggplot(NULL, aes(x))
for (i in seq(1,3)){
gg <- gg + geom_line(aes_string( y=sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))))
}
gg + guides(colour=guide_legend(title="Forecast"))
If I don't use a loop, I can use aes instead of that horrible aes_string and then everything works, but I have to write the same code 3 times and replace the loop with this:
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[1]]^2, na.rm=TRUE)), colour=names(cv.list)[1]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[2]]^2, na.rm=TRUE)), colour=names(cv.list)[2]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[3]]^2, na.rm=TRUE)), colour=names(cv.list)[3]))
and then there are nice automatic colors and legend. What am I missing? Why is r being so noob-unfriendly?

The example is not reproducible, (there is no data!) but it seems you have some information in a list cv.list which contains multiple data.frames, and you want to plot some summary statistic of each against a common varaible stored in x.
The simplest method is simply to create a data.frame and plot using the data.frame.
#Create 3 data.frames with data (forecast?)
df <- lapply(1:3, function(group){
summ_stat <- sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))
group <- group
data.frame(summ_stat, group, x = x)
})
#bind the data.frames into a single data.frame
df <- do.call(rbind, df)
#Create the plot
ggplot(data = df, aes(x = x, y = summ_stat, colour = group)) +
geom_line() +
labs(colour = "Forecast")
Note the change of label in the labs argument. This is changing the label of colour which is part of aes.

Related

ggplot in R to add significance asterisk vs control group over multiple variables

I have barplots, but would like to run a Wilcox.test within each "grp1" comparing the bars to the control for that group, and then putting an asterix if it is significant.
I've seen "compare_means" to get the comparisons, but I'm trying to make it automated and not so manual. Would "geom_signif" or "stat_compare_means" do this? Can someone help with this? Thank you very much.
I need the comparison to be made using the full dataset, not just the means (which is only one value per bar). I added a line at the end of the code running one of the comparisons so you can see where I need the p-values from.
y <- c(runif(100,0,4.5),runif(100,3,6),runif(100,4,7))
grp1 <- sample(c("A","B","C","D"),size = 300, replace = TRUE)
grp2 <- rep(c("High","Med","Contrl"),each=100)
dataset <- data.frame(y,grp1,grp2)
means <- aggregate(y~grp1+grp2,data=dataset,mean)
sd <- aggregate(y~grp1+grp2,data=dataset,function(x){sd(x)})
means.all <- merge(sd,means,by=c("grp1","grp2"))
names(means.all)[3:4] <- c("sd","y.mean")
library(ggplot2)
p<- ggplot(means.all, aes(x=grp1, y=y.mean, fill=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))
p
compare_means(y~grp2,data = dataset[dataset$grp1=="A",],method="wilcox.test")
Maybe this is not the optimal way but you can create a list splitting the data and applying the stat_compare_means() function individually at each level of your data. After that you can arrange the plots in one using patchwork:
library(ggplot2)
library(ggpubr)
library(patchwork)
#Split data
List <- split(means.all,means.all$grp1)
#Function for plot
myfun <- function(x)
{
#Ref group
rg <- paste0(unique(x$grp1),'.','Contrl')
#Plot
G <- ggplot(x, aes(x=interaction(grp1,grp2), y=y.mean, fill=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))+
stat_compare_means(ref.group = rg,label = "p.signif",method = "wilcox.test",label.y = 7)+
theme(axis.text.x = element_blank())+
xlab(unique(x$grp1))
return(G)
}
#Apply
Lplot <- lapply(List, myfun)
#Wrap plots
wrap_plots(Lplot,nrow = 1)+plot_layout(guides = 'collect')
Output:
Consider this update that takes the values for asterisks stored in a new dataframe:
#Create p-vals dataset
List2 <- split(dataset,dataset$grp1)
#p-val function
mypval <- function(x)
{
y <- compare_means(y~grp2,data = x,method="wilcox.test")
y <- y[,c('group2', 'group1','p.signif')]
names(y)<-c('grp2','grp1','p.signif')
y <- y[y$grp2=='Contrl',]
y$grp2 <- y$grp1
y <- rbind(y,data.frame(grp2='Contrl',grp1='',p.signif=''))
y$grp1 <- unique(x$grp1)
y$y.mean=7
return(y)
}
#Apply
dfpvals <- lapply(List2, mypval)
df <- do.call(rbind,dfpvals)
#Plot
ggplot(means.all, aes(x=grp1, y=y.mean, fill=grp2,group=grp2))+
geom_bar(stat="identity", color="black",
position=position_dodge()) +
geom_errorbar(aes(ymin=y.mean-sd, ymax=y.mean+sd), width=.2,
position=position_dodge(.9))+
geom_text(data=df,aes(x=grp1, y=y.mean,group=grp2,label=p.signif),
position=position_dodge(0.9))
Output:

Assigning plot to a variable in a loop

I am trying to create 2 line plots.
But I noticed that using a for loop will generate two plots with y=mev2 (instead of a plot based on y=mev1 and another one based on y=mev2).
The code below shows the observation here.
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
library(ggplot2)
# Method 1: Creating plot1 and plot2 without using "for" loop (hard-code)
plot1 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[2])))) + geom_line()
plot2 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[3])))) + geom_line()
# Method 2: Creating plot1 and plot2 using "for" loop
for (i in 1:2) {
y_var <- unlist(as.list(df[i+1]))
assign(paste("plot", i, sep = ""), ggplot(data = df, aes(x=Period, y=y_var)) + geom_line())
}
Seems like this is due to some ggplot()'s way of working that I am not aware of.
Question:
If I want to use Method 2, how should I modify the logic?
People said that using assign() is not an "R-style", so I wonder what's an alternate way to do this? Say, using list?
One possible answer with no tidyverse command added is :
library(ggplot2)
y_var <- colnames(df)
for (i in 1:2) {
assign(paste("plot", i, sep = ""),
ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line())
}
plot1
plot2
You may use aes_string. I hope it helps.
EDIT 1
If you want to stock your plot in a list, you can use this :
Initialize your list :
n <- 2 # number of plots
list_plot <- vector(mode = "list", length = n)
names(list_plot) <- paste("plot", 1:n)
Fill it :
for (i in 1:2) {
list_plot[[i]] <- ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line()
}
Display :
list_plot[[1]]
list_plot[[2]]
For lines in different "plots", you can simplify it with facet_wrap():
library(tidyverse)
df %>%
gather(variable, value, -c(Period)) %>% # wide to long format
ggplot(aes(Period, value)) + geom_line() + facet_wrap(vars(variable))
You can also put it in a loop if necessary and store the results in a list:
# empty list
listed <- list()
# fill the list with the plots
for (i in c(2:3)){
listed[[i-1]] <- df[,-i] %>%
gather(variable, value, -c(Period)) %>%
ggplot(aes(Period, value)) + geom_line()
}
# to get the plots
listed[[1]]
listed[[2]]
Why do you want 2 separate plots? ggplots way to do this would be to get data in long format and then plot.
library(tidyverse)
df %>%
pivot_longer(cols = -Period) %>%
ggplot() + aes(Period, value, color = name) + geom_line()
Here is an alternative approach using a function and lapply. I recognize that you asked how to solve this using a loop. Still, I think it might be useful to consider this approach.
library(ggplot2)
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
myplot <- function(yvar){
plot <- ggplot(df, aes(Period, !!sym(yvar))) + geom_line()
return(plot)
}
colnames <- c("mev1","mev2")
list <- lapply(colnames, myplot)
names(list) <- paste0("plot_", colnames)
# Alternativing naming: names(list) <- paste0("plot", 1:2)
Using this approach you can easily apply your plot function to whatever columns you like. You can specify the columns by name, which may be preferrabe to specifying by position. Plots are saved in a list, and they are named afterwards using the names attribute. In my example I named the plots plot_mev1 and plot_mev2. But you can easily adjust to some other naming. E.g. write names(list) <- paste0("plot", 1:2) to get plot1 and plot2.
Note that I used !!sym() in the ggplot call. This is essentally an alternative to aes_string which was used in the answer of RĂ©mi Coulaud. In this way ggplot understands even in the context of a function or in the context of a loop that "mev1" is a column of your dataset and not just a text string

how to plot multiple plots on ggplots with lapply

library(ggplot2)
x<-c(1,2,3,4,5)
a<-c(3,8,4,7,6)
b<-c(2,9,4,8,5)
df1 <- data.frame(x, a, b)
x<-c(1,2,3,4,5)
a<-c(6,5,9,4,1)
b<-c(9,5,8,6,2)
df2 <- data.frame(x, a, b)
df.lst <- list(df1, df2)
plotdata <- function(x) {
ggplot(data = x, aes(x=x, y=a, color="blue")) +
geom_point() +
geom_line()
}
lapply(df.lst, plotdata)
I have a list of data frames and i am trying to plot the same columns on the same ggplot. I tried with the code above but it seems to return only one plot.
There should be 2 ggplots. one with the "a" column data plotted and the other with the "b" column data plotted from both data frames in the list.
i've looked at many examples and it seems that this should work.
They are both plotted. If you are using RStudio, click the back arrow to toggle between the plots. If you want to see them together, do:
library(gridExtra)
do.call(grid.arrange,lapply(df.lst, plotdata))
If you want them on the same plot, it's as simple as:
ggplot(data = df1, aes(x=x, y=a), color="blue") +
geom_point() +
geom_line() +
geom_line(data = df2, aes(x=x, y=a), color="red") +
geom_point(data = df2, aes(x=x, y=a), color="red")
Edit: if you have several of these, you are probably better off combining them into a big data set while keeping the df of origin for use in the aesthetic. Example:
df.lst <- list(df1, df2)
# put an identifier so that you know which table the data came from after rbind
for(i in 1:length(df.lst)){
df.lst[[i]]$df_num <- i
}
big_df <- do.call(rbind,df.lst) # you could also use `rbindlist` from `data.table`
# now use the identifier for the coloring in your plot
ggplot(data = big_df, aes(x=x, y=a, color=as.factor(df_num))) +
geom_point() +
geom_line() + scale_color_discrete(name="which df did I come from?")
#if you wanted to specify the colors for each df, see ?scale_color_manual instead

ggplot2 multiplot using changing variables

I am trying to create multiple plots using ggplot2 that is then gathered in using multiplot. However, when I try to create X graphs I end up with X of the same graph.
My problem code pretty much boils down to this, asuming df is the dataframe
library(ggplot2)
i = 1
j = 2
xVar = df[[i]]
yVar = df[[j]]
plot1 = ggplot(data = df, aes(xVar, yVar)) + geom_point(shape=1)
i = 1
j = 3
xVar = df[[i]]
yVar = df[[j]]
plot2 = ggplot(data = df, aes(xVar, yVar)) + geom_point(shape=1)
multiplot(plot1,plot2, cols=2)
At this point plot1 is equal to plot2 and I dont understand why.
My full code if interested:
n = 1
columns = colnames(df)
plots = list()
for(i in 3:7)
{
for(j in (i+1):7)
{
if(j < 8 & i < 7) {
xVar = df[[i]]
yVar = df[[j]]
plots[[n]] = ggplot(data = df, aes(x=xVar, y=yVar)) +
geom_point(shape=1) +
labs(x=columns[[i]], y=columns[[j]]) +
theme(axis.title=element_text(size=8))
n = n + 1
}
}
}
multiplot(plotlist = plots, cols=3)
There are lots of things going on here.
First, it is a really, really, really bad idea to use external variables in calls to aes(...). The arguments to aes(...) are evaluated in the context of the data=... argument, so in the context of df in your case. If that fails they are evaluated in the global environment. So it is highly preferable to do something like this:
gg <- data.frame(x=df[[i]],y=df[[j]])
plots[[n]] = ggplot(data = gg, aes(x,y)) +...
Second, ggplot stores the expressions from aes(...) and evaluates them when the plot is rendered (so, during the call to multiplot(...)). All of your plots use variables named xVar and yVar in aes(...). So when these plots are rendered, ggplot uses whatever is stored in those variables at the time - presumably from the last plot definition. That's why all your plots look like the last one. This is the reference to "lazy evaluation" in the other answer.
On the other hand, ggplot evaluates the data=... argument immediately, and stores the dataset as part of the plot definition (in the gtable). So creating different data frames (called gg above), for each plot will work.
Finally, it looks like you are trying to create a pairs plot (every column vs. every other column, more or less). Unless this is a homework assignment, there are much easier ways to do this. You could use ggpairs(...) in the GGally package (which uses grid graphics), or you could do it this way using basic ggplot with facets:
# make up some data
set.seed(1) # for reproducible example
df <- data.frame(matrix(rnorm(700),nc=7))
df[4] <- 1+2*df[3] + rnorm(100)
df[5] <- 3*df[3] - 2*df[4] + rnorm(100)
df[6] <- -10*df[5] + rnorm(100)
# you start here...
gg.pairs <- function(data) { # scatterplot matrix using ggplot facets
require(ggplot2)
require(data.table)
require(reshape2) # for melt(...)
DT <- data.table(melt(cbind(id=1:nrow(data),data),id="id"),key="id")
gg <- DT[DT,allow.cartesian=T]
setnames(gg,c("id","H","x","V","y"))
ggplot(gg[as.integer(gg$H)<as.integer(gg$V),], aes(x,y)) +
geom_point(shape=1) +
facet_grid(V~H, scales="free")
}
gg.pairs(df[3:7])
I think that your problem is in R lazy evaluation. Indeed what happens is that plot1 and plot2 are not created when you assign it but when you call it, and at this moment there is only one copy (the last one) of xVarand yVar and plots are the same
Well, I can't explain what is happening, but a workaround is to use column names instead of columns withaes_string. The following makes two unique plots in multiplot for me, and this change could easily be incorporated into your plot loop.
dat = data.frame(x = rnorm(10), y1 = rnorm(10), y2 = rpois(10, 5))
xVar = names(dat)[1]
yVar = names(dat)[2]
plot1 = ggplot(data = dat, aes_string(xVar, yVar)) + geom_point(shape=1)
yVar = names(dat)[3]
plot2 = ggplot(data = dat, aes_string(xVar, yVar)) + geom_point(shape=1)
multiplot(plot1, plot2, cols=2)

ggplot2 Scatter Plot Labels

I'm trying to use ggplot2 to create and label a scatterplot. The variables that I am plotting are both scaled such that the horizontal and the vertical axis are plotted in units of standard deviation (1,2,3,4,...ect from the mean). What I would like to be able to do is label ONLY those elements that are beyond a certain limit of standard deviations from the mean. Ideally, this labeling would be based off of another column of data.
Is there a way to do this?
I've looked through the online manual, but I haven't been able to find anything about defining labels for plotted data.
Help is appreciated!
Thanks!
BEB
Use subsetting:
library(ggplot2)
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- letters[1:10]
ggplot(data=x, aes(a, b, label=lab)) +
geom_point() +
geom_text(data = subset(x, abs(b) > 0.2), vjust=0)
The labeling can be done in the following way:
library("ggplot2")
x <- data.frame(a=1:10, b=rnorm(10))
x$lab <- rep("", 10) # create empty labels
x$lab[c(1,3,4,5)] <- LETTERS[1:4] # some labels
ggplot(data=x, aes(x=a, y=b, label=lab)) + geom_point() + geom_text(vjust=0)
Subsetting outside of the ggplot function:
library(ggplot2)
set.seed(1)
x <- data.frame(a = 1:10, b = rnorm(10))
x$lab <- letters[1:10]
x$lab[!(abs(x$b) > 0.5)] <- NA
ggplot(data = x, aes(a, b, label = lab)) +
geom_point() +
geom_text(vjust = 0)
Using qplot:
qplot(a, b, data = x, label = lab, geom = c('point','text'))

Resources