Why does R behave differently when parsing parameters of plotting? - r

I am attempting to plot multiple time series variables on a single line chart using ggplot. I am using a data.frame which contains n time series variables, and a column of time periods. Essentially, I want to loop through the data.frame, and add exactly n goem_lines to a single chart.
Initially I tried using the following code, where;
df = data.frame containing n time series variables, and 1 column of time periods
wid = n (number of time series variables)
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
for (i in 1:wid) {
p <- p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
}
ggplotly(p)
However, this only produces a plot of the final time series variable in the data.frame. I then investigated further, and found that following sets of code produce completely different results:
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
i = 1
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 2
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
i = 3
p = p + geom_line(aes(x=df$Time, y=df[,i], color=var.lab[i]))
ggplotly(p)
Plot produced by code above
p <- ggplot() +
scale_color_manual(values=c(colours[1:wid]))
p = p + geom_line(aes(x=df$Time, y=df[,1], color=var.lab[1]))
p = p + geom_line(aes(x=df$Time, y=df[,2], color=var.lab[2]))
p = p + geom_line(aes(x=df$Time, y=df[,3], color=var.lab[3]))
ggplotly(p)
Plot produced by code above
In my mind, these two sets of code are identical, so could anyone explain why they produce such different results?
I know this could probably be done quite easily using autoplot, but I am more interested in the behavior of these two snipits of code.

What you're trying to do is a 'hack' way by plotting multiple lines, but it's not ideal in ggplot terms. To do it successfully, I'd use aes_string. But it's a hack.
df <- data.frame(Time = 1:20,
Var1 = rnorm(20),
Var2 = rnorm(20, mean = 0.5),
Var3 = rnorm(20, mean = 0.8))
vars <- paste0("Var", 1:3)
col_vec <- RColorBrewer::brewer.pal(3, "Accent")
library(ggplot2)
p <- ggplot(df, aes(Time))
for (i in 1:length(vars)) {
p <- p + geom_line(aes_string(y = vars[i]), color = col_vec[i], lwd = 1)
}
p + labs(y = "value")
How to do it properly
To make this plot more properly, you need to pivot the data first, so that each aesthetic (aes) is mapped to a variable in your data frame. That means we need a single variable to be color in our data frame. Hence, we pivot_longer and plot again:
library(tidyr)
df_melt <- pivot_longer(df, cols = Var1:Var3, names_to = "var")
ggplot(df_melt, aes(Time, value, color = var)) +
geom_line(lwd = 1) +
scale_color_manual(values = col_vec)

Related

Represent dataset in column bar in R using ggplot [duplicate]

I have a csv file which looks like the following:
Name,Count1,Count2,Count3
application_name1,x1,x2,x3
application_name2,x4,x5,x6
The x variables represent numbers and the applications_name variables represent names of different applications.
Now I would like to make a barplot for each row by using ggplot2. The barplot should have the application_name as title. The x axis should show Count1, Count2, Count3 and the y axis should show the corresponding values (x1, x2, x3).
I would like to have a single barplot for each row, because I have to store the different plots in different files. So I guess I cannot use "melt".
I would like to have something like:
for each row in rows {
print barplot in file
}
Thanks for your help.
You can use melt to rearrange your data and then use either facet_wrap or facet_grid to get a separate plot for each application name
library(ggplot2)
library(reshape2)
# example data
mydf <- data.frame(name = paste0("name",1:4), replicate(5,rpois(4,30)))
names(mydf)[2:6] <- paste0("count",1:5)
# rearrange data
m <- melt(mydf)
# if you are wanting to export each plot separately
# I used facet_wrap as a quick way to add the application name as a plot title
for(i in levels(m$name)) {
p <- ggplot(subset(m, name==i), aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
ggsave(paste0("figure_",i,".pdf"), p)
}
# or all plots in one window
ggplot(m, aes(variable, value, fill = variable)) +
facet_wrap(~ name) +
geom_bar(stat="identity", show_guide=FALSE)
I didn't see #user20650's nice answer before preparing this. It's almost identical, except that I use plyr::d_ply to save things instead of a loop. I believe dplyr::do() is another good option (you'd group_by(Name) first).
yourData <- data.frame(Name = sample(letters, 10),
Count1 = rpois(10, 20),
Count2 = rpois(10, 10),
Count3 = rpois(10, 8))
library(reshape2)
yourMelt <- melt(yourData, id.vars = "Name")
library(ggplot2)
# Test a function on one piece to develope graph
ggplot(subset(yourMelt, Name == "a"), aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = subset(yourMelt, Name == 'a')$Name)
# Wrap it up, with saving to file
bp <- function(dat) {
myPlot <- ggplot(dat, aes(x = variable, y = value)) +
geom_bar(stat = "identity") +
labs(title = dat$Name)
ggsave(filname = paste0("path/to/save/", dat$Name, "_plot.pdf"),
myPlot)
}
library(plyr)
d_ply(yourMelt, .variables = "Name", .fun = bp)

ggplot why does adding a new element overwrite colour inside loop but not outside

I am writing a function to plot n series of data. My idea was to loop through each series and adding a new gg_smooth for each loop. It works when I "loop" by hand but inserting it into an actual loop overwrites the colour aestetic.
Data I am working with, the idea is to be able to have n number of columns:
data
Using the following lines i get the desired result:
gene_list <- c("tetA", "tet.W.")
gg <- ggplot()
gg <- gg + geom_smooth(data=df_analysis_summed,
aes(x=as.Date(dato), y=!!sym(gene_list[1]), linetype = oua_2, colour = gene_list[1] ),
method="auto", se=F)
gg <- gg + geom_smooth(data=df_analysis_summed,
aes(x=as.Date(dato), y=!!sym(gene_list[2]), linetype = oua_2, colour = gene_list[2] ),
method="auto", se=F)
gg + labs(colour = "gene")
I then try to add the functionality to a loop:
plot_genes_scat_smooth <- function (df,gene_list) {
plot <- ggplot()
for (gene_index in 1:length(gene_list)) {
print(gene_index)
print(gene_list[gene_index])
plot <- plot +
geom_smooth(data=df, aes(x=as.Date(dato), y=!!sym(gene_list[gene_index]), linetype=oua_2, colour = gene_list[gene_index]), method="auto", se=F)#+
#geom_point(data=df,aes(x=as.Date(dato), y=!!sym(gene), colour = gene, shape = oua_2))
}
plot
}
genes = c("tetA", "tet.W.")
plot_genes_scat_smooth(df_analysis_summed,gene_list = genes)
Using the function I get the following result:
It would seem that the colour aes of the first line is overwritten by the second call when doing trying to implement it as a function. How can that be?
Instead of trying to add multiple lines, one-by-one onto the plot. Ggplot2 works best with the data in a long format. In this case the data is pivoted so that there one column for gene type and one column for the corresponding value.
#create some data
set.seed(1)
dato<-seq.Date(as.Date("2018-04-25"), length.out = 14, by="1 day")
oua_2<-rep(c(0, 1), 7)
tetA<-rnorm(14, 0.04, 0.02)
tet.W<-rnorm(14, 0.2, 0.02)
df_analysis_summed <- data.frame(dato, oua_2, tetA, tet.W)
#convert the data frame to long
library(tidyr)
df_analysis_long <- df_analysis_summed %>% pivot_longer(starts_with("tet"), names_to = "genes", values_to = "value")
#function to plot
plot_genes_scat_smooth_long <- function (df) {
plot <- ggplot()
plot <- plot +
geom_smooth(data=df_analysis_long, aes(x=as.Date(dato), y=value, linetype=as.factor(oua_2), colour = genes), method="auto", se=F)
plot
}
plot_genes_scat_smooth_long(df_analysis_long)

Adding point to a facet

First, the libraries
library(tidyr)
library(leaps)
library(ggplots2)
library(ggdark)
The value of the model
set.seed(1)
X = rnorm(100)
e = rnorm(100)
Y = 8 + 7*X + 2.5*X^2 - 9*X^3 + e
Fitting
data.all = data.frame(Y,X)
regfit.full = regsubsets(Y~poly(X,10,raw=T), data=data.all, nvmax=10)
(reg.summary = summary(regfit.full))
Then I get the minimum value for each variables
(reg.min.cp = which.min(reg.summary$cp))
(reg.min.bic = which.min(reg.summary$bic))
(reg.min.adjr2 = which.min(reg.summary$adjr2))
Creating the data frame for plot
df = data.frame(reg.summary$cp, reg.summary$bic, reg.summary$adjr2)
df$rownum = 1:nrow(df)
Reshaping the data frame
molten = df %>% gather(variable, value, reg.summary.cp:reg.summary.adjr2 )
Plotting with facets
(lp = molten %>% ggplot(data=.) +
aes(x=rownum, y=value) +
geom_line(col="black") +
geom_point(data=molten, aes(xint=reg.min.adjr2, z="reg.summary.adjr2", col="red")) + # this is where I got the wrong plot
facet_wrap(~variable, scales="free_y")
)
And it shows wrong. What I expect is that the geom_point(data=molten, aes(xint=reg.min.adjr2, z="reg.summary.adjr2", col="red")) will just add the reg.min.adjr2 to the facet reg.summary.adjr2 and only one point.
How to make it in that way?
I got some idea here from these two SO:
How to add different lines for facets
Add a segment only to one facet using ggplot2
What I did is to create first a new data frame for the min values for cp , bic, and adjr2. And then add the points to the main plot.
I make sure that the value for x will be the rownum and the y are the min values. I also added a parameter variable to min_plot to make sure that it will be added to the right facet.
min_plot = data.frame(
rownum=c(reg.min.cp, reg.min.bic, reg.min.adjr2),
y = c(reg.summary$cp[reg.min.cp], reg.summary$bic[reg.min.bic], reg.summary$adjr2[reg.min.adjr2]),
variable=c("reg.summary.cp", "reg.summary.bic", "reg.summary.adjr2"))
(lp = molten %>% ggplot(data=.)
+ aes(x=rownum, y=value)
+ geom_line(col="black")
+ facet_wrap(~variable, scales="free_y")
+ geom_point(data = min_plot, aes(x=rownum, y=y), col="red")
)

Cannot overlay multiple stat_function with ggplot2

I have a table with a binning variable VAR2_BY_NS_BIN and an x-y data pair (MP_BIN,CORRECT_PROP). I want to plot the data point binned, and also draw a different line for each bin using stat_function, taking a different reference each time using the for loop.
test_tab <- data.table(VAR2_BY_NS_BIN=c(0.0005478, 0.0005478, 0.002266, 0.002266, 0.006783, 0.006783, 0.020709, 0.020709, 0.142961, 0.142961),
MP_BIN=rep(c(0.505, 0.995), 5),
CORRECT_PROP=c(0.5082, 0.7496, 0.5024, 0.8627, 0.4878, 0.9368, 0.4979, 0.9826, 0.4811, 0.9989))
VAR2_BIN <- sort(unique(test_tab$VAR2_BY_NS_BIN)) #get unique bin values
LEN_VAR2_BIN <- length(VAR2_BIN) #get number of bins
col_base <- c("#FF0000", "#BB0033", "#880088", "#3300BB", "#0000FF") #mark bins with different colours
p <- ggplot(data = test_tab)
for (i in 1:LEN_VAR2_BIN) {
p <- p + geom_point(data = test_tab[test_tab$VAR2_BY_NS_BIN==VAR2_BIN[i],],
aes(x = MP_BIN, y = CORRECT_PROP),
col = col_base[i],
alpha = 0.5) +
stat_function(fun = function(t) {VAR2_BIN[i]*(t-0.5)+0.5}, col = col_base[i])
}
p <- p + xlab("MP") + ylab("Observed proportion")
print(p)
The above code (a reproducible example), however, always returns a plot with only the last stat_function line drawn (which is the 5th line in the above case).
The following code (without using the for loop) works, but I in fact have a large number of bins so it is not very feasible...
p <- p + stat_function(fun = function(t) {VAR2_BIN[1]*(t-0.5)+0.5}, col = col_base[1])
p <- p + stat_function(fun = function(t) {VAR2_BIN[2]*(t-0.5)+0.5}, col = col_base[2])
p <- p + stat_function(fun = function(t) {VAR2_BIN[3]*(t-0.5)+0.5}, col = col_base[3])
p <- p + stat_function(fun = function(t) {VAR2_BIN[4]*(t-0.5)+0.5}, col = col_base[4])
p <- p + stat_function(fun = function(t) {VAR2_BIN[5]*(t-0.5)+0.5}, col = col_base[5])
Thanks in advance!
You don't need a for loop or stat_function. To plot the points, just map MP_BIN and CORRECT_PROP to x and y and the points can be plotted with a single call to geom_point. For the lines, you can create the necessary values on the fly (as done in the code below) and plot those with geom_line.
library(tidyverse)
ggplot(test_tab %>% mutate(model=VAR2_BY_NS_BIN*(MP_BIN - 0.5) + 0.5),
aes(x=MP_BIN, colour=factor(VAR2_BY_NS_BIN))) +
geom_point(aes(y=CORRECT_PROP)) +
geom_line(aes(y=model)) +
labs(colour="VAR2_BY_NS_BIN") +
guides(colour=guide_legend(reverse=TRUE))
In terms of the problem you were having with the for loop, what's going on is that ggplot doesn't actually evaluate the loop variable (i) until you print the plot. The value of i is 5 at the end of the loop when the plot is printed, so that's the only line you get. You can find several questions related to this issue on Stack Overflow. Here's one of them.

How to support loop drawing in ggplot2?

data <- data.frame(a=1:10, b=1:10 * 2, c=1:10 * 3)
library(ggplot2)
p <- ggplot(NULL, aes(x = 1:10))
# Using for loop will cause the plot only to draw the last line.
for (i in names(data)){
p <- p + geom_line(aes(y = data[[i]], colour = i))
}
# Lines below works fine.
# p <- p + geom_line(aes(y = data[["a"]], colour = "a"))
# p <- p + geom_line(aes(y = data[["b"]], colour = "b"))
# p <- p + geom_line(aes(y = data[["c"]], colour = "c"))
print(p)
Why loop plotting doesn't work as what we expected?
Is this a lazy plotting method?
You don't actually have to loop to get your lines. You just need to reshape your data and actually include x in your data frame. Your data is wide, and ggplot2 likes long data. This is how you can easily make multiple lines in a single plot.
As an aside, your method doesn't work as you are replacing p each time you iterate, ending up with only the endpoint of the loop.
library(ggplot2)
library(tidyr)
data <- data.frame(x = 1:10, a=1:10, b=1:10 * 2, c=1:10 * 3)
df <- gather(data, name, value, -x)
ggplot(df, aes(x = x, y = value, color = name)) +
geom_line()

Resources