R: Error looping variable in in ggplot2 geom_line - r

I am trying to add multiple lines to a plot using a for loop. Using i as a variable to navigate in my data, for some reason, I get an error in the ggplot part all of a sudden, saying that that object i is not found ("Error in as.name(names(data$SURF)[i]) : object 'i' not found"). It seems like only those variables are accepted, that are linked to the data frame given to ggplot. I have used different variables in ggplots before though, so I don't know why it doesn't work now. This is my code:
library(ggplot2)
#creating object 'data' for example purpose
surf <- list()
day <- c(1:7)
for(i in 1:5){
surf[[i]] <- data.frame(day*i, day-i)
names(surf)[i] <- paste("var",i,sep = "")
colnames(surf[[i]]) <- c("T0", "whatever")
}
data <- list(surf)
names(data)[1] <- "SURF"
df <- data.frame(day)
ret <- ggplot(df, aes(x=day))
for(i in 1:length(names(data$SURF))){
df[,i+1] <- data$SURF[[i]]$T0
colnames(df)[i+1] <- names(data$SURF)[i]
ret <- ret + geom_line(data = df, aes(y=as.name(names(data$SURF)[i]), colour= names(data$SURF)[i]))
}
I managed to solve the problem in a not fully pleasing way, by omitting new variables in the plot part. I am not fully content with this solution though, because I want to keep working from a more automated code. This is the 'dirty" solution:
df <- data.frame(day)
ret <- ggplot(df, aes(x=day))
for(i in 1:length(names(data$SURF))){
df$y <- data$SURF[[i]]$T0
df$name <- names(data$SURF[i])
ret <- ret + geom_line(data = df, aes(y=y, colour= name))
}
I'd be grateful if someone could help me figure out why the use of 'external' variable i does not work in this example.

Thanks for the help. I see, my head was thinking in the wrong direction and way too complicated. Using the melt() function of the reshape2 package really gives me everything I need for this purpose. For anyone interested, here is how I solved my problem:
library(reshape2)
library(ggplot2)
for(i in 1:length(data$SURF)){
df[,i+1] <- data$SURF[[i]]$T0
}
colnames(df) <- c("day", names(data$SURF))
mlt <- melt(data = df, id.vars = "day")
ret <- ggplot(mlt) +
aes(x=day, y=value, group=variable, colour = variable) +
geom_line()

Related

Plot all columns from a data.frame in a subplot with ggplot2

as the title suggest, I want to plot all columns from my data.frame, but I want to do it in a generic way. All my columns are factor.
Here is my code so far:
nums <- sapply(train_dataset, is.factor) #Select factor columns
factor_columns <- train_dataset[ , nums]
plotList <- list()
for (i in c(1:NCOL(factor_columns))){
name = names(factor_columns)[i]
p <- ggplot(data = factor_columns) + geom_bar(mapping = aes(x = name))
plotList[[i]] <- p
}
multiplot(plotList, cols = 3)
where multiplot function came from here: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
And my dataset came from Kaggle (house pricing prediction): https://www.kaggle.com/c/house-prices-advanced-regression-techniques
What I get from my code is the image below, which appears to be the last column badly represented.
This would be the last column well represented:
EDIT:
Using gridExtra as #LAP suggest also doesn't give me a good result. I use this instead of multiplot.
nCol <- floor(sqrt(length(plotList)))
do.call("grid.arrange", c(plotList, ncol=nCol))
but what I get is this:
Again, SaleCondition is the only thing printed and not very well.
PD: I also tried cowplot, same result.
Using tidyr you can do something like the following:
factor_columns %>%
gather(factor, level) %>%
ggplot(aes(level)) + geom_bar() + facet_wrap(~factor, scales = "free_x")

creating a subset of data frame when running a loop

I'm quite new in R, trying to find my why around. I have created a new data frame based on the "original" data frame.
library(dplyr)
prdgrp <- as.vector(mth['MMITCL'])
prdgrp %>% distinct(MMITCL)
When doing this, then the result is a list of Unique values of the column MMITCL. I would like to use this data in a loop sequence that first creates a new subset of the original data and the prints a graph based on this:
#START LOOP
for (i in 1:length(prdgrp))
{
# mth[c(MMITCL==prdgrp[i],]
mth_1 <- mth[c(mth$MMITCL==prdgrp[i]),]
# Development of TPC by month
library(ggplot2)
library(scales)
ggplot(mth_1, aes(Date, TPC_MTD))+ geom_line()
}
# END LOOP
Doing this gives me the following error message:
Error in mth$MMITCL == prdgrp[i] :
comparison of these types is not implemented
In addition: Warning:
I `[.data.frame`(mth, c(mth$MMITCL == prdgrp[i]), ) :
Incompatible methods ("Ops.factor", "Ops.data.frame") for "=="
What am I doing wrong.
If you just want to plot the outputs there is no need to subset the dataframe, it is simpler to just put ggplot in a loop (or more likely use facet_wrap). Without seeing your data it is a bit hard to give you a precise answer. However there are two generic iris examples below - hopefully these will also show where you made the error in sub setting your dataframe. Please let me know if you have any questions.
library(ggplot2)
#looping example
for(i in 1:length(unique(iris$Species))){
g <- ggplot(data = iris[iris$Species == unique(iris$Species)[i], ],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}
#facet_wrap example
g <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
facet_wrap(~Species)
g
However if you need to save the data frames for later use, one option is to put them into a list. If you only need to save the data frame with in the loop you can just remove the list and use whatever variable name you wish.
myData4Later <- list()
for(i in 1:length(unique(iris$Species))){
myData4Later[[i]] <- iris[iris$Species == unique(iris$Species)[i], ]
g <- ggplot(data = myData4Later[[i]],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}

Function for formatting and plotting in R

I am currently trying to create a function that will format my data and properly and return a bar plot that is sorted. Yet for some reason I keep getting this error:
Error in `$<-.data.frame`(`*tmp*`, "Var1", value = integer(0)) :
replacement has 0 rows, data has 3
I have tried debugging it, but have had no luck. I have an example of what I expect down at the bottom. Can anyone spot what I am doing wrong?
x <- rep(c("Mark","Jimmy","Jones","Jones","Jones","Jimmy"),2)
y <- rnorm(12)
df <- data.frame(x,y)
plottingfunction <- function(data, name,xlabel,ylabel,header){
newDf <- data.frame(table(data))
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
newDf$Var1 <- factor(newDf$Var1,order)
colnames(newDf)[1] <- name
plot <- ggplot(newDf, aes(x=name, y=Freq)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(stat="identity", fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingfunction(df$x, "names","xlabel","ylabel","header")
A few comments, your function didn't work, because this part isn't correct:
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
Since we have no idea if there will be any columns in data which has the column name Var1. What looks like happend is when you were trying your code you ran:
newDf <- data.frame(table(df$x))
which immediately renamed your column to Var1, but when you ran your function, the name changed. So to get around this I would recommend being explicit with your column names. In this example, I used the dplyr library to make my life easier. So following your code and logic it would look like this:
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
Then within your ggplot we can use aes_string to refer to the column name of the data frame instead. So then the whole function would look like this:
plottingFunction <- function(data, col_name, xlabel, ylabel, header) {
#' create a dataframe with the data that we're interested in
#' make sure that you preserve the anme of the column that you're
#' counting on...
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
plot <- ggplot(data, aes_string(col_name)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingFunction(df, "x", "xlabel","ylabel","header")
Which would have output like:
I think for your plot having stat="identity" is redundant since you can just use your original data frame rather than having a transformed one.

Object created inside function not found by ggplot

I have a csv of time series data for a number of sites that I produce ggplots for, showing changes in means using the changepoint package. I have written a function that takes the csv, performs some calculations to get the means then loops through the sites producing a plot for each. My problem is that an object created in the for loop isn't found.
A very simplified example is below but produces the same error:
df1 <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-10"),
by = "day"),
site1 = runif(10),
site2 = runif(10),
site3 = runif(10))
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=df2[,2]))+
geom_line(data=df2, aes(x=date, y=df2[,2]))
sname.i<-sname[i]
filename<-paste0(sname.i, "-test-plot.pdf")
ggsave(file=filename, plot)
}
}
example(df1)
The error I get is: " Error in eval(expr, envir, enclos) : object 'df2' not found"
I'm not quite sure what the problem is as I have created similar loops which have worked in the past. If I assign a value to i and step through the code within the loop it works fine. I'm thinking an environment problem? Or is ggsave doing something wiggy? Any help/pointers gratefully received.
Thanks.
You problem is not so much your code, but the implementation of the ggplot2 package. This package uses nonstandard evaluation, and that can seriously mess up your results.
Take a look at the example code at the end of this post. I create in the global environment a data frame called df2 with different values. If I run your code now, you get plots that looks like this:
Note that on the X axis, it uses the correct dates, but the values on the Y axis are the ones from the dataframe df2 that is in the global environment! So the function aes() looks for the data in two different places. If you specify the name of a variable as a symbol (date) the function first looks in the data frame that is specified in the function call. However, an expression like df2[,2] cannot be found in the dataframe, as there is no variable with that name. Due to the way the ggplot2 package is constructed, R will look for that in the global environment instead of the calling environment.
As per wici's comment: Your best option is probably to use the function aes_string(), as this allows you to pass the aes in character form, and this function evaluates expressions in the correct environment :
plot <- ggplot()+
geom_point(data=df2, aes_string(x="date", y=sname[i]))+
geom_line(data=df2, aes_string(x="date", y=sname[i]))
Alternatively, you can get around that by using eval() and parse() like this:
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
aesy <- sname[i]
command <- paste("plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=",aesy,"))+
geom_line(data=df2, aes(x=date, y=",aesy,"))")
eval(parse(text=command))
sname.i<-sname[i]
print(plot)
}
If you try that out with the example script below, you'll see that this time around you get the correct values displayed. Note that this is a suboptimal solution, as most solutions involving eval(). I'd go for aes_string() here.
EXAMPLE SCRIPT
df1 <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-10"),
by = "day"),
site1 = runif(10),
site2 = runif(10),
site3 = runif(10))
df2 <- data.frame(date = seq(as.Date("2014-10-01"), as.Date("2014-10-10"),
by = "day"),
site1 = runif(10,10,20),
site2 = runif(10,10,20),
site3 = runif(10,10,20))
example <- function(df1){
sname <- names(df1)[-1]
for (i in 1:length(sname)){
df2 <- df1[,c(1, 1+i)]
df2$label <- factor(rep("ts", by=length(df2[,1])))
plot <- ggplot()+
geom_point(data=df2, aes(x=date, y=df2[,2]))+
geom_line(data=df2, aes(x=date, y=df2[,2]))
sname.i<-sname[i]
print(plot)
}
}
example(df1)

Histograms using ggplot2 within loop

I would like to create a grid of histograms using a loop and ggplot2. Say I have the following code:
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-NULL
for (i in 1:5){
out[[i]]<-ggplot(df, aes(x=df[,i])) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
Note that all of the plots appear, but that they all have the same mean and shape, despite having set each of the columns of df to have different means.
It seems to only plot the last plot (out[[5]]), that is, the loop seems to be reassigning all of the out[[i]]s with out[[5]].
I'm not sure why, could someone help?
I agree with #GabrielMagno, facetting is the way to go. But if for some reason you need to work with the loop, then either of these will do the job.
library(gridExtra)
library(ggplot2)
df<-matrix(NA,2000,5)
df[,1]<-rnorm(2000,1,1)
df[,2]<-rnorm(2000,2,1)
df[,3]<-rnorm(2000,3,1)
df[,4]<-rnorm(2000,4,1)
df[,5]<-rnorm(2000,5,1)
df<-data.frame(df)
out<-list()
for (i in 1:5){
x = df[,i]
out[[i]] <- ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5)
}
grid.arrange(out[[1]],out[[2]],out[[3]],out[[4]],out[[5]], ncol=2)
or
out1 = lapply(df, function(x){
ggplot(data.frame(x), aes(x)) + geom_histogram(binwidth=.5) })
grid.arrange(out1[[1]],out1[[2]],out1[[3]],out1[[4]],out1[[5]], ncol=2)
I would recommend using facet_wrap instead of aggregating and arranging the plots by yourself. It requires you to specify a grouping variable in the data frame that separates the values for each distribution. You can use the melt function from the reshape2 package to create such new data frame. So, having your data stored in df, you could simply do this:
library(ggplot2)
library(reshape2)
ggplot(melt(df), aes(x = value)) +
facet_wrap(~ variable, scales = "free", ncol = 2) +
geom_histogram(binwidth = .5)
That would give you something similar to this:

Resources