R GGPLOT2 lapply and function not finding object? - r

I hope I can get a contextual clue as to what may be wrong here without providing data frame, but can if necessary, but ultimately I want to utilize lapply to create multiple boxplots across multiple Ys and same X, but get the following error, but Termed is definitely in my CMrecruitdat data.frame:
Error in aes_string(x = Termed, y = RecVar, fill = Termed) :
object 'Termed' not found
RecVar <- CMrecruitdat[,c("Req.Open.To.System.Entry", "Req.Open.To.Hire", "Tenure")]
BP <- function (RecVar){
require(ggplot2)
ggplot(CMrecruitdat, aes_string(x=Termed, y=RecVar, fill=Termed))+
geom_boxplot()+
guides(fill=false)
}
lapply(RecVar, FUN=BP)

If you use aes_string, you should pass strings rather than vectors and use strings for all your fields.
RecVar <- CMrecruitdat[,c("Termed", "Req.Open.To.System.Entry", "Req.Open.To.Hire", "Tenure")]
BP <- function (RecVar){
require(ggplot2)
ggplot(RecVar, aes_string(x="Termed", y=RecVar, fill="Termed"))+
geom_boxplot()+
guides(fill=false)
}
lapply(names(RecVar), FUN=BP)

Related

Overlaying trials in separate files onto one ggplot graph

I am trying to plot one graph with multiple trials (from separate text files). In the below case, I am plotting the "place" variable with the "firing rate" variable, and it works when I use ggplot on its own:
a <- read.table("trial1.txt", header = T)
library(ggplot2)
ggplot(a, aes(x = place, y = firing_rate)) + geom_point() + geom_path()
But when I try to create a for loop to go through each trial file in the folder and plot it on the same graph, I am having issues. This is what I have so far:
files <- list.files(pattern=".txt")
for (i in files){
p <- lapply(i, read.table)
print(ggplot(p, aes(x = place, y = firing_rate)) + geom_point() + geom_path())
}
It gives me a "Error: data must be a data frame, or other object coercible by fortify(), not a list" message. I am a novice in R so I am not sure what to make of that.
Thank you in advance for the help!
In general avoiding loops is the best adivce in R. Since you are using ggplot you may be interested in using the map_df function from tidyverse:
First create a read function and include the filename as a trial lable:
readDataFile = function(x){
a <- read.table(x, header = T)
a$trial = x
return(a)
}
Next up map_df:
dataComplete = map_df(files, readDataFile)
This runs our little function on each file and combines them all to a single data frame (of course assuming they are compatible in format).
Finally, you can plot almost as before but can distinguish based on the trial variable:
ggplot(dataComplete, aes(x = place, y = firing_rate, color=trial, group=trial)) + geom_point() + geom_path()

Using Reduce to add layers to a ggplot

I have question similar to this one about the use of multiple dataframes for plotting a ggplot. I would like to create a base plot and then add data using a list of dataframes (rationale/usecase described below).
library(ggplot2)
# generate some data and put it in a list
df1 <- data.frame(p=c(10,8,7,3,2,6,7,8),v=c(100,300,150,400,450,250,150,400))
df2 <- data.frame(p=c(10,8,6,4), v=c(150,250,350,400))
df3 <- data.frame(p=c(9,7,5,3), v=c(170,200,340,490))
l <- list(df1,df2,df3)
#create a layer-adding function
addlayer <-function(df,plt=p){
plt <- plt + geom_point(data=df, aes(x=p,y=v))
plt
}
#for loop works
p <- ggplot()
for(i in l){
p <- addlayer(i)
}
#Reduce throws and error
p <- ggplot()
gg <- Reduce(addlayer,l)
Error in as.vector(x, mode) :
cannot coerce type 'environment' to vector of type 'any'
Called from: as.vector(e2)
In writing out this example I realize that the for loop is not a bad option but wouldn't mind the conciseness of Reduce, especially if I want to chain several functions together.
For those who are interested my use case is to draw a number of unconnected lines between points on a map. From a reference dataframe I figured the most concise way to map was to generate a list of subsetted dataframes, each of which corresponds to a single line. I don't want them connected so geom_path is no good.
This seems to work,
addlayer <-function(a, b){
a + geom_point(data=b, aes(x=p,y=v))
}
Reduce(addlayer, l, init=ggplot())
Note that you can also use a list of layers,
ggplot() + lapply(l, geom_point, mapping = aes(x=p,y=v))
However, neither of those two strategies is to be recommended; ggplot2 is perfectly capable of drawing multiple unconnected lines in a single layer (using e.g. the group argument). It is more efficient, and cleaner code.
names(l) = 1:3
m = ldply(l, I)
ggplot(m, aes(p, v, group=.id)) + geom_line()

Plotting inside function: subset(df,id_==...) gives wrong plot, df[df$id_==...,] is right

I have a df with multiple y-series which I want to plot individually, so I wrote a fn that selects one particular series, assigns to a local variable dat, then plots it. However ggplot/geom_step when called inside the fn doesn't treat it properly like a single series. I don't see how this can be a scoping issue, since if dat wasn't visible, surely ggplot would fail?
You can verify the code is correct when executed from the toplevel environment, but not inside the function. This is not a duplicate question. I understand the problem (this is a recurring issue with ggplot), but I've read all the other answers; this is not a duplicate and they do not give the solution.
set.seed(1234)
require(ggplot2)
require(scales)
N = 10
df <- data.frame(x = 1:N,
id_ = c(rep(20,N), rep(25,N), rep(33,N)),
y = c(runif(N, 1.2e6, 2.9e6), runif(N, 5.8e5, 8.9e5) ,runif(N, 2.4e5, 3.3e5)),
row.names=NULL)
plot_series <- function(id_, envir=environment()) {
dat <- subset(df,id_==id_)
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
# Unsuccessfully trying the approach from http://stackoverflow.com/questions/22287498/scoping-of-variables-in-aes-inside-a-function-in-ggplot
p$plot_env <- envir
plot(p)
# Displays wrongly whether we do the plot here inside fn, or return the object to parent environment
return(p)
}
# BAD: doesn't plot geom_step!
plot_series(20)
# GOOD! but what's causing the difference?
ggplot(data=subset(df,id_==20), mapping=aes(x,y), color='red') + geom_step()
#plot_series(25)
#plot_series(33)
This works fine:
plot_series <- function(id_) {
dat <- df[df$id_ == id_,]
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
return(p)
}
print(plot_series(20))
If you simply step through the original function using debug, you'll quickly see that the subset line did not actually subset the data frame at all: it returned all rows!
Why? Because subset uses non-standard evaluation and you used the same name for both the column name and the function argument. As jlhoward demonstrates above, it would have worked (but probably not been advisable) to have simply used different names for the two.
The reason is that subset evaluates with the data frame first. So all it sees in the logical expression is the always true id_ == id_ within that data frame.
One way to think about it is to play dumb (like a computer) and ask yourself when presented with the condition id_ == id_ how do you know what exactly each symbol refers to. It's ambiguous, and subset makes a consistent choice: use what's in the data frame.
Notwithstanding the comments, this works:
plot_series <- function(z, envir=environment()) {
dat <- subset(df,id_==z)
p <- ggplot(data=dat, mapping=aes(x,y), color='red') + geom_step()
p$plot_env <- envir
plot(p)
# Displays wrongly whether we do the plot here inside fn, or return the object to parent environment
return(p)
}
plot_series(20)
The problem seems to be the subset is interpreting id_ on the RHS of the == as identical to the LHS, to this is equivalent to subletting on T, which of course includes all the rows of df. That's the plot you are seeing.

Passing through data frames into functions and into ggplot by column

I'm trying to do my first function in R. I have a dataframe of inderminate columns, and I want to create a ggplot of each set of columns. For example, columns, 1&2, 1&3, 1&4 etc.
However, when I try the following function I get the object not found error, but only when we get the the ggplot portion.
Thanks,
BrandPlot=function(Brand){
NoCol=ncol(Brand)
count=2
while (count<=NoCol){
return(ggplot(Brand, aes(x=Brand[,1], y=Brand[,count]))+geom_point())
count=(count+1)
}
}
To clarify,
I'm trying to get the effect (also, I plan on adding additional things like geom_smooth() but I want to get it working first
ggplot(Brand, aes(x=Brand[,1], y=Brand[,2]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,3]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,4]))+geom_point
ggplot(Brand, aes(x=Brand[,1], y=Brand[,5]))+geom_point
(also, I plan on adding additional things like geom_smooth() ) but I want to get it working first
Per the note above, something like this may be what you're looking for...
brandplot <- function(x){
require(reshape2)
require(ggplot2)
x_melt <- melt(x, id.vars = names(x)[1])
ggplot(x_melt,
aes_string(x = names(x_melt)[1],
y = 'value',
group = 'variable')) +
geom_point() +
facet_wrap( ~ variable)
}
dat <- data.frame(a = sample(1:10, 25, T),
b = sample(20:30, 25, T),
c = sample(40:50, 25, T))
brandplot(dat)
[Note: #maloneypatr's solution is a better way to use ggplot for your application].
To answer your question directly, there are a couple of problems.
Your function returns after the first run through the loop (e.g., count=2), so you will never get more than one plot from this.
ggplot evaluates arguments to aes(...) in the context of the data frame defined in data=..., so it is looking for something like Brand$Brand (e.g., a column named Brand in the dataframe Brand). Since there is no such column, you get the Object not found error.
The following code will generate a series of n-1 plots where n = ncol(Brand).
BrandPlot=function(Brand){
for (count in 2:ncol(Brand)){
ggp <- ggplot(Brand, aes_string(x=names(Brand)[1], y=names(Brand)[count]))
ggp <- ggp + geom_point()
ggp <- ggp + ggtitle(paste(names(Brand)[count], " vs. ", names(Brand)[1]))
plot(ggp)
}
}

ggploting multiple graphs from a data list

I would like to do something along the lines of this post: R: saving ggplot2 plots in a list
The problem is I can't get it to work. I seem to be able to get the individual graphs but the facet_wrap throws out an error. I would be content with just outputting all the graphs and then saving them to disk as a jpg or something, so I can scroll through them later.
for(n in 1:5){
pdata <- data.frame(mt1[n])
library(ggplot2)
p <-ggplot(pdata, aes(x=variable, y=value, color=Legend, group=Legend))+ geom_line()+ facet_wrap(~ color)
}
Link to a dput of the data : mt1
Edit:
Added the whole correct file, its a bit long
If we omit the facet error due to a missing variable in your data frames, you can generate and save your plots in different files this way using ggsave :
for(n in 1:5){
pdata <- data.frame(mt1[n]) # better to use mt1[[n]]
p <-ggplot(pdata, aes(x=variable, y=value, color=Legend, group=Legend))+ geom_line()
ggsave(paste0("plot",n,".jpg"), p)
}
Some suggestions for improvement:
First, as #Dason points out, your library(ggplot2) call should be outside your loop.
Second, if you access an element of list by [.], then the result will still be a list. You should do instead: [[.]] which will render the data.frame(.) call unnecessary (as commented above in the code).
Third is a suggestion to use *apply family of functions. Here, using lapply.
To summarise all these points in code:
require(ggplot2) # load package outside once
o <- lapply(seq_along(mtl), function(idx) {
p <- ggplot(mtl[[idx]], aes(x = variable, y = value,
color = Legend, group = Legend))+ geom_line()
ggsave(paste0("plot",idx,".jpg"), p)
})

Resources