R - Passing data frame to function - r

R beginner here. I'm trying to write a function on my own which has a data frame as an argument and then reorders the data frame and then uses ggplot. I've been struggling with trying to get the function to work and somehow I can't seem to find the answer I'm looking for.
The first code I had was this,
pareto_plot <- function(pareto_data, title, x_label, y_label, filename){
pareto_calc = pareto_data[order(-pareto_data[2]),]
colnames(pareto_calc) = c("sku", "volume")
pareto_calc$sku_perc = 1/length(pareto_calc$sku)
pareto_calc$sku_cum = cumsum(pareto_calc$sku_perc)
pareto_calc$vol_perc = pareto_calc$volume/sum(pareto_calc$volume)
pareto_calc$vol_cum = cumsum(pareto_calc$vol_perc)
ggplot(pareto_calc, aes(x=pareto_data$sku_cum, y=pareto_data$vol_cum)) + geom_line(col="blue") +
geom_line(y=0.8, col="red") +geom_line(x=0.2, col="red") +
ggtitle(title) + ylab(y_label) + xlab(x_label)
ggsave(paste(filename,".png", sep=""))
}
When I used the above code, I got an error,
Error in eval(expr, envir, enclos) : object 'pareto_calc' not found
I then changed the code to make use of data as i saw that a lot of examples online made use of it as an argument. My modified code was now,
pareto_plot <- function(data, title, x_label, y_label, filename){
pareto_data = data
pareto_data[order(-pareto_data[2]),]
colnames(pareto_data) = c("sku", "volume")
pareto_data$sku_perc = 1/length(pareto_data$sku)
pareto_data$sku_cum = cumsum(pareto_data$sku_perc)
pareto_data$vol_perc = pareto_data$volume/sum(pareto_data$volume)
pareto_data$vol_cum = cumsum(pareto_data$vol_perc)
ggplot(pareto_data, aes(x=pareto_data$sku_cum, y=pareto_data$vol_cum)) + geom_line(col="blue") +
geom_line(y=0.8, col="red") +geom_line(x=0.2, col="red") +
ggtitle(title) + ylab(y_label) + xlab(x_label)
ggsave(paste(filename,".png", sep=""))
}
With this code, I now get the error,
Error in exists(name, envir = env, mode = mode) :
argument "env" is missing, with no default
Any help will be greatly appreciated. Thanks in advance! :)

When you make a function, it is often easiest to write the code first, without making it a function, until you are sure it works. Then wrap it as a function.
set.seed(33)
df <- data.frame(V1 = runif(10),
V2 = rnorm(10))
pareto_plot <- function(data, title, x_label, y_label, filename){
pareto_data <- data[order(-data[2]),] #you forgot to assign it
names(pareto_data) <- c("sku", "volume")
pareto_data$sku_perc <- 1/length(pareto_data$sku)
pareto_data$sku_cum <- cumsum(pareto_data$sku_perc)
pareto_data$vol_perc <- pareto_data$volume/sum(pareto_data$volume)
pareto_data$vol_cum <- cumsum(pareto_data$vol_perc)
ggplot(pareto_data, aes(x=sku_cum, y=vol_cum)) + geom_line(color="blue") +
geom_line(y=0.8, col="red") +geom_line(x=0.2, col="red") +
ggtitle(title) + ylab(y_label) + xlab(x_label)
ggsave(paste(filename,".png", sep=""))
}

Related

How to specify an arbitrary amount of variables in aes in a generic function in R?

I am making a shiny application where the user specifies the independent variables and as a result shiny displays a time series plot with plotly, where on-however each point shows the selected parameters.
If I know the exact number of variables that the user selects, I am able to construct the time series plot without a problem. Let's say there are 3 parameters chosen:
ggp <- ggplot(data = data.depend(), aes(x = Datum, y = y, tmp1 = .data[[input$Coockpit.Dependencies.Undependables[1]]], tmp2 = .data[[input$Coockpit.Dependencies.Undependables[2]]], tmp3 = .data[[input$Coockpit.Dependencies.Undependables[3]]])) +
geom_point()
ggplotly(ggp)
where data.depend() looks like
and the selected parameters are stored in a character vector
So the problem is that for each parameter I want to include in the tooltip, I have to hard code it in the aes function as tmpi = .data[[input$Coockpit.Dependencies.Undependables[i]]]. I would however like to write generic function that handles any amount of selected parameters. Any comment suggestions are welcome.
EDIT:
Below a minimal working example:
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
ggp <- ggplot(data = data.dummy, aes(x = Datum, y = y, tmp1 = .data[[ChosenParams[1]]], tmp2 = .data[[ChosenParams[2]]], tmp3 = .data[[ChosenParams[3]]])) + geom_point()
ggplotly(ggp)
Result:
So this works at the "cost" of me knowing the user is choosing three parameters and therefore I write in aes tmpi = .data[[ChosenParams[i]]]; i=1:3. I am interested in a solution with the same result but where I don't have to write tmpi = .data[[ChosenParams[i]]] i-number of times
Thank you!
One solution is to use eval(parse(...)) to create the code for you:
library(ggplot2)
library(plotly)
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
ggp <- eval(parse(text = paste0("ggplot(data = data.dummy, aes(x = Datum, y = y, ",
paste0("tmp", seq_along(ChosenParams), " = .data[[ChosenParams[", seq_along(ChosenParams), "]]]", collapse = ", "),
")) + geom_point()"
)
))
ggplotly(ggp)
Just note that this is not very efficient and in some cases it is not advised to use it (see What specifically are the dangers of eval(parse(...))?). There might also be a way to use quasiquotation in aes(), but I am not really familiar with it.
EDIT: Added a way to do it with quasiquotation.
I had a look a closer look at quasiquotations in aes() and found a nicer way to do it using syms() and !!!:
data.dummy <- data.frame(Charge = c(1,2,3,4,5), Datum = c(as.Date("2020-01-01"),as.Date("2020-01-02"),as.Date("2020-01-03"),as.Date("2020-01-04"),as.Date("2020-01-05")), y = c(4,5,6,4,5), ZuluftTemperatur = c(52,51,54,58,49), Durchflussgeschwindigkeit = c(690, 716,722,710,801), ZuluftFeuchtigkeit= c(3.9,4.1,3.8,3.0,4.9))
ChosenParams <- c("ZuluftTemperatur", "ZuluftFeuchtigkeit", "Durchflussgeschwindigkeit")
names(ChosenParams) <- paste0("tmp", seq_along(ChosenParams))
ChosenParams <- syms(ChosenParams)
ggp <- ggplot(data = data.dummy, aes(x = Datum, y = y, !!!ChosenParams)) + geom_point()
ggplotly(ggp)

ggplot in a function: variable not found

I have an issue trying to create a function to creat a plot using ggplot. Here is some code:
y1<- sample(1:30,45,replace = T)
x1 <- rep(rep(c("a1","a2","a3","a4","a5"),3),each=3)
x2 <- rep(rep(c("b1","b2","b3","b4","b5"),3),each=3)
df <- data.frame(y1,x1,x2)
library(Rmisc)
dfsum <- summarySE(data=df, measurevar="y1",groupvars=c("x1","x2"))
myplot <- function(d,v, w,g) {
pd <- position_dodge(.1)
localenv <- environment()
ggplot(data=d, aes(x=v,y=w,group=g),environment = localenv) +
geom_errorbar(data=d,aes(ymin=d$w-d$se, ymax=d$w+d$se,col=d$g), width=.4, position=pd,environment = localenv) +
geom_line(position=pd,linetype="dotted") +
geom_point(data=d,position=pd,aes(col=g))
}
myplot(dfsum,x1,y1,x2)
As I was looking for similar questions, I found that specifying the local environment should solve the issue. However it did not help in my case.
Thank you
Preliminary Note
When looking at your data.frame, the group variable does not make any sense, as it is perfectly confounded with the x variable. Hence I adapted your data a bit, to show a full example:
Data
library(Rmisc)
library(ggplot2)
d <- expand.grid(x1 = paste0("a", 1:5),
x2 = paste0("b", 1:5))
d <- d[rep(1:NROW(d), each = 3), ]
d$y1 <- rnorm(NROW(d))
dfsum <- summarySE(d, measurevar = "y1", groupvars = paste0("x", 1:2))
Plot Function
myplot <- function(mydat, xvar, yvar, grpvar) {
mydat$ymin <- mydat[[yvar]] - mydat$se
mydat$ymax <- mydat[[yvar]] + mydat$se
pd <- position_dodge(width = .5)
ggplot(mydat, aes_string(x = xvar, y = yvar, group = grpvar,
ymin = "ymin", ymax = "ymax", color = grpvar)) +
geom_errorbar(width = .4, position = pd) +
geom_point(position = pd) +
geom_line(position = pd, linetype = "dashed")
}
myplot(dfsum, "x1", "y1", "x2")
Explanation
Your problem occurs because the scope of x1 x2 and y1 was ambiguous. As you defined these variables also at the top environmnet, R did not complain in the first place. If you had added a rm(x1, x2, y1)in your original code right after you created your data.frame you would have seen the problem already eralier.
ggplot looks in the data.frame you provide for all the variables you want to map to certain aesthetics. If you want to create a function, where you specify the name of the aesthatics as arguments, you should use aes_string instead of aes, as the former expects a string giving the name of the variable rather than the variable itself.
With this approach however, you cannot do calculations on the spot, so you need to create the variables yminand ymaxbeforehand in your data.frame. Furthermore, you do not need to provide the data argument for each geom if it is the same as provided to ggplot.
I've got it plotting something, let me know if this isn't the expected output.
The changes I've made to the code to get it working are:
Load the ggplot2 library
Remove the d$ from the geom_errorbar call to w and g, as these are function arguments rather than columns in d.
I've also removed the data=d calls from all layers except the main ggplot one as these aren't necessary.
library(ggplot2)
myplot <- function(d,v, w,g) {
pd <- position_dodge(.1)
localenv <- environment()
ggplot(data=d, aes(x=v,y=w,group=g),environment = localenv) +
geom_errorbar(aes(ymin=w-se, ymax=w+se,col=g), width=.4,
position=pd,environment = localenv) +
geom_line(position=pd,linetype="dotted") +
geom_point(position=pd,aes(col=g))
}
myplot(dfsum,x1,y1,x2)

how to pass the custom transformation parameters to ggplot2::scale_x_continuous

I am trying to transform my data with a customized function biexp:
ggplot(df, aes(x = x)) + geom_histogram() + scale_x_continuous(trans = "biexp", myArg = 4)
How ever continuous_scale doesn't seem to support the customized argument myArg
Error in continuous_scale(c("x", "xmin", "xmax", "xend", "xintercept"), :
unused argument (myArg = 4)
Here is the definition of biexp_trans
biexp_trans <- function(myArg = 4.5){
trans <- biexp(myArg = myArg)
inv <- biexp(myArg = myArg, inverse = TRUE)
trans_new("biexp", transform = trans, inverse = inv)
}
Two people have pointed out in the comments that the answer is:
calling trans = biexp_trans(myArg = 4) in scale_x_continuous()

Error parsing dynamically-built ggplot2 code

In the following reproducible example I'm attempting to build a ggplot2 function call dynamically, in order to be able to accommodate unknown number of mixture distribution components. The code produces this error message: Error in parse(text = g) : <text>:8:0: unexpected end of input. What is the problem with the code? (I'm aware of the method of pre-calculating plot data, storing it in a data frame, melting it and supplying it to ggplot2. I would like to explore the option below, as well.) Thank you!
library(ggplot2)
library(scales)
library(RColorBrewer)
library(mixtools)
NUM_COMPONENTS <- 2
set.seed(12345) # for reproducibility
data(diamonds, package='ggplot2') # use built-in data
myData <- diamonds$price
calc.component <- function(x, lambda, mu, sigma) {
lambda * dnorm(x, mean = mu, sd = sigma)
}
overlayHistDensity <- function(data, func) {
# extract 'k' components from mixed distribution 'data'
mix <- normalmixEM(data, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix)
DISTRIB_COLORS <-
suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))
# plot histogram, empirical and fitted densities
g <- "ggplot(data) +\n"
for (i in seq(length(mix$lambda))) {
args <- paste0("args.", i)
assign(args, list(lambda = mix$lambda[i], mu = mix$mu[i],
sigma = mix$sigma[i]))
g <- paste0(g,
"stat_function(fun = func, args = ",
args,
", aes(color = ",
DISTRIB_COLORS[i], ")) +\n")
}
tailStr <-
"geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')"
g <- paste0(g, tailStr)
gr <- eval(parse(text = g))
return (gr)
}
overlayHistDensity(log10(myData), 'calc.component')
As long as you realize you are going about this a hard way...
If you look at the value of g before it is parsed, it is
ggplot(data) +
stat_function(fun = func, args = args.1, aes(color = #E41A1C)) +
stat_function(fun = func, args = args.2, aes(color = #377EB8)) +
geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')
Usually the unexpected end of input message is from unbalanced quotes or parentheses, but you've not (obviously) got that problem here. The problem is in the color specification. Literal hex colors should be specified as strings
ggplot(data) +
stat_function(fun = func, args = args.1, aes(color = "#E41A1C")) +
stat_function(fun = func, args = args.2, aes(color = "#377EB8")) +
geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')
Without the quotes, the hash is a comment character and the rest of the lines (the right parentheses in particular) are not included, and the error you got is given. (Note the syntax highlighting that SO gives on the first code snippet.)
That said, I think you can get what you want without the eval(parse()) approach. In particular, look at aes_string which allows the specification of which variable is used as the aesthetic by the value of a string variable and adding a list of stats or geoms (which can be of un-pre-specified length created using lapply, for example). Also, you seem to be specifying literal colors and then mapping them to just red and blue; possibly you want scale_colour_identity? All this (last paragraph) is more code review and is not what you actually asked about.
You've got several problems:
ggplot's data argument must be a data.frame, not a vector
hex color names starting with # must be quoted, or they'll be interpreted as comments
you must to provide an aes(x = ) mapping
color definitions that are constant do not go in aes
This should work:
overlayHistDensity <- function(data, func) {
# extract 'k' components from mixed distribution 'data'
mix <- normalmixEM(data, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix)
DISTRIB_COLORS <-
suppressWarnings(brewer.pal(NUM_COMPONENTS, "Set1"))
# plot histogram, empirical and fitted densities
g <- "ggplot(as.data.frame(data), aes(x = data)) +\n"
for (i in seq(length(mix$lambda))) {
args <- paste0("args.", i)
assign(args, list(lambda = mix$lambda[i], mu = mix$mu[i],
sigma = mix$sigma[i]))
g <- paste0(g,
"stat_function(fun = func, args = ",
args,
", color = '",
DISTRIB_COLORS[i], "') +\n")
}
tailStr <-
"geom_line(aes(y = ..density..,colour = 'Empirical'),stat = 'density') +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = '', values = c('red', 'blue')) +
theme(legend.position = 'top', legend.direction = 'horizontal')"
g <- paste0(g, tailStr)
gr <- eval(parse(text = g))
return (gr)
}
Like Brian, I'll finish with 2 comments:
This is standard debugging and you shouldn't need an SO post for it. It's essentially several syntax errors and a couple little mistakes. I took your code outside of a function and ran it up through the final g <- paste0 line, and put the g output in a code window and looked for problems. Try to write code that works outside of a function first, then put it in a function.
Seconding Brian's comment, a more natural approach is to not use eval(parse()) and all this pasting. Instead, use aes_string, melt your data so that you can use one stat_function call based on a a grouping variable.

How to get geom_vline and facet_wrap from ggplot2 to work inside a function

I'm using ggplot2 to explore the effects of different military operations on murder rates. To show the effect I draw a vertical line when the operation occurred and a smoothed line of the murder rate before and after the operation.
I've written a facet_wrap plot to show this for a whole bunch of counties. It works beautifully, but when converted to a function I get an error when using a local variable to draw the vertical line.
Here's some example code:
drawTS <- function(df, dates, text) {
p <- ggplot(df, aes(date, murders)) +
facet_wrap(~ county, ncol = 1,
scale="free_y") +
scale_x_date() +
geom_smooth(aes(group = group), se = FALSE)
for(i in 1:length(dates)) {
#If it's not a global variable I get an object not found error
temp[i] <<- dates[i]
p <- p + geom_text(aes(x,y), label = text[i],
data = data.frame(x = dates[i], y = -10),
size = 3, hjust = 1, vjust = 0) +
#Here's the problem
geom_vline(xintercept=temp[i], alpha=.4)
}
p
}
library(ggplot2)
df <- data.frame(date = rep(seq(as.Date("2007/1/01"),
length=36, by='1 month'),4),
murders = round(runif(36*4) * 100),
county = rep(rep(factor(1:4),9),each=4),
group = rep(c(rep(1,6), rep(2,12),rep(3,18))), each=4)
dates <- c(as.Date("2007/6/15"), as.Date("2008/6/15"))
temp <- c()
drawTS(df, dates, c("Op 1","Op 2"))
There's no error with the global variable, but it looks ugly.
If instead of the temp[i] variable I use dates[i] inside geom_vline(), I get this:
Error in NextMethod("[") : object 'i' not found
If I wrap the variable dates[i] in aes(), I get:
Error in eval(expr, envir, enclos) : object 'county' not found
Anybody know how to fix this?
I don't know what is causing the error, but a fix that I could come up with is to replace the for loop with a data frame like this:
date.df<-data.frame(d=dates,t=text)
p <- p + geom_text(aes(x=d,label=t),y=0,
data = date.df,
size = 3, hjust = 1, vjust = 0)
p<-p+geom_vline(aes(xintercept=d),data=date.df,alpha=.4)

Resources