I have run into a strange issue (I am new to R). I have tried creating a function as follows:
library(ggplot2)
median_confidence_interval <- function(x) {
quart_list<-c()
return_data<-data.frame(lower_ci=0,median=0,upper_ci=0)
for(i in 1:1000){
y<-x[as.integer(runif(length(x), min = 1, max = length(x) + 1))]
median<-median(y)
quart_list=c(quart_list,median)
}
return_data$median<-median(quart_list)
return_data$lower_ci<-quantile(quart_list,probs=0.025)
return_data$upper_ci<-quantile(quart_list,probs=0.975)
p <- ggplot()
p <- p + geom_density(aes(x=x)) + geom_density(aes(x=quart_list))
p <- p + geom_vline(aes(xintercept = return_data$median, color='red'))
p <- p + geom_vline(aes(xintercept = return_data$lower_ci, color='blue'))
p <- p + geom_vline(aes(xintercept = return_data$upper_ci, color='green')) + coord_cartesian(xlim = c(min(x),max(x)))
png("density_confidence_internal.png")
plot(p)
dev.off()
return_data
}
In this code I am simply trying to create a plot and save it. Though I am able to execute each of these statements independently, outside the function, but not inside the function. The function compiles without errors but while running the function it says 'quart_list' not found.
If quart_list and return_data are present in the workspace, then I am able to execute the function and get the result. When I clear the workspace and execute the function, I run into the same error while running (not compiling).
Another issue is that when I call the function median_confidence_interval(x), it expects me to only provide 'x' as the argument, it doesn't take something like median_confidence_interval(possum$earconch). Why could that be?
Would someone please be able to point me in some direction?
The environment evaluation of ggplot objects is a little mysterious. However, if you remember that ggplot wants data.frames to be passed to the data argument and the values in aes should be columns of the data.frame you'll generally avoid issues.
To debug things like this, I find it helpful to insert print statements into the function to sort out how far through it I get. (see the commented lines)
Adjusting your function accordingly gives:
median_confidence_interval <- function(x) {
quart_list<-c()
return_data<-data.frame(lower_ci=0,median=0,upper_ci=0)
for(i in 1:1000){
y<-x[as.integer(runif(length(x), min = 1, max = length(x) + 1))]
median<-median(y)
# print ('inside for loop')
quart_list=c(quart_list,median)
}
# print('past for loop')
return_data$median<-median(quart_list)
return_data$lower_ci<-quantile(quart_list,probs=0.025)
return_data$upper_ci<-quantile(quart_list,probs=0.975)
# print('start of ggplot code')
foo=data.frame(q=quart_list, x=x)
p <- ggplot()
p <- p + geom_density(data=foo, aes(x=x)) + geom_density(data=foo, aes(x=q))
# print('past first quart_list reference in ggplot')
p <- p + geom_vline(data=return_data, aes(xintercept = median, color='red'))
p <- p + geom_vline(data=return_data, aes(xintercept = lower_ci, color='blue'))
p <- p + geom_vline(data=return_data, aes(xintercept = upper_ci, color='green')) + coord_cartesian(xlim = c(min(x), max(x)))
png("/tmp/density_confidence_internal.png")
plot(p)
dev.off()
return_data
}
Also, I think #DWin has a good point in his comment!
Related
I am currently writing a function which ultimately returns a ggplot. I would like to offer the user the option to change different aspects of that ggplot, such as the xlab by specifying it in the function. Right now I am using a code like this.
library(tidyverse)
d <- sample_n(diamonds,500)
plot_something <- function(data,x,y,x_axis_name=NULL){
if(is.null(x_axis_name)){
p<-ggplot(d,aes_string(x=x,y=y))+
geom_point()
} else{
p<-ggplot(d,aes_string(x=x,y=y))+
geom_point()+
xlab(x_axis_name)
}
return(p)
}
plot_something(data=d, x="depth", y="price",x_axis_name = "random_name")
This works fine, but as you can see a lot of the code is duplicated, the only difference is the xlab argument. In this case it is not too bad, but my actual function is much more complicated and things get more difficult if I would also allow the user to for example modify the ylab.
So my question is, if there is a more elegant way to modify ggplots inside of a function depending on arguments passed by the user.
Any help is much appreciated!
There is no need for the duplicated code. You could conditionally add layers to a base plot as desired like so:
library(tidyverse)
d <- sample_n(diamonds, 500)
plot_something <- function(data, x, y, x_axis_name = NULL) {
x_lab <- if (!is.null(x_axis_name)) xlab(x_axis_name)
p <- ggplot(d, aes_string(x = x, y = y)) +
geom_point() +
x_lab
return(p)
}
plot_something(data = d, x = "depth", y = "price", x_axis_name = "random_name")
Or using ggplot2s built-in mechanisms to choose defaults you don't even need an if condition but could do:
plot_something <- function(data, x, y, x_axis_name = ggplot2::waiver()) {
p <- ggplot(d, aes_string(x = x, y = y)) +
geom_point() +
xlab(x_axis_name)
return(p)
}
I am trying to create some ggplots automaticly. Here is my working code example for adding stat_functions:
require(ggplot2)
p1 <- ggplot(data.frame(x = c(-2.5, 7.5)), aes(x = x)) + theme_minimal()+
stat_function(fun= function(x){1*x},lwd=1.25, colour = "navyblue") +
stat_function(fun= function(x){2*x},lwd=1.25, colour = "navyblue") +
stat_function(fun= function(x){3*-x},lwd=1.25, colour = "red")
p1
As you can see the stat_functions all use (nearly) the same function just with a different parameter.
Here is what i have tried to write:
f <- function(plot,list){
for (i in 1:length(list)){
plot <- plot + stat_function(fun= function(x){x*list[i]})
}
return(plot)
}
p1 <- ggplot(data.frame(x = c(-2.5, 7.5)), aes(x = x)) + theme_minimal()
p2 <- f(p1,c(1,2,3))
p2
This however doesnt return 3 lines, but only one. Why?
Your question is a bit confusing, because the first plot actually contains some other variable bits, but in your function you have a single stat_summary call for only one variable element.
Anyways. Keep the ggplot main object separate and create a list of additional objects, very easy for example with lapply. Add this list to your main plot as usual.
Check also https://ggplot2-book.org/programming.html
library(ggplot2)
p <- ggplot(data.frame(x = c(-2.5, 7.5)), aes(x = x)) + theme_minimal()
ls_sumfun <- lapply(1:3, function(y){
stat_function(fun= function(x){y*x}, lwd=1.25, colour = "navyblue")
}
)
p + ls_sumfun
Created on 2021-04-26 by the reprex package (v2.0.0)
In R, you can pass functions as arguments. You can also return functions from functions. This might make your code simpler and cleaner.
Here's an example:
p1 <- ggplot(data.frame(x = c(-2.5, 7.5)), aes(x = x))
add_stat_fun <- function (ggp, f) {
ggp + stat_function(fun = f)
}
make_multiply_fun <- function (multiplier) {
force(multiplier) # not sure if this is required...
f <- function (x) {multiplier * x}
return(f)
}
my_funs <- lapply(1:3, make_multiply_fun)
# my_funs is now a list of functions
add_stat_fun(p1, my_funs[[1]])
add_stat_fun(p1, my_funs[[2]])
add_stat_fun(p1, my_funs[[3]])
I have a table with a binning variable VAR2_BY_NS_BIN and an x-y data pair (MP_BIN,CORRECT_PROP). I want to plot the data point binned, and also draw a different line for each bin using stat_function, taking a different reference each time using the for loop.
test_tab <- data.table(VAR2_BY_NS_BIN=c(0.0005478, 0.0005478, 0.002266, 0.002266, 0.006783, 0.006783, 0.020709, 0.020709, 0.142961, 0.142961),
MP_BIN=rep(c(0.505, 0.995), 5),
CORRECT_PROP=c(0.5082, 0.7496, 0.5024, 0.8627, 0.4878, 0.9368, 0.4979, 0.9826, 0.4811, 0.9989))
VAR2_BIN <- sort(unique(test_tab$VAR2_BY_NS_BIN)) #get unique bin values
LEN_VAR2_BIN <- length(VAR2_BIN) #get number of bins
col_base <- c("#FF0000", "#BB0033", "#880088", "#3300BB", "#0000FF") #mark bins with different colours
p <- ggplot(data = test_tab)
for (i in 1:LEN_VAR2_BIN) {
p <- p + geom_point(data = test_tab[test_tab$VAR2_BY_NS_BIN==VAR2_BIN[i],],
aes(x = MP_BIN, y = CORRECT_PROP),
col = col_base[i],
alpha = 0.5) +
stat_function(fun = function(t) {VAR2_BIN[i]*(t-0.5)+0.5}, col = col_base[i])
}
p <- p + xlab("MP") + ylab("Observed proportion")
print(p)
The above code (a reproducible example), however, always returns a plot with only the last stat_function line drawn (which is the 5th line in the above case).
The following code (without using the for loop) works, but I in fact have a large number of bins so it is not very feasible...
p <- p + stat_function(fun = function(t) {VAR2_BIN[1]*(t-0.5)+0.5}, col = col_base[1])
p <- p + stat_function(fun = function(t) {VAR2_BIN[2]*(t-0.5)+0.5}, col = col_base[2])
p <- p + stat_function(fun = function(t) {VAR2_BIN[3]*(t-0.5)+0.5}, col = col_base[3])
p <- p + stat_function(fun = function(t) {VAR2_BIN[4]*(t-0.5)+0.5}, col = col_base[4])
p <- p + stat_function(fun = function(t) {VAR2_BIN[5]*(t-0.5)+0.5}, col = col_base[5])
Thanks in advance!
You don't need a for loop or stat_function. To plot the points, just map MP_BIN and CORRECT_PROP to x and y and the points can be plotted with a single call to geom_point. For the lines, you can create the necessary values on the fly (as done in the code below) and plot those with geom_line.
library(tidyverse)
ggplot(test_tab %>% mutate(model=VAR2_BY_NS_BIN*(MP_BIN - 0.5) + 0.5),
aes(x=MP_BIN, colour=factor(VAR2_BY_NS_BIN))) +
geom_point(aes(y=CORRECT_PROP)) +
geom_line(aes(y=model)) +
labs(colour="VAR2_BY_NS_BIN") +
guides(colour=guide_legend(reverse=TRUE))
In terms of the problem you were having with the for loop, what's going on is that ggplot doesn't actually evaluate the loop variable (i) until you print the plot. The value of i is 5 at the end of the loop when the plot is printed, so that's the only line you get. You can find several questions related to this issue on Stack Overflow. Here's one of them.
I'm trying to create a utility function that combines several geom_, like in this example (which doesn't work):
my_geom_y <- function(yy, colour){
geom_line(aes(y=yy), col=colour) + geom_point(aes(y=yy), col=colour)
}
so that then I can do this:
myX <- 0:90
ggplot(mapping = aes(x=myX)) + my_geom_y(dlnorm(myX), "red") + my_geom_y(dexp(myX), "blue")
Is that possible?
I tried using get(), eval(), substitute(), as.name() with no avail.
Looking at related posts: passing parameters to ggplot, Use of ggplot() within another function in R didn't help.
I like MSM's approach, but if you want to be able to add my_geom_y to a ggplot you've already made, this is an alternative that might suit what you're after:
library(ggplot2)
x <- 1:100
my_geom_y <- function(yy, colour = "black"){
list(
geom_line(mapping = aes(y= yy),
col = colour),
data = data.frame(x, yy)),
geom_point(mapping = aes(y = yy),
col = colour,
data = data.frame(x, yy))
)
}
ggplot(mapping = aes(x)) +
my_geom_y(x, "red") +
my_geom_y(dlnorm(x), "blue") +
my_geom_y((x^1.1), "black") +
my_geom_y(x/2, "yellow")
I don't have enough reputations to comment so here is a suggestion:
my_geom_y <- function(xx, yy, colour){
ggplot() +
geom_line(aes(x=xx, y=yy), col=colour) +
geom_point(aes(x=xx, y=yy), col=colour)
}
This will create one plot. To create multiple ones, you need to pass your inputs to the function as a list and loop through it inside the function for each geom (since we can't add two or more ggplot objects) - if that makes sense.
Based on #luke-c idea, this makes the function standalone, cut-n-paste ready. We can also add now labels to each curve.
my_geom_y <- function(.xx, .yy, yLabel = 1, .colour=NA ){
if (is.na(.colour))
.colour <- palette()[yLabel%%length(palette())]
list( geom_line(mapping=aes(.xx,.yy), col=.colour, data=data.frame(.xx, .yy)),
geom_point(mapping=aes(.xx,.yy), col=.colour, data=data.frame(.xx, .yy)),
annotate(geom="text" , col = .colour, label=deparse(substitute(.yy)),
x=mean(.xx),y=max(.yy)-(max(.yy)-min(.yy))/20*yLabel)
)
}
myX <- 1:10
ggplot() + my_geom_y(myX, dlnorm(myX), 1) +
my_geom_y(myX, dexp(myX), 2) + my_geom_y(myX, dexp(myX,0.7), 3)
This function becomes handy when you need to visually compare multiple distributions.
For some reason, in this loop the PDFs that it produces end up corrupt. However, when I plot each individually it is saved and I can open them. Please advise, going mad!
for (l in 1:length(which_genes)) {
gene_name <- which_genes[[l]]
cases_values <- cases[cases$HGNC == genes[gene_name],]
controls_values <- controls[controls$HGNC == genes[gene_name],]
t <- t.test(cases_values[c(2:ncol(cases_values))], controls_values[c(2:ncol(controls_values))])
case <- cbind(t(cases_values[c(2:ncol(cases_values))]), "cases")
cont <- cbind(t(controls_values[c(2:ncol(controls_values))]), "controls")
dat <- as.data.frame(rbind(case, cont))
names(dat) <- c("expression", "type")
dat$expression <- as.numeric(dat$expression)
#plot significant genes
pdf(file = paste(genes[gene_name], "_different.pdf", sep=""))
ggplot(dat, aes(type, expression, fill=type)) +
geom_boxplot() +
ggtitle(paste(genes[gene_name], "pvalue", t$p.value)) +
xlab("cases vs controls")
dev.off()
}
Yet another instance of the failure-to-print error (as described in the R-FAQ). Use this instead inside the loop:
pdf(file = paste(genes[gene_name], "_different.pdf", sep=""))
print( ggplot(dat, aes(type, expression, fill=type)) +
geom_boxplot() +
ggtitle(paste(genes[gene_name], "pvalue", t$p.value)) +
xlab("cases vs controls")
)
dev.off()
If the goal was to have a multi-page output then you should have opened the PDF-device outside the loop, print-ed within the loop, and then closed the device outside.