How to use column names starting with numbers in ggplot functions - r

I have a huge dataframe, whose variables/ column names start with a number such as `1_variable`. Now I am trying to create a function that can take these column names as arguments to then plot a few boxplots using ggplot. However I need the string but also need to to use its input with `` to use the arguments in ggplot. However I am not sure how to escape the character string such as "1_variable" to give ggplot an input that is `1_variable`.
small reproducible example:
dfx = data.frame(`1ev`=c(rep(1,5), rep(2,5)), `2ev`=sample(10:99, 10),
`3ev`=10:1, check.names = FALSE)
If I were to plot the figure manually, the input would look like this:
dfx$`1ev` <- as.factor(dfx$`1ev`)
ggplot(dfx, aes(x = `1ev`, y = `2ev`))+
geom_boxplot()
the function I'd like to be able to run for the dataframe is this one:
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes(x = group, y = value))+
geom_boxplot()
return(plot)
}
1. Try
plot_boxplot(dfx, `1ev`, `2ev`)
which gives me an error saying Error in [.data.frame(data, c(group, value)) : object '1ev' not found
2. Try
entering the arguments with double quotes "" gives me unexpectedly this:
plot_boxplot(dfx, "1ev", "2ev")
3. Try
I also tried to replace the double quotes of the string with gsub in the function
gsub('\"', '`', group)
but that does not change anything abut its output.
4. Try
finally, I also tried to make use of aes_string , but that just gives me the same errors.
plot_boxplot <- function(data, group, value){
data = data[c(as.character(group), as.character(value))]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_string(x= group, y=value))+
geom_boxplot()
return(plot)
}
plot_boxplot(dfx, `1ev`, `2ev`)
plot_boxplot(dfx, "1ev", "2ev")
Ideally I would like to run the function to produce this output:
plot_boxplot(dfx, group = "1ev", value = "2ev")
[can be produced with this code manually]
ggplot(dfx, aes(x= `1ev`, y=`2ev`)) +
geom_boxplot()
Any help would be greatly appreciated.

One way to do this is a combination of aes_ and as.name():
plot_boxplot <- function(data, group, value){
data = data[c(group, value)]
data[,group] = as.factor(data[,group])
plot <- ggplot(data, aes_(x= as.name(group), y=as.name(value))) +
geom_boxplot()
return(plot)
}
And passing in strings for group and value:
plot_boxplot(dfx, "1ev", "2ev")
It's not the same plot you show above, but it looks to align with the data.

Related

Changing title of plots in a loop with colnames() in R

I am creating a for loop which creates a ggplot2 plot for each of the first six columns in a dataframe. Everything works except for the looping of the title names. I have been trying to use title = colnames(df[,i]) and title = paste0(colnames(df[,i]) to create the proper title but it simply ends up repeating the 2nd column name. The plots themselves produce the data correctly for each column, but the title is for some reason not looping. For the first plot it produces the correct title, but then for the second plot and beyond it just keeps on repeating the third column name, completely skipping over the second column name. I even tried creating a variable within the loop to store the respective title name to then use within the ggplot2 title labels: changetitle <- colnames(df[,i]) and then using title = changetitle but that also loops incorrectly.
Here is an example of what I have so far:
plot_6 <- list()
for(i in df[1:6]){
plot_6[i] <- print(ggplot(df, aes(x = i, ...) ...) +
... +
labs(title = colnames(df[,i]),
x = ...) +
...)
}
Thank you very much.
df[1:6] is a data frame with six columns. When used as a loop variable, this results in i being a vector of values each time through the loop. This might "work" in the sense that ggplot will prroduce a plot, but it breaks the link between the data frame provided to ggplot (df in this case) and the mapping of df's columns to ggplot's aesthetics.
Here are a few options, using the built-in mtcars data frame:
library(tidyverse)
library(patchwork)
plot_6 <- list()
for(i in 1:6) {
var = names(mtcars)[i]
plot_6[[i]] <- ggplot(mtcars, aes(x = !!sym(var))) +
geom_density() +
labs(title = var)
}
# Use column names directly as loop variable
for(i in names(mtcars)[1:6]) {
plot_6[[i]] <- ggplot(mtcars, aes(x = !!sym(i))) +
geom_density() +
labs(title = var)
}
# Use map, which directly generates a list of plots
plot_6 = map(names(mtcars)[1:6],
~ggplot(mtcars, aes(x = !!sym(.x))) +
geom_density() +
labs(title = .x)
)
Any of these produces the same list of plots:
wrap_plots(plot_6)

How do I loop a ggplot2 functon to export and save about 40 plots?

I am trying to loop a ggplot2 plot with a linear regression line over it. It works when I type the y column name manually, but the loop method I am trying does not work. It is definitely not a dataset issue.
I've tried many solutions from various websites on how to loop a ggplot and the one I've attempted is the simplest I could find that almost does the job.
The code that works is the following:
plots <- ggplot(Everything.any, mapping = aes(x = stock_VWRETD, y = stock_10065)) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
But I do not want to do this another 40 times (and then 5 times more for other reasons). The code that I've found on-line and have tried to modify it for my means is the following:
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in seq_along(nm)){
plots <- ggplot(z, mapping = aes(x = stock_VWRETD, y = nm[i])) +
geom_point() +
labs(x = 'Market Returns', y = 'Stock Returns', title ='Stock vs Market Returns') +
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1",nm[i],".png",sep=" "))
}
}
plotRegression(Everything.any)
I expect it to be the nice graph that I'd expect to get, a Stock returns vs Market returns graph, but instead on the y-axis, I get one value which is the name of the respective column, and the Market value plotted as normally, but as if on a straight number-line across the one y-axis value. Please let me know what I am doing wrong.
Desired Plot:
Actual Plot:
Sample Data is available on Google Drive here:
https://drive.google.com/open?id=1Xa1RQQaDm0pGSf3Y-h5ZR0uTWE-NqHtt
The problem is that when you assign variables to aesthetics in aes, you mix bare names and strings. In this example, both X and Y are supposed to be variables in z:
aes(x = stock_VWRETD, y = nm[i])
You refer to stock_VWRETD using a bare name (as required with aes), however for y=, you provide the name as a character vector produced by colnames. See what happens when we replicate this with the iris dataset:
ggplot(iris, aes(Petal.Length, 'Sepal.Length')) + geom_point()
Since aes expects variable names to be given as bare names, it doesn't interpret 'Sepal.Length' as a variable in iris but as a separate vector (consisting of a single character value) which holds the y-values for each point.
What can you do? Here are 2 options that both give the proper plot
1) Use aes_string and change both variable names to character:
ggplot(iris, aes_string('Petal.Length', 'Sepal.Length')) + geom_point()
2) Use square bracket subsetting to manually extract the appropriate variable:
ggplot(iris, aes(Petal.Length, .data[['Sepal.Length']])) + geom_point()
you need to use aes_string instead of aes, and double-quotes around your x variable, and then you can directly use your i variable. You can also simplify your for loop call. Here is an example using iris.
library(ggplot2)
plotRegression <- function(z,na.rm=TRUE,...){
nm <- colnames(z)
for (i in nm){
plots <- ggplot(z, mapping = aes_string(x = "Sepal.Length", y = i)) +
geom_point()+
geom_smooth(method='lm',formula=y~x)
ggsave(plots,filename=paste("regression1_",i,".png",sep=""))
}
}
myiris<-iris
plotRegression(myiris)

Dash in column name yields "object not found" Error

I have a function to generate scatter plots from data, where an argument is provided to select which column to use for coloring the points. Here is a simplified version:
library(ggplot2)
plot_gene <- function (df, gene) {
ggplot(df, aes(x, y)) +
geom_point(aes_string(col = gene)) +
scale_color_gradient()
}
where df is a data.frame with columns x, y, and then a bunch of gene names. This works fine for most gene names; however, some have dashes and these fail:
print(plot_gene(df, "Gapdh")) # great!
print(plot_gene(df, "H2-Aa")) # Error: object "H2" not found
It appears the gene variable is getting parsed ("H2-Aa" becomes H2 - Aa). How can I get around this? Is there a way to indicate that a string should not go through eval in aes_string?
Reproducible Input
If you need some input to play with, this fails like my data:
df <- data.frame(c(1,2), c(2,1), c(1,2), c(2,1))
colnames(df) <- c("x", "y", "Gapdh", "H2-Aa")
For my real data, I am using read.table(..., header=TRUE) and get column names with dashes because the raw data files have them.
Normally R tries very hard to make sure you have column names in your data.frame that can be valid variable names. Using non-standard column names (those that are not valid variable names) will lead to problems when using functions that use non-standard evaluation type syntax. When focused to use such variable names you often have to wrap them in back ticks. In the normal case
ggplot(df, aes(x, y)) +
geom_point(aes(col = H2-Aa)) +
scale_color_gradient()
# Error in FUN(X[[i]], ...) : object 'H2' not found
would return an error but
ggplot(df, aes(x, y)) +
geom_point(aes(col = `H2-Aa`)) +
scale_color_gradient()
would work.
You can paste in backticks if you really want
geom_point(aes_string(col = paste0("`", gene, "`")))
or you could treat it as a symbol from the get-go and use aes_q instread
geom_point(aes_q(col = as.name(gene)))
The latest release of ggplot support escaping via !! rather than using aes_string or aes_q so you could do
geom_point(aes(col = !!rlang::sym(gene)))

how to pass an arguments to function to get a line plot using ggplot2?

I am trying to write a function to create time series plot (line graph). How do I pass an argument to function so that the plot is created? I tried different ways like using aes_string etc. but no success.
lineplotfun <- function(feature){
ggplot(aes(x = 1:length(feature), y = feature), data = mtcars) +
geom_line()
}
lineplotfun(mpg)
I want to pass mpg as string or name.
There are numerous problems with the code in the question.
1) y is not in aes()
2) if ggplot2 is loaded, mpg is a tibble
3) y = feature with data = mtcars is meaningless
4) 1:length(feature) only makes sense if feature is a vector
One way of achieving what you want is by setting data = NULL and pass a vector to the function:
lineplotfun <- function(feature){
require(ggplot2)
ggplot2::ggplot(data = NULL, aes(x = seq_along(feature), y = feature)) +
ggplot2::geom_line()
}
lineplotfun(mtcars$mpg)
The result is:

Function for formatting and plotting in R

I am currently trying to create a function that will format my data and properly and return a bar plot that is sorted. Yet for some reason I keep getting this error:
Error in `$<-.data.frame`(`*tmp*`, "Var1", value = integer(0)) :
replacement has 0 rows, data has 3
I have tried debugging it, but have had no luck. I have an example of what I expect down at the bottom. Can anyone spot what I am doing wrong?
x <- rep(c("Mark","Jimmy","Jones","Jones","Jones","Jimmy"),2)
y <- rnorm(12)
df <- data.frame(x,y)
plottingfunction <- function(data, name,xlabel,ylabel,header){
newDf <- data.frame(table(data))
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
newDf$Var1 <- factor(newDf$Var1,order)
colnames(newDf)[1] <- name
plot <- ggplot(newDf, aes(x=name, y=Freq)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(stat="identity", fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingfunction(df$x, "names","xlabel","ylabel","header")
A few comments, your function didn't work, because this part isn't correct:
order <- newDf[order(newDf$Freq, decreasing = FALSE), ]$Var1
Since we have no idea if there will be any columns in data which has the column name Var1. What looks like happend is when you were trying your code you ran:
newDf <- data.frame(table(df$x))
which immediately renamed your column to Var1, but when you ran your function, the name changed. So to get around this I would recommend being explicit with your column names. In this example, I used the dplyr library to make my life easier. So following your code and logic it would look like this:
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
Then within your ggplot we can use aes_string to refer to the column name of the data frame instead. So then the whole function would look like this:
plottingFunction <- function(data, col_name, xlabel, ylabel, header) {
#' create a dataframe with the data that we're interested in
#' make sure that you preserve the anme of the column that you're
#' counting on...
newDf <- data %>% group_by_(col_name) %>% tally
order <- newDf[order(newDf$n, decreasing = FALSE), col_name][[col_name]]
data[,col_name] <- factor(data[,col_name], order)
plot <- ggplot(data, aes_string(col_name)) +
xlab(xlabel) +
ylab(ylabel) +
ggtitle(header) +
geom_bar(fill="lightblue", colour="black") +
coord_flip()
return(plot)
}
plottingFunction(df, "x", "xlabel","ylabel","header")
Which would have output like:
I think for your plot having stat="identity" is redundant since you can just use your original data frame rather than having a transformed one.

Resources