"Undefined columns selected" when passing a function [duplicate] - r

This question already has answers here:
How to use a variable to specify column name in ggplot
(6 answers)
Closed 3 years ago.
I asked similar question and was told there's a duplicated question (R ggplot2 aes argument). I changed my code based on that answer but still does not work.
I'm writing a spider plot and can run it (without function). Then I want to write a function named "plottum" which can input some variables and make plots. But now it does not work.
Can anyone help me? Thanks!
(I have changed my code a bit)
library(ggplot2)
library(tumgr)
set.seed(1234)
tumorgrowth = sampleData
tumorgrowth = do.call(rbind,
by(tumorgrowth, tumorgrowth$name,function(subset) within(subset,
{ treatment = ifelse(rbinom(1,1,0.5), "Drug","Control")
#random classfied
o = order(date)
date = date[o]
size = size[o]
baseline = size[1]
percentChange = 100*(size-baseline)/baseline
time = ifelse(date > 250, 250, date) ## data censored at 250
cstatus = factor(ifelse(date > 250, 0, 1))
})))
# Above codes work well, and problem is this plottum function
plottum = function(data,time,pct,name,censor,treat){
ggplot(data,aes(x=data[,time],y=data[,pct],group=data[,name]))+
geom_line(aes(color=data[,treat]))+
geom_point(aes(shape=data[,censor],color=data[,treat]))
}
plottum(tumorgrowth,"time","percentChange","name","cstatus","treatment" )

Option 1: use quasiquotation
scatter_by <- function(data, x, y) {
x <- enquo(x)
y <- enquo(y)
ggplot(data, aes(!!x, !!y)) + geom_point()
}
scatter_by(mtcars, disp, drat)
Option 2: aes_string (soft deprecated, meaning time to move start using new methods)
scatter_by <- function(data, x, y) {
ggplot(data, aes_string(x, y)) + geom_point()
}
scatter_by(mtcars, "disp", "drat")

Related

Use ggplot in an R function with three inputs: filename of dataframe, and two column variables of numeric data [duplicate]

This question already has answers here:
How to use a variable to specify column name in ggplot
(6 answers)
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed last year.
I would like to create an R function that takes as input:
a dataframe of my choosing
two columns of the dataframe containing numeric data
The output should be a scatterplot of one column variable against another, using both base R plot function and ggplot.
Here is a toy dataframe:
df <- data.frame("choco" = 1:5,
"tea" = c(2,4,5,8,10),
"coffee" = c(0.5,2,3,1.5,2.5),
"sugar" = 16:20)
Here is the function I wrote, which doesn't work (also tried this with base R plot which didn't work - code not shown)
test <- function(Data, ing1, ing2) {
ggplot(Data, aes(x = ing1, y = ing2)) +
geom_point()
}
test(Data = df, ing1 = "choco", ing2 = "tea")
As part of the above function, I would like to incorporate an 'if..else' statement to test whether ing1 and ing2 inputs are valid, e.g.:
try(test("coffee", "mint"))
the above inputs should prompt a message that 'one or both of the inputs is not valid'
I can see that using %in% could be the right way to do this, but I'm unsure of the syntax.
df <- data.frame(
"choco" = 1:5,
"tea" = c(2, 4, 5, 8, 10),
"coffee" = c(0.5, 2, 3, 1.5, 2.5),
"sugar" = 16:20
)
test <- function(Data, ing1, ing2) {
if (ing1 %in% names(Data) & ing2 %in% names(Data)) {
ggplot(Data, aes(x = Data[, ing1], y = Data[, ing2])) +
geom_point()
}
else {
print("Both ing1, and ing2 has to be columns of data frame")
}
}
test(Data = df, ing1 = "choco", ing2 = "sugar")
Regards,
Grzegorz

Making Multiple Plots from a Function call in R studio

I am relatively new to R. My question results from a project in an online learning course. I am using R studio to make multiple plots from a function call. I want a new plot for each column which represents the y-axis while the x-axis remains equal to the month. The function works when displaying a single variable. However, when I try the function call using multiple columns I receive:
"Error: More than one expression parsed"
Similar code worked in the online program's simulated platform.
I have provided my code below with a small sample from the data frame. Is it possible to derive multiple plots in this way? If so, how can I update or correct my code to make the plot for each column.
month <- c('mar', 'oct', 'oct')
day <- c('fri', 'tue', 'sat')
FFMC <- c(86.2, 90.6, 90.6)
DMC <- c(26.2, 35.4, 43.7)
DC <- c(94.3, 669.1, 686.9)
ISI <- c(5.1, 6.7, 6.7)
temp <- c(8.2, 18.0, 14.6)
RH <- c(51, 33, 33)
wind <- c(6.7, 0.9, 1.3)
rain <- c(0.0, 0.0, 0.0)
forestfires_df <- data.frame(month, day, FFMC, DMC, DC, ISI, temp, RH, wind, rain)
library(ggplot2)
library(purrr)
month_box <- function(x , y) {
ggplot(data = forestfires_df, aes_string(x = month, y = y_var)) +
geom_boxplot() +
theme_bw()
}
month <- names(forestfires_df)[1]
y_var <- names(forestfires_df)[3:10]
month_plots <- map2(month, y_var, month_box)
#After running month_plots I receive "Error: More than one expression parsed"
The issue is that the function arguments should match the ones inside
month_box <- function(x , y) {
ggplot(data = forestfires_df, aes_string(x = x, y = y)) +
geom_boxplot() +
theme_bw()
}
If we use 'month' and 'y_var', 'y_var' is of length 8 and that is the reason we do the looping in map. With the change, the map2 should work as expected
map2(month, y_var, month_box)
Or using anonymous function
map2(month, y_var, ~ month_box(.x, .y))
Like I mentioned in a comment, aes_string has been soft-deprecated in favor of using tidyeval to write ggplot2 functions. You can rewrite your function as a simple tidyeval-based one, then map over the columns of interest passing bare column names or their positions the way you would with most other tidyverse functions.
There are a couple ways to write a function like this. The older way is with quosures and unquoting columns, but its syntax can be confusing. dplyr comes with a very in-depth vignette, but I like this blog post as a quick guide.
month_box_quo <- function(x, y) {
x_var <- enquo(x)
y_var <- enquo(y)
ggplot(forestfires_df, aes(x = !!x_var, y = !!y_var)) +
geom_boxplot()
}
A single call looks like this, with bare column names:
month_box_quo(x = month, y = DMC)
Or with map_at and column positions (or with vars()):
# mapped over variables of interest; assumes y gets the mapped-over column
map_at(forestfires_df, 3:10, month_box_quo, x = month)
# or with formula shorthand
map_at(forestfires_df, 3:10, ~month_box_quo(x = month, y = .))
The newer tidyeval syntax ({{}}, or curly-curly) is easier to follow, and returns the same list of plots as above.
month_box_curly <- function(x, y) {
ggplot(forestfires_df, aes(x = {{ x }}, y = {{ y }})) +
geom_boxplot()
}

Save ggplot in loop with R

I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.
My df is IRIS dataset.
df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)
Then i calculate rest of it (numerics)
nm<-ncol(df)-length(fact)
Next step is create loop
i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()
for (i in 1:length(df)){
plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i])
if (is.factor(df[,i])){
p_factor<-plot+geom_bar()
list_plotF[[i_F]]<-p_factor
i_F=i_F+1
}else{
p_numeric <- plot+geom_histogram()
list_plotN[[i_N]]<-p_numeric
i_N=i_N+1
}
}
When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.
thanks!!!
I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.
Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.
In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.
histOrBar <- function(var) {
basePlot <- ggplot(df, aes_string(var))
if ( is.factor(df[[var]]) ) {
basePlot + geom_bar()
} else {
basePlot + geom_histogram()
}
}
loDFs <- lapply(colnames(df), histOrBar)
Consider passing column names with aes_string to better align x with df:
for (i in 1:length(df)){
plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) +
xlab(names(df)[i])
...
}
To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.
Data
library(ggplot2)
set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")
random_df <- data.frame(
group = sample(data_tools, 500, replace=TRUE),
int = as.numeric(sample(1:15, 500, replace=TRUE)),
num = rnorm(500),
char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()),
500, replace=TRUE), origin='1970-01-01')
)
Graph
fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)
i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL
for (i in 1:length(random_df)){
# aes() VERSION
#plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
# xlab(names(random_df)[i])
# aes_string() VERSION
plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
xlab(names(random_df)[i])
if (is.factor(random_df[,i])){
p_factor <- plot + geom_bar()
list_plotF[[i_F]] <- p_factor
i_F=i_F+1
}else{
p_numeric <- plot + geom_histogram()
list_plotN[[i_N]] <- p_numeric
i_N=i_N+1
}
}
Problem (using aes() where graph outputs DO NOT change according to type)
Solution (using aes_string() where graphs DO change according to type)

Creating a boxplot loop with ggplot2 for only certain variables

I have a dataset with 99 observations and I need to create boxplots for ones with a specific string in them. However, when I run this code I get 57 of the exact same plots from the original function instead of the loop. I was wondering how to prevent the plots from being overwritten but still create all 57. Here is the code and a picture of the plot.
Thanks!
Boxplot Format
#starting boxplot function
myboxplot <- function(mydata=ivf_dataset, myexposure =
"ART_CURRENT", myoutcome = "MEG3_DMR_mean")
{bp <- ggplot(ivf_dataset, aes(ART_CURRENT, MEG3_DMR_mean))
bp <- bp + geom_boxplot(aes(group =ART_CURRENT))
}
#pulling out variables needed for plots
outcomes = names(ivf_dataset)[grep("_DMR_", names(ivf_dataset),
ignore.case = T)]
#creating loop for 57 boxplots
allplots <- list()
for (i in seq_along(outcomes))
{
allplots[[i]]<- myboxplot (myexposure = "ART_CURRENT", myoutcome =
outcomes[i])
}
allplots
I recommend reading about standard and non-standard evaluation and how this works with the tidyverse. Here are some links
http://adv-r.had.co.nz/Functions.html#function-arguments
http://adv-r.had.co.nz/Computing-on-the-language.html
I also found this useful
https://rstudio-pubs-static.s3.amazonaws.com/97970_465837f898094848b293e3988a1328c6.html
Also, you need to produce an example so that it is possible to replicate your problem. Here is the data that I created.
df <- data.frame(label = rep(c("a","b","c"), 5),
x = rnorm(15),
y = rnorm(15),
x2 = rnorm(15, 10),
y2 = rnorm(15, 5))
I kept most of your code the same and only changed what needed to be changed.
myboxplot2 <- function(mydata = df, myexposure, myoutcome){
bp <- ggplot(mydata, aes_(as.name(myexposure), as.name(myoutcome))) +
geom_boxplot()
print(bp)
}
myboxplot2(myexposure = "label", myoutcome = "y")
Because aes() uses non-standard evaluation, you need to use aes_(). Again, read the links above.
Here I am getting all the columns that start with x. I am assuming that your code gets the columns that you want.
outcomes <- names(df)[grep("^x", names(df), ignore.case = TRUE)]
Here I am looping through in the same way that you did. I am only storing the plot object though.
allplots <- list()
for (i in seq_along(outcomes)){
allplots[[i]]<- myboxplot2(myexposure = "label", myoutcome = outcomes[i])$plot
}
allplots

Pass variables as parameters to plot_ly function

I would like to create a function that creates different kinds of plotly plots based on the parameters that are passed into it. If I create the following data
library(plotly)
#### test data
lead <- rep("Fred Smith", 30)
lead <- append(lead, rep("Terry Jones", 30))
lead <- append(lead, rep("Henry Sarduci", 30))
proj_date <- seq(as.Date('2017-11-01'), as.Date('2017-11-30'), by = 'day')
proj_date <- append(proj_date, rep(proj_date, 2))
set.seed(1237)
actHrs <- runif(90, 1, 100)
cummActHrs <- cumsum(actHrs)
forHrs <- runif(90, 1, 100)
cummForHrs <- cumsum(forHrs)
df <- data.frame(Lead = lead, date_seq = proj_date,
cActHrs = cummActHrs,
cForHrs = cummForHrs)
I could plot it using:
plot_ly(data = df, x = ~date_seq, y = ~cActHrs, split = ~Lead)
If I made a makePlot function like the one shown below, how would I make it do something like this:
makePlot <- function(plot_data = df, x_var = date_seq, y_var, split_var) {
plot <- plot_ly(data = df, x = ~x_var, y = ~y_var, split = ~split_var)
return(plot)
}
?
Is there a function I can wrap x_var, y_var, and split_var with so that plotly will recognize them as x, y, and split parameters?
Eventually got around to figuring this out and hope this little follow up takes some of the mystery of these types of tasks. Although this question is focused on plotting, it's important to first build an understanding of how the functions in various R packages (e.g. dplyr and plotly) evaluate expressions and how to manipulate the way those expressions are evaluated. A great reference to build this understanding is Hadley's article on programming in dplyr here or alternatively here.
Once that's under your belt, this turns out to be pretty easy. The trick is to simply pass your variable arguments like you do when you call dplyr functions and be sure to quote those parameters inside your plotting function. For the question above, this function worked for me:
makePlot <- function(plot_data = df, x_var, y_var, split_var,
type_var="scatter",
mode_var="lines+markers") {
quo_x <- enquo(x_var)
quo_y <- enquo(y_var)
quo_split <- enquo(split_var)
# print(c(quo_x, quo_y, quo_split))
plot <- plot_ly(data = plot_data, x = quo_x, y = quo_y, split = quo_split,
type=type_var, mode=mode_var)
return(plot)
}
# using df created in question, pass col's as args like dplyr functions
p1 <- makePlot2(df, date_seq, cActHrs, Lead)
p2 <- makePlot2(df, date_seq, cForHrs, Lead)

Resources