I am new to R and have been trying to figure this out for a while. Basically, I have a data frame, and various y variables. I am trying to write a function that will allow me to come up with a customized graph template for the many different y variables that I have. I am trying the following code below but I am met with this error:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In aes_string(xvar[max(which(complete.cases(yvar)))], yvar[max(which(complete.cases(yvar)))], :
NAs introduced by coercion
The code works if I add the variables in directly and not through a function. I believe that it is something to do with how the function plugs in the xvar into the as.numeric() function. I am not sure but any of you knows how to deal with this?
test <- function (Data, xvar, yvar){
# Plot data
plot <- ggplot(subset(Data,!is.na((yvar))), aes_string(xvar, yvar)) + geom_line(colour="darkblue") + theme_bw()
# Add Trendline for recent data
plot <- plot + geom_smooth(data=subset(Data, xvar > as.numeric(xvar)[max(which(complete.cases(yvar)))-8]), method = "lm")
# Label most recent data
plot + geom_text(data = Data, aes_string(xvar[max(which(complete.cases(yvar)))],
yvar[max(which(complete.cases(yvar)))],
label = as.numeric(yvar)[max(which(complete.cases(yvar)))],
hjust= -0.5, vjust = 0.5))
As xvar is probably (you do not show a reproducible example) a character vector of length 1, subsetting like xvar[] will not yield the desired result.
You could try something like
library(ggplot2)
f <- function(data, xvar, yvar) {
ggplot(data, aes_string(xvar, yvar)) +
geom_point() +
geom_smooth(data=subset(data, eval(parse(text=xvar)) > 5), method = "lm")
}
or
f <- function(data, xvar, yvar) {
ggplot(data, aes_string(xvar, yvar)) +
geom_point() +
geom_smooth(data = data[data[, xvar]>5, ], method = "lm")
}
f(mtcars, "cyl", "disp")
I think #LukeA has gotten you practically all the way there, but here is an example that uses your data and adds a few more columns to help demonstrate how you can pass column names into ggplot inside your own function.
It uses your variable names. It subsets your data into a data.frame with non-missing values for y, and then it subsets your data into a separate data.frame that allows you to add additional filtering criteria to your smoothing function.
library(zoo)
set.seed(72)
X1 <- as.yearqtr(seq(as.Date("2010/3/1"), by = "quarter", length.out = 10))
Y1 <- as.vector(c(124,315,363,574,345,434,141,512,142,647))
Y2 <- sample(Y1)
Y3 <- sample(Y1)
Data1 <- data.frame(X1, Y1, Y2, Y3)
plot_function <- function(data, xvar, yvar){
# remove rows with NA on yvar
mydata1 <- data[!is.na(data[, yvar]), ]
# remove rows with NA on yvar and subset yvar above some threshold
mydata2 <- data[!is.na(data[, yvar]) & data[, yvar] > 400, ]
# plot it
myplot <- ggplot(mydata1, aes_string(xvar, yvar)) +
geom_line(colour="darkblue") +
scale_x_yearqtr(limits = c(min(mydata1[, xvar]), max(mydata1[, xvar])), format = "%YQ%q") +
geom_smooth(data = mydata2, aes_string(xvar, yvar), method = "lm") +
geom_text(data = mydata1, aes_string(xvar, yvar, label = yvar), hjust= -0.5, vjust = 0.5) +
theme_bw()
return(myplot)
}
plot_function(data = Data1, xvar = "X1", yvar = "Y1")
plot_function(data = Data1, xvar = "X1", yvar = "Y2")
plot_function(data = Data1, xvar = "X1", yvar = "Y3")
Related
I'm trying to create a function to combine the output from the package rmcorr with ggplot. The documentation for rmcorr includes an example on how to render the output with ggplot. I'm having trouble getting the grouping variable working for my custom function (3rd code paragraph below).
Here is the code and the following graph, without grouping variable for facetting, and where everything looks fine:
rmcorr_fun_2 <- function(p,m1,m2) {
my.rmc <- rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars)
print(my.rmc)
p <- sym(p)
m1 <- sym(m1)
m2 <- sym(m2)
#grp <- sym(grp)
print(ggplot(data = mtcars, aes(x = !!m1, y = !!m2, group = factor(!!p), color = factor(!!p)))+
geom_point(aes(colour = factor(!!p))) +
geom_line(aes(y = my.rmc$model$fitted.values), linetype = 1))
#facet_wrap(.~(!!grp)))
}
Using same codes above but adding grp variable and removing the hashes for grouping:
rmcorr_fun_2 <- function(p,m1,m2,grp) {
my.rmc <- rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars)
print(my.rmc)
p <- sym(p)
m1 <- sym(m1)
m2 <- sym(m2)
grp <- sym(grp)
print(ggplot(data = mtcars, aes(x = !!m1, y = !!m2, group = factor(!!p), color = factor(!!p)))+
geom_point(aes(colour = factor(!!p))) +
geom_line(aes(y = my.rmc$model$fitted.values), linetype = 1)+
facet_wrap(.~(!!grp)))
}
Gives the following error:
Error in sym(grp) : argument "grp" is missing, with no default
In addition: Warning message:
In rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars) :
'p' coerced into a factor
Called from: is_symbol(x)
Instead of formula notation you have to wrap the faceting variable inside vars().
Also, instead of sym + !! you could simply make use of the .data pronoun from rlang in case you pass your column names as strings.
library(ggplot2)
library(rmcorr)
rmcorr_fun_2 <- function(p, m1, m2, grp) {
my.rmc <- rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars)
print(my.rmc)
ggplot(data = mtcars, aes(x = .data[[m1]], y = .data[[m2]], group = factor(.data[[p]]), color = factor(.data[[p]]))) +
geom_point(aes(colour = factor(.data[[p]]))) +
geom_line(aes(y = my.rmc$model$fitted.values), linetype = 1) +
facet_wrap(vars(.data[[grp]]))
}
rmcorr_fun_2("cyl", "hp", "disp", "cyl")
I want to change the color of a binned scatter plot that is created with the "binsreg" package. I know how to add lines etc. within a ggplot, but I don't know where I can change things within the "bins_plot". Below, you see an example. How do I change the color of the points from blue to green, for instance?
library(binsreg)
library(ggplot)
x <- rnorm(100,0,1)
err <- rnorm(100,0,0.5)
y <- x + err
data <- data.frame(x,y)
a <- binsreg(y,x, nbins = 15)
a$bins_plot +
geom_smooth(data = data, aes(x = x, y = y), method = "lm")
Use the bycolors argument of the binsreg function:
library(binsreg)
library(ggplot2)
set.seed(123)
x <- rnorm(100,0,1)
err <- rnorm(100,0,0.5)
y <- x + err
data <- data.frame(x,y)
a <- binsreg(y, x, nbins = 15, bycolors = "green")
a$bins_plot +
geom_smooth(data = data, aes(x = x, y = y), method = "lm")
Created on 2022-03-14 by the reprex package (v2.0.1)
I am trying to create a multi-faceted plot with free scaling using ggplot2. By design, facet_grid, cannot achieve what I need. And facet_wrap fails with a cryptic error. Could you please tell me, do you have any suggestions on how to fix the error? A reproducible example is given below.
Let's create sample data:
require(tidyverse)
require(modelr)
d1 <- tibble(
x = 1:100,
y = 1:100 + rnorm(10),
z = y ^ 2,
dataset_name = "d1"
)
d2 <- tibble(
x = 1:1000,
y = 1:1000 + rnorm(10),
z = y ^ 2,
dataset_name = "d2"
)
#these data will be used for the 1st layer
actuals <- bind_rows(d1, d2)
#these data will be used for the 2nd layer
predictions <- bind_rows(
d1 %>% gather_predictions(
"m1" = lm(y ~ x, data = d1),
"m2" = lm(y ~ x + z, data = d1),
.pred = "y"
),
d2 %>% gather_predictions(
"m1" = lm(y ~ x, data = d2),
"m2" = lm(y ~ x + z, data = d2),
.pred = "y"
)
)
facet_grid generated the required graphs:
)
But it cannot (by design) scale the x-axis:
ggplot(actuals, aes(x, y)) +
geom_point() +
geom_line(data = predictions, colour = "red") +
facet_grid(dataset_name ~ model, scales = "free")
If I want to plot the data only for one dataset (namely, predictions), it works as expected and I get 4 facets:
ggplot(predictions, aes(x, y)) +
geom_point() +
facet_wrap( ~ model + dataset_name, scales = "free")
However, if I try to combine actuals and predictions as follows:
ggplot(actuals, aes(x, y)) +
geom_point() +
geom_line(data = predictions, colour = "red") +
facet_wrap( ~ model + dataset_name, scales = "free")
Then things fall apart with the following error: Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", : only 'grobs' allowed in "gList"
Try making a single variable with the interaction of model and dataset_name.
# these two blocks of code are equivalent
library(magrittr)
predictions %<>% mutate(mod_dn = interaction(model, dataset_name))
and
predictions <- predictions %>%
mutate(mod_dn = interaction(model, dataset_name))
Now, this poses a problem for facet_wrap, since mod_dn does not exist there. So we need to merge the two datasets together. Using tidyverse, we can do this with left_join, but we need to be careful about what we join by, and then adjust the ggplot call accordingly:
all_data <- left_join(
actuals,
predictions,
by = c("x", "dataset_name"),
suffix = c(".actual", ".pred")
)
all_data %>%
ggplot(aes(x, y.actual)) +
geom_point() +
geom_line(aes(y = y.pred), colour = "red") +
facet_wrap( ~ mod_dn, scales = "free") +
labs(y = "y")
Thank you all for the help! It seems that a more straightforward solution (even though it will alter the original question), would be to tweak the predictions as follows:
predictions <- bind_rows(
d1 %>% gather_predictions(
"m1" = lm(y ~ x, data = d1),
"m2" = lm(y ~ x + z, data = d1),
.pred = "y.pred"
),
d2 %>% gather_predictions(
"m1" = lm(y ~ x, data = d2),
"m2" = lm(y ~ x + z, data = d2),
.pred = "y.pred"
)
)
Then we can do the plotting without resorting to joins:
ggplot(predictions, aes(x, y)) +
geom_point() +
geom_line(aes(x, y.pred), colour = "red") +
facet_wrap( ~ model + dataset_name, scales = "free")
This renders the desired plot.
I have been trying to fit a polynomial surface to a set of point with 3 coordinates.
Let the data be:
DATA <- with(mtcars, as.data.frame(cbind(1:32, wt,disp,mpg)))
I have been trying to draw a surface using:
plot3d from rgl package,
using rsm package,
scatterplot3d package.
For example:
library(scatterplot3d)
attach(mtcars)
DATA <- as.data.frame(cbind(1:32, wt,disp,mpg))
scatterplot3d(wt,disp,mpg, main="3D Scatterplot")
model <- loess(mpg ~wt + disp, data=DATA)
x <-range(DATA$wt)
x <- seq(x[1], x[2], length.out=50)
y <- range(DATA$disp)
y <- seq(y[1], y[2], length.out=50)
z <- outer(x,y,
function(wt,disp)
predict(model, data.frame(wt,disp)))
z
p <- persp(x,y,z, theta=30, phi=30,
col="lightblue",expand = 0.5,shade = 0.2,
xlab="wt", ylab="disp", zlab="mpg")
I have also tried using surf.ls function:
surf.ls(2,DATA[,2],DATA[,3],DATA[,4])
But what I got looks like this:
I don't really know how to transform it to a 3D plot and more importantly, how to get the formula for the best fit surface obtained.
I would really appreciate your help.
PS I have deleted my last post and included more details in this one.
Try this:
attach(mtcars)
DATA <- as.data.frame(cbind(1:32, wt,disp,mpg))
x_wt <- DATA$wt
y_disp <- DATA$disp
z_mpg <- DATA$mpg
fit <- lm(z_mpg ~ poly(x_wt, y_disp, degree = 2), data = DATA)
To plot with rsm, use the following:
library(rsm)
image(fit, y_disp ~ x_wt)
contour(fit, y_disp ~ x_wt)
persp(fit, y_disp ~ x_wt, zlab = "z_mpg")
To plot with ggplot, use the following:
## ggplot
# Use rsm package to create surface model.
library(rsm)
SurfMod <- contour(fit, y_disp ~ x_wt)
# extract list values from rsm Surface Model
Xvals <- SurfMod$`x_wt ~ y_disp`[1]
Yvals <- SurfMod$`x_wt ~ y_disp`[2]
Zvals <- SurfMod$`x_wt ~ y_disp`[3]
# Construct matrix with col and row names
SurfMatrix <- Zvals$z
colnames(SurfMatrix) <- Yvals$y
rownames(SurfMatrix) <- Xvals$x
# Convert matrix to data frame
library(reshape2)
SurfDF <- melt(SurfMatrix)
library(ggplot2)
gg <- ggplot(data = SurfDF) +
geom_tile(data = SurfDF, aes(Var1, Var2,z = value, fill = value)) +
stat_contour(data = SurfDF, aes(Var1, Var2, z = value, color = ..level..)) +
scale_colour_gradient(low = "green", high = "red") +
geom_point(data = DATA, aes(wt, disp, z = mpg, color = mpg)) +
geom_text(data = DATA, aes(wt, disp,label=mpg),hjust=0, vjust=0) +
scale_fill_continuous(name="mpg") +
xlab("x_wt") +
ylab("y_disp")
library(directlabels)
direct.label.ggplot(gg, "angled.endpoints")
To see all of the available direct.label methods, go to http://directlabels.r-forge.r-project.org/docs/index.html
I have a data frame created the following way.
library(ggplot2)
x <- data.frame(letters[1:10],abs(rnorm(10)),abs(rnorm(10)),type="x")
y <- data.frame(letters[1:10],abs(rnorm(10)),abs(rnorm(10)),type="y")
# in reality the number of row could be larger than 10 for each x and y
all <- rbind(x,y)
colnames(all) <- c("name","val1","val2","type")
What I want to do is to create a faceted ggplot that looks roughly like this:
Hence each facet above is the correlation plot of the following:
# Top left facet
subset(all,type=="x")$val1
subset(all,type=="y")$val1
# Top right facet
subset(all,type=="x")$val1
subset(all,type=="y")$val2
# ...etc..
But I'm stuck with the following code:
p <- ggplot(all, aes(val1, val2))+ geom_smooth(method = "lm") + geom_point() +
facet_grid(type ~ )
# Calculate correlation for each group
cors <- ddply(all, c(type ~ ), summarise, cor = round(cor(val1, val2), 2))
p + geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=0.5, y=0.5)
What's the right way to do it?
Some of your code was incorrect. This works for me:
p <- ggplot(all, aes(val1, val2))+ geom_smooth(method = "lm") + geom_point() +
facet_grid(~type)
# Calculate correlation for each group
cors <- ddply(all, .(type), summarise, cor = round(cor(val1, val2), 2))
p + geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=1, y=-0.25)
Edit: Following OP's comment and edit. The idea is to re-create the data with all four combinations and then facet.
# I consider the type in your previous data to be xx and yy
dat <- data.frame(val1 = c(rep(all$val1[all$type == "x"], 2),
rep(all$val1[all$type == "y"], 2)),
val2 = rep(all$val2, 2),
grp1 = rep(c("x", "x", "y", "y"), each=10),
grp2 = rep(c("x", "y", "x", "y"), each=10))
p <- ggplot(dat, aes(val1, val2)) + geom_point() + geom_smooth(method = "lm") +
facet_grid(grp1 ~ grp2)
cors <- ddply(dat, .(grp1, grp2), summarise, cor = round(cor(val1, val2), 2))
p + geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=1, y=-0.25)
Since your data is not in the appropriate format, some reshaping is necessary before it can be plotted.
Firstly, reshape the data to the long format:
library(reshape2)
allM <- melt(all[-1], id.vars = "type")
Split the values along type and val1 vs. val2:
allList <- split(allM$value, interaction(allM$type, allM$variable))
Create a list of all combinations:
allComb <- unlist(lapply(c(1, 3),
function(x)
lapply(c(2 ,4),
function(y)
do.call(cbind, allList[c(x, y)]))),
recursive = FALSE)
Create a new dataset:
allNew <- do.call(rbind,
lapply(allComb, function(x) {
tmp <- as.data.frame(x)
tmp <- (within(tmp, {xval <- names(tmp)[1];
yval <- names(tmp)[2]}))
names(tmp)[1:2] <- c("x", "y")
tmp}))
Plot:
library(ggplot2)
p <- ggplot(allNew, aes(x = x, y = y)) +
geom_smooth(method = "lm") +
geom_point() +
facet_grid(yval ~ xval)
# Calculate correlation for each group
library(plyr)
cors <- ddply(allNew, .(yval, xval), summarise, cor = round(cor(x, y), 2))
p + geom_text(data=cors, aes(label=paste("r=", cor, sep="")), x=0.5, y=0.5)
There is an additional package ggpubr available now addressing exactly this issue with the stat_cor() function.
library(tidyverse)
library(ggpubr)
ggplot(all, aes(val1, val2))+
geom_smooth(method = "lm") +
geom_point() +
facet_grid(~type) +
stat_cor()