I'm trying to create a function to combine the output from the package rmcorr with ggplot. The documentation for rmcorr includes an example on how to render the output with ggplot. I'm having trouble getting the grouping variable working for my custom function (3rd code paragraph below).
Here is the code and the following graph, without grouping variable for facetting, and where everything looks fine:
rmcorr_fun_2 <- function(p,m1,m2) {
my.rmc <- rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars)
print(my.rmc)
p <- sym(p)
m1 <- sym(m1)
m2 <- sym(m2)
#grp <- sym(grp)
print(ggplot(data = mtcars, aes(x = !!m1, y = !!m2, group = factor(!!p), color = factor(!!p)))+
geom_point(aes(colour = factor(!!p))) +
geom_line(aes(y = my.rmc$model$fitted.values), linetype = 1))
#facet_wrap(.~(!!grp)))
}
Using same codes above but adding grp variable and removing the hashes for grouping:
rmcorr_fun_2 <- function(p,m1,m2,grp) {
my.rmc <- rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars)
print(my.rmc)
p <- sym(p)
m1 <- sym(m1)
m2 <- sym(m2)
grp <- sym(grp)
print(ggplot(data = mtcars, aes(x = !!m1, y = !!m2, group = factor(!!p), color = factor(!!p)))+
geom_point(aes(colour = factor(!!p))) +
geom_line(aes(y = my.rmc$model$fitted.values), linetype = 1)+
facet_wrap(.~(!!grp)))
}
Gives the following error:
Error in sym(grp) : argument "grp" is missing, with no default
In addition: Warning message:
In rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars) :
'p' coerced into a factor
Called from: is_symbol(x)
Instead of formula notation you have to wrap the faceting variable inside vars().
Also, instead of sym + !! you could simply make use of the .data pronoun from rlang in case you pass your column names as strings.
library(ggplot2)
library(rmcorr)
rmcorr_fun_2 <- function(p, m1, m2, grp) {
my.rmc <- rmcorr(participant = p, measure1 = m1, measure2 = m2, dataset = mtcars)
print(my.rmc)
ggplot(data = mtcars, aes(x = .data[[m1]], y = .data[[m2]], group = factor(.data[[p]]), color = factor(.data[[p]]))) +
geom_point(aes(colour = factor(.data[[p]]))) +
geom_line(aes(y = my.rmc$model$fitted.values), linetype = 1) +
facet_wrap(vars(.data[[grp]]))
}
rmcorr_fun_2("cyl", "hp", "disp", "cyl")
Related
I imputed my data using both the packages mice and miceRanger and I would like to compare the distributions of the imputed variables with the original data. In miceRanger this is very easy using the function plotDistributions() which displays density plots for the numeric variables and barplots for factors.
In mice, if the variables are numeric, it is easy to compare the distributions using stripplot() or bwplot(), but I cannot find a simple way to do it if the variables are factors. I wonder if I am missing something or I just have to give in and create a custom routine for that.
Does anyone have any suggestion? Thanks in advance!
As far as I am aware, there isn't a mice-equivalent function similar to miceRanger::plotDistributions() (which is disappointing because it's a very convenient function). However, you can use ggmice to use ggplot2 syntax on mids objects.
library(miceRanger)
library(mice)
library(ggmice)
data(nhanes)
nhanes$hyp <- factor(nhanes$hyp)
## miceRanger
imp1 <- miceRanger(data = nhanes)
plotDistributions(imp1)
## mice and ggmice
imp2 <- mice(data = nhanes)
ggmice(imp2, aes(x = hyp)) +
geom_histogram(stat = "count")
I adapted PlotDistributions() from miceRanger to work with mice.
Beware this function has been only tested quickly on 2 random datasets, therefore I cannot guarantee it does not contain bugs.
For more info, see here
PlotDist <- function (miceObj, vars = names(miceObj$imp), dotsize = 0.5,
...)
{
pos <- which(miceObj$nmis!=0)
vars <- vars[pos]
newClasses <- sapply(miceObj$data[pos], class)
if (vars[[1]] == "allCategorical")
vars <- names(newClasses[newClasses == "factor"])
if (vars[[1]] == "allNumeric")
vars <- names(newClasses[newClasses != "factor"])
newClasses <- newClasses[vars]
facVars <- newClasses[newClasses == "factor"]
numVars <- newClasses[newClasses != "factor"]
if (length(facVars) > 0) {
facList <- lapply(names(facVars), function(var) {
dat <- as.data.table(miceObj$imp[[var]])
dat <- melt(dat, measure.vars = names(dat))
setnames(dat, "value", var)
agg <- dat[, .(Percentage = .N/sum(miceObj$where[, var])), by = c("variable", var)]
rawAgg <- na.omit(as.data.table(mice$data),cols = var)[, .(Percentage = .N/sum(!miceObj$where[, var])), by = var]
return(ggplot() + geom_dotplot(data = agg, aes(x = !!sym(var),
y = !!sym("Percentage")),
binaxis = "y", stackdir = "center",
dotsize = dotsize, stackratio = 0.75, binwidth = 1/50) +
geom_bar(data = rawAgg, aes(x = !!sym(var), y = !!sym("Percentage")),
stat = "identity", alpha = 0.5) +
scale_x_discrete(guide = guide_axis(n.dodge = 2)))
})
}
else facList <- NULL
if (length(numVars) > 0) {
numList <- lapply(names(numVars), function(var) {
dat <- as.data.table(miceObj$imp[[var]])
dat <- melt(dat, measure.vars = names(dat))
setnames(dat, "value", var)
dens <- density(miceObj$data[, get(var)], na.rm = TRUE)
return(ggplot() + geom_density(data = dat, aes(!!sym(var), group = !!sym("variable")), bw = dens$bw) +
geom_density(data = miceObj$data[!is.na(get(var))], aes(!!sym(var)),
linewidth = 1, color = "red", bw = dens$bw) +
ylab("Density"))
})
}
else numList <- NULL
pList <- c(numList, facList)
ggarrange(plotlist = pList, ...)
}
We often want individual regression equations in ggplot facets. The best way to do this is build the labels in a dataframe and then add them manually. But what if the labels contain plotmath, e.g., superscripts?
Here is a way to do it. The plotmath is converted to a string and then parsed by ggplot. The test_eqn function is taken from another Stackoverflow post, I'll link it when I find it again. Sorry about that.
library(ggplot2)
library(dplyr)
test_eqn <- function(y, x){
m <- lm(log(y) ~ log(x)) # fit y = a * x ^ b in log space
p <- exp(predict(m)) # model prediction of y
eq <- substitute(expression(Y==a~X^~b),
list(
a = format(unname(exp(coef(m)[1])), digits = 3),
b = format(unname(coef(m)[2]), digits = 3)
))
list(eq = as.character(eq)[2], pred = p)
}
set.seed(123)
x <- runif(20)
y <- runif(20)
test_eqn(x,y)$eq
#> [1] "Y == \"0.57\" ~ X^~\"0.413\""
data <- data.frame(x = x,
y = y,
f = sample(c("A","B"), 20, replace = TRUE)) %>%
group_by(f) %>%
mutate(
label = test_eqn(y,x)$eq, # add label
labelx = mean(x),
labely = mean(y),
pred = test_eqn(y,x)$pred # add prediction
)
# plot fits (use slice(1) to avoid multiple copies of labels)
ggplot(data) +
geom_point(aes(x = x, y = y)) +
geom_line(aes(x = x, y = pred), colour = "red") +
geom_text(data = slice(data, 1), aes(x = labelx, y = labely, label = label), parse = TRUE) +
facet_wrap("f")
Created on 2021-10-20 by the reprex package (v2.0.1)
I am trying to add lm model coefs of two parallel modelling results onto the same ggplot plot. Here is my working example:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x <- rnorm(100, 1),
y <- rnorm(100, 10),
lev <- gl(n = 2, k = 50, labels = letters[1:2])
)
mod1 <- lm(y~x, dat = dat[lev %in% "a", ])
r1 <- paste("R^2==", round(summary(mod1)[[9]], 3))
p1<- paste("p==", round(summary(mod1)[[4]][2, 4], 3), sep= "")
lab1 <- paste(r1, p1, sep =",")
mod2 <- lm(y~x, dat = dat[lev %in% "b", ])
r2 <- paste("R^2==", round(summary(mod2)[[9]], 3))
p2 <- paste("p==", round(summary(mod2)[[4]][2, 4], 3), sep= "")
lab2 <- paste(r2, p2, sep =",")
ggplot(dat, aes(x = x, y = y, col = lev)) + geom_jitter() + geom_smooth(method = "lm") + annotate("text", x = 2, y = 12, label = lab1, parse = T) + annotate("text", x = 10, y = 8, label = lab2, parse = T)
Here is the promot shows:
Error in parse(text = text[[i]]) : <text>:1:12: unexpected ','
1: R^2== 0.008,
Now the problem is that I could label either R2 or p value seperately, but not both of them together. How could I do to put the two results into one single line on the figure?
BTW, any other efficienty way of doing the same thing as my code? I have nine subplots that I want to put into one full plot, and I don't want to add them one by one.
++++++++++++++++++++++++++ Some update ++++++++++++++++++++++++++++++++++
Following #G. Grothendieck 's kind suggestion and idea, I tried to wrap the most repeatative part of the codes into a function, so I could finish all the plot with a few lines. Now the problem is that, whatever I changed the input variables, the output plot are basically the same, except the axis labels. Can anyone explain why? The following is the working code I used:
library(ggplot2)
library(ggpubr)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
z = rnorm(100, 25),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
test <- function(dat, x, y){
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
p <- ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
return(p)
}
ggarrange(test(dat, x, z), test(dat, y, z))
There are several problems here:
x, y and lev are arguments to data.frame so they must be specified using = rather than <-
make use of the subset= argument in lm
use sprintf instead of paste to simplify the specification of labels
label the text strings a and b and make them the same color as the corresponding lines to identify which is which
the formula syntax needs to be corrected. See fmt below.
it would be clearer to use component names and accessor functions of the summary objects where available
use TRUE rather than T because the latter can be overridden if there is a variable called T but TRUE can never be overridden.
use hjust=0 and adjust the x= and y= in annotate to align the two text strings
combine the annotate statements
place the individual terms of the ggplot statement on separate lines for improved readability
This gives:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
Unless I'm misunderstanding your question, the problem's with the parse = T arguments to your annotate calls. I don't think your strings need to be parsed. Try parse = F instead, or just drop the parameter, as the default value seems to be FALSE anyway
I am trying to write a function that will spit out model diagnostic plots.
to_plot <- function(df, model, response_variable, indep_variable) {
resp_plot <-
df %>%
mutate(model_resp = predict.glm(model, df, type = 'response')) %>%
group_by(indep_variable) %>%
summarize(actual_response = mean(response_variable),
predicted_response = mean(model_resp)) %>%
ggplot(aes(indep_variable)) +
geom_line(aes(x = indep_variable, y = actual_response, colour = "actual")) +
geom_line(aes(x = indep_variable, y = predicted_response, colour = "predicted")) +
ylab(label = 'Response')
}
When I run this over a dataset, dplyr throws an error that I don't understand:
fit <- glm(data = mtcars, mpg ~ wt + qsec + am, family = gaussian(link = 'identity')
to_plot(mtcars, fit, mpg, wt)
Error in grouped_df_impl(data, unname(vars), drop) :
Column `indep_variable` is unknown
Based on some crude debugging, I found that the error happens in the group_by step, so it could be related to how I'm calling the columns in the function. Thanks!
This code seems to fix it. As the commenters above mention, variables passed in to the function must be wrapped in the "enquo" function and then unwrapped with the !!. Note the aes() function becomes aes_() when working with strings.
library(tidyverse)
to_plot <- function(df, model, response_variable, indep_variable) {
response_variable <- enquo(response_variable)
indep_variable <- enquo(indep_variable)
resp_plot <-
df %>%
mutate(model_resp = predict.glm(model, df, type = 'response')) %>%
group_by(!!indep_variable) %>%
summarize(actual_response = mean(!!response_variable),
predicted_response = mean(model_resp)) %>%
ggplot(aes_(indep_variable)) +
geom_line(aes_(x = indep_variable, y = quote(actual_response)), colour = "blue") +
geom_line(aes_(x = indep_variable, y = quote(predicted_response)), colour = "red") +
ylab(label = 'Response')
return(resp_plot)
}
fit <- glm(data = mtcars, mpg ~ wt + qsec + am, family = gaussian(link = 'identity'))
to_plot(mtcars, fit, mpg, wt)
I am new to R and have been trying to figure this out for a while. Basically, I have a data frame, and various y variables. I am trying to write a function that will allow me to come up with a customized graph template for the many different y variables that I have. I am trying the following code below but I am met with this error:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In aes_string(xvar[max(which(complete.cases(yvar)))], yvar[max(which(complete.cases(yvar)))], :
NAs introduced by coercion
The code works if I add the variables in directly and not through a function. I believe that it is something to do with how the function plugs in the xvar into the as.numeric() function. I am not sure but any of you knows how to deal with this?
test <- function (Data, xvar, yvar){
# Plot data
plot <- ggplot(subset(Data,!is.na((yvar))), aes_string(xvar, yvar)) + geom_line(colour="darkblue") + theme_bw()
# Add Trendline for recent data
plot <- plot + geom_smooth(data=subset(Data, xvar > as.numeric(xvar)[max(which(complete.cases(yvar)))-8]), method = "lm")
# Label most recent data
plot + geom_text(data = Data, aes_string(xvar[max(which(complete.cases(yvar)))],
yvar[max(which(complete.cases(yvar)))],
label = as.numeric(yvar)[max(which(complete.cases(yvar)))],
hjust= -0.5, vjust = 0.5))
As xvar is probably (you do not show a reproducible example) a character vector of length 1, subsetting like xvar[] will not yield the desired result.
You could try something like
library(ggplot2)
f <- function(data, xvar, yvar) {
ggplot(data, aes_string(xvar, yvar)) +
geom_point() +
geom_smooth(data=subset(data, eval(parse(text=xvar)) > 5), method = "lm")
}
or
f <- function(data, xvar, yvar) {
ggplot(data, aes_string(xvar, yvar)) +
geom_point() +
geom_smooth(data = data[data[, xvar]>5, ], method = "lm")
}
f(mtcars, "cyl", "disp")
I think #LukeA has gotten you practically all the way there, but here is an example that uses your data and adds a few more columns to help demonstrate how you can pass column names into ggplot inside your own function.
It uses your variable names. It subsets your data into a data.frame with non-missing values for y, and then it subsets your data into a separate data.frame that allows you to add additional filtering criteria to your smoothing function.
library(zoo)
set.seed(72)
X1 <- as.yearqtr(seq(as.Date("2010/3/1"), by = "quarter", length.out = 10))
Y1 <- as.vector(c(124,315,363,574,345,434,141,512,142,647))
Y2 <- sample(Y1)
Y3 <- sample(Y1)
Data1 <- data.frame(X1, Y1, Y2, Y3)
plot_function <- function(data, xvar, yvar){
# remove rows with NA on yvar
mydata1 <- data[!is.na(data[, yvar]), ]
# remove rows with NA on yvar and subset yvar above some threshold
mydata2 <- data[!is.na(data[, yvar]) & data[, yvar] > 400, ]
# plot it
myplot <- ggplot(mydata1, aes_string(xvar, yvar)) +
geom_line(colour="darkblue") +
scale_x_yearqtr(limits = c(min(mydata1[, xvar]), max(mydata1[, xvar])), format = "%YQ%q") +
geom_smooth(data = mydata2, aes_string(xvar, yvar), method = "lm") +
geom_text(data = mydata1, aes_string(xvar, yvar, label = yvar), hjust= -0.5, vjust = 0.5) +
theme_bw()
return(myplot)
}
plot_function(data = Data1, xvar = "X1", yvar = "Y1")
plot_function(data = Data1, xvar = "X1", yvar = "Y2")
plot_function(data = Data1, xvar = "X1", yvar = "Y3")