How to use reformulate in R when variables have embedded spaces - r

What is the proper string parsing required to use reformulate() when the termlabels have embedded spaces?
This works:
reformulate(c("A", "B"), "Y")
Y ~ A + B
These all fail:
reformulate(c("A var", "B"), "Y")
reformulate(quote(c("A var", "B")), "Y")
reformulate(as.formula(quote(c("A var", "B"))), "Y")
Expected results:
Y ~ `A var` + B
# or
Y ~ `A var` + `B`
NOTE
I cannot hard code the backticks. This is part of a larger shiny application, therefore, if backticks are the answer, I need a method to do this programmatically.

Here are a few other ways that work with symbols rather than strings (so no need for explicit backticks).
input <- "A var"
eval(bquote( Y ~ .(as.name(input)) + B))
# Y ~ `A var` + B
eval(substitute( Y ~ INPUT + B, list(INPUT = as.name(input))))
# Y ~ `A var` + B
library(rlang)
eval(expr(Y ~ !!sym(input) + B))
# Y ~ `A var` + B

Use backticks, e.g.
reformulate(c("`A var`", "B"), "Y")
#Y ~ `A var` + B
Or better yet, don't use spaces in variable names.
Or with a helper function
bt <- function(x) sprintf("`%s`", x)
reformulate(c(bt(var1), var2), "Y")
#Y ~ `A var` + B

Related

Iterate over multiple dependent variables (columns) of an sp dataframe when using krige.cv

I have a SpatialPointsDataframe called rain and I would like to fit a variogram and perfom cross-validation for each one of its last 10 columns (dependent variables) like below:
fit.reg.vgm <- autofitVariogram(
column (dependent variable) ~ X + Y + Z + AS + SL,
rain,
model = c("Sph", "Exp", "Gau", "Lin", "Log"),
fix.values = c(NA, NA, NA),
verbose = FALSE,
GLS.model = NA,
start_vals = c(NA, NA, NA),
miscFitOptions = list()
)
cv <-krige.cv(column (dependent variable) ~ X + Y + Z + AS + SL, rain, fit.reg.vgm$var_model)
Does anyone know how to construct such a for-loop?
Thanks in advance!
You will need to construct a formula. Try formula() and paste(). Something along the lines of
x <- c("a", "b", "c")
out <- list()
for (i in seq_along(x)) {
out[[i]] <- formula(paste(x[i], "~ X + Y + Z"))
}
> out
[[1]]
a ~ X + Y + Z
[[2]]
b ~ X + Y + Z
[[3]]
c ~ X + Y + Z
An option with reformulate
out <- vector('list', length(x))
for(i in seq_along(x)) {out[[i]] <- reformulate(c("X", "Y", "Z"), response = x[i]) }
out
#[[1]]
#a ~ X + Y + Z
#[[2]]
#b ~ X + Y + Z
#[[3]]
#c ~ X + Y + Z

Labelling R2 and p value in ggplot?

I am trying to add lm model coefs of two parallel modelling results onto the same ggplot plot. Here is my working example:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x <- rnorm(100, 1),
y <- rnorm(100, 10),
lev <- gl(n = 2, k = 50, labels = letters[1:2])
)
mod1 <- lm(y~x, dat = dat[lev %in% "a", ])
r1 <- paste("R^2==", round(summary(mod1)[[9]], 3))
p1<- paste("p==", round(summary(mod1)[[4]][2, 4], 3), sep= "")
lab1 <- paste(r1, p1, sep =",")
mod2 <- lm(y~x, dat = dat[lev %in% "b", ])
r2 <- paste("R^2==", round(summary(mod2)[[9]], 3))
p2 <- paste("p==", round(summary(mod2)[[4]][2, 4], 3), sep= "")
lab2 <- paste(r2, p2, sep =",")
ggplot(dat, aes(x = x, y = y, col = lev)) + geom_jitter() + geom_smooth(method = "lm") + annotate("text", x = 2, y = 12, label = lab1, parse = T) + annotate("text", x = 10, y = 8, label = lab2, parse = T)
Here is the promot shows:
Error in parse(text = text[[i]]) : <text>:1:12: unexpected ','
1: R^2== 0.008,
Now the problem is that I could label either R2 or p value seperately, but not both of them together. How could I do to put the two results into one single line on the figure?
BTW, any other efficienty way of doing the same thing as my code? I have nine subplots that I want to put into one full plot, and I don't want to add them one by one.
++++++++++++++++++++++++++ Some update ++++++++++++++++++++++++++++++++++
Following #G. Grothendieck 's kind suggestion and idea, I tried to wrap the most repeatative part of the codes into a function, so I could finish all the plot with a few lines. Now the problem is that, whatever I changed the input variables, the output plot are basically the same, except the axis labels. Can anyone explain why? The following is the working code I used:
library(ggplot2)
library(ggpubr)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
z = rnorm(100, 25),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
test <- function(dat, x, y){
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
p <- ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
return(p)
}
ggarrange(test(dat, x, z), test(dat, y, z))
There are several problems here:
x, y and lev are arguments to data.frame so they must be specified using = rather than <-
make use of the subset= argument in lm
use sprintf instead of paste to simplify the specification of labels
label the text strings a and b and make them the same color as the corresponding lines to identify which is which
the formula syntax needs to be corrected. See fmt below.
it would be clearer to use component names and accessor functions of the summary objects where available
use TRUE rather than T because the latter can be overridden if there is a variable called T but TRUE can never be overridden.
use hjust=0 and adjust the x= and y= in annotate to align the two text strings
combine the annotate statements
place the individual terms of the ggplot statement on separate lines for improved readability
This gives:
library(ggplot2)
set.seed(100)
dat <- data.frame(
x = rnorm(100, 1),
y = rnorm(100, 10),
lev = gl(n = 2, k = 50, labels = letters[1:2])
)
fmt <- "%s: Adj ~ R^2 == %.3f * ',' ~ {p == %.3f}"
mod1 <- lm(y ~ x, dat, subset = lev == "a")
sum1 <- summary(mod1)
lab1 <- sprintf(fmt, "a", sum1$adj.r.squared, coef(sum1)[2, 4])
mod2 <- lm(y ~ x, dat, subset = lev == "b")
sum2 <- summary(mod2)
lab2 <- sprintf(fmt, "b", sum2$adj.r.squared, coef(sum2)[2, 4])
colors <- 1:2
ggplot(dat, aes(x = x, y = y, col = lev)) +
geom_jitter() +
geom_smooth(method = "lm") +
annotate("text", x = 2, y = c(12, 8), label = c(lab1, lab2),
parse = TRUE, hjust = 0, color = colors) +
scale_color_manual(values = colors)
Unless I'm misunderstanding your question, the problem's with the parse = T arguments to your annotate calls. I don't think your strings need to be parsed. Try parse = F instead, or just drop the parameter, as the default value seems to be FALSE anyway

R: Dynamically update formula

How can I dynamically update a formula?
Example:
myvar <- "x"
update(y ~ 1 + x, ~ . -x)
# y ~ 1 (works as intended)
update(y ~ 1 + x, ~ . -myvar)
# y ~ x (doesn't work as intended)
update(y ~ 1 + x, ~ . -eval(myvar))
# y ~ x (doesn't work as intended)
You can use paste() within the update()call.
myvar <- "x"
update(y ~ 1 + x, paste(" ~ . -", myvar))
# y ~ 1
Edit
As #A.Fischer noted in the comments, this won't work if myvar is a vector of length > 1
myvar <- c("k", "l")
update(y ~ 1 + k + l + m, paste(" ~ . -", myvar))
# y ~ l + m
# Warning message:
# Using formula(x) is deprecated when x is a character vector of length > 1.
# Consider formula(paste(x, collapse = " ")) instead.
Just "k" gets removed, but "l" remains in the formula.
In this case we could transform the formula into a strings, add/remove what we want to change and rebuild the formula using reformulate, something like:
FUN <- function(fo, x, negate=FALSE) {
foc <- as.character(fo)
s <- el(strsplit(foc[3], " + ", fixed=T))
if (negate) {
reformulate(s[!s %in% x], foc[2], env=.GlobalEnv)
} else {
reformulate(c(s, x), foc[2], env=.GlobalEnv)
}
}
fo <- y ~ 1 + k + l + m
FUN(fo, c("n", "o")) ## add variables
# y ~ 1 + k + l + m + n + o
FUN(fo, c("k", "l"), negate=TRUE)) ## remove variables
# y ~ 1 + m

ggplot of lm() with equation [duplicate]

I have read many postings on this topic using expression(), paste(), and bquote(), or some combination. I think I am close to solving my problem, but I just can't get there. The following script generates a plot labelled with "y = 1 + 2(x); r^2= 0.9". How can I italicize "y" and "x", and italicize the "r" and superscript the 2 of "r^2"? If I have overlooked a relevant earlier post, sorry, but please direct me to it.
df <- data.frame(x=c(1:5), y=c(1:5))
a <- 1
b <- 2
r2 <- 0.9
eq <- paste("y = ", a, " + ", b, "(x); r^2=", r2)
ggplot(data=df, aes(x=x, y=y))+
geom_point(color="black")+
geom_text(x=2, y=4,label=eq, parse=FALSE)
You could use annotate() which allows you to paste directly into the plot.
library(ggplot2)
ggplot(data=df, aes(x=x, y=y)) +
geom_point(color="black") +
annotate('text', 2.5, 4,
label=paste("italic(y)==", a, "+", b,
"~italic(x)~';'~italic(r)^2==", r2),
parse=TRUE,
hjust=1, size=5)
Yields:
Data:
df <- data.frame(x=c(1:5), y=c(1:5))
a <- 1
b <- 2
r2 <- 0.9
You can use a combination of substitute and plotmath (https://www.rdocumentation.org/packages/grDevices/versions/3.5.1/topics/plotmath) to italicize the text-
# setup
set.seed(123)
library(ggplot2)
# dataframe
df <- data.frame(x = c(1:5), y = c(1:5))
# label
eq <- substitute(
expr =
paste(
italic("y"),
" = ",
a,
" + ",
b,
"(",
italic("x"),
"); ",
italic("r") ^ 2,
" = ",
r2
),
env = base::list(a = 1,
b = 2,
r2 = 0.9)
)
# plot
ggplot(data = df, aes(x = x, y = y)) +
geom_point(color = "black") +
labs(subtitle = eq)
Created on 2018-12-04 by the reprex package (v0.2.1)
In addition to the answer by Indrajit Patil & jay-sf, I would like to add that there is an automated way to fit regression lines (I believe there are many), using a package called ggpmisc. The letters that you want in italic, are already formatted in such a way. The code that needs to be used is:
> install.packages('ggpmisc'); library(ggpmisc); formula <- y ~ x
> df <- data.frame(x=c(1:5), y=c(1:5))
> ggplot(data = df, aes(x, y)) + geom_point(color="black") +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(aes(label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),
formula = formula, parse = TRUE)
It shows the fitted lines also, which I hope is not an impediment to the main goal.
EDIT: The line can be removed using linetype = 0, compatible with
most of the aesthetics in ggplot2.
... + geom_smooth(method = "lm", formula = formula, linetype = 0) + ...

Remove unwanted symbols from expression function - R

I am using the below function (found here) to generate linear model equations.
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
lm_eqn <- function(df){
m <- lm(y ~ x, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
lm_eqn(df)
[1] "italic(y) == \"14\" + \"3\" %.% italic(x) * \",\" ~ ~italic(r)^2 ~ \"=\" ~ \"0.806\""
However, this function was built for use in ggplot2, meaning it includes specific expression symbols that ggplot2 recognises and acts upon. I am using this function for something else. How can I alter the code so that I just end up with "y = 14 + 3x, r2=0.806"? Thank you.
If this is about dynamically generating/formatting an output string, you can also use stringr::str_interp:
# Sample data
set.seed(2017);
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
# Fit
m <- lm(y ~ x, df);
# Extract coefficients and generate string
a <- coef(m)[1];
b <- coef(m)[2];
r2 <- summary(m)$r.squared;
stringr::str_interp("y = $[2.0f]{a} + $[2.0f]{b} x, R2 = $[4.3f]{r2}")
#[1] "y = 9 + 3 x, R2 = 0.793"
Or use sprintf:
sprintf("y = %2.0f + %2.0f x, R2 = %4.3f", a, b, r2);
#[1] "y = 9 + 3 x, R2 = 0.793"
We can use glue
as.character(glue::glue("y = {round(a)} + {round(b)} x, R2 = {round(r2, 3)}"))
#[1] "y = 9 + 3 x, R2 = 0.793"
NOTE: Data based on #MauritsEvers post
Ah, found it.
g<-as.character("y = a + b x, R2= r2 ")
library(magrittr)
g %<>%
gsub("a", format(coef(m)[1], digits = 2), .) %>%
gsub("b", format(coef(m)[2], digits = 2), .) %>%
gsub("r2", format(summary(m)$r.squared, digits = 3), .)
g
[1] "y = 14 + 3 x, R2= 0.806 "

Resources