How to remove the quotation mark in formula object? - r

Here is an example:
formula <- Y ~ A + B + C + D + E + F + G
pryr::substitute_q(formula, list(Y = as.name('Ya + Yb')))
# `Ya + Yb` ~ A + B + C + D + E + F + G
what I am hoping for is:
Ya + Yb ~ A + B + C + D + E + F + G
I have tried noquote(), as.symbol(), as.name() so on so on, but none of them work.

Why not using update from base?
update(formula, Ya + Yb ~ .)
# Ya + Yb ~ A + B + C + D + E + F + G
or
x <- "Ya + Yb"
update(formula, paste(x, "~ ."))
# Ya + Yb ~ A + B + C + D + E + F + G

pryr::substitute_q(formula, list(Y = quote(Ya + Yb)))
# Ya + Yb ~ A + B + C + D + E + F + G

Related

Wrong legend in ggplot output

The output of this code gives a distribution and two vertical lines, one red and one blue. But in the legend the blue line is marked "red" and vice versa. What might be the reason? Distribution and 2 vertical lines
variances <- apply(matrix(rexp(40*1000,0.2),1000),1,var)
hist(variances)
v_theo <- 45 ## need to define v_theo
g <- ggplot(data.frame(x=variances), aes(x = x))
g <- g + geom_density(alpha=0.2,size=1,fill="red")
g <- g + geom_vline(aes(xintercept = mean(variances),color="red"), size=1)
g <- g + geom_vline(aes(xintercept = (v_theo),color="blue"), size=1)
g
library(ggplot2)
variances <- apply(matrix(rexp(40*1000,0.2),1000),1,var)
hist(variances)
v_theo <- 45
g <- ggplot(data.frame(x=variances), aes(x = x))
g <- g + geom_density(alpha=0.2,size=1,fill="red")
g <- g + geom_vline(aes(xintercept = v_theo, color="blue"), size=1)
g
g <- ggplot(data.frame(x=variances), aes(x = x))
g <- g + geom_density(alpha=0.2,size=1,fill="red")
g <- g + geom_vline(aes(xintercept = mean(variances),color="mean"), size=1)
g <- g + geom_vline(aes(xintercept = v_theo,color="v_theo"), size=1) +
scale_color_manual(name = "Legend name", values = c(mean = "red", v_theo = "blue"))
g
See here as well:
Add legend to geom_vline
That's because the colors are mapped by the aes function. If you want to map them manually, you could either take them out of the aes like this
variances <- apply(matrix(rexp(40*1000,0.2),1000),1,var)
hist(variances)
g <- ggplot(data.frame(x=variances), aes(x = x))
g <- g + geom_density(alpha=0.2,size=1,fill="red")
g <- g + geom_vline(aes(xintercept = mean(variances)), color="red", size=1)
g <- g + geom_vline(aes(xintercept = (v_theo)), color="blue", size=1)
g
You'll lose the legen by doing this though. If you want the legend, you can use scale_color_manual to fix the order of the colors.
variances <- apply(matrix(rexp(40*1000,0.2),1000),1,var)
hist(variances)
g <- ggplot(data.frame(x=variances), aes(x = x))
g <- g + geom_density(alpha=0.2,size=1,fill="red")
g <- g + geom_vline(aes(xintercept = mean(variances),color="red"), size=1)
g <- g + geom_vline(aes(xintercept = (v_theo),color="blue"), size=1)
g <- g + scale_color_manual(values = c("blue", "red"))
g

Big O of shrinking list?

Want to make sure I have this right.
int n = 20;
while (n > 0)
int index = 0
while (index < n)
index++
n--
The Big O of this is:
n + (n-1) + (n-2) + (n-3) + … ++ (n-n)
Is that still technically O(N)?
Prove by induction:
1 + 2 + 3 + ... + n = n(n + 1) / 2
1 + 2 + 3 + ... + n = O(n^2)
Base case:
n = 1
1 = (1 + 1) / 2
1 = 2 / 2
1 = 1
Assume true up to k for k < n:
1 + 2 + 3 + ... + k = k(k + 1) / 2
Prove true for n = k + 1
1 + 2 + 3 + ... + k + (k + 1) = (k + 1)(k + 1 + 1) / 2
k(k + 1)/2 + (k + 1) = (k + 1)(k + 1 + 1) / 2
k(k + 1)/2 + 2(k + 1) / 2 = (k + 1)(k + 1 + 1) / 2
(k^2 + k)/2 + (2k + 2) / 2 = (k + 1)(k + 1 + 1) / 2
(k^2 + k + 2k + 2) / 2 = (k + 1)(k + 1 + 1) / 2
(k^2 + 3k + 2) / 2 = (k + 1)(k + 2) / 2
(k^2 + 3k + 2) / 2 = (k^2 + 2k + k + 2) / 2
(k^2 + 3k + 2) / 2 = (k^2 + 3k + 2) / 2
Therefore:
1 + 2 + 3 + ... + n = n(n + 1) / 2
1 + 2 + 3 + ... + n = (n^2 + n) / 2
1 + 2 + 3 + ... + n = O(n^2)
If you work it out, it's the Nth triangular number - and therefore:
O(N(N + 1) / 2)

Squared transform in coord ggplot2

I have data where I think that y^2 ~ x.
So, I want to plot y as a function of x with some transformed scaled for y.
N <- 100
ggplot(data_frame(x = runif(N), y = 20 * sqrt(x) + rnorm(N)), aes(x, y)) +
geom_point()
+ scale_y_square??
You need to make a new transformation with scales::trans_new and to use it with coord_trans:
N <- 100
ggplot(data_frame(x = runif(N), y = 20 * sqrt(x) + rnorm(N)), aes(x, y)) +
geom_point() +
coord_trans(y = scales::trans_new("square", function(x) x^2, "sqrt"))

Overlaying different vlines in R with ggplot facet_wrap

I am trying to produce a set of density plots showing the difference in expression level distributions for two sets of genes in four cell types. In addition to the density plots, I would like to have the median expression level for both groups overlaid onto each plot. Based on answers to a few similar questions, I've been able to get correct plots OR correct medians but not both at the same time. I'm out of ideas and hoping someone can set me right. Thanks!
Sample data is available here: https://github.com/adadiehl/sample_data/blob/master/sample.data
First Attempt. Produces correct plots, but same medians are plotted on all four:
dat = read.table("sample.data")
g = ggplot(dat[which(dat$FPKM > 0),], aes(x = FPKM))
g = g + geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2)
g = g + geom_vline(data=dat, aes(xintercept = median(dat$FPKM[ which(dat$FPKM > 0 & dat$class == "Other") ]) ), colour="turquoise3", linetype="longdash")
g = g + geom_vline(data=dat, aes(xintercept = median(dat$FPKM[ which(dat$FPKM > 0 & dat$class == "a_MCKG") ]) ), colour="tomato1", linetype="longdash")
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + ggtitle("Distribution of FPKM, MCKG vs. Other")
g = g + xlab("FPKM > 0")
Second Attempt: Correct plots but places all medians on all plots:
dat = read.table("sample.data")
vline.dat = data.frame(z=levels(dat$source), vl=tapply(dat$FPKM[which(dat$class != "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class != "a_MCKG" & dat$FPKM > 0)], median), vm=tapply(dat$FPKM[which(dat$class == "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class == "a_MCKG" & dat$FPKM > 0)], median))
g = ggplot(dat[which(dat$FPKM > 0),], aes(x = FPKM))
g = g + geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2)
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + geom_vline(data=vline.dat, aes(xintercept = vl), colour="turquoise3", linetype="longdash")
g = g + geom_vline(data=vline.dat, aes(xintercept = vm), colour="tomato1", linetype="longdash")
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + ggtitle("Distribution of FPKM, MCKG vs. Other")
g = g + xlab("FPKM > 0")
Third Attempt: Plots are all the same but have correct medians.
dat = read.table("sample.data")
vline.dat = data.frame(z=levels(dat$source), vl=tapply(dat$FPKM[which(dat$class != "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class != "a_MCKG" & dat$FPKM > 0)], median), vm=tapply(dat$FPKM[which(dat$class == "a_MCKG" & dat$FPKM > 0)], dat$source[which(dat$class == "a_MCKG" & dat$FPKM > 0)], median))
g = ggplot(dat[which(dat$FPKM > 0),], aes(x = FPKM))
g = g + geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2)
g = g + facet_wrap(~source, ncol=2, scales="free")
g = g + geom_vline(data=vline.dat, aes(xintercept = vl), colour="turquoise3", linetype="longdash")
g = g + geom_vline(data=vline.dat, aes(xintercept = vm), colour="tomato1", linetype="longdash")
g = g + facet_wrap(~z, ncol=2, scales="free")
g = g + ggtitle("Distribution of FPKM, MCKG vs. Other")
g = g + xlab("FPKM > 0")
Passing pre-summarized data is the way to go:
library(plyr)
names(dat) <- c("FPKM", "class", "source")
dat2 <- subset(dat, FPKM > 0)
ggplot(dat2, aes(x = FPKM)) +
geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2) +
geom_vline(data = ddply(dat2, .(source, class), summarize, mmed = median(FPKM)),
aes(xintercept = mmed, color = class)) +
facet_wrap(~ source, ncol = 2, scales = "free") +
ggtitle("Distribution of FPKM, MCKG vs. Other") +
xlab("FPKM > 0")
Alternatively, you can achieve the same with base R:
dat3 <- aggregate(FPKM ~ source + class, data = dat2, FUN = median)
ggplot(dat2, aes(x = FPKM)) +
geom_density(aes(y = ..density.., group = class, color = class, fill = class), alpha=0.2) +
geom_vline(data = dat3,
aes(xintercept = FPKM, color = class)) +
facet_wrap(~ source, ncol = 2, scales = "free") +
ggtitle("Distribution of FPKM, MCKG vs. Other") +
xlab("FPKM > 0")
N.B. You may want to avoid column names such as source and class as these conflict with built-in functions.

How to convert R formula to text?

I have trouble working with formula as with text. What I'm trying to do is to concatenate the formula to the title of the graph. However, when I try to work with the formula as with text, I fail:
model <- lm(celkem ~ rok + mesic)
formula(model)
# celkem ~ rok + mesic
This is fine. Now I want to build string like "my text celkem ~ rok + mesic" - this is where the problem comes:
paste("my text", formula(model))
# [1] "my text ~" "my text celkem" "my text rok + mesic"
paste("my text", as.character(formula(model)))
# [1] "my text ~" "my text celkem" "my text rok + mesic"
paste("my text", toString(formula(model)))
# [1] "my text ~, celkem, rok + mesic"
Now I see there is a sprint function in package gtools, but I think this is such a basic thing that it deserves a solution within the default environment!!
A short solution from the package formula.tools, as a function as.character.formula:
frm <- celkem ~ rok + mesic
Reduce(paste, deparse(frm))
# [1] "celkem ~ rok + mesic"
library(formula.tools)
as.character(frm)
# [1] "celkem ~ rok + mesic"
Reduce might be useful in case of long formulas:
frm <- formula(paste("y ~ ", paste0("x", 1:12, collapse = " + ")))
deparse(frm)
# [1] "y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + "
# [2] " x12"
Reduce(paste, deparse(frm))
# [1] "y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12"
Which is because of width.cutoff = 60L in ?deparse.
Try format :
paste("my text", format(frm))
## [1] "my text celkem ~ rok + mesic"
Simplest solution covering everything:
f <- formula(model)
paste(deparse(f, width.cutoff = 500), collapse="")
R 4.0.0 (released 2020-04-24) introduced deparse1 which never splits the result into multiple strings:
f <- y ~ a + b + c + d + e + f + g + h + i + j + k + l + m + n + o +
p + q + r + s + t + u + v + w + x + y + z
deparse(f)
# [1] "y ~ a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + " " p + q + r + s + t + u + v + w + x + y + z"
deparse1(f)
# [1] "y ~ a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z"
However, it still has a width.cutoff argument (default (an maximum): 500) after which linebreaks are introduced but with lines separated by collapse (default: " ") not \n, leaving extra white whitespace (even with collapse = "") (use gsub to remove them if needed, see Ross D's answer):
> f <- rlang::parse_expr( paste0("y~", paste0(rep(letters, 20), collapse="+")))
> deparse1(f, collapse = "")
[1] "y ~ a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u + v + w + x + y + z"
To use it in R < 4.0.0 use backports (recommended)
or copy it's implementation:
# Part of the R package, https://www.R-project.org
#
# Copyright (C) 1995-2019 The R Core Team
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# A copy of the GNU General Public License is available at
# https://www.R-project.org/Licenses/
deparse1 <- function (expr, collapse = " ", width.cutoff = 500L, ...)
paste(deparse(expr, width.cutoff, ...), collapse = collapse)
or as an alternative to Julius's version (note: your code was not self-contained)
celkem = 1
rok = 1
mesic = 1
model <- lm(celkem ~ rok + mesic)
paste("my model ", deparse(formula(model)))
The easiest way is this:
f = formula(model)
paste(f[2],f[3],sep='~')
done!
Here a solution which use print.formula, it seems trick but it do the job in oneline and avoid the use of deparse and no need to use extra package. I just capture the output of the print formula, using capture.output
paste("my text",capture.output(print(formula(celkem ~ rok + mesic))))
[1] "my text celkem ~ rok + mesic"
In case of long formula:
ff <- formula(paste("y ~ ", paste0("x", 1:12, collapse = " + ")))
paste("my text",paste(capture.output(print(ff)), collapse= ' '))
"my text y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12"
Another deparse-based solution is rlang::expr_text() (and rlang::quo_text()):
f <- Y ~ 1 + a + b + c + d + e + f + g + h + i +j + k + l + m + n + o + p + q + r + s + t + u
rlang::quo_text(f)
#> [1] "Y ~ 1 + a + b + c + d + e + f + g + h + i + j + k + l + m + n + \n o + p + q + r + s + t + u"
They do have a width argument to avoid line breaks, but that is limited to 500 characters too. At least it's a single function that is most likely loaded already...
Then add gsub to remove white spaces
gsub(" ", "", paste(format(frm), collapse = ""))
Was optimizing some functions today. A few approaches that have not been mentioned so far.
f <- Y ~ 1 + a + b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s + t + u
bench::mark(
expression = as.character(as.expression(f)),
deparse = deparse(f, width.cutoff = 500L),
deparse1 = deparse1(f),
tools = formula.tools:::as.character.formula(f),
stringi = stringi::stri_c(f),
I = as.character(I(f)),
as = as(f, "character"),
txt = gettext(f),
txtf = gettextf(f),
sub = sub("", "", f),
chr = as.character(f),
str = substring(f, 1L),
paste = paste0(f),
)[c(1, 3, 5, 7)]
#> # A tibble: 13 x 3
#> expression median mem_alloc
#> <bch:expr> <bch:tm> <bch:byt>
#> 1 expression 15.4us 0B
#> 2 deparse 31us 0B
#> 3 deparse1 34us 0B
#> 4 tools 58.7us 1.74MB
#> 5 stringi 67us 3.09KB
#> 6 I 64.1us 0B
#> 7 as 100.5us 521.61KB
#> 8 txt 83.4us 0B
#> 9 txtf 85.8us 3.12KB
#> 10 sub 64.6us 0B
#> 11 chr 60us 0B
#> 12 str 62.8us 0B
#> 13 paste 63.5us 0B

Resources