knitr does not hook scientific notation in plots

knitr does not hook scientific notation in plots - r

When using knitr inline, it typos $\Sexpr{2.5e3}$ with $2.5 \times 10^3$ which is pretty nice. However I miss the fact that this does not work for the labels in plots. I guess knitr thinks there are character and not numeric. Is there a way to change that ?

As I discussed in the comments, getting this to happen automatically isn't so easy, but it's not too hard to get your scales translated. sfsmisc::eaxis() is a good solution for base plots. Here are some ggplot-style solutions:
If you want to use native ?plotmath-style formatting:
##scale function for plotting y-axis labels
scientific_10 <- function(x) {
s <- scales::scientific_format()(x)
## substitute for exact zeros
s[s=="0e+00"] <- "0"
## regex: [+]? = "zero or one occurrences of '+'"
parse(text=gsub("e[+]?", " %*% 10^", s ))
}
Or if you're using LaTeX/TikZ (I'm using Hmisc::latexSN() here. It's a 3-line function, so you could just copy it if you wanted to avoid dependencies or hack it):
scientific_latex <- function(x,scipen=-2) {
require(Hmisc)
op <- options(scipen=scipen) ## encourage use of scientific notation
on.exit(options(op)
s <- paste0("$",Hmisc::latexSN(x),"$")
}
Example:
set.seed(101)
d <- data.frame(trait=runif(1000),
time=runif(1000,0,1000))
library(ggplot2); theme_set(theme_bw())
## make plot
breaks.y<-seq(from=0, to=1000, by=200)
g0 <- ggplot(d,aes(trait,time))+
geom_point()
plotmath style:
g0 + scale_y_continuous(label=scientific_10,
breaks=breaks.y, limits=c(0,1000))
TikZ:
g0 + scale_y_continuous(label=scientific_latex,
breaks=breaks.y, limits=c(0,1000))

Well after #Ben Bolker comment, I add also my solution here, maybe more in the knitr mood as it is only sort of a hack to apply the knitrrendering to the axis label. Especially in a knitr document one often sets in the beginning digits and scipen and expects it to be applied everywhere.
So define in the setup chunk the function:
inline_hook <- function (x) {
if (is.numeric(x)) {
x = knitr:::format_sci(x, "latex")
i = grep("[^0-9.,]", x)
x[i] = sprintf("\\ensuremath{%s}", x[i])
if (getOption("OutDec") != ".")
x = sprintf("\\text{%s}", x)
}
if (is.numeric(x)) x = round(x, getOption("digits"))
x
}
which is just a mix of knitr::.inline.hook.tex and knitr:::.inline.hook with the final collapse discarded. Then modify the default behaviour of scale_y_continous according to this post:
scale_y_continuous <- function(...) ggplot2:::scale_y_continuous(..., labels=inline_hook)
Then from the above example:
set.seed(101)
d <- data.frame(trait=runif(1000),
time=runif(1000,0,1000))
library(ggplot2); theme_set(theme_bw())
## make plot
breaks.y<-seq(from=0, to=1000, by=200)
g0 <- ggplot(d,aes(trait,time))+
geom_point()
g0
I get by default:
which is exactly what knitr does to number given inline.

Related

Looping cut2 color argument in qplot

First off fair warning that this is relevant to a quiz question from coursera.org practical machine learning. However, my question does not deal with the actual question asked, but is a tangential question about plotting.
I have a training set of data and I am trying to create a plot for each predictor that includes the outcome on the y axis, the index of the data set on the x axis, and colors the plot by the predictor in order to determine the cause of bias along the index. To make the color argument more clear I am trying to use cut2() from the Hmisc package.
Here is my data:
library(ggplot2)
library(caret)
library(AppliedPredictiveModeling)
library(Hmisc)
data(concrete)
set.seed(1000)
inTrain = createDataPartition(mixtures$CompressiveStrength, p = 3/4)[[1]]
training = mixtures[ inTrain,]
testing = mixtures[-inTrain,]
training$index <- 1:nrow(training)
I tried this and it makes all the plots but they are all the same color.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(paste0("cutEx",i), cut2(x[ ,i]))
print(qplot(x$index, x$CompressiveStrength, color=paste0("cutEx",i)))
}
}
plotCols(training)
Then I tried this and it makes all the plots, and this time they are colored but the cut doesn't work.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(cols[i], cut2(x[ ,i]))
print(qplot(x$index, x$CompressiveStrength, color=x[ ,cols[i]]))
}
}
plotCols(training)
It seems qplot() doesn't like having paste() in the color argument. Does anyone know another way to loop through the color argument and still keep my cuts? Any help is greatly appreciated!

Your desired output is easier to achieve using ggplot() instead of qplot(), since you can use aes_string(), that accepts strings as arguments.
plotCols <- function(x) {
cols <- names(x)
for (i in 1:length(cols)) {
assign(paste0("cutEx", i), cut2(x[, i]))
p <- ggplot(x) +
aes_string("index", "CompressiveStrength", color = paste0("cutEx", i)) +
geom_point()
print(p)
}
}
plotCols(training)

Retrieve facet labels from a ggplot or a gtable/gTree/grob/gDesc object

I have data I'm plotting using ggplot's facet_grid:
My data:
species <- c("spcies1","species2")
conditions <- c("cond1","cond2","cond3")
batches <- 1:6
df <- expand.grid(species=species,condition=conditions,batch=batches)
set.seed(1)
df$y <- rnorm(nrow(df))
df$replicate <- 1
df$col.fill <- paste(df$species,df$condition,df$batch,sep=".")
My plot:
integerBreaks <- function(n = 5, ...)
{
library(scales)
breaker <- pretty_breaks(n, ...)
function(x){
breaks <- breaker(x)
breaks[breaks == floor(breaks)]
}
}
library(ggplot2)
p <- ggplot(df,aes(x=replicate,y=y,color=col.fill))+
geom_point(size=3)+facet_grid(~col.fill,scales="free_x")+
scale_x_continuous(breaks=integerBreaks())+
theme_minimal()+theme(legend.position="none",axis.title=element_text(size=8))
which gives:
Obviously the labels are long and come out pretty messed up in the figure so I was wondering if there's a way edit these labels in the ggplot object (p) or the gtable/gTree/grob/gDesc object (ggplotGrob(p)).
I am aware that one way of getting better labels is to use the labeller function when the ggplot object is created but in my case I'm specifically looking for a way to edit the facet labels after the ggplot object has been created.

As I mentioned in the comments, the facet names are nested quite deeply within the gtable that ggplotGrob() gives you. However, this is still possible and since the OP explicitly wants to edit them after being plotted, you can do this with:
library(grid)
gg <- ggplotGrob(p)
edited_grobs <- mapply(FUN = function(x, y) {
x[["grobs"]][[1]][["children"]][[2]][["children"]][[1]][["label"]] <- y
return(x)
},
gg$grobs[which(grepl("strip-t",gg$layout$name))],
unique(gsub("cond","c", df$condition)),
SIMPLIFY = FALSE)
gg$grobs[which(grepl("strip-t",gg$layout$name))] <- edited_grobs
grid.draw(gg)
Note that this extracts all the strips using gg$grobs[which(grepl("strip-t",gg$layout$name))] and passes them to the mapply to be reset with the gsub(...) that OP specified in their comment.
In general, if you want to access just one of the text labels, there is a very similar structure which I made use of in my mapply:
num_to_access <- 1
gg$grobs[which(grepl("strip-t",gg$layout$name))][[num_to_access]][["grobs"]][[1]][["children"]][[2]][["children"]][[1]]$label
So to access the 4th label for example all you would need to do is change num_to_acces to be 4. Hope this helps!

Rendering xtable with significant digits

I'd like to be able to render an xtable in an automatically run piece of code, i.e. NOT via copy-and-paste, while controlling the number of significant digits. The only way that I know to render an xtable on a standard plot device is by using grid.table, but that method ignores the digits directive and plots all available digits. Here's a code example. Any advice?
library(xtable)
library(gridExtra)
x = rnorm(100)
y = x + rnorm(100)
m = lm(y ~ x)
print(xtable(m)) #too many decimal places
print(xtable(m, digits = 2)) #this works
grid.table(xtable(m, digits=2)) #this doesn't!!!
None of the bits of advice here seem useful for automated rendering:
R: rendering xtable

If you convert everything to strings, you should be able to make this work:
x <- xtable(m)
x[] <- lapply(x, sprintf, fmt = "%0.2f")
grid.table(x)

I'm not sure of your final plot device, but for some purposes you can just skip xtable all together:
library("broom")
library("gridExtra")
x = rnorm(100)
y = x + rnorm(100)
m = lm(y ~ x)
DF <- broom::tidy(m)
DF[,2:4] <- round(DF[,2:4], 2)
DF[,5] <- format(DF[,5], scientific = TRUE, digits = 4)
grid.table(DF)
Make sure you have the latest gridExtra. You can also control the appearance of the table in great detail, via themes (there is a vignette on the topic).

Inverse of ggplotGrob?

I have a function which manipulates a ggplot object, by converting it to a grob and then modifying the layers. I would like the function to return a ggplot object not a grob. Is there a simple way to convert a grob back to gg?
The documentation on ggplotGrob is awfully sparse.
Simple example:
P <- ggplot(iris) + geom_bar(aes(x=Species, y=Petal.Width), stat="identity")
G <- ggplotGrob(P)
... some manipulation to G ...
## DESIRED:
P2 <- inverse_of_ggplotGrob(G)
such that, we can continue to use basic ggplot syntax, ie
`P2 + ylab ("The Width of the Petal")`
UPDATE:
To answer the question in the comment, the motivation here is to modify the colors of facet labels programmatically, based on the value of label name in each facet. The functions below work nicely (based on input from baptise in a previous question).
I would like for the return value from colorByGroup to be a ggplot object, not simply a grob.
Here is the code, for those interested
get_grob_strips <- function(G, strips=grep(pattern="strip.*", G$layout$name)) {
if (inherits(G, "gg"))
G <- ggplotGrob(G)
if (!inherits(G, "gtable"))
stop ("G must be a gtable object or a gg object")
strip.type <- G$layout[strips, "name"]
## I know this works for a simple
strip.nms <- sapply(strips, function(i) {
attributes(G$grobs[[i]]$width$arg1)$data[[1]][["label"]]
})
data.table(grob_index=strips, type=strip.type, group=strip.nms)
}
refill <- function(strip, colour){
strip[["children"]][[1]][["gp"]][["fill"]] <- colour
return(strip)
}
colorByGroup <- function(P, colors, showWarnings=TRUE) {
## The names of colors should match to the groups in facet
G <- ggplotGrob(P)
DT.strips <- get_grob_strips(G)
groups <- names(colors)
if (is.null(groups) || !is.character(groups)) {
groups <- unique(DT.strips$group)
if (length(colors) < length(groups))
stop ("not enough colors specified")
colors <- colors[seq(groups)]
names(colors) <- groups
}
## 'groups' should match the 'group' in DT.strips, which came from the facet_name
matched_groups <- intersect(groups, DT.strips$group)
if (!length(matched_groups))
stop ("no groups match")
if (showWarnings) {
if (length(wh <- setdiff(groups, DT.strips$group)))
warning ("values in 'groups' but not a facet label: \n", paste(wh, colapse=", "))
if (length(wh <- setdiff(DT.strips$group, groups)))
warning ("values in facet label but not in 'groups': \n", paste(wh, colapse=", "))
}
## identify the indecies to the grob and the appropriate color
DT.strips[, color := colors[group]]
inds <- DT.strips[!is.na(color), grob_index]
cols <- DT.strips[!is.na(color), color]
## Fill in the appropriate colors, using refill()
G$grobs[inds] <- mapply(refill, strip = G$grobs[inds], colour = cols, SIMPLIFY = FALSE)
G
}

I would say no. ggplotGrob is a one-way street. grob objects are drawing primitives defined by grid. You can create arbitrary grobs from scratch. There's no general way to turn a random collection of grobs back into a function that would generate them (it's not invertible because it's not 1:1). Once you go grob, you never go back.
You could wrap a ggplot object in a custom class and overload the plot/print commands to do some custom grob manipulation, but that's probably even more hack-ish.

You can try the following:
p = ggplotify::as.ggplot(g)
For more info, see https://cran.r-project.org/web/packages/ggplotify/vignettes/ggplotify.html
It involves a little bit of a cheat annotation_custom(as.grob(plot),...), so it may not work for all circumstances: https://github.com/GuangchuangYu/ggplotify/blob/master/R/as-ggplot.R

Have a look at the ggpubr package: it has a function as_ggplot(). If your grob is not too complex it might be a solution!
I would also advise to have a look at the patchwork package which combine nicely ggplots... it is likely to not be what you are looking for but... have a look.

How can I auto-title a plot with the R call that produced it?

R's plotting is great for data exploration, as it often has very intelligent defaults. For example, when plotting with a formula the labels for the plot axes are derived from the formula. In other words, the following two calls produce the same output:
plot(x~y)
plot(x~y, xlab="x", ylab="y")
Is there any way to get a similar "intelligent auto-title"?
For example, I would like to call
plot(x~y, main=<something>)
And produce the same output as calling
plot(x~y, main="plot(x~y)")
Where the <something> inserts the call used using some kind of introspection.
Is there a facility for doing this in R, either through some standard mechanism or an external package?
edit: One suggestion was to specify the formula as a string, and supply that as the argument to a formula() call as well as main. This is useful, but it misses out on parameters than can affect a plot, such as using subsets of data. To elaborate, I'd like
x<-c(1,2,3)
y<-c(1,2,3)
z<-c(0,0,1)
d<-data.frame(x,y,z)
plot(x~y, subset(d, z==0), main=<something>)
To have the same effect as
plot(x~y, subset(d, z==0), main="plot(x~y, subset(d, z==0))")

I don't think this can be done without writing a thin wrapper around plot(). The reason is that R evaluates "supplied arguments" in the evaluation frame of the calling function, in which there's no way to access the current function call (see here for details).
By contrast, "default arguments" are evaluated in the evaluation frame of the function, from where introspection is possible. Here are a couple of possibilities (differing just in whether you want "myPlot" or "plot" to appear in the title:
## Function that reports actual call to itself (i.e. 'myPlot()') in plot title.
myPlot <- function(x,...) {
cl <- deparse(sys.call())
plot(x, main=cl, ...)
}
## Function that 'lies' and says that plot() (rather than myPlot2()) called it.
myPlot2 <- function(x,...) {
cl <- sys.call()
cl[[1]] <- as.symbol("plot")
cl <- deparse(cl)
plot(x, main=cl, ...)
}
## Try them out
x <- 1:10
y <- 1:10
par(mfcol=c(1,2))
myPlot(x,y)
myPlot2(y~x)
Here's a more general solution:
plotCaller <- function(plotCall, ...) {
main <- deparse(substitute(plotCall))
main <- paste(main, collapse="\n")
eval(as.call(c(as.list(substitute(plotCall)), main=main, ...)))
}
## Try _it_ out
plotCaller(hist(rnorm(9999), breaks=100, col="red"))
library(lattice)
plotCaller(xyplot(rnorm(10)~1:10, pch=16))
## plotCaller will also pass through additional arguments, so they take effect
## without being displayed
plotCaller(xyplot(rnorm(10)~1:10), pch=16)
deparse will attempt to break deparsed lines if they get too long (the default is 60 characters). When it does this, it returns a vector of strings. plot methods assume that 'main' is a single string, so the line main <- paste(main, collapse='\n') deals with this by concatenating all the strings returned by deparse, joining them using \n.
Here is an example of where this is necessary:
plotCaller(hist(rnorm(9999), breaks=100, col="red", xlab="a rather long label",
ylab="yet another long label"))

Of course there is! Here ya go:
x = rnorm(100)
y = sin(x)
something = "y~x"
plot(formula(something),main=something)

You might be thinking of the functionality of match.call. However that only really works when called inside of a function, not passed in as an argument. You could create your wrapper function that would call match.call then pass everything else on to plot or use substitute to capture the call then modify it with the call before evaluating:
x <- runif(25)
y <- rnorm(25, x, .1)
myplot <- function(...) {
tmp <- match.call()
plot(..., main=deparse(tmp))
}
myplot( y~x )
myplot( y~x, xlim=c(-.25,1.25) )
## or
myplot2 <- function(FUN) {
tmp1 <- substitute(FUN)
tmp2 <- deparse(tmp1)
tmp3 <- as.list(tmp1)
tmp4 <- as.call(c(tmp3, main=tmp2))
eval(tmp4)
}
myplot2( plot(y~x) )
myplot2( plot(y~x, xlim=c(-.25,1.25) ) )