Unable to add Greek/Math/Expression Split labels using rpart.plot - r

I'm attempting to plot an rpart tree where I'd like to change some of the split labels to their greek/math equivalent. For instance, I have a column named mu -- I'd like this to show up as the greek letter $\mu$.
Unfortunately, when I replace one of the labels, it results in the error "Error in strsplit(labs, "\n\n") : non-character argument". As I'm not using strsplit, this error must be coming from rpart.plot call where it is assuming the labels are all plain text. This is my code:
split.fun <- function(x, labs, digits, varlen, faclen)
{
for(i in 1:length(labs)) {
if(substring(labs[i],0,2)=="mu"){
#labs[i] <- bquote(mu ~ .(substring(labs[i],3)))
labs[i] <- expression(paste0(mu,substring(labs[i],3)))
}
print(labs[i])
}
labs
}
data$dv <- factor(data$dv, labels = c("No", "Yes"))
fit <- rpart(dv ~ n + alpha + dev + mu, method="class", data=data)
rpart.plot(fit, yesno=2, box.palette = 0, extra=100, under = TRUE, split.fun = split.fun)
Neither the "expression" approach or "bquote" approach work. However, the split.fun function works fine as long as I just replace substrings with other strings (not expressions).
In trying to figure out what's going on, I've also been printing out the resulting labels. This is what I get:
[1] "root"
[1] "dev >= 0.075"
expression(paste0(mu, substring(labs[i], 3)))
expression(paste0(mu, substring(labs[i], 3)))
expression("alpha < 0.025")
expression("alpha >= 0.025")
expression("dev < 0.075")
expression("alpha < 0.025")
expression("dev >= 0.025")
expression(paste0(mu, substring(labs[i], 3)))
expression(paste0(mu, substring(labs[i], 3)))
expression("dev < 0.025")
expression("alpha >= 0.025")
From this, it seems that once I replace one label with an expression, all other labels are replaced with an expression.
Is there another approach to placing greek letters on the rpart.plot? Or is rpart.plot (or prp in general), simply not capable of including math expressions?

A combination fo #G5W's suggestion and fonts work. For those trying to do this, add the following to the top of the file:
library(extrafont)
loadfonts()
Then in adjust the rpart.plot call to use "Arial Unicode MS". This font seems to always correctly display math unicode characters (including combining characters).
rpart.plot(fit, yesno=2, box.palette = 0, extra=100, under = TRUE, split.fun = split.fun, split.font=1, split.family="Arial Unicode MS", family="Arial Unicode MS")

Related

Render unicode emoji in colour in ggplot2 geom_text

I have unicode text that includes emoji. I'd like to render them in a ggplot2 graphic with geom_text or geom_label in a way that includes the emoji's colour. I've looked at emojifont, emo and ggtext and none of these seem to allow this. The issue of course is that the colour of the text in geom_text is governed by the colour aesthetic. Is there any way I can get colours rendered in my text, either through geom_text or some other workaround?
Reproducible example:
library(ggplot2)
pets <- "I like 🐢 🐱 🐟 🐒"
cat(pets)
ggplot() +
theme_void() +
annotate("text", x = 1, y = 1, label = pets, size = 15)
The cat(pets) works on screen in RStudio, but the graphic drawn with the last line looks like this:
Alternatively, with ggtext::geom_richtext() I get a similar black and white result and this error message:
> library(ggtext)
> ggplot() +
+ theme_void() +
+ annotate("richtext", x = 1, y = 1, label = pets, size = 15)
Warning messages:
1: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F436>RStudioGD142.6791338582677' to native encoding
2: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F431>RStudioGD142.6791338582677' to native encoding
3: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F41F>RStudioGD142.6791338582677' to native encoding
4: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F422>RStudioGD142.6791338582677' to native encoding
5: In do.call(gList, grobs) :
unable to translate 'I like <U+0001F436> <U+0001F431> <U+0001F41F> <U+0001F422>' to native encoding
OK, here's an answer to my own question.
Overall approach: we convert each emoji to a hyperlink to an image of the emoji, and use ggtext to render the new version of combination of text and images.
First we need a vector of all emoji so down the track we will be able to recognise them:
library(tidyverse)
library(ggtext)
library(rvest)
# test vector
pets <- "I like 🐢 🐱 🐟 🐒"
# the definitive web page with emoji:
unicode <- read_html("https://unicode.org/emoji/charts/full-emoji-list.html")
ut <- unicode %>%
html_node("table") %>%
html_table()
# vector of all emoji - purely for recognition purposes
all_emoji <- ut[,3]
Then I borrow with virtually no alteration several functions from this page by Emil Hvitfeldt. Emil had a similar challenge to me, but without the problem of the original emoji just being text.
emoji_to_link <- function(x) {
paste0("https://emojipedia.org/emoji/",x) %>%
xml2::read_html() %>%
rvest::html_nodes("tr td a") %>%
.[1] %>%
rvest::html_attr("href") %>%
paste0("https://emojipedia.org/", .) %>%
xml2::read_html() %>%
rvest::html_node('div[class="vendor-image"] img') %>%
rvest::html_attr("src")
}
link_to_img <- function(x, size = 24) {
paste0("<img src='", x, "' width='", size, "'/>")
}
Those links take an emoji and convert it into a hyperlink to an image of the emoji as rendered by the Apple Color Emoji font. So far so good, but I need to extract the emoji from my mixed test in the first place. To do this I wrote two more functions
to convert an individual token (where a token might be an individual emoji) into an emoji or return it as unchanged text; and
to tokenize a text string, convert any emoji tokens to images, and then paste them all back together again.
Here's those two functions:
token_to_rt <- function(x){
if(x %in% all_emoji){
y <- link_to_img(emoji_to_link(x))
} else {
y <- x
}
return(y)
}
string_to_rt <- function(x){
tokens <- str_split(x, " ", simplify = FALSE)[[1]]
y <- lapply(tokens, token_to_rt)
z <- do.call(paste, y)
return(z)
}
Now we have everything we need. First I convert my pets vector into pets2, then I can use ggplot2 and ggtext to render it on screen, in glorious colour
pets2 <- string_to_rt(pets)
ggplot() +
theme_void() +
annotate("richtext", x = 1, y = 1, label = pets2, size = 15)
There we are:
For completeness, here's how the key objects pets, pets2 and all_emoji look when just printed in the R console:
> pets
[1] "I like \U0001f436 \U0001f431 \U0001f41f \U0001f422"
> pets2
[1] "I like <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/dog-face_1f436.png' width='24'/> <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/cat-face_1f431.png' width='24'/> <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/fish_1f41f.png' width='24'/> <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/turtle_1f422.png' width='24'/>"
> all_emoji[1:10]
[1] "face-smiling" "Browser" "\U0001f600" "\U0001f603" "\U0001f604" "\U0001f601"
[7] "\U0001f606" "\U0001f605" "\U0001f923" "\U0001f602"

Print text with subscripts (programatically) to R console

I'm using R to balance some complex chemical equations and would like to print these equations including subscripts to the console as the code runs. I've seen some answers posted, most of which are related to plots or rely on pasting the subscript from another program into R scripts:
Subscripts in R when adding other text
How to literally print superscripts in R not used in labels or legends?
Using Subscripts and Superscripts in R console
Unicode subscript in R had some pointers that were helpful. I can get the appropriate code from this link but it doesn't allow me to programatically create the code for the character I want.
CODE
Here's a simple example equation for combustion of methane that works:
> sub2 <- '\u2082' # hard-coding unicode for '2' as a subscript
> sub4 <- '\u2084' # hard-coding unicode for '4' as a subscript
> cat(sprintf('CH%s + 2 O%s --> CO%s + 2 H%sO', sub4, sub2, sub2, sub2))
CHβ‚„ + 2 Oβ‚‚ --> COβ‚‚ + 2 Hβ‚‚O
Lengthy workaround (proof-of-concept):
desired_subscript <- 3.375
subs <- c('\u2080', '\u2081', '\u2082', '\u2083', '\u2084',
'\u2085', '\u2086', '\u2087', '\u2088', '\u2089')
charvec <- as.character(x = desired_subscript)
lapply(0:9, function(z){
charvec <<- gsub(pattern = z, replacement = subs[z+1], x = charvec)
return(NULL)
})
> cat(charvec)
₃.₃₇₅
Here's what doesn't work:
replacing the last digit of the unicode string to what I want:
> cat(sub(pattern = '2', replacement = '4', x = sub2))
β‚‚
Trying to create a unicode string:
> paste('\208','4',sep = '')
[1] "\02084"
I have multiple equations to balance and the subscripts are not always whole numbers. Is there a way to programatically get unicode for the subscript that I want to include in my output to console?
Try this
create a function to return unicodes. Caution: No error checking
ss <- function(x) {intToUtf8(0x2080 + x)}
cat(sprintf('CH%s + 2 O%s --> CO%s + 2 H%sO', ss(4), ss(2), ss(2), ss(2)))

R baseline package saving plots in a loop

I'm trying to optimize the parameters for baseline in the R baseline package by changing each parameters in a loop and comparing plots to determine which parameters give me the best baseline.
I currently have the code written so that the loop produces each plot, but I'm having trouble with getting the plot saved as the class of each object I'm creating is a baseline package-specific (which I'm suspecting is the problem here).
foo <- data.frame(Date=seq.Date(as.Date("1957-01-01"), by = "day",
length.out = ncol(milk$spectra)),
Visits=milk$spectra[1,],
Old_baseline_visits=milk$spectra[1,], row.names = NULL)
foo.t <- t(foo$Visits)
#the lines above were copied from https://stackoverflow.com/questions/37346967/r-packagebaseline-application-to-sample-dataset to make a reproducible dataset
df <- expand.grid(lambda=seq(1,10,1), p=seq(0.01,0.1,0.01))
baselinediff <- list()
for(i in 1:nrow(df)){
thislambda <- df[i,]$lambda
thisp <- df[i,]$p
thisplot <- baseline(foo.t, lambda=thislambda, p=thisp, maxit=20, method='als')
print(paste0("lambda = ", thislambda))
print(paste0("p = ", thisp))
print(paste0("index = ", i))
baselinediff[[i]] <- plot(thisplot)
jpeg(file = paste(baselinediff[[i]], '.jpeg', sep = ''))
dev.off()
}
I know that I would be able to extract corrected spectra using baseline.als but I just want to save the plot images with the red baseline so that I can see how well the baselines are getting drawn. Any baseline users out there that can help?
I suggest you change your loop in the following way:
for(i in 1:nrow(df)){
thislambda <- df[i,]$lambda
thisp <- df[i,]$p
thisplot <- baseline(foo.t, lambda=thislambda, p=thisp, maxit=20, method='als')
print(paste0("lambda = ", thislambda))
print(paste0("p = ", thisp))
print(paste0("index = ", i))
baselinediff[[i]] <- thisplot
jpeg(file = paste('baseline', i, '.jpeg', sep = ''))
plot(baselinediff[[i]])
dev.off()
}
Note that this does not try to capture the already plotted element (thisplot) inside of the list. Instead, the plotting is done after you call the jpeg command. This solves your export issue. Another problem was the naming of the file. If you call baselinediff[[i]] inside of paste, you apparently end up with an error. So I switched it to a simpler name. To plot your resulting list, call:
lapply(baselinediff, plot)
If you are determined on storing the already plotted element, the capture.plotfunction from the imager package might be a good start.

Use math symbols in panel titles for stratigraphic plot

I want to include math symbols in the panel titles for this stratigraphic plot:
library(analogue)
data(V12.122)
Depths <- as.numeric(rownames(V12.122))
names(V12.122)
(plt <- Stratiplot(Depths ~ O.univ + G.ruber + G.tenel + G.pacR,
data = V12.122,
type = c("h","l","g"),
zones = 400))
plt
For example, I want to have this text in place of "O.univ" etc.:
I used this code to make that text:
plot(1, type="n", axes=FALSE, ann=FALSE)
title(line = -1, main = expression(phantom()^14*C~years~BP))
title(line = -3, main = expression(delta^18*O))
title(line = -5, main = expression(paste("TP ", mu,"g l"^-1)))
title(line = -10, main = expression("very long title \n with \n line breaks"))
But if I try to update the colnames of the data frame passed to Stratiplot, the code is not parsed, and we do not get the correct text formatting:
V12.122 <- V12.122[, 1:4]
names(V12.122)[1] <- expression(phantom()^14*C~years~BP)
names(V12.122)[2] <- expression(delta^18*O)
names(V12.122)[3] <- expression(paste("TP ", mu,"g l"^-1))
(plt <- Stratiplot(Depths ~ .,
data = V12.122,
type = c("h","l","g"),
zones = 400))
plt
How can I get Stratiplot to parse the expressions in the colnames and format them correctly in the plot?
I've tried looking through str(plt) to see where the panel titles are stored, but no success:
text <- expression(phantom()^14*C~years~BP)
plt$condlevels$ind[1] <- text
names(plt$packet.sizes)[1] <- text
names(plt$par.settings$layout.widths$panel)[1] <- text
You can't actually do this in the current release of analogue; the function is doing too much messing around with data for the expressions to remain unevaluated prior to plotting. I could probably figure this out to allow expressions as the names of the data argument object, but it is easier to just allow users to pass a vector of labels that they want for the variables.
This is now implemented in the development version of the package on github, and I'll push this to CRAN early next week.
This change implements a new argument labelValues which takes a vector of labels for use in labelling the top axis. This can be a vector of expressions.
Here is an illustration of the usage:
library("analogue")
set.seed(1)
df <- setNames(data.frame(matrix(rnorm(200 * 3), ncol = 3)),
c("d13C", "d15N", "d18O"))
df <- transform(df, Age = 1:200)
exprs <- expression(delta^{13}*C, # label for 1st variable
delta^{15}*N, # label for 2nd variable
delta^{18}*O) # label for 3rd variable
Stratiplot(Age ~ ., data = df, labelValues = exprs, varTypes = "absolute", type = "h")
which produces
Note that this is just a first pass; I'm pretty sure I haven't accounted for any reordering that goes on with sort and svar etc. if they are used.
Never used lattice plots, but I thought a chance to learn something should be worth while. Took too long to figure out.
text <- "c( expression(phantom()^14*C~years~BP),expression(delta^18*O))"
strip = strip.custom(factor.levels=eval(parse(text=text)))
plt <- Stratiplot(Depths ~ .,
data = V12.122[, 1:4],
type = c("h","l","g"),
zones = 400,
strip = strip)
Hope this gets you started.

Reset graph at the end of the loop :could not find function "device" error

I am trying to generate plots by looping, here is my code:
n <- unique(wide_data$Product.Code)[1:3]
for (i in n)
{
my.prod2 <- filter(tall_bind, Product.Code == i, Date > ymd("2012/04/01"))
dev.new()
mypath <- file.path("C:","R","SAVEHERE",paste("myplot_", i, ".jpg", sep = ""))
jpeg(file=mypath)
mytitle = paste("Plot for product", i)
p <- qplot(Date, Sold, data = my.prod2, geom = "line", main=mytitle, group = Model, colour = Model) + facet_grid(Model ~ .)
ggsave("myplot_", i, plot=p, device= "jpg" )
}
I get the following error for the above code:
Saving 6.67 x 6.67 in image
Error in ggsave("myplot_", i, plot = p, device = "jpg") : could
not find function "device"
Earlier when I used dev.off() at the end of the loop, I found that though the graphs were being generated they were totally blank.
Could someone please help me understand where is the mistake in my code?
You can leave out the dev.new() and jpg() commands, and also your arguments to ggsave() are incorrect. This should work:
n <- unique(wide_data$Product.Code)[1:3]
for (i in n) {
my.prod2 <- filter(tall_bind, Product.Code == i, Date > ymd("2012/04/01"))
mypath <- file.path("C:","R","SAVEHERE",paste("myplot_", i, ".jpg", sep = ""))
mytitle = paste("Plot for product", i)
p <- qplot(Date, Sold, data = my.prod2, geom = "line", main=mytitle, group = Model, colour = Model) + facet_grid(Model ~ .)
ggsave(filename = mypath, plot = p)
}
What you did was creating a new default graphics device, typically a plotting window, then a jpeg graphics device, i.e. a file. Then you tried to make ggplot2 to plot to directly to file using ggsave, i.e. using its own (jpg) device, and not using either of the two graphics devices you created.
The error, however, was because you gave ggsave the wrong arguments. But even with the right arguments, you would still have ended up with additional unused graphics windows and files through the dev.new() and jpeg() commands. I suggest some extra reading of the help (e.g. type ?ggsave at the r console).
Typically, when using ggplot2 you do not need to worry about dev.new, jpeg and the like. qplot or ggplot and ggsave should do all you need.

Resources