Render unicode emoji in colour in ggplot2 geom_text - r

I have unicode text that includes emoji. I'd like to render them in a ggplot2 graphic with geom_text or geom_label in a way that includes the emoji's colour. I've looked at emojifont, emo and ggtext and none of these seem to allow this. The issue of course is that the colour of the text in geom_text is governed by the colour aesthetic. Is there any way I can get colours rendered in my text, either through geom_text or some other workaround?
Reproducible example:
library(ggplot2)
pets <- "I like 🐶 🐱 🐟 🐢"
cat(pets)
ggplot() +
theme_void() +
annotate("text", x = 1, y = 1, label = pets, size = 15)
The cat(pets) works on screen in RStudio, but the graphic drawn with the last line looks like this:
Alternatively, with ggtext::geom_richtext() I get a similar black and white result and this error message:
> library(ggtext)
> ggplot() +
+ theme_void() +
+ annotate("richtext", x = 1, y = 1, label = pets, size = 15)
Warning messages:
1: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F436>RStudioGD142.6791338582677' to native encoding
2: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F431>RStudioGD142.6791338582677' to native encoding
3: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F41F>RStudioGD142.6791338582677' to native encoding
4: In text_info(label, fontkey, fontfamily, fontface, fontsize, cache) :
unable to translate '<U+0001F422>RStudioGD142.6791338582677' to native encoding
5: In do.call(gList, grobs) :
unable to translate 'I like <U+0001F436> <U+0001F431> <U+0001F41F> <U+0001F422>' to native encoding

OK, here's an answer to my own question.
Overall approach: we convert each emoji to a hyperlink to an image of the emoji, and use ggtext to render the new version of combination of text and images.
First we need a vector of all emoji so down the track we will be able to recognise them:
library(tidyverse)
library(ggtext)
library(rvest)
# test vector
pets <- "I like 🐶 🐱 🐟 🐢"
# the definitive web page with emoji:
unicode <- read_html("https://unicode.org/emoji/charts/full-emoji-list.html")
ut <- unicode %>%
html_node("table") %>%
html_table()
# vector of all emoji - purely for recognition purposes
all_emoji <- ut[,3]
Then I borrow with virtually no alteration several functions from this page by Emil Hvitfeldt. Emil had a similar challenge to me, but without the problem of the original emoji just being text.
emoji_to_link <- function(x) {
paste0("https://emojipedia.org/emoji/",x) %>%
xml2::read_html() %>%
rvest::html_nodes("tr td a") %>%
.[1] %>%
rvest::html_attr("href") %>%
paste0("https://emojipedia.org/", .) %>%
xml2::read_html() %>%
rvest::html_node('div[class="vendor-image"] img') %>%
rvest::html_attr("src")
}
link_to_img <- function(x, size = 24) {
paste0("<img src='", x, "' width='", size, "'/>")
}
Those links take an emoji and convert it into a hyperlink to an image of the emoji as rendered by the Apple Color Emoji font. So far so good, but I need to extract the emoji from my mixed test in the first place. To do this I wrote two more functions
to convert an individual token (where a token might be an individual emoji) into an emoji or return it as unchanged text; and
to tokenize a text string, convert any emoji tokens to images, and then paste them all back together again.
Here's those two functions:
token_to_rt <- function(x){
if(x %in% all_emoji){
y <- link_to_img(emoji_to_link(x))
} else {
y <- x
}
return(y)
}
string_to_rt <- function(x){
tokens <- str_split(x, " ", simplify = FALSE)[[1]]
y <- lapply(tokens, token_to_rt)
z <- do.call(paste, y)
return(z)
}
Now we have everything we need. First I convert my pets vector into pets2, then I can use ggplot2 and ggtext to render it on screen, in glorious colour
pets2 <- string_to_rt(pets)
ggplot() +
theme_void() +
annotate("richtext", x = 1, y = 1, label = pets2, size = 15)
There we are:
For completeness, here's how the key objects pets, pets2 and all_emoji look when just printed in the R console:
> pets
[1] "I like \U0001f436 \U0001f431 \U0001f41f \U0001f422"
> pets2
[1] "I like <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/dog-face_1f436.png' width='24'/> <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/cat-face_1f431.png' width='24'/> <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/fish_1f41f.png' width='24'/> <img src='https://emojipedia-us.s3.dualstack.us-west-1.amazonaws.com/thumbs/120/apple/237/turtle_1f422.png' width='24'/>"
> all_emoji[1:10]
[1] "face-smiling" "Browser" "\U0001f600" "\U0001f603" "\U0001f604" "\U0001f601"
[7] "\U0001f606" "\U0001f605" "\U0001f923" "\U0001f602"

Related

Plotting a tree - collapsing a vector of nodes

I am trying to plot a large tree using ggtree, but, due to its size, I would like to collapse multiple nodes. I am following a tutorial , but it collapses the nodes one at the time, and this is not an option in my case.
Here is my code:
library(ggtree)
library(ape)
library(ggplot2)
library(colorspace)
library(Biostrings)
library(phytools)
library(treeio)
library(dplyr)
library(readr)
library(tidyr)
library(reshape2)
tempnwk<- "((('clade01_1':1.35E-4,('clade01_2':1.0E-6,'clade01_3':1.0E-6):3.3E-5):3.3E-5,('clade02_1':2.7E-4,'clade02_2':3.3E-5):3.3E-5):1.0E-6,'clade03_1':1.0E-6);"
testTree0 <- read.tree(text = tempnwk)
#
testcollapse0<- ggtree(testTree0)
#Now, this works:
#
testcollapse0b<- testcollapse0 %>% collapse(node = 10) +
geom_point2(aes(subset=(node==10)),
shape=21, size=5, fill='green')
testcollapse0b<- collapse(testcollapse0b, node = 11) +
geom_point2(aes(subset=(node==11)),
shape=21, size=5, fill='red')
testcollapse0b ####This works
#
#
##############THis does not:
nodes2go<- c(10, 11)
myTestCols<- c('green', 'red')
testcollapse1<- testcollapse0
for(i in 1:2) {
testcollapse1<- collapse(
testcollapse1, node = nodes2go[i]) +
geom_point2(
aes(subset=(node==i)), shape=23,
size=7, fill=myTestCols[i])
}
rm(i)
#
testcollapse1 + geom_text(aes(label=label))
#
#Error in FUN(X[[i]], ...) : object 'i' not found
I need some help, I am not sure how to fix it. I had a look at drop.tip, but I am not sure that is what I want, since I still want a colored dot where the collapsed node is.
I am looking forward to your feedback, thank you for your kind attention.
Well,
While waiting for a sane way to do it, quick and dirty will do the job:
myTestCols2<- c("'green'", "'red'")
testcollapse2<- testcollapse0
teststring0<- "testcollapse2<- collapse(testcollapse2, node=NODE) + geom_point2(aes(subset=(node==NODE)), shape=23, size=7, fill=COLOR);"
testString2<- character()
for(i in 1:2) {
indString<- gsub(
pattern = "NODE",replacement = nodes2go[i],
x = teststring0)
indString<- gsub(
pattern = "COLOR", replacement = myTestCols2[i],
x = indString)
testString2<- c(testString2, indString)
}
rm(i, indString)
#
#Run the command
eval(parse(text = testString2))
##And now plot:
testcollapse2
And yes, I am aware that there must be a better way to do it 🙄

change <f6> to swedish charactors and related ggplot geom_bar issues in R

I have 2 issues related to Swedish characters. I am fetching data directly from MS SQL database.
1.could anyone gives me a hint how could i change the back to Swedish characters in R?
I use write.csv write the data out to csv then copy and paste those string here to make the df as follow
library(tidyverse)
library(ggplot2)
library(scales)
c <- c("c","u","m","j","c","u","m","j","c","u","m","j")
city <- c("G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping")
priority <- c(1,1,1,1,0,0,0,0,2,3,3,2)
n_cust <- sample(50:1000, 12, replace=T)
df <- data.frame(c,city,priority,n_cust)
should be ö and is å
interesting enough. if i use the code as following:
dpri %>% group_by(kommun, artikel_prioritet) %>%
summarise(n_cust=n_distinct(kund_id),
sum_sales=sum(p_sum_adj_sale),
avg_margin=mean(pp_avg_margin),
avg_pec_sales=mean(p_pec_sales)) %>%
arrange(desc(sum_sales)) %>%
head(20)%>%
ggplot(aes(x=reorder(kommun, sum_sales), y=sum_sales,
fill=factor(artikel_prioritet))) +
geom_bar(stat='identity')+
coord_flip()+
scale_y_continuous(labels = comma)+
facet_grid(.~ factor(artikel_prioritet), scales = "free")+
theme(legend.position="none")
i got this error:
Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
invalid input 'Göteborg' in 'utf8towcs'
if I first put this head(20) into a variable ci. then use ggplot to plot ci
ggplot(ci,aes(x=reorder(kommun, sum_sales), y=sum_sales,
fill=factor(artikel_prioritet))) + geom_bar(stat='identity')+
coord_flip()+ scale_y_continuous(labels = comma)+ facet_grid(.~
factor(artikel_prioritet), scales = "free")+
theme(legend.position="none")
I have bar chart without any city legend.
then I print out ci, I got pic as follow:
then, I write the head(20) to a csv 'cityname.csv' then read.csv back to R
use the same code to do the bar chart
ci <- read.csv("cityname.csv")
ggplot(ci,aes(x=reorder(kommun, sum_sales), y=sum_sales,
fill=factor(artikel_prioritet))) + geom_bar(stat='identity')+
coord_flip()+ scale_y_continuous(labels = comma)+ facet_grid(.~
factor(artikel_prioritet), scales = "free")+
theme(legend.position="none")
I got the pic as follow:
we can see legends this time but see , this time.
hope get some suggestions how could i fix the strings in Swedish and wondering suggestion is there any other way without write.csv and then read again still can get the bar chart fixed?
Thank you!
I believe your issue is that R doesn't know how to interpret your character encoding. Try \u notation instead of <>, which denotes UTF-8 encoding in R
> city <- c("G\u00f6teborg", "Ume\u00e5", "Malm\u00f6", "J\u00f6nk\u00f6ping","G\u00f6teborg", "Ume\u00e5", "Malm\u00f6", "J\u00f6nk\u00f6ping","G\u00f6teborg", "Ume\u00f6", "Malm\u00f6", "J\u00f6nk\u00f6ping")
> Encoding(city)
[1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
> head(city)
[1] "Göteborg" "Umeå" "Malmö" "Jönköping" "Göteborg" "Umeå"
EDIT:
You asked a good follow up question about how to make this replacement programmatically. I have provided a solution for that as well below, using the tidyverse packages dplyr and stringr
> city <- c("G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<f6>", "Malm<f6>", "J<f6>nk<f6>ping")
> city_df <- as.data.frame(city)
> special_character_replacements <- c("<f6>" = "\\u00f6", "<e5>" = "\\u00e5")
> city_df %>%
dplyr::mutate(city_fixed =
stringr::str_replace_all(city, special_character_replacements))
city city_fixed
1 G<f6>teborg Göteborg
2 Ume<e5> Umeå
3 Malm<f6> Malmö
4 J<f6>nk<f6>ping Jönköping
5 G<f6>teborg Göteborg
6 Ume<e5> Umeå
7 Malm<f6> Malmö
8 J<f6>nk<f6>ping Jönköping
9 G<f6>teborg Göteborg
10 Ume<f6> Umeö
11 Malm<f6> Malmö
12 J<f6>nk<f6>ping Jönköping

Unable to add Greek/Math/Expression Split labels using rpart.plot

I'm attempting to plot an rpart tree where I'd like to change some of the split labels to their greek/math equivalent. For instance, I have a column named mu -- I'd like this to show up as the greek letter $\mu$.
Unfortunately, when I replace one of the labels, it results in the error "Error in strsplit(labs, "\n\n") : non-character argument". As I'm not using strsplit, this error must be coming from rpart.plot call where it is assuming the labels are all plain text. This is my code:
split.fun <- function(x, labs, digits, varlen, faclen)
{
for(i in 1:length(labs)) {
if(substring(labs[i],0,2)=="mu"){
#labs[i] <- bquote(mu ~ .(substring(labs[i],3)))
labs[i] <- expression(paste0(mu,substring(labs[i],3)))
}
print(labs[i])
}
labs
}
data$dv <- factor(data$dv, labels = c("No", "Yes"))
fit <- rpart(dv ~ n + alpha + dev + mu, method="class", data=data)
rpart.plot(fit, yesno=2, box.palette = 0, extra=100, under = TRUE, split.fun = split.fun)
Neither the "expression" approach or "bquote" approach work. However, the split.fun function works fine as long as I just replace substrings with other strings (not expressions).
In trying to figure out what's going on, I've also been printing out the resulting labels. This is what I get:
[1] "root"
[1] "dev >= 0.075"
expression(paste0(mu, substring(labs[i], 3)))
expression(paste0(mu, substring(labs[i], 3)))
expression("alpha < 0.025")
expression("alpha >= 0.025")
expression("dev < 0.075")
expression("alpha < 0.025")
expression("dev >= 0.025")
expression(paste0(mu, substring(labs[i], 3)))
expression(paste0(mu, substring(labs[i], 3)))
expression("dev < 0.025")
expression("alpha >= 0.025")
From this, it seems that once I replace one label with an expression, all other labels are replaced with an expression.
Is there another approach to placing greek letters on the rpart.plot? Or is rpart.plot (or prp in general), simply not capable of including math expressions?
A combination fo #G5W's suggestion and fonts work. For those trying to do this, add the following to the top of the file:
library(extrafont)
loadfonts()
Then in adjust the rpart.plot call to use "Arial Unicode MS". This font seems to always correctly display math unicode characters (including combining characters).
rpart.plot(fit, yesno=2, box.palette = 0, extra=100, under = TRUE, split.fun = split.fun, split.font=1, split.family="Arial Unicode MS", family="Arial Unicode MS")

Using R to read out excel-colorinfo

Is there any way to read out the color-index of cells from excel files with R?
While I can set the cell color with packages like XLConnect or XLSX, I have found no way to extract the color-information from existing workbooks.
R-Bloggers provided a function that will do the job for you. I am including the answer here for future reference.
Read the excel file using xlsx package:
library(xlsx)
wb <- loadWorkbook("test.xlsx")
sheet1 <- getSheets(wb)[[1]]
# get all rows
rows <- getRows(sheet1)
cells <- getCells(rows)
This part extracts the information that later will be used for getting background color (or other style information) of the cells:
styles <- sapply(cells, getCellStyle) #This will get the styles
This is the function that identifies/extracts the cell background color:
cellColor <- function(style)
{
fg <- style$getFillForegroundXSSFColor()
rgb <- tryCatch(fg$getRgb(), error = function(e) NULL)
rgb <- paste(rgb, collapse = "")
return(rgb)
}
error will handle the cells with no background color.
Using sapply you can get the background color for all of the cells:
sapply(styles, cellColor)
You can also categorize/identify them by knowing the RGb codes:
mycolor <- list(green = "00ff00", red = "ff0000")
m <- match(sapply(styles, cellColor), mycolor)
labs <-names(mycolor)[m]
You can read more and learn how to apply it at R-bloggers
You can get the RGB codes from RapidTables.com
Old question but maybe it can help someone in the future.
There is a strange behavior in the POI (java) library (at least on my computer). It is not getting the colors correctly. The code provided in the #M--'s answer works well when the color is a basic color (indexed color), but does not work when the color is, for example, in grayscale. To get around you can use the following code using the getTint () function. Tint is a number between -1 (dark) and 1 (light), and combining it with the RGB (getRgb ()) function, you can completely recover the color.
cell_color <- function(style){
fg <- style$getFillForegroundXSSFColor()
hex <- tryCatch(fg$getRgb(), error = function(e) NULL)
hex <- paste0("#", paste(hex, collapse = ""))
tint <- tryCatch(fg$getTint(), error = function(e) NULL)
if(!is.null(tint) & !is.null(hex)){ # Tint varies between -1 (dark) and 1 (light)
rgb_col <- col2rgb(col = hex)
if(tint < 0) rgb_col <- (1-abs(tint))*rgb_col
if(tint > 0) rgb_col <- rgb_col + (255-rgb_col)*tint
hex <- rgb(red = rgb_col[1, 1],
green = rgb_col[2, 1],
blue = rgb_col[3, 1],
maxColorValue = 255)
}
return(hex)
}
Some references to help:
https://poi.apache.org/apidocs/dev/org/apache/poi/hssf/usermodel/HSSFExtendedColor.html#getTint--
https://bz.apache.org/bugzilla/show_bug.cgi?id=50787
Getting Excel fill colors using Apache POI

rPlot tooltip problems

I have a simple example using tooltips with rCharts that doesn't seem to work:
set.seed(1)
test <- data.frame(x = rnorm(100), y = rnorm(100))
rPlot(y ~ x, data = test,
type = 'point',
tooltip = "function(item){return item.x + '\n' + item.name + '\n' + item.y}")
An empty page comes up. The plot is there if I remove the tooltip option. I'm using rCharts_0.4.1, R Under development on x86_64-apple-darwin10.8.0 (64-bit) and version 31.0.1650.63 of Chrome.
Bonus question! Can tooltips contain variables in the data set but not used in x, y, etc? I have a large data set and I'd like to annotate the data points with an ID variable that has a unique value per row.
Thanks,
Max
Rcharts 0.4.2
I can't point out where I've seen this before, but the best I can offer is that the current instruction is to wrap your js function like so:
"#!function(item) { return item.x }!#"
set.seed(1)
test <- data.frame(x = rnorm(100), y = rnorm(100), id = 1:100)
p <- rPlot(y ~ x, data = test,
type = 'point',
tooltip = "#!function(item){ return 'x: ' + item.x +
' y: ' + item.y + ' id: ' + item.id }!#")
I could never get the tips on different lines. You can see the problem by typing p$show('inline') and including line breaks where you want them when creating the tooltip. Mustache is converting the linebreaks, which means they disappear in the resulting JS function and cause the tooltip function to span several lines - causing an error. I've tried to escape the newline character and append each string containing a newline character with .replace('\\n', '\n'), but that obviously results in the same problem.
For the time being, your best bet is to make the plot as shown here, type p$save('filepath.html') and manually add the line breaks. Otherwise, the question is how to pass a literal newline character to mustache.
Your code works/worked using rCharts 0.3.51 (Chrome 31.0.1650.63 m, Win7 x64):
set.seed(1)
require(rCharts)
test <- data.frame(x = rnorm(100), y = rnorm(100),
name=replicate(100, paste(sample(letters, 5, rep=T), collapse="")))
rPlot(y ~ x, data = test,
type = 'point',
tooltip = "function(item){return item.x + '\n' + item.name + '\n' + item.y}")
You should be able to reference all columns in test - as I did with name.

Resources