Print text with subscripts (programatically) to R console - r

I'm using R to balance some complex chemical equations and would like to print these equations including subscripts to the console as the code runs. I've seen some answers posted, most of which are related to plots or rely on pasting the subscript from another program into R scripts:
Subscripts in R when adding other text
How to literally print superscripts in R not used in labels or legends?
Using Subscripts and Superscripts in R console
Unicode subscript in R had some pointers that were helpful. I can get the appropriate code from this link but it doesn't allow me to programatically create the code for the character I want.
CODE
Here's a simple example equation for combustion of methane that works:
> sub2 <- '\u2082' # hard-coding unicode for '2' as a subscript
> sub4 <- '\u2084' # hard-coding unicode for '4' as a subscript
> cat(sprintf('CH%s + 2 O%s --> CO%s + 2 H%sO', sub4, sub2, sub2, sub2))
CH₄ + 2 O₂ --> CO₂ + 2 H₂O
Lengthy workaround (proof-of-concept):
desired_subscript <- 3.375
subs <- c('\u2080', '\u2081', '\u2082', '\u2083', '\u2084',
'\u2085', '\u2086', '\u2087', '\u2088', '\u2089')
charvec <- as.character(x = desired_subscript)
lapply(0:9, function(z){
charvec <<- gsub(pattern = z, replacement = subs[z+1], x = charvec)
return(NULL)
})
> cat(charvec)
₃.₃₇₅
Here's what doesn't work:
replacing the last digit of the unicode string to what I want:
> cat(sub(pattern = '2', replacement = '4', x = sub2))
₂
Trying to create a unicode string:
> paste('\208','4',sep = '')
[1] "\02084"
I have multiple equations to balance and the subscripts are not always whole numbers. Is there a way to programatically get unicode for the subscript that I want to include in my output to console?

Try this
create a function to return unicodes. Caution: No error checking
ss <- function(x) {intToUtf8(0x2080 + x)}
cat(sprintf('CH%s + 2 O%s --> CO%s + 2 H%sO', ss(4), ss(2), ss(2), ss(2)))

Related

Generate full width character string in R

In R, how can I make the following:
convert this string: "my test string"
to something like this ( a full width character string): "my  test  string"
is there a way to do this through hexidecimal character encodings?
Thanks for your help, I'm really not sure how to even start. Perhaps something with {stringr}
I'm trying to get an output similar to what I would expect from this online conversion tool:
http://www.linkstrasse.de/en/%EF%BD%86%EF%BD%95%EF%BD%8C%EF%BD%8C%EF%BD%97%EF%BD%89%EF%BD%84%EF%BD%94%EF%BD%88%EF%BC%8D%EF%BD%83%EF%BD%8F%EF%BD%8E%EF%BD%96%EF%BD%85%EF%BD%92%EF%BD%94%EF%BD%85%EF%BD%92
Here is a possible solution using a function from the archived Nippon package. This is the han2zen function, which can be found here.
x <- "my test string"
han2zen <- function(s){
stopifnot(is.character(s))
zenEisu <- paste0(intToUtf8(65295 + 1:10), intToUtf8(65312 + 1:26),
intToUtf8(65344 + 1:26))
zenKigo <- c(65281, 65283, 65284, 65285, 65286, 65290, 65291,
65292, 12540, 65294, 65295, 65306, 65307, 65308,
65309, 65310, 65311, 65312, 65342, 65343, 65372,
65374)
s <- chartr("0-9A-Za-z", zenEisu, s)
s <- chartr('!#$%&*+,-./:;<=>?#^_|~', intToUtf8(zenKigo), s)
s <- gsub(" ", intToUtf8(12288), s)
return(s)
}
han2zen(x)
# [1] "my test string"

K-mer words in R

I am still new to R programming and I just have no idea how to write this same code below from python to R.
human_data is dataframe from CSV file. the word includes sequence of letters. Basically, I want to convert my 'word' column sequence of string into all possible k-mer words of length 6.
def getKmers(sequence, size=6):
return [sequence[x:x+size] for x in range(len(sequence) - size + 1)]
human_data['words'] = human_data.apply(lambda x: getKmers(x['sequence']), axis=1)
You could use the library quanteda too, in order to compute the k-mers (k-grams), the following code shows an example:
library(quanteda)
k = 6 # 6-mers
human_data = data.frame(sequence=c('abcdefghijkl', 'xxxxyyxxyzz'))
human_data$words <- apply(human_data, 1,
function(x) char_ngrams(unlist(tokens(x['sequence'],
'character')), n=k, concatenator = ''))
human_data
# sequence words
#1 abcdefghijkl abcdef, bcdefg, cdefgh, defghi, efghij, fghijk, ghijkl
#2 xxxxyyxxyzz xxxxyy, xxxyyx, xxyyxx, xyyxxy, yyxxyz, yxxyzz
I hope this helps, using R basic commands:
df = data.frame(words=c('asfdklajsjahk', 'dkajsadjkfggfh', 'kfjlhdaDDDhlw'))
getKmers = function(sequence, size=6) {
kmers = c()
for (x in 1:(nchar(sequence) - size + 1)) {
kmers = c(kmers, substr(sequence, x, x+size-1))
}
return(kmers)
}
sapply(df$words, getKmers)

Unable to add Greek/Math/Expression Split labels using rpart.plot

I'm attempting to plot an rpart tree where I'd like to change some of the split labels to their greek/math equivalent. For instance, I have a column named mu -- I'd like this to show up as the greek letter $\mu$.
Unfortunately, when I replace one of the labels, it results in the error "Error in strsplit(labs, "\n\n") : non-character argument". As I'm not using strsplit, this error must be coming from rpart.plot call where it is assuming the labels are all plain text. This is my code:
split.fun <- function(x, labs, digits, varlen, faclen)
{
for(i in 1:length(labs)) {
if(substring(labs[i],0,2)=="mu"){
#labs[i] <- bquote(mu ~ .(substring(labs[i],3)))
labs[i] <- expression(paste0(mu,substring(labs[i],3)))
}
print(labs[i])
}
labs
}
data$dv <- factor(data$dv, labels = c("No", "Yes"))
fit <- rpart(dv ~ n + alpha + dev + mu, method="class", data=data)
rpart.plot(fit, yesno=2, box.palette = 0, extra=100, under = TRUE, split.fun = split.fun)
Neither the "expression" approach or "bquote" approach work. However, the split.fun function works fine as long as I just replace substrings with other strings (not expressions).
In trying to figure out what's going on, I've also been printing out the resulting labels. This is what I get:
[1] "root"
[1] "dev >= 0.075"
expression(paste0(mu, substring(labs[i], 3)))
expression(paste0(mu, substring(labs[i], 3)))
expression("alpha < 0.025")
expression("alpha >= 0.025")
expression("dev < 0.075")
expression("alpha < 0.025")
expression("dev >= 0.025")
expression(paste0(mu, substring(labs[i], 3)))
expression(paste0(mu, substring(labs[i], 3)))
expression("dev < 0.025")
expression("alpha >= 0.025")
From this, it seems that once I replace one label with an expression, all other labels are replaced with an expression.
Is there another approach to placing greek letters on the rpart.plot? Or is rpart.plot (or prp in general), simply not capable of including math expressions?
A combination fo #G5W's suggestion and fonts work. For those trying to do this, add the following to the top of the file:
library(extrafont)
loadfonts()
Then in adjust the rpart.plot call to use "Arial Unicode MS". This font seems to always correctly display math unicode characters (including combining characters).
rpart.plot(fit, yesno=2, box.palette = 0, extra=100, under = TRUE, split.fun = split.fun, split.font=1, split.family="Arial Unicode MS", family="Arial Unicode MS")

error: unexpected input

Observe following code:
Xij <- scan(n=45)
6398400 6273897 6038777 5810740 5673521 5688332 5669445 5682840 5679432
5723561 5555929 5345696 5321179 5199592 5165409 5130744 5132372
4717909 4925673 4999103 4960733 4840036 4824080 4821902
7115151 7114401 7039423 6967723 6967513 6901684
8203359 8286980 8222974 8323470 8067521
5930080 5862383 5994123 6017566
5558436 5754304 5613530
4595506 5074887
3443322
n <- length(Xij); TT <- trunc(sqrt(2*n))
i <- rep(1:TT,TT:1); j <- sequence(TT:1)
i <- as.factor(i); j <- as.factor(j)
If I now try to run following command:
Xij.1 <- xtabs(Xij˜i+j)
I get the error 'Error: unexpected input in "Xij.1 <- xtabs(Xij˜"
This exercise is however, analog to an example from the book 'Modern Actuarial Risk Theory using R'.
Does somebody know what is possibly wrong?
It works fine:
xtabs(Xij~i+j)
Notice that in R formula you have to use tilde character ~ rather then ˜ character. Those are two different characters.

How to use a non-ASCII symbol (e.g. £) in an R package function?

I have a simple function in one of my R packages, with one of the arguments symbol = "£":
formatPound <- function(x, digits = 2, nsmall = 2, symbol = "£"){
paste(symbol, format(x, digits = digits, nsmall = nsmall))
}
But when running R CMD check, I get this warning:
* checking R files for non-ASCII characters ... WARNING
Found the following files with non-ASCII characters:
formatters.R
It's definitely that £ symbol that causes the problem. If I replace it with a legitimate ASCII character, like $, the warning disappears.
Question: How can I use £ in my function argument, without incurring a R CMD check warning?
Looks like "Writing R Extensions" covers this in Section 1.7.1 "Encoding Issues".
One of the recommendations in this page is to use the Unicode encoding \uxxxx. Since £ is Unicode 00A3, you can use:
formatPound <- function(x, digits=2, nsmall=2, symbol="\u00A3"){
paste(symbol, format(x, digits=digits, nsmall=nsmall))
}
formatPound(123.45)
[1] "£ 123.45"
As a workaround, you can use intToUtf8() function:
# this causes errors (non-ASCII chars)
f <- function(symbol = "➛")
# this also causes errors in Rd files (non-ASCII chars)
f <- function(symbol = "\u279B")
# this is ok
f <- function(symbol = intToUtf8(0x279B))

Resources