R plots some unicode characters but not others - r

our sysadmin just upgraded our operating system to SLES12SP1. I reinstalled Rv3.2.3 and tried to make plots. I use cairo_pdf and try to make a plot with the x-label being \u0298 i.e. the solar symbol, but it doesn't work: the label just comes out blank. For example:
cairo_pdf('Rplots.pdf')
plot(1, xlab='\u0298') # the x-label comes up blank
dev.off()
This used to work, but for some reason it does not anymore. It works with other characters, e.g.
cairo_pdf('Rplots.pdf')
plot(1, xlab='\u2113') # the x-label comes up with the \ell symbol
dev.off()
When I just paste in the solar symbol, i.e.
plot(1, xlab='ʘ')
then I get the warning
Warning messages:
1: In title(...) :
conversion failure on 'ʘ' in 'mbcsToSbcs': dot substituted for <ca>
The machine is German, but I am using the US English UTF-8 locale:
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 12 SP1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
Any tips on how I can get the solar symbol to appear?

Note: I suppose with a new system you should first do:
capabilities() #And see what the result for cairo is.
A couple of ideas although one of them requires knowing what fonts you are using so the output of l10n_info()$MBCS and names(X11Fonts()) might be needed.
Option 1) The Hershey fonts have all the astrological signs as special escape characters. Page 4 of the output of :
demo(Hershey) # has \\SO as the escape sequence for the "solar" symbol.
So looking at the code for the draw.vf.cell function we see that it's using the text function to plot those characters and therefore using it to label an axis will require adding xpd=TRUE to the arguments:
plot(1, xlab="") ; text(1, .45, "\\SO" , vfont=c("serif", "plain"), xpd=TRUE )
Option 2) find the solar symbol in the font of your choice. You might try setting the font to something other than "Helvetica". See ?X11 that has a section on Cairo fonts. The points function's help page has a function called TestChars that lets you print character glyphs in various fonts to your output device. In this case your output device might be either cairopdf or x11. On my device (the Mac fork of UNIX) the Arial font has this output:
png(type="cairo-png");plot(1, xlab="\u0298");dev.off()
My observation over the years of similar questions leads me to believe that Cairo graphics are more reliably cross-platform. But since R can be compiled without cairo support, it's not a sure thing.

Maybe your text editor is using latin1, therfore you would send latin1 characters to your console.
Look at the encoding
Encoding('ʘ')
and / or try
plot(1, xlab=iconv('ʘ', from='latin1', to="UTF-8"))
but be carefull the encoding could change while coping.
If you use Notepad++ you can convert in the text editor between the different encodings.

Related

Foreign(hebrew, Chinese) characters: Tidyverse incorrect display in console but correct in View() [duplicate]

For at least some cases, Asian characters are printable if they are contained in a matrix, or a vector, but not in a data.frame. Here is an example
q<-'天'
q # Works
# [1] "天"
matrix(q) # Works
# [,1]
# [1,] "天"
q2<-data.frame(q,stringsAsFactors=FALSE)
q2 # Does not work
# q
# 1 <U+5929>
q2[1,] # Works again.
# [1] "天"
Clearly, my device is capable of displaying the character, but when it is in a data.frame, it does not work.
Doing some digging, I found that the print.data.frame function runs format on each column. It turns out that if you run format.default directly, the same problem occurs:
format(q)
# "<U+5929>"
Digging into format.default, I find that it is calling the internal format, written in C.
Before I dig any further, I want to know if others can reproduce this behaviour. Is there some configuration of R that would allow me to display these characters within data.frames?
My sessionInfo(), if it helps:
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.0.1
I hate to answer my own question, but although the comments and answers helped, they weren't quite right. In Windows, it doesn't seem like you can set a generic 'UTF-8' locale. You can, however, set country-specific locales, which will work in this case:
Sys.setlocale("LC_CTYPE", locale="Chinese")
q2 # Works fine
# q
#1 天
But, it does make me wonder why exactly format seems to use the locale; I wonder if there is a way to have it ignore the locale in Windows. I also wonder if there is some generic UTF-8 locale that I don't know about on Windows.
I just blogged about Unicode and R several days ago. I think your R editor is UTF-8 and this gives your illusion that R in your Windows handles UTF-8 characters.
The short answer is when you want to process Unicode (Here, it is Chinese), don't use English Windows, use a Chinese version Windows or Linux which by default is UTF-8.
Session info in my Ubuntu:
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

Unicode in rgl plot3d

I'm all new to R, and the rgl packageand having searched all over all day for a solution...
I'm trying to use rgl's text3d function with unicode text. I have no problem plotting the same chars in 2D (using text()), but in 3D, instead of rendering the symbols it just writes out the UTF-8 char codes (unless they're ascii chars).
I'm reading in data from file where the column "vowel" contains the symbols to be plotted (e.g. "e i ə ɪ ɒ" etc.), and cols "F1", "F2" and "F3" contain the values to be plotted. The file is read in with read.delim with encoding="UTF-8"; and inspecting the data in the RGui shows the UTF-8 char codes for any non-ascii symbols.
Sample data (comma-delimited)
vowel,F1,F2,F3
i,424.1352452,1985.143387,2549.272611
e,515.0401373,1693.077496,2534.527142
ə,408.8233704,1589.12993,2567.448424
ɒ,490.6565129,1070.564989,2590.467597
ɪ,405.5223379,1665.733731,2261.069994
u,360.0803517,1798.355786,2354.845875
ɜ,541.6360766,1323.593646,2435.121753
ɑ,718.8871543,1139.013741,2820.694337
ɑ,629.1691413,1064.047107,2910.997552
ɪ,375.0097039,2091.996102,2648.991664
This is the code I've been testing with:
d <- read.delim("my.filename", header=TRUE, sep=",", encoding="UTF-8")
Plotting in 3D (plots things like "<\U+0252>" etc. (escaped here!) for all non-ASCII chars):
library(rgl)
cols <- c("F1", "F2", "F3");
plot3d(d[,cols], xlab="F1", ylab="F2", zlab="F3", type="n");
text3d(d[,cols], col=1, text=d$vowel);
Plotting in 2D (works):
cols <- c("F1", "F2");
plot(d[,cols], xlab="F1", ylab="F2", type="n");
text(d[,cols], col=1, labels=d$vowel);
Does it have something to do with OpenGL? I've installed freetype, hoping that might solve the issue, but I haven't managed to point R to it - so it issues warnings "par3d(useFreeType = TRUE) : FreeType not supported in this build" and "In par3d(useFreeType = TRUE) : font family "sans" not found, using "bitmap""...
Having spent several hours battling R for freetype, I was hoping someone here can tell me whether freetype will even solve the issue??! If yes, a hint as to what "Set the environment variable LIB_FREETYPE to give the full path to the install directory" (from rgl README) is trying to tell me to do would be hugely appreciated!
Thank you.
My sessionInfo:
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
LC_COLLATE=English_United Kingdom.1252
LC_CTYPE=English_United Kingdom.1252
LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C
LC_TIME=English_United Kingdom.1252
attached base packages:
stats graphics grDevices utils datasets methods base
other attached packages:
rgl_0.93.975
You need to have FreeType installed.
Make sure you have FreeType and FreeType Open GL libraries installed, then reinstall rgl in R and then everything works.
See here also : http://www.smnd.sk/kotanyi/index.php?page=rgl

R plot title encoding in Pdf

This question is related to: Rhtml: Warning: conversion failure on '<var>' in 'mbcsToSbcs': dot substituted for <var> and R doesn't open with UTF-8
I use Ubuntu, I can not show a turkish character, ı, on the title of a plot:
myScript.r:
pdf(file='/home/sait/Desktop/abc.pdf')
plot(1:7,1:7,main='geziparkı')
I am having the following warning messages when I run the script using Rscript myScript.r,
Warning messages:
1: In title(...) :
conversion failure on 'geziparkı' in 'mbcsToSbcs': dot substituted for <c4>
2: In title(...) :
conversion failure on 'geziparkı' in 'mbcsToSbcs': dot substituted for <b1>
3: In title(...) :
conversion failure on 'geziparkı' in 'mbcsToSbcs': dot substituted for <c4>
4: In title(...) :
conversion failure on 'geziparkı' in 'mbcsToSbcs': dot substituted for <b1>
I added the line pdf.options(encoding='ISOLatin2.enc') on the top of my script as mentioned in the related previous questions, did not help.
Do I need to change something from my locale settings of Ubuntu. My sessioinInfo() is following,
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=tr_TR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=tr_TR.UTF-8 LC_COLLATE=tr_TR.UTF-8
[5] LC_MONETARY=tr_TR.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=tr_TR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
PS: I continue investigating this issue, and realized that if I use .png, it works perfectly, only problem is with .pdf.
I finally found the solution,
Substituting pdf(file='/home/sait/Desktop/abc.pdf') with
cairo_pdf('/home/sait/Desktop/abc.pdf', family="DejaVu Sans") did the trick.
I do not know what this actually done, however I have tried a lot of stuff and nothing has worked except this one.

knitr updated from 1.2 to 1.4 error: Quitting from lines

I recently updated knitr to 1.4, and since then my .Rnw files don't compile.
The document is rich (7 chapters, included with child="").
Now, in the recent knitr version I get an error message:
Quitting from lines 131-792 (/DATEN/anna/tex/CoSta/chapter1.Rnw)
Quitting from lines 817-826 (/DATEN/anna/tex/CoSta/chapter1.Rnw)
Fehler in if (eval) { :
Argument kann nicht als logischer Wert interpretiert werden
(the last two lines mean that knitr is looking for a logical and it cannot find it.
At those lines 131 and 817 two figures end. Compiling these sniplets separately will work.
I have no idea how to resolve this problem.
Thank's in advance for any hints that allow to resolve my issue.
Here is the sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] knitr_1.4
loaded via a namespace (and not attached):
[1] compiler_2.15.1 digest_0.6.3 evaluate_0.4.7 formatR_0.9
[5] stringr_0.6.2 tcltk_2.15.1
Following the suggestions of Hui, I run each chapter separately with
knit("chapter1.Rnw")
and so on. No error message occurs, and separate tex files are created. To provide more information I display part of the code.
There is a main document in which several options are set
<<options-setting,echo=FALSE>>=
showthis <- FALSE
evalthis <- FALSE
evalchapter <- TRUE
opts_chunk$set(comment=NA, fig.width=6, fig.height=4)
#
The each chapter is used via child chunks, e.g. chapter1 is called from
<<child-chapter1, child='chapter1.Rnw', eval=evalchapter>>=
#
The error message which appears when knitting the main Rnw file was given above.
The related Figure environment is as follows
\begin{figure}[ht]
\centering
<<wuerfel-simulation,echo=showthis,fig.height=5>>=
data.sample6 <- sample(1:6,repl=TRUE,100)
table(data.sample6)
barplot(table(data.sample6)/100,col=5,main="Haeufigkeiten beim Wuerfeln")
#
\caption{Visualisierung beim W"urfeln. 100 Versuche.}
\label{fig:muent-vis}
\end{figure}
This is not very advanced, but the error is still as it was given before.
The quitting from lines concerns a long text, from 131 (end of first chunk) to line 792 (beginning of the followup chunk), which is
<< zeiten, echo=showthis,eval=evalthis>>=
zeiten <- c(17,16,20,24,22,15,21,15,17,22)
max(zeiten)
mean(zeiten)
zeiten[4] <- 18; zeiten
mean(zeiten)
sum(zeiten > 20)
#
Is there a problem with correctly closing a chunk?
I now located the error and I provide a short piece of code with reproducible error message.It concerns conditional evaluation of child processes involving Sexpr:
The main file is the following
\documentclass{article}
\begin{document}
<<options-setting,echo=FALSE>>=
evalchapter <- TRUE
#
<<test,child="test-child.Rnw", eval=evalchapter>>=
#
\end{document}
The related child file 'test-child.Rnw' is
<<no-sexpr>>=
t <- 2:4
#
text \Sexpr{(t <- 2:4)}
knitting this 'as is' gives the error message from above. Removing the Sexpr in the child everything works nicely.
But, everything also works nicely, if I remove the conditioning in the call of the child file, i.e., without 'eval=evalchapter'
Since I use Sexpr quite often I would like to have a solution to this problem. As I mentioned earlier, there were no problems up to knitR Version 1.2.
This is related to a change in knitr 1.3 and mentioned in the NEWS:
added an argument options to knit_child() to set global chunk options for child documents; if a parent chunk calls a child document (via the child option), the chunk options of the parent chunk will be used as global options for the child document, e.g. for <<foo, child='bar.Rnw', fig.path='figure/foo-'>>=, the figure path prefix will be figure/foo- in bar.Rnw; see How to avoid figure filenames in child calls for an application
And this caused a bug for inline R code. In your case, the chunk option eval=evalchapter was not evaluated when it is used for evaluating inline code. I have fixed the bug in the development version v1.4.5 on Github.

how to display đ, ư, ơ, ă in R graphs

I am trying to put Vietnamese labeling in R graphs. I use RStudio and save my code using UTF-8 encoding. It handles the Vietnamese characters I put in the code well, I mean everything shows up in the code properly. However, in the graphs I make, while many characters display OK, several important ones do not show up properly, including
đ - which displays incorrectly as d
ư - which displays incorrectly as u
ơ - which displays incorrectly as o
ă - which displays incorrectly as a
Unfortunately this makes my graphs look unprofessional and untrustworthy.
I would really appreciate it if someone can help me figure this out.
Thanks much!
Trang
#DWin: I am on Windows 7, and here is my sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] MASS_7.3-17 MatchIt_2.4-20 tools_2.15.0
#krlmlr: Here's my code for a simple graph:
knorelative.count <- matrix(nrow=5,ncol=1)
knorelative.count[,1] <- c(1579,638,215,100,120)
par(mar=c(2,4,4,2))
barplot(prop.table(knorelative.count),beside=TRUE,
yaxt="n",ylim=c(0,.6),
legend=c("không ai biết",
"không biết nhiều hơn biết",
"nửa biết, nửa không biết",
"biết nhiều hơn không biết",
"tất cả đều biết"),
main="Người khác trong gia đình, họ hàng biết hay không")
axis(2,at=seq(0,.6,.1),labels=paste(100*seq(0,.6,.1),"%",sep=""),las=1)
When I run this, the đ in the main title and the two ơ's and the đ in the legend turn into d and o.
You are often safer specifying unicode characters by their hex codes:
plot(1:4,rep(1,4),pch=c("\u0111","\u01B0","\u01A1","\u0103"),cex=4)
For any Vietnamese folks out there who run into the same problem, here's an example for the fix using hex codes suggested by James:
print("trường")
[1] "truờng"
print("tr\u01B0ờng")
[1] "trường"
While I can type Vietnamese, in this example the word trường, into my R console ok, any kind of output (e.g. print, graph) fails to display the character ư. Replacing ư with the hex code fixes the output.
(Note: I used the function paste earlier, but then edited this based on James's suggestion to stick the hex code in the character string.)
I am so thankful to learn this way. I will do this for the report I am currently writing.
Trang
You save code in another file. Example: folder R/graph_display.R
After that run this code
eval(parse("R/graph_display.R", encoding = "UTF-8"))

Resources