Problems with unicode in R - r

I am having problems with printing unicode characters in R. Initially the problems started with me trying to plot some custom labels with ggplot, but I have found out that this problem runs deeper.
For example, the letter đ is represented with unicode code U+0100. This means that if I type
"\u0100"
in the console, I should get đ printed as output, right? However, this is not what happens. Instead, as the output I get:
<U+0100>
I don't understand why this is happening since my encoding is set to UTF-8. Does anyone have any ideas on how to solve this?
I tried using the following function:
stri_escape_unicode("<U+0100>")
but I just get the same output as before. Any help would be appreciated! I am using Macbook Pro from 2013.

Related

Why can't read_csv use my directory/path?

I am having a problem with my read_csv. I have used this function with no problem but the path/directory I am using is a little different than normal and I can't figure it out by myself.
This is the code I have been using:
X2022_03_08_habit_and_OCD_clinical <- read_csv("Box/OCD: Habit or Learning?/experiment/data/raw/survey-data/2022-03-08_habit-and-OCD_clinical.csv")
I have tired tweaking this by not using the first two arguments with no luck. Has anyone used Box in a path before (Box is in my finder, like desktop would be). I also tried updating r for that first error code but maybe it didn't take, I am not sure how to update again.
Here is the error code I have been receiving:Error Message
I would appreciate any help and I apologize if there is a simple answer I have been missing!

Bengali conjuncts not rendering in ggplot

ggplot(data=NULL,aes(x=1,y=1))+
geom_text(size=10,label="ক্ত", family="Kohinoor Bangla")
On my machine, the Bengali conjunct cluster "ক্ত" is rendered as its constituents plus a virana:
I have tried several different fonts to no avail. Is there a trick to making conjuncts render correctly?
EDIT:
Explicitly using the unicode still doesn't not render correctly:
This renders correctly for me:
print(stringi::stri_enc_toutf8("\u0995\u09cd\u09a4"))
This still gives me the exact same result as before
ggplot(data=NULL,aes(x=1,y=1))+
geom_text(size=10,label="\u0995\u09cd\u09a4", family="Kohinoor Bangla")
Why is there a difference between the console output and ggplot output?
I'm not familiar with the Bengali language, but if you would look up the unicode characters for the text that you want to render, you could simply use those in geom_text()
# According to unicode code chart, these are some Bengali characters
# U+099x4
# U+09Ex3
ggplot(data=NULL,aes(x=1,y=1))+
# Substitute 'U+' by '\u', leave the 'x' out
geom_text(size = 10, label = "\u0994\u09E3")
Substitute the unicode characters as you see fit.
Hope that helped!
EDIT: I tried your last piece of code, which gave me a warning about the font not being installed. So I ran it without the family = "Kohinoor Bangla":
ggplot(data=NULL,aes(x=1,y=1))+
geom_text(size=10,label="\u0995\u09cd\u09a4")
Which gave me the following output:
From a visual comparison with the character that you posted, it looks quite similar. Next, I ran the same piece of code on my work computer, which gave me the following output:
The difference between work and home, is that work runs on a linux, while home runs on windows, work has R 3.4.4, home has R 3.5.3. Both are in RStudio, both are ggplot 3.2.0. I can't update R on work because of backwards compatibility issues, to check wether the version of R might be the problem. However, you could check wether your version of R is older than 3.5.3 and see if updating relieves the problem. Otherwise, I would guess it is a platform issue.

gsub error message when addressing column in dataframe in RStudio

Since a couple of days I get the following error message in RStudio from time to time and can't figure out what is causing it.
When I write in the console window to address a data.frame followed by $ to address a specific column in the data.frame (for example df$SomeVariable), the following message is shown in the console window and is printed over an over with every letter I type
Error in gsub(reStrip, "", completions, perl = TRUE) :
input string 38 is invalid UTF-8
The error message doesn't have any real effect. Everything works just fine except the automatic completion of the variable name.
I'm using R version 3.4.4 and RStudio Version 1.0.143 on a Windows computer. In the R script I am currently working on I don't use gsub or any other "string" or regular expression function for that matter. The issue appeared with various data.frames and various types of variables in the data.frames (numeric, integer, date, factor, etc.). It also happens with various packages. Currently, I am using combinations of the packages readr, dplyr, plm, lfe, readstata13, infuser, and RPostgres. The issue disappears for a while after closing RStudio and opening it again but re-appears after working for a while.
Does anyone have an idea what may cause this and how to fix it?
I used to have the same problem a few days ago. I made some research and i found that when you import the dataset, you can change the encoding. Change the encoding to "latin1" and maybe that could fix your problem. Sorry for my poor english, im from Southamerica. Hope it works.

R rstudio view encoding different than print

When I use non-standard letters, the Rstudio view function tends to use a different encoding than print. If you run the following code you will see the difference. I've tried setting all possible encodings, but View keeps displaying them wrongly. Any solutions?
x <- data.frame(test=c('a','b','c','d','é'))
View(x)
print(x)
I will upload an image as soon as I have more reputation.
This issue is solved in the newest version of Rstudio (1.0.136).

invalid multibyte character crashes when script is loaded from source (umlauts / special characters)

EDIT:
Thx to suggestions from the mailing list I realized that the problem I got has nothing to do with Sweave or Latex. It´s some Mac OS X related issue. Whenever I run my script by selecting all and sending it to R it works.
When I use
source("myplainRcode.R")
i get the error message stated below
finally I got sweave working together with ggplot2 on my Mac OS X. I invoke Sweave inside R with
Sweave("myfile.Rnw")
which creates the desired latex output. Now that the basic tests work, I try to source my real world file and it crashes at the following line:
gl_bybranch = ddply(new_wans,.(period,Branchen),
function(X)data.frame(Geschäftslage=mean(X$sentiment)))
I guess it has either to do with the ".(period...)" or the "ä" . Unfortunately I can't change these labels because they are also used in legends. So, somewhere in my code these ugly umlauts will appear. Is there a way to escape them in Sweave? I can't believe that this is problem since Sweave is written by a German who probably have second most umlaut characters (behind Turkey).
The error message I get is: invalid multibyte character in Parser on line 195
Thx for any ideas in advance!
YAY ! i got it. Sorry for the noise everybody. I switched all three files (.Rnw, mysource.R , invokeSweave.R) to UTF-8 it finally worked. So everybody who works with Komodo on a Mac make sure to change your default encoding to UTF-8 !

Resources