I am trying to view characters of multiple of languages in RStudio. What I find unusual is I am able to view these in the console, but not in the viewer. UTF-8 encoded characters appear like 'U+3042', 'U+500B', etc. in the viewer.
Is there a way to get the viewer to display the actual characters instead of the encoded character?
Here are a couple of images showing what I mean -
In console: https://ibb.co/T0681H7
In viewer: https://ibb.co/QnxF25c
This is a known issue in RStudio. Feel free to comment/upvote here:
https://github.com/rstudio/rstudio/issues/4193
Related
I occasionally work with data frames where unorthodox special characters are used that look identical to standard characters in RStudio's in-built viewing functionality. I refer to these characters in my scripts, but sometimes when I open the file, these characters have been changed to standard keyboard characters within the script.
For example, in my script, ’ changes to a standard apostrophe ' and – changes to a standard hyphen -.
These scripts are ones I have to run regularly, so having to manually correct this each time is a chore. I also haven't worked out what it is that triggers RStudio to make these changes. I've tried closing and reopening to test if that's the trigger, and the characters have remained correct. It only seems to happen after I've turned off my computer.
Does anyone know of a workaround for this and/or what is causing this? TIA
EDIT: the reason I need to do this is I export to csv which is UTF-8 encoded.
I've found a workaround, although I welcome any feedback on any drawbacks to this.
If you have already written your code (including the special characters):
Click File > Save with Encoding... > Show all encodings > unicodeFFFE
Now when you reopen the file:
Click File > Reopen with Encoding... > Show all encodings > unicodeFFFE
If you haven't already written your code, it should just be a case of saving your file from the start with the unicodeFFFE encoding (instructions above) before you write the code and then using the reopen with encoding option whenever you open the file.
I was working with a script with lots of Cyrillic characters (throughout chunks and out of them) for weeks. One day I have opened a new Rmarkdown script where I wrote English, while the other document is still in my R session. Afterwards, I have returned to the Cyrillic document and everything written turns to something like this 8 иÑлÑ 1995 --> ÐлаÑÑÑ - наÑодÑ
The question is: Where is the source of problem? And, how can the corrupted script turn to its original form (with the Cyrillic characters)?
UPDATE!!
I have tried reopeining the Rstudio scrip with encoding CP1251, CP1252, windows1251 and UTF8, but it does not work. Certaintly the weird symbols change to another weird symbols. The problem is that I have saved the document with the default encoding CP1251 and windows1251) at the very begining.
Solution:
If working with cyrillic and lating characters, be sure you save the Rstudio script with UTF-8 encoding always, when you computer is windows (I do not know mac). If you close the script and open it again, re-open the file with UTF8 encoding.
Assuming you're using RStudio: Open your *.Rmd file and then try to reopen it "with encoding". Therefore simply use the File-Menu as shown below.
Select "Show all encodings" and choose your specific encoding, I suggest windows-1251 for cyrillic encoding:
Note: Apparently the issue can also occur while at the one time opening the *.Rmd file as "standalone" and at the other time from within an R Project.
Hope that would help.
There are a number of StackOverflow posts about opening CSV files containing (UTF-8 encoded) Chinese characters into R, in Windows. None of the answers I've found seem to work completely.
If I read.csv with encoding="UTF-8", then the Chinese characters are shown encoded (<U+XXXX>, which I've manually verified are at least correct). However, if I interrogate the data frame to get just one row or a specific cell from a row, then it's printed properly.
One post suggested this is due to strings being typed as factors. However, setting stringsAsFactors=FALSE had no effect.
Other posts say the locale must be set correctly. My system locale is apparently English_United Kingdom.1252; a Windows code page looks decidedly non-Unicode friendly! If I try to change it to any of en.UTF-8, en_GB.UTF-8 or en_US.UTF-8 (or even UTF-8 or Unicode), I get an error saying that my OS cannot honour the request.
If I try Sys.setlocale(category="LC_ALL", locale="Chinese"), the locale does change (albeit to another Windows code page; still no Unicode) but then the CSV files can't be parsed. That said, if I read the files in the English locale and then switch to Chinese afterwards, the data frame is printed out correctly in the console. However, this is cludgy and, regardless, View(myData) now shows mojibake rather than the encoded Unicode code points.
Is there any way to just make it all work? That is, correct Chinese characters are echoed from the data frame to the console and View works, without having to perform secret handshakes when reading the data?
My gut feeling is that the problem is the locale: It should be set to a UTF-8 locale and then everything should [might] just work. However, I don't know how to do that...
The UTF notation is good and it means your characters were read in property. The issue is on R's side with printing to console, which shouldn't be a big problem unless you are copying and pasting output. Writing out is a bit tricky: you want to open a UTF-8 file connection, then write to that file.
I'm currently attempting to convert some PCL files into PDF using GhostPCL (PCL6).
For the most part this works. However, there is an odd problem with some of the conversion. For some reason, PCL6 is not converting some logos where are at the top of our documents. The logo is of the format:
^[(25XABCDEFGHIJKLMNOPQ^[(3#^M
^[(25X^[&a+1.49RRSTUVWXYZ[\]^_`ab^[(3#^M
^[(25X^[&a+1.49Rcdefghijklmnopqrs^M
when viewing the PCL file in vim. When printing the file as a PCL file, the image prints out correctly, but when converting to pdf, the following takes it's place:
ABCDEFGHIJKLMNOPQ
RSTUVWXYZ[\]^_`ab
cdefghijklmnopqrs
I recognize that the format is meant to be matched against some sort of embedded image or font, but it has been really difficult trying to find useful documentation on PCL (so I can actually figure out what these characters mean) or the conversion process.
Can anyone offer some insight on how to approach the conversion? We will need these images/logos in the converted documents since they often contain disclaimer information as part of the image.
EDIT1: I've also attempted converting to postscript and printing then and the same behavior occurs.
EDIT2: When rendering the PCL file in a viewer, the same text shows up instead of the image. But when printing, the logo does show up. Strange...
EDIT3: To clarify, sending the PCL file to a printer directly does not seem to cause the problem (i.e, the logo does print correctly). It's only when I attempt to convert it to another file format that the problem occurs.
What happens when you try rendering the PCL input with Ghostscript ? Eg to the display device. If it doesn't render its not going to end up in a PDF either.
Have you tried printing the file to a PCL printer ?
If it works to a PCL printer, but not when rendering you can open a bug against ghostpcl. If it renders but does not end up in the PDF then you can open a bug against ghostspcl with the 'pdf writer' component.
Its possible that the logo is shown using a rasterop, this is a part of the PCL imaging model which has no counterpart in PDF and so cannot be reproduced. The result of using a rasterop with the PDF device is variable, sometimes it will do what you expect, often it will not.
I have a problem that it might be a bit unique, but I think that if it is answered it could answer other questions about encoding too.
In order to expand my R skills I tried to write a function that I could manage the vcf file from android phones. Everything went ok, until I tried to upload the file in the phone. An error appeared that the first line starts with something else than a normal VCF version 3 file. But when I check the file on the PC it appears to be ok without these characters that my phone said. So, I asked about it and one person here said that it is the Byte Ordering Mark and I should use a HEX editor to see it. And it was there even it couldn't be seen in the TXT editor of windows and linux.
Thus, I tried to solve the problem by using fileEncoding arguments in R. the code that I use to write the file is:
write.table(cons2,file=paste(filename,".vcf",sep=""),row.names=F,col.names=F,quote=FALSE,fileEncoding="")
I put ASCII as argument, UTF-8 etc but no luck. ASCII seems to delete some of the characters, and UTF-8 makes these characters be visible in the text file.
I would appreciate if someone could provide a solution to this.
PS: I know that if I modify the file in a HEX editor it solves the problem, but I want the solution in the R coding.