I have a problem that it might be a bit unique, but I think that if it is answered it could answer other questions about encoding too.
In order to expand my R skills I tried to write a function that I could manage the vcf file from android phones. Everything went ok, until I tried to upload the file in the phone. An error appeared that the first line starts with something else than a normal VCF version 3 file. But when I check the file on the PC it appears to be ok without these characters that my phone said. So, I asked about it and one person here said that it is the Byte Ordering Mark and I should use a HEX editor to see it. And it was there even it couldn't be seen in the TXT editor of windows and linux.
Thus, I tried to solve the problem by using fileEncoding arguments in R. the code that I use to write the file is:
write.table(cons2,file=paste(filename,".vcf",sep=""),row.names=F,col.names=F,quote=FALSE,fileEncoding="")
I put ASCII as argument, UTF-8 etc but no luck. ASCII seems to delete some of the characters, and UTF-8 makes these characters be visible in the text file.
I would appreciate if someone could provide a solution to this.
PS: I know that if I modify the file in a HEX editor it solves the problem, but I want the solution in the R coding.
Related
I am working with Mitsubishi PLC files that were originally commented in Japanese but then opened on English-only computers which converted the Japanese symbols to incomprehensible latin keyboard symbol combinations such as ‰^“]€”õONŠm”F(‘€ì”Õ1).
Being able to understand these comments would greatly enhance my ability to analyze and modify these files as I am required to do so for my work. If I could translate these back to Japanese symbols (I do have the Japanese language pack installed on my windows laptop), I could then translate these with Google Translate, which I know is not perfect, but is a lot better than ##$$##&^.
Does anyone have any ideas how this could be done? I figure that Windows must have interpreted the original characters somehow, and there may be a way to interpret them back to the original symbols.
I am thinking of trying to do some kind of character translation using a script in Python or Powershell or VBA (maybe I can create a map in Excel...)
Any ideas?
I can export these comments into CSV files so easy to get to and manipulate if I can figure out how....
This is an ongoing problem for me so I am willing to put some time into a solution.
I tried re-opening the oldest version of the files, in my computer with the Japanese language pack installed and no luck.
You can run your text through an ascii to hex converter and then through a hex to ascii converter in order to change the encoding without your system settings being in the way.
I occasionally work with data frames where unorthodox special characters are used that look identical to standard characters in RStudio's in-built viewing functionality. I refer to these characters in my scripts, but sometimes when I open the file, these characters have been changed to standard keyboard characters within the script.
For example, in my script, ’ changes to a standard apostrophe ' and – changes to a standard hyphen -.
These scripts are ones I have to run regularly, so having to manually correct this each time is a chore. I also haven't worked out what it is that triggers RStudio to make these changes. I've tried closing and reopening to test if that's the trigger, and the characters have remained correct. It only seems to happen after I've turned off my computer.
Does anyone know of a workaround for this and/or what is causing this? TIA
EDIT: the reason I need to do this is I export to csv which is UTF-8 encoded.
I've found a workaround, although I welcome any feedback on any drawbacks to this.
If you have already written your code (including the special characters):
Click File > Save with Encoding... > Show all encodings > unicodeFFFE
Now when you reopen the file:
Click File > Reopen with Encoding... > Show all encodings > unicodeFFFE
If you haven't already written your code, it should just be a case of saving your file from the start with the unicodeFFFE encoding (instructions above) before you write the code and then using the reopen with encoding option whenever you open the file.
I'm a newbie and have searched Stack, the internet, everywhere I can think of.. But I cannot figure out why when I use write.csv() in R it doesn't actually save it as a csv file on my computer. All I want is to get a .csv file of my work from RStudio to Tableau and I've spent a week trying to figure it out. Many of the answers I have read use too much coding "lingo" and I cannot translate it because I'm just a beginner. Would be so so thankful for any help.
Here is the code I'm using:
""write.csv(daily_steps2,"C:\daily_steps2.csv", row.names = TRUE)""
I put the double quotes around the code because it seems like that's what I'm supposed to do here? IDK, but I don't have those when I run the function. There is no error when I run this, it just doesn't show up as a .csv on my computer. It runs but actually does nothing. Thank you so much for any help.
In my opinion, the simplest way would be to save the file to the same folder that rstudio is running in, and use the rstudio gui. It should be write.csv(daily_steps, "./daily_steps.csv") (no quotes around the function), and then on the tab in the bottom right of rstudio, you can select files, and it should be there. Then you can use the graphic user interface to move it to your desktop in a way analogous to what you would do in MS word.
Quick fix is to use double slashes or forward slash for Windows paths. (Also, since row.names=TRUE is the default, there is no need to specify)
write.csv(daily_steps2, "C:\\daily_steps2.csv")
write.csv(daily_steps2, "C:/daily_steps2.csv")
However, consider the OS-agnostic file.path() and avoid issues of folder separators in file paths including forward slash (used on Unix systems like Mac and Linux) or backslash (used on Windows systems).
write.csv(daily_steps2, file.path("C:", "daily_steps2.csv"))
Another benefit is that because of this functional form to path expression, you can pass dynamic file or folder names without paste.
I have imported several .txt files (texts written in Spanish) to RStudio using the following code:
content = readLines(paste("my_texts", "text1",sep = "/"))
However, when I read the texts in RStudio, they contain codes instead of diacritics. For example, I see the code <97> instead of an "ó" or the code <96> instead of an "ñ".
I have realized also that if the .txt file was originally written using a computer configured in Spanish, I don't see the codes but the actual diacritics. And if the texts were written using a a computer configured in English, then I do get the codes (even though when opening the .txt file on TextEdit I see the diacritics).
I don't know why R displays those symbols and what I can do to retain the diacritics I see in the original .txt files.
I read I could possibly solve this by changing the encoding to UTF-8, so I tried this:
content = readLines(paste("my_texts", "text1",sep = "/"), encoding = "UTF-8")
But that didn't work. Any ideas what those codes are and how to keep my diacritics?
As you figured out, you need to set the correct encoding. Unfortunately the text file was written using a legacy encoding rather than UTF-8 — namely, MacRoman. Ideally the application producing the file would not use this encoding, and Apple products by default no longer produce it.
But since this is what you’ve got, we have to deal with it, and we can. But unfortunately we need to go a detour because the encoding argument of readLines is a bit useless. Instead, we need to manually open a file connection:
con = file(file.path("my_texts", "text1"), encoding = "macintosh")
on.exit(close(con)) # Always make sure to close connections!
contents = readLines(con)
Do note that the encoding name “macintosh” is strictly speaking not portable, so this might not work on all platforms.
more a tip question that can save lots of time in many cases. I have a script.R file which I try to save and get the error:
Not all of the characters in ~/folder/script.R could be encoded using ASCII. To save using a different encoding, choose "File | Save with Encoding..." from the main menu.
I was working on this file for months and today I was editing like crazy my code and got this error for the first time, so obviously I inserted a character that can not be encoded while I was working today.
My question is, can I track and find this specific character and where exactly in the document is?
There are about 1000 lines in my code and it's almost impossible to manually search it.
Use tools::showNonASCIIfile() to spot the non-ascii.
Let me suggest two slight improvements this.
Process:
Save your file using a different encoding (eg UTF-8)
set a variable 'f' to the name of that file. something like this f <- yourpath\\yourfile.R
Then use tools::showNonASCIIfile(f) to display the faulty characters.
Something to check:
I have a Markdown file which I run to output to Word document (not important).
Some of the packages I used to initialise overload previous functions. I have found that the warning messages sometimes have nonASCII characters and this seems to have caused this message for me - some fault put all that output at the end of the file and I had to delete it anyway!
Check where characters are coming back from Warnings!
Cheers
Expanding the accepted answer with this answer to another question, to check for offending characters in the script currently open in RStudio, you can use this:
tools::showNonASCIIfile(rstudioapi::getSourceEditorContext()$path)