UTF-8 problems in RStudio - r

I am passing on my work with some R-files to my colleague atm, and we are having a lot of trouble getting the files to work on his computer. The script as well as the data contains the nordic letters, and so to prevent this from being an issue, we have made sure to save the R-files with encoding UTF-8.
Still, there are two problems. A solution to either one would be much appreciated:
Problem 1: loading the standard data CSV-file (semicolon separated - which works on my computer), my colleague gets the following error:
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string 3
But then, instead we have tried to make it work, both with a CSV-file that he has saved with UTF-8 format and also an excel (xlsx) file. Both these files he can load fine (with read.csv2 and read_excel from the officer-package, respectively), and in both cases, when he opens up the data in R, it looks fine to him too ("æ", "ø" and "å" are included).
The problem first comes when he tries to run the plots that actually has to "grap" and display the values from the data columns where "æ", "ø" and "å" are included in the values. Here, he gets the following error message:
in grid.call(c_textBounds, as.graphicAnnot(x$label), x$x, x$y, : invalid input 'value with æ/ø/å' in 'utf8towcs'
When I try to run the R-script with the CSV-UTF-8 data-file (comma-separated), and I open up the data in a tab in RStudio, I can see that æ, ø and å is not written correctly (but they are just a bunch of weird signs). This is weird, considering that it should work more optimally with this type of CSV-file, and instead I'm having problems with this and not the standard CSV-file (the non UTF-8, semicolon-separated file).
When I try to run the script with the xlsx-file it then works totally fine for me. I get to the plot that has to display the data values with æ, ø and å, and it works completely fine. I do not get the same error message.
Why does my colleague get these errors?
(we have also made sure that he has installed the danish version of R at the CRAN-website)
We have tried all of the above.

Related

R version 4.2.0 and Swedish letters (ä ö å) not working in newest R release. Anyone found a solution?

I have updated to the latest R release (R version 4.2.0), but I am now facing the problem that all the Swedish special letters cannot be read anymore. I am working with a database that has many Swedish letters in its factor lables and even if I am reading them in as strings R doesn't recognise them, with the consequence that all summary tables that are based on these factors as groups are not calculated correctly anymore. The code has been working fine under the previous release (but I had issues with knitting Rmarkdown files, therefore the need for updating).
I have set the encoding to iso-5889-4 (which is nothern languages) after UTF-8 has not worked. Is there anything else I could try? Or has anyone come to a solution on how to fix this, other than to rename all lables before reading in the .csv files? (I would really like to avoid this fix, since I am often working with similar data)
I have used read.csv() and it produces cryptic outputs replacing the special letters with for example <d6> instead of ö and <c4> instead of ä.
I hope that someone has an idea for a fix. Thanks.
edit: I use windows.
Sys.getlocale("LC_CTYPE")
[1] "Swedish_Sweden.utf8"
Use the encoding parameter
I have been able to detect failed loads by attempting to apply toupper to strings, which gives me errors such as
Error in toupper(dataset$column) :
invalid multibyte string 999751
This is resolved and expected outcomes obtained by using
read.csv(..., encoding = 'latin1')
or
data.table::fread(..., encoding = 'Latin-1')
I believe this solution should apply to Swedish characters as they are also covered by the Latin-1 encoding.
I have the same problem, what worked for me was like the answer above said but I used encoding ISO-8859-1 instead. It works for both reading from file and saving to file for Swedish characters å,ä,ö,Å,Ä,Ä, i.e:
read.csv("~/test.csv", fileEncoding = "ISO-8859-1")
and
write.csv2(x, file="test.csv", row.names = FALSE, na = "", fileEncoding = "ISO-8859-1")
It's tedious but it works right now. Another tip is if you use Rstudio is to go to Global options -> Code -> Saving and set your default text encoding to ISO-8859-1 and restart Rstudio. It will save and read your scripts in that encoding as default if I understand correctly. I had the problem when I opened my scripts with Swedish characters, they would display wrong characters. This solution fixed that.

R won't recognize updated CSV

so this is a super basic question that I'm hoping someone can help me with (I'm super new to R, so my troubleshooting is remedial at best).
I noticed there were some spelling errors in my data, so I went back to the CSV file, made the changes to the file, saved, closed out, and re-read in the data using read.csv(). Everything showed up and worked as normal until I wanted to simply run a count of the entries in three of the columns (I've done this numberous times with the exact same code and exact same CSV file with the exact same working directory, no spelling errors), but for whatever reason I got the following error message:
Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'FFS_TargetingEventsAllCSV2.csv': No such file or directory
I restarted everything, reset the working directory, doublechecked spelling and used getwd(), but encountered the same problem.
So I decided to use an older backup version of the same dataset. I was able to read it in as normal, and I was able to run my counts. However, I noticed there were similar spelling errors. So I went back to the CSV, made the necessary changes, re-read in the CSV. Everything looked normal until I ran the counts again and saw the exact same spelling errors, unchanged.
So I decided to start fresh and re-save the file (Save As) under a new name, new CSV, same working directory...same. exact. issue.
Every time I opened the CSV file on my desktop, it shows me the most up to date version, but I can't figure out why R isn't recognizing any changes. I even made new, subtle spelling changes in a different column to see if that would make a difference, but nope.
To clarify, the data in the three colums of interest are just 2-3 letters (i.e. FSP). It's only text, sometimes with a hypthen (i.e. HTBR-DA). I'm not trying to run any stats/tests or anything, I just want summary statistics. I also updated my R/RStudio last Thursday, so I have the most recent version of the software as well.
Any advice on this would be much appreciated.

BlueSky Statistics - Character Encoding Problem

I am loading a data set, characters of which was encoded in ISO 8859-9 ("Latin 5") using Windows 10 OS (Microsoft has assigned code page 28599 a.k.a. Windows-28599 to ISO-8859-9 in Windows).
The data set is originally in Excel.
Whenever I run an analysis, or any operation with a variable name containing a character specific to this code page (ISO 8859-9), I get an error like:
Error: undefined columns selected
BSkyFreqResults <- BSkyFrequency(vars = c("MesleÄŸi"), data = Turnudep_raw_data_5)
Error: object 'BSkyFreqResults' not found
BSkyFormat(BSkyFreqResults)
The characters ÄŸ within "MesleÄŸi" are originally one character in Turkish (g with an inverted hat on) ğ
Those variable names that contain only letters from US code page work normally in BlueSky operations.
If I try to use save as in Excel and use web option UTF-8, to convert the data to UTF-8, this does not work either. If I export it to csv file, it does not work as is, or saved as UTF-8.
How can I load this data into BlueSky so that it works?
This same data set works in Rstudio:
> Sys.getlocale('LC_CTYPE')
[1] "Turkish_Turkey.1254"
And also in SPSS:
Language is set to Unicode
Picture of Language settings in SPSS
It also works in Jamovi
I also get an error when I start BlueSky, that may be relevant to this problem:
Python-CFFI error
From cffi callback <function _consolewrite_ex at 0x000002A36B441F78>:
Traceback (most recent call last):
File "rpy2\rinterface_lib\callbacks.py", line 132, in _consolewrite_ex
File "rpy2\rinterface_lib\conversion.py", line 133, in _cchar_to_str_with_maxlen
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 15: invalid start byte
Since then I re-downloaded and re-installed BlueSky, but I still get this Python-CFFI error every time I start the software.
I want to work with BlueSky and will appreciate any help in resolving this problem.
Thanks in advance
Here is a link for reproducing the problem.
The zip file contains a data source of 2 cases both in Excel and BlueSky format, a BlueSky Markdown file to show how the error is produced and an RMarkdown file for redundancy (probably useless).
UPDATE: The Python error (Python-CFFI error) appears to be related to the Region settings in Windows.
If the region is USA (Turnudep_reprex_Windows_Region_USA-Settings.jpg) , the python error does NOT appear.
If the region is Turkey (Turnudep_reprex_Windows_Region_Turkey-Settings.jpg) the python error DOES appear.
Unfortunately, setting the region and language to USA does eliminate the python error message but not the other problem. Still all the operations with the Turkish variable names end up with an error.
This may be a problem only the BlueSky developers may solve ...
Any help or suggestion will be greatly appreciated.
UPDATE FOR VERSION 10.2: The Python error (Python-CFFI error) is eliminated in this version. All others persist. I also notice that I can not change the variable names that have characters not in US code page. Meaning, if a variable name is something like "HastaNo", I can do analysis with that variable and change the name of the variable in the editor. If the variable name is something like "Mesleği" I can not do analysis with that variable AND I CANNOT CHANGE THAT NAME in the editor to "Meslegi" or anything else, so that it is usable in analysis.
UPDATE FOR VERSION: BlueSky Statistics Version 10.2.1, R package version 8.70
No change from Version 10.2. Variable names that contain a character outside of ASCII, cause an error AND can not be changed in BlueSky Statistics.
For version 10, according to user manual chapter 15.1.3 you can adjust the encoding setting. (answer has been edited for more clarity)

Warning message:In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'hola.csv'

Problem: in R I get Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'hola.csv'
To simplify I created a basic table in excel and I have saved it in all the .csv formats it offers (comma separated values, csv UTF8, MS2 csv etc) and the error persists in all of them. I'm working in mac 10.15 catalina, Excel version 16.29.1 (2019).
I changed the language of my laptop from Spain to Uk, selecting , for groups and . for decimals, as some people here suggested it may be due to some countries languages by default using semicolon instead of commas for csv. After this, as expected, csv are indeed created separated by commas, but I still get the warning.
As suggested, if I open the file in textedit and click enter at the end, saving it afterwards, R works perfectly and the error disappears, but it does not seem practical/efficient to do that every single time I want to open a csv. On the other hand it remains a mystery to me why working colleagues using mac UK configuration do not get this error (neither do I when I open csv they have created on their laptops).
Can it be the Excel version? Should I ignore the warning? (the table looks fine when opening it). thanks!
aq2<-read.csv("hola.csv")
That is a warning message generated because R's read.table expects the final line to include an end of line character (either or ). It's almost always an unnecessary warning. Many programs including Excel, will create files like that.
You should carefully read the error message. It says incomplete final line found by readTableHeader. This refers to the last row of your .csv file and suggests that this line is incomplete for R to read. So what could be the problem? If you have a csv (=comma separated values) file, it might well be that each line has a certain formatting. Check if this formatting is consistently applied throughout the file. An issue that often pops up in hand-collected data. If you post an excerpt of your data using tail(aq2) (from tidyverse package) we could have a look at the last line and check the formatting to answer the issue in more depth. Eventually, it is just a warning, not an error message. Still, important to understand warnings.

Unexpected input error in Shiny, but unable to locate the source of error

I am getting the unexpected input error in UI.R, as follows:
ERROR: C:\Users\myApp/ui.R:1:2: unexpected input
1: ï»
However, when I try to locate the error at line 1, there is absolutely nothing of the form ï».
To resolve this error, I tried saving my UI.R file as a text file and changing the encoding to UTF-8, but this still does not remove the strange character. I also tried removing the first couple of lines and re-writing the code, but it still gives the same error!
How can I remove this character? Should I use another text editor?
I am using base R, not R Studio. And I had copy-pasted my code form my GitHub account, if that info is required...
Code from my file can be viewed here.
Many thanks.
I have this same issue in the year '19, and took me a while to run into this question from the year '14.
Not Shiny, but a regular R project with its .Rprofile.
The solution that worked for me is:
Open your file in Notepad++. From the Encoding menu, select Convert to UTF-8 (without BOM), save the file, replace the old file with this new file. And all is fixed.

Resources