gsub error message when addressing column in dataframe in RStudio - r

Since a couple of days I get the following error message in RStudio from time to time and can't figure out what is causing it.
When I write in the console window to address a data.frame followed by $ to address a specific column in the data.frame (for example df$SomeVariable), the following message is shown in the console window and is printed over an over with every letter I type
Error in gsub(reStrip, "", completions, perl = TRUE) :
input string 38 is invalid UTF-8
The error message doesn't have any real effect. Everything works just fine except the automatic completion of the variable name.
I'm using R version 3.4.4 and RStudio Version 1.0.143 on a Windows computer. In the R script I am currently working on I don't use gsub or any other "string" or regular expression function for that matter. The issue appeared with various data.frames and various types of variables in the data.frames (numeric, integer, date, factor, etc.). It also happens with various packages. Currently, I am using combinations of the packages readr, dplyr, plm, lfe, readstata13, infuser, and RPostgres. The issue disappears for a while after closing RStudio and opening it again but re-appears after working for a while.
Does anyone have an idea what may cause this and how to fix it?

I used to have the same problem a few days ago. I made some research and i found that when you import the dataset, you can change the encoding. Change the encoding to "latin1" and maybe that could fix your problem. Sorry for my poor english, im from Southamerica. Hope it works.

Related

Bengali conjuncts not rendering in ggplot

ggplot(data=NULL,aes(x=1,y=1))+
geom_text(size=10,label="ক্ত", family="Kohinoor Bangla")
On my machine, the Bengali conjunct cluster "ক্ত" is rendered as its constituents plus a virana:
I have tried several different fonts to no avail. Is there a trick to making conjuncts render correctly?
EDIT:
Explicitly using the unicode still doesn't not render correctly:
This renders correctly for me:
print(stringi::stri_enc_toutf8("\u0995\u09cd\u09a4"))
This still gives me the exact same result as before
ggplot(data=NULL,aes(x=1,y=1))+
geom_text(size=10,label="\u0995\u09cd\u09a4", family="Kohinoor Bangla")
Why is there a difference between the console output and ggplot output?
I'm not familiar with the Bengali language, but if you would look up the unicode characters for the text that you want to render, you could simply use those in geom_text()
# According to unicode code chart, these are some Bengali characters
# U+099x4
# U+09Ex3
ggplot(data=NULL,aes(x=1,y=1))+
# Substitute 'U+' by '\u', leave the 'x' out
geom_text(size = 10, label = "\u0994\u09E3")
Substitute the unicode characters as you see fit.
Hope that helped!
EDIT: I tried your last piece of code, which gave me a warning about the font not being installed. So I ran it without the family = "Kohinoor Bangla":
ggplot(data=NULL,aes(x=1,y=1))+
geom_text(size=10,label="\u0995\u09cd\u09a4")
Which gave me the following output:
From a visual comparison with the character that you posted, it looks quite similar. Next, I ran the same piece of code on my work computer, which gave me the following output:
The difference between work and home, is that work runs on a linux, while home runs on windows, work has R 3.4.4, home has R 3.5.3. Both are in RStudio, both are ggplot 3.2.0. I can't update R on work because of backwards compatibility issues, to check wether the version of R might be the problem. However, you could check wether your version of R is older than 3.5.3 and see if updating relieves the problem. Otherwise, I would guess it is a platform issue.

Problems with unicode in R

I am having problems with printing unicode characters in R. Initially the problems started with me trying to plot some custom labels with ggplot, but I have found out that this problem runs deeper.
For example, the letter đ is represented with unicode code U+0100. This means that if I type
"\u0100"
in the console, I should get đ printed as output, right? However, this is not what happens. Instead, as the output I get:
<U+0100>
I don't understand why this is happening since my encoding is set to UTF-8. Does anyone have any ideas on how to solve this?
I tried using the following function:
stri_escape_unicode("<U+0100>")
but I just get the same output as before. Any help would be appreciated! I am using Macbook Pro from 2013.

XLConnect 'envir' error

I manage a number of Excel reports, and I use R to do the preprocessing and write the output report. It's great because all I have to do is run the R function and distribute the reports, and the rest of the report writing is inactive time. The reports need to be in Excel format because it is the easiest to disseminate and the audience is large and non-technical. Once the data is pre-processed, I do this very, very simply using XLConnect:
file.copy(from = template,
to = newFileName)
writeWorksheetToFile(file = newFileName,
data = newData,
sheet = "Data",
clearSheets = T)
However, one of my reports began throwing this error when I attempted to write the new data:
Error in ls(envir = envir, all.names = private) :
invalid 'envir' argument
Furthermore, before throwing the error, the function ties up R for 15 minutes. The normal writing time is less than 10 seconds. I must confess, I don't understand what this error even means, and it did not succumb to my usual debugging methods or to any other SO solution.
I've noticed that others have referred to rJava (reinstalling this package didn't work) and to a Java cache of log files (not sure where this would be located on Mac). I'm especially confused as the report ran with no problems just one day earlier using precisely the same process, AND my other reports using the exact same process still work just fine.
I didn't update Java or R or my OS, or debug/rewrite any of the R code. So, starting from the beginning - how can I investigate this 'envir' error? What would you do if you were in my shoes? I've been working on this for a couple days and I'm stumped.
I'm happy to provide extra information if it will provide better context for more discerning programmers than myself :)
Update:
My previous answer (below) did not, in fact, fix this intermittent error (which as the OP points out is extremely difficult to unpick due to the Java dependency). Instead, I followed the advice given here and migrated from the XLConnect package to openxlsx, which sidesteps the problem entirely.
Previous answer:
I've been frustrated by precisely this error for a while, including the apparent intermittency and the tying up of R for several minutes when writing a workbook.
I just realised what the problem was: the length of the name of an Excel worksheet appears to be limited to 31 characters, and my R code was generating worksheet names in excess of this limit.
Just to be clear, I'm referring to the names of the individual tabbed sheets within an Excel workbook, not the filename of the workbook itself.
Trimming each worksheet name to no more than 31 characters fixed this error for me.

plyr's rename() working in R Studio if run manually, but not in source(), or native R client

I am using plyr to rename the columns of a large data set to a shorter aliases. The from names are very long with occasional unusual symbols (i.e. Â) This code works in in R Studio when I manually (i.e. Ctrl+R) execute the code. No errors.
However, when it is run using source in another script and/or in the standard Rgui (even using Ctrl+R), it recognizes some of the names in the from statement, but not others, which are identified in the error:
The following from values were not present in x
32/64 bit doesn't seem to make a difference. Can't identify character or value that is producing the error. Any solutions?
Should this be posted as an issue on the plyr Github?
I have prepared a dummy replica of the data set here.
The program that works in R Studio, but not in standard Rgui is here.
The code for the "source" call that produces errors is
source("dftest.R")
All software and packages updated on 3/18/2016.
See similar but unrelated question here.
Looks like copying the code from R Studio text editor and pasting into native client text editor, and then saving, solved both the source problem and the native client problem.

Use of fread() from data.table causes R session to abort

I am working on a project for a MOOC, and was tinkering around with the data.table package in RStudio. Use of the fread() function to import the data files initially worked fine:
fread("UCI HAR Dataset/features.txt")->features
fread("UCI HAR Dataset/test/y_test.txt")->ytest
However, when I tried to run the following line of code, I received a pop-up that said "R Session Aborted: R encountered a fatal error. The session was terminated."
fread("UCI HAR Dataset/test/X_test.txt")->xtest
I don't understand what the problem is. I checked the file names and paths to make sure I had correctly spelled and capitalized everything, and it all checks out. The equivalent code using read.table() works fine and does not cause R to abort. I also tried renaming the file to "x_test.txt", but the same issue occurred.
According to ?fread, only the function will only work with "regular delimited files." As far as I can tell, the file is a "regular delimited file", in that all rows have the same number of columns. There are no cells containing "NA" when I use read.table instead; I checked using anyNA(). Is there a quick way to determine whether a file is a delimited "regularly" or not? Is there something else about the original file that could be causing the problem?
UPDATE
After further research and searching through the reported issues listed on the developer's github, I think that my problem lies in having two white spaces at the beginning of each row, which is discussed here. I am unsure why R aborted instead of giving me a warning. The latest development version of data.table (1.9.5) isn't causing the session to abort under the same conditions, though.
Although I do believe you should have contacted the package maintainer first for any situation where the R session was aborted (and it was not due to your mucking with C-code), I can offer a strategy for your last request which is not really specific to fread but I've found useful with regular-reads(). I'm going to assume that this is a comma separated file but if it;'s whitespace separated you could change the sep="," to sep="".
filcnts <- count.fields("UCI HAR Dataset/test/X_test.txt", sep=",")
table(filcnts)
That should be a single items table. If not, try switching parameters such as quote, sep, blank.lines.skip, or comment.char

Resources