Repeated data.table fread and fwrite causes "Permission denied" error - r

I encountered this issue using the data.table fwrite() and fread() functions for managing resources in a parallel calculation, but was also able to recreate the behavior in the below sequential example code. Calling fwrite() throws the following error:
Error in fwrite(dt, csv_path) : Permission denied: 'D:/mypath/test.csv'. Failed to open
existing file for writing. Do you have write permission to it? Is this
Windows and does another process such as Excel have it open?
The behavior seems to be related to the calling of fread() right before, as commenting out the fread() command makes the error disappear. Depending on your system, you might have to increase the number of iterations before the error occurs as it occurs at varying iteration numbers.
Does anyone have an idea why this is happening? Thanks in advance for your assistance!
Example code:
library(data.table)
dt = data.table(a = c(1, 2), b = c("a", "b"))
csv_path = "D:/mypath/test.csv"
fwrite(dt, csv_path)
for(i in 1:10000){
test = fread(csv_path)
fwrite(dt, csv_path)
}
System info
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)
data.table_1.12.8

I tried your code on a Windows machine and I was not able reproduce it.
I believe the issue is related to Windows file handler, which seems to be not fast enought to close file connection before opening it again.
You can try following code to see if it is reproducible just in R:
x = "a,b\n1,a\n2,b\n"
csv_path = "D:/mypath/test.csv"
file.create(csv_path)
f = file(csv_path, "w")
cat(x, file=f)
close(f)
for (i in 1:10000) {
f = file(csv_path, "r")
test = readLines(f)
close(f)
f = file(csv_path, "w")
cat(x, file=f)
close(f)
}
What could also make sense is it see how much Sys.sleep is enough to make the problem disappear.

Determine the number of threads you're using for data tables with
data.table::getDTthreads()
I was receiving the same fread() error until I reduced this from 96 to 24 with
data.table::setDTthreads(threads = 24)
Other users have reported threads < 79 works. See .data.table crashes with segfault while grouping with more than 79 threads #5077.

Related

Error in makePSOCKcluster(names = spec, ...) : Cluster setup failed. 3 of 3 workers failed to connect

I am trying to recruit more cores to increases my processing time for some lidar data I am analyzing but I keep getting "Error in makePSOCKcluster(names = spec, ...) : Cluster setup failed. 3 of 3 workers failed to connect." after I run this:
UseCores <-detectCores() -1
cl <- makeCluster(UseCores)
registerDoParallel(cl)
foreach(i=1:lengthcanopy_list)) %dopar% {
library(raster)
ttops <- vwf(CHM = canopy_test, winFun = lin, minHeight = 2, maxWinDiameter = NULL)
}
Why am I getting this error and what can I do to fix it?
It seems a problem relative to recent versions of R. Until further updates, looking at this issue on GitHub it seems there are two workarounds as follows.
Directly use this to create the cluster:
cl <- parallel::makeCluster(2, setup_strategy = "sequential")
Or for a long term solution add the following to your ~/.Rprofile
## WORKAROUND: https://github.com/rstudio/rstudio/issues/6692
## Revert to 'sequential' setup of PSOCK cluster in RStudio Console on macOS and R 4.0.0
if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM")) &&
Sys.info()["sysname"] == "Darwin" && getRversion() >= "4.0.0") {
parallel:::setDefaultClusterOptions(setup_strategy = "sequential")
}
Even if this workaround was necessary for Rstudio users, it could be of general use, as it is useful also on my GitLab registered runner tests.
Assuming you're working in RStudio, the problem was most likely this bug: changes in parallel created a conflict between R 4.0 and RStudio 1.3.something. Your code isn't a minimal working example so I can't check it on my end, but I just confirmed that cl = makeCluster(1) behaves as expected in RStudio 1.4.1103, R 4.0.2, Mac OS 10.15.7. So try updating RStudio and checking your code again.

R rowsum in a data.table crashes R

The code below works fine
library(data.table)
dt <- data.table(mtcars)[,.(cyl, gear, mpg)]
colsToSum <- c("cyl", "gear", "mpg")
dt[, F15_49 := rowSums(.SD), .SDcols = colsToSum]
but a version of this crashes R in RStudio with the message "R Session Aborted. R encountered a fatal error. The session was terminated." Followed by a Start New Session button. The code snippet that crashes is
ageColsToSum <- c("F15_19", "F20_24", "F25_29", "F30_34", "F35_39", "F40_44", "F45_49")
dt.SSP.scen.wide[, F15_49 := rowSums(.SD), .SDcols = ageColsToSum]
When I run the code in R in a shell I get the following message.
OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Abort trap: 6
I don't know anything about openMP so I don't know what could be initializing libomp.dylib.
I followed the directions at https://github.com/Rdatatable/data.table/wiki/Installation and installed the development version of data.table 1.10.5. My code now works.

R Studio running scripts improperly? [duplicate]

Since a couple of days ago when I run a for loop in R it gives me plenty of errors related to "}". It only happens if I highlight the whole code and run it. If i execute it line by line, then it runs just fine.
I tried even with the most basic loop:
foo <- seq(1, 100, by=2)
foo.squared <- NULL
for (i in 1:50 ) {
foo.squared[i] <- foo[i]^2
}
Here is the console:
> foo <- seq(1, 100, by=2)
"rror: unexpected input in "foo <- seq(1, 100, by=2)
> foo.squared <- NULL
"rror: unexpected input in "foo.squared <- NULL
> for (i in 1:50 ){
"rror: unexpected input in "for (i in 1:50 ){
> foo.squared[i] <- foo[i]^2
"rror: unexpected input in " foo.squared[i] <- foo[i]^2
> }
Error: unexpected '}' in "}"
>
Details of the R session (I run it in RStudio):
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
It has been very annoying!! I would appreciate any advice!!!
Thanks,
Maria
UPDATE:
Here is the code in the very beginning that I suspect causes this problem..It is supposed to take a vector of names and extract the second element from it.
splitnames <- strsplit(as.character(train$Name),"[,.]")
firstelement <- function(x){x[2]}
sapply(splitnames,firstelement)
After I execute it R acts weird. Though I am not 100% sure.
I have been experiencing the same problem, and have found that it is caused by a bug in RStudio (the code runs fine in R and R-gui, but fails in RStudio.) It is hard to reproduce until something gets corrupted in RStudio's saved state, after which the behaviour is pretty consistent.
Removing ~/.rstudio-desktop fixed the issue for me.
mv ~/.rstudio-desktop ~/rstudio-desktop.old
More on resetting RStudio's state on various platforms here.
I suspect that the issue was in using the R script that I downloaded from a website. I ended up reinstalling R and saving my own R script as a new file. I am not sure what and how, but now it is working fine.
I am also using Rstudio and get the same error message when running for loops.
Error: unexpected '}' in "}"
If I source the file, like this...
source('~/.active-rstudio-document')
or if I simply click the "source" button in the GUI, I don't get the same error message.
If sourcing the whole R script is not an option, consider copying the for loop to another file and sourcing that.

R "for loop" error messages {}

Since a couple of days ago when I run a for loop in R it gives me plenty of errors related to "}". It only happens if I highlight the whole code and run it. If i execute it line by line, then it runs just fine.
I tried even with the most basic loop:
foo <- seq(1, 100, by=2)
foo.squared <- NULL
for (i in 1:50 ) {
foo.squared[i] <- foo[i]^2
}
Here is the console:
> foo <- seq(1, 100, by=2)
"rror: unexpected input in "foo <- seq(1, 100, by=2)
> foo.squared <- NULL
"rror: unexpected input in "foo.squared <- NULL
> for (i in 1:50 ){
"rror: unexpected input in "for (i in 1:50 ){
> foo.squared[i] <- foo[i]^2
"rror: unexpected input in " foo.squared[i] <- foo[i]^2
> }
Error: unexpected '}' in "}"
>
Details of the R session (I run it in RStudio):
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
It has been very annoying!! I would appreciate any advice!!!
Thanks,
Maria
UPDATE:
Here is the code in the very beginning that I suspect causes this problem..It is supposed to take a vector of names and extract the second element from it.
splitnames <- strsplit(as.character(train$Name),"[,.]")
firstelement <- function(x){x[2]}
sapply(splitnames,firstelement)
After I execute it R acts weird. Though I am not 100% sure.
I have been experiencing the same problem, and have found that it is caused by a bug in RStudio (the code runs fine in R and R-gui, but fails in RStudio.) It is hard to reproduce until something gets corrupted in RStudio's saved state, after which the behaviour is pretty consistent.
Removing ~/.rstudio-desktop fixed the issue for me.
mv ~/.rstudio-desktop ~/rstudio-desktop.old
More on resetting RStudio's state on various platforms here.
I suspect that the issue was in using the R script that I downloaded from a website. I ended up reinstalling R and saving my own R script as a new file. I am not sure what and how, but now it is working fine.
I am also using Rstudio and get the same error message when running for loops.
Error: unexpected '}' in "}"
If I source the file, like this...
source('~/.active-rstudio-document')
or if I simply click the "source" button in the GUI, I don't get the same error message.
If sourcing the whole R script is not an option, consider copying the for loop to another file and sourcing that.

Error in ls(envir = envir, all.names = private)?

The below error keeps coming up inconsistently when I try to read excel files into R using the 'XLConnect' package.
Error in ls(envir = envir, all.names = private) :
invalid 'envir' argument
I have actually run into this error while even using other packages that read excel files like package 'xlsx' and 'xlsReadWrite'. Many times restarting the R session solves this problem, which leads me to think that something else I am doing in my R session is changing the environment and not allowing me to load excel files anymore. Below is the latest example of code that is causing this error. In this case I know that the following coding sequence is causing the error to appear - but why is that happening? And how can I get past this error if I need the chron package.
library("XLConnect")
wb2 <- loadWorkbook("excel_file", create = FALSE)
library(chron)
wb2 <- loadWorkbook("excel_file", create = FALSE)
Anyone else run into this issue before? Any help on this issue is greatly appreciated!
Before reopening the workbook try removing the reference to previously opened one, so:
rm(wb2)
wb2 <- loadWorkbook("excel_file", create = FALSE)
Also, make sure that "excel_file" is not open by excel or any other program while you run the R test.
I've seen the same error come up when using XLConnect and the above seemed to help.
Had this problem a couple of times and the call stack looks like this message is generated when a "OutOfMemory" Exception is thrown.
To solve this problem I used:
options( java.parameters = "-Xmx4g" )
to increase the heap size rJava is able to use.
Debugging with options(error=utils::recover) helped a lot, because the R error messages are not very specific.

Resources