Error in openxlsx::read.xlsx/download.file after updating R to 4.2.1 - r

After update my Rgui to 4.2.1 version I found an enconding issue which was apparently solved by Problem with opening scripts in updated version of R. Briefly: my R files were opening empty and the solution was simply change default encoding to UTF-8.
But other issue came with (or just was there from the beginning). When I tried to read some xlsx directly from url, I got this error message:
In download.file(url = xlsxFile, destfile = tmpFile, cacheOK = FALSE, :
URL 'https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.xlsx': Timeout of 100000 seconds was reached
When I went back to R 4.0.5, the problem disappeared!
Reproducible example
url_maddison = "https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.xlsx"
read.xlsx(url_maddison, sheet = 3)

Related

How to remove a 'permission denied' file from folder within R

I am downloading a large xlsx file as a part of a function. This file is removed with file.remove() in linux and mac but I have permission denied in windows machines. Below is the code for my function.
download.file(
'http://mirtarbase.mbc.nctu.edu.tw/cache/download/7.0/miRTarBase_MTI.xlsx',
'miRTarBase.xlsx', mode = "wb")
readxl::read_excel('miRTarBase.xlsx') -> miRTarBase
write.csv(miRTarBase, 'miRTarBase.csv')
read.csv('miRTarBase.csv', row.names = 1) -> miRTarBase
file.remove("miRTarBase.xlsx")
I get the following error message in my console
Warning message:
In file.remove("miRTarBase.xlsx") :
cannot remove file 'miRTarBase.xlsx', reason 'Permission denied'.
Again this warning only appears in windows.
Furthermore, after checking the properties of the file itself the 'Read-only' attribute is unchecked.
Following this, the following code works perfectly fine so I do not think the issue is with the folder either.
file.remove("miRTarBase.csv")
I believe the issue lies in how .xlsx files are treated in windows.
When I try to delete the .xlsx file while Rstudio is still running I get a File in use warning message. After closing the R session the .xlsx file can be deleted with no hassle.
This has confused me because I am not used to working with windows. Has anyone had this issue before? Would appreciate any help that can be given. Many thanks.
Have you tried saving as a temporary file in windows?
tmp <- tempfile()
download.file(
'http://mirtarbase.mbc.nctu.edu.tw/cache/download/7.0/miRTarBase_MTI.xlsx', tmp, mode = "wb")
readxl::read_excel(tmp) -> miRTarBase
write.csv(miRTarBase, 'miRTarBase.csv')
read.csv('miRTarBase.csv', row.names = 1) -> miRTarBase
file.remove(tmp)

write_csv Error in stream_delim_(df, path, ...) : invalid connection

I have a doubt about why write_csv is not working in the following example, but write.csv is working properly. In general, I use the packages "dplyr" and "readr" as they are fast. However, I am not sure why this is happening.
write_csv(df1, path = "C:\\Users\\Sergio\\final\\file1.csv" ,col_names = TRUE)
I get the following error message:
Error in stream_delim_(df, path, ...) : invalid connection
However, if I write the following code, it works ok.
write.csv(df1, file = "file1.csv", fileEncoding="UTF-8")
I am using R version 3.4.1 64-bit, and RStudio Version 1.0.143. My OS is Windows 10. I would appreciate your comments. Thanks in advance.
You may have misspecified the file path. You should make sure both commands point to the same directory:
write.csv(df1, file = "C:\\Users\\Sergio\\final\\file1.csv", fileEncoding="UTF-8")
If that fails, you've misspecified the directory.

Error downloading GPL file with getGEO

Using OSX 10.11 and R 3.3.0 I get this error using GEOQuery package:
library(GEOquery)
GSE56045 <- getGEO("GSE56045")
It downloads the GSE file but not the GPL:
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL10558&form=text&view=full'
It looks like the GPL file was redirected and the download method auto set in GEOquery fails to follow the redirect: setting options('download.file.method.GEOquery'='auto')
I was able to get it working by running this in R: options('download.file.method.GEOquery' = 'libcurl')
Also, I had to delete the old downloaded GPL file - which was just the redirect message. It's easier to just set a download directory instead of finding the temp file, using destdir = for the getGEO command.

Warning message In download.file: download had nonzero exit status

I am downloading data from data.gov website and I get following two types of errors in the process:
fileUrl <- "http://catalog.data.gov/dataset/expenditures-on-children-by-families"
download.file(fileUrl,destfile=".data/studentdata.csv",method="curl")
Warning message:
In download.file(fileUrl, destfile = ".data/studentdata.csv", method = "curl") :
download had nonzero exit status
I tried to remove the method="curl" as suggested in other forum, but again I get this new error
download.file(fileUrl,destfile=".data/studentdata.csv")
Error in download.file(fileUrl, destfile = ".data/studentdata.csv") :
cannot open destfile '.data/studentdata.csv', reason 'No such file or directory'
I think there are two major factors why your curl doesn't work well.
First, the problem is on your URL. fileUrl <- "http://catalog.data.gov/dataset/expenditures-on-children-by-families". In your URL, it is not referred to a csv file. So, they won't work even if you set the destination into a csv file such as destfile = ".data/studentdata.csv"
I have an example of getting a csv dataset using the same code (different dataset):
DataURL<- "https://data.baltimorecity.gov/api/views/dz54-2aru/rows.csv?accessType=DOWNLOAD" (This link refers to a rows.csv file)
download.file(DataURL, destfile="./data/rows.csv", method="curl") (The method is quite same, using curl)
Second, previously I had the same problem that the curl does not work, even I used a proper URL that refers to a csv file. However, when I diagnosed a bit deeper, I found something interesting fact about why my curl method cannot work properly. It was my R session program. I used a 32-bit R, in which the error occurs. Later then, I tried to change the session into a 64-bit R. Amazingly, and the download status was running at that time. To see your R session architecture (whether you are using 32-bit or 64-bit), type in your R:
sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-ming32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
You have to switch your R, from 32-bit to 64-bit to avoid 'curl' call had nonzero exit status. You go to your R directory folder, and then you run a 64-bit R.
If you are using a Windows OS and installing the R in a default path folder, you can run this C:\Program Files\R\R-3.5.3\bin\x64\R.exe. (I used a version of 3.5.3, so it may be different with your version)
If you are using R-studio, you can switch the R session on the menubar Tools -> Global Options -> R version -> Change -> Use your machine's default version of R64 (64-bit) -> OK. Then restart your R-studio.
However, it depends on your OS architecture. If you are using a 32-bit OS, hence you have to find another way to solve this.
So looking at the code for download.file(...), if you specify method="curl" the function tries to use the curl shell command. If this command does not exist on your system, you will get the error above.
If you do not specify a method, the default is to use an internal R method to download, which evidently works on your system. In that case, the function is trying to put the file in .data/studentdata.csv but evidently there is not .data directory. Try taking out the ..
When this download works, you will get a text/html file, not a csv file. Your url points to a web page, not a download link. That page does have a download link, but unfortunately it is a pdf, not a csv.
Finally, if your goal is to have the data in R (is it?), and if the link actually produces a csv file, you could more easily use
df <- read.csv(fileUrl)
If I'm not very much mistaken you just have a simple typo here. I suspect you have a "data" directory, not a ".data" directory - in which case your only problem is that your destfile string needs to begin "./data", not ".data".
I was having the same problem.
Then I realized that I forget to create the "data" directory!
So try adding this above your fileURL line to create the directory first.
if(!file.exists("data")){
dir.create("data")
}
Also, if you are running a Mac, then you want to keep method="curl" when downloading a https file. I don't believe Windows has that problem hence the suggestions to remove it.
Try this:
file<-'http://catalog.data.gov/dataset/expenditures-on-children-by-families'
file<- read.csv(file)

Error when using getGEO() in package GEOquery

I'M running the following code in R:
library(GEOquery)
mypath <- "C:/Users/Farzin/Desktop/BIOC"
GDS1 <- getGEO('GDS1',destdir=mypath)
But I'm getting the following error:
Using locally cached version of GDS1 found here:
C:/Users/Farzin/Desktop/BIOC/GDS1.soft.gz
Error in read.table(con, sep = "\t", header = FALSE, nrows = nseries) :
invalid 'nlines' argument
Could anyone please tell me how I could get rid of this error?
I have had the same error using GEOquery (version 2.23.5) with R and Bioconductor from ubuntu (12.04), whatever GDS file I queried. Could it be that the GEOquery package is faulty ?
In my experience, getGEO is extremely finicky. I commonly experience issues connecting to the GEO server. If this happens during download, getGEO leaves a partial file. But since the partial file is there, when you try to re-download, it will use this cached, partially downloaded file, and run into the error you see (which you want, because its not the full file).
To solve this, delete the cached SOFT file and retry the download.

Resources