Error when trying to read excel file from web site - r

I'm trying to download the xlsx file that is available at the following url. If you go to the website and click the link, it will download as a file on your computer. However, I want to automate this process. I have tried the following:
library(RCurl)
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx", "temp.xlsx")
library(readxl)
tmp <- read_xlsx("temp.xlsx")
# Error: Evaluation error: error reading from the connection.
This method does download a temp.xlsx file to my drive. However, if you try and manually click on it to open, excel fails to open it. It knows it's size, but is unable to open.
.
readxl::read_xlsx("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx")
# Error: `path` does not exist: ‘https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx’
Both of these methods are my go-to for downloading excel files from websites. Is there some specific reason why these methods don't work here?

When downloading certain file formats on Windows you need to specify that it should be a binary rather than the (usual) default of a text transfer - from the download.file() documentation:
The choice of binary transfer (mode = "wb" or "ab") is important on
Windows, since unlike Unix-alikes it does distinguish between text and
binary files and for text transfers changes \n line endings to \r\n
(aka ‘CRLF’).
On Windows, if mode is not supplied (missing()) and url ends in one of
.gz, .bz2, .xz, .tgz, .zip, .rda, .rds or .RData, mode = "wb" is set
such that a binary transfer is done to help unwary users.
Code written to download binary files must use mode = "wb" (or "ab"),
but the problems incurred by a text transfer will only be seen on
Windows.
In this case so that the file is written correctly use:
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx",
"temp.xlsx", mode = "wb")

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Read open excel file in R

is there a way to read an open excel file into R?
When an excel file is open in Excel, Excel puts a lock on the file, such as the reading method in R cannot access the file.
Can you circumvent this lock?
Thanks
Edit: this occurs under windows with original excel.
I too do not have problem opening xlsx files that are already open in excel, but if you do i have a workaround that might work:
path_to_xlsx <- "C:/Some/Path/to/test.xlsx"
temp <- tempdir()
file.copy(path_to_xlsx, to = paste0(temp, "/test.xlsx"))
df <- openxlsx::read.xlsx(paste0(temp, "/test.xlsx"))
This copies the file (Which should not be blocked) to a temporary directory, and then loads the file from there. Again, i'm not sure if this is needed, as i do not have the problem you have.
You could try something like this using the ps package. I've used it on Windows and Mac to read from files that I had downloaded from some web resource and opened in Excel with openxlsx2, but it should work with other packages or programs too.
# get the path to the open file via the ps package
library(ps)
p <- ps()
# get the pid for the current program, in my case Excel on Mac
ppid <- p$pid[grepl("Excel", p$name)]
# get the list of open files for that program
pfiles <- ps_open_files(ps_handle(ppid))
pfile <- pfiles[grepl(".xlsx", pfiles$path),]
# return the path to the file
sel <- grepl("^(.|[^~].*)\\.xlsx", basename(pfile$path))
path <- pfile$path[sel]
What do you mean by "the reading method in R", and by "cannot access the file" (i.e. what code are you using and what error message do you get exactly)? I'm successfully importing Excel files that are currently open, with something like:
dat <- readxl::read_excel("PATH/TO/FILE.xlsx")
If the file is being edited in Excel, R imports the last saved version.
EDIT: I've now tried it on both Linux and Windows and it still works, at least with version 1.3.1 of 'readxl'.

R: error downloading shapefiles with download.file() and opening them with readRDS() [duplicate]

I was getting hung up on Shiny Apps Tutorial Lesson 5 because I was unable to open the counties.rds file. readRDS() threw: error reading from connection.
I figured out I could open the .rds fine if I downloaded it with download.file(URL, dest, mode = "wb") or simply used my browser to download the file to my local directory.
Outstanding Question: Why does the counties.rds file not open properly if I use download.file() without setting mode = "wb"? I expect the answer will be something obvious like: "Duh, counties.rds is a binary file." However, before I try to answer my own question, I'd like confirmation from someone with more experience.
Repro steps:
download.file("http://shiny.rstudio.com/tutorial/lesson5/census-app/data/counties.rds",
"counties.rds")
counties <- readRDS("counties.rds")
Error in readRDS("counties.rds") : error reading from connection
Resolution: Download via browser or use binary mode (wb).
download.file("http://shiny.rstudio.com/tutorial/lesson5/census-app/data/counties.rds",
"counties.rds", mode = "wb")
counties <- readRDS("counties.rds") # Success!
My suggestion is to always specify 'mode' regardless and that it pretty safe to always use mode="wb". I argue it the latter should be the default and that the automatic recognition by file extension is faulty and should not be relied upon, cf. https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html

Error in gzfile(file, "wb"): cannot open the connection or compressed file

I'm trying to run two things: first, I'm creating a PDF with 4x5, ending with dev.off(), and then trying to create a new graph. However, after starting the second plot, I get:
Error in gzfile(file, "wb") : cannot open the connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file '/var/folders/n9/pw_dz8d13j3gb2xgqb6rfnz00000gn/T/RtmpTfm1Ur/rs-graphics-822a1c83-b3fd-46c3-8028-4e0778f91d0c/4db4b438-ac35-403b-b791-e781baba152c.snapshot', probable reason 'No such file or directory'
Graphics error: Error in gzfile(file, "wb") : cannot open the connection
What is this error? The working directory is one I have read/write access to, and my hard drive isn't full.
Also, I'm using RStudio.
This is a bit late but for anyone coming here for help, I got this error when I was trying to write a file from RStudio and my destination file path was very long. I realized this could be a problem because when I wrote the file to another location with a shorter name and tried to copy it into my original destination, Windows gave me an error saying "File path too long". You might need to save the original file into another location with a shorter absolute path.
Maybe you should look here. At the end it says
Note:
The most common reason for failure is lack of write permission in the current directory. For save.image and for saving at the end of a session this will shown by messages like
Error in gzfile(file, "wb") : unable to open connection
In addition: Warning message:
In gzfile(file, "wb") :
cannot open compressed file '.RDataTmp',
probable reason 'Permission denied'
So rapidly, if you try getwd(), look at where is your working directory set. If you're trying to save your document in a place where it's not in your current working directory, it will throw you this error.
At the end of your error message, it says probable reason 'No such file or directory'
Graphics error: Error in gzfile(file, "wb") : cannot open the connection
My diagnosis would be simply that it's trying to save your item in the wrong place and RStudio is not able to find the right place.
This burned me so hopefully saves someone else some toil. The issue was that the classifiers loaded just fine on OS X but on the Linux deployment system they would fail with the error listed in the question. The issue was the the files on the disk had extension abc.RData but the code modelAbc <- readRDS(file="abc.Rdata"). The difference in the upper and lowercase D in the .RData vs .Rdata extension would fail on Linux. It was not very noticeable but check your extensions for case.
You may have no permission to save file in the directory.
On RStudio, get your working directory by getwd().
Then, go to the directory in linux and observe its owner by ls -l.
Now you can change the owner of the directory by chown -R username directoryname.
But you must be root.
Problem resolved by specifying full file path:
saveRDS(df,'C:\\users\\matt\\desktop\\code\\df.Rdata')
I faced this issue lately. Try turning off your anti-virus and build the package, it might help. It worked for me. Usually anti-virus blocks the permissions and you could avoid it by disabling for sometime just before building a package.
I was trying to save an RDS file to my local Dropbox folder so it syncs with my Dropbox.
I figured out I got the same error because I was trying to create a new folder and looks like saveRDS cannot create a new folder, but it can add files to existing folders. So I changed the path to add the file into an existing folder and it worked!
In my case it was Windows Defender which was preventing Rstudio to write any file on hard drive. Either you need to turn Controlled Folder Access off or add Rstudio in the exclusion list.
I also had this problem when working with RStudio and R Markdown. I was getting this error message and had an annoying number of fatal errors which closed RStudio. My issue was that I was working off a network drive and either the name was too long, as in #AHedge above or my network firewalls were giving me trouble. For the moment, I have moved my working files to my desktop and things seem to be working fine. Not sure what this means for my file management over time.
Just want to add more clarity(scenarios in my experience) to what M Beausoleil mentioned.
When you are using a shared-working-directory and trying to rewrite the RDS files which are already existing in a working-directory written by some other user, you get this error.
As some people have already quoted that deleting the existing RDS files or changing the working directory works. It's not a magic. It just works because you are writing a new RDS file and not trying to re-write the old ones.
I came into the same problem after I re-install a new version of RStudio.
The Rmarkdown file I created using old version of RStudio shows the same problem.
When I use ggplot() to draw a picture the error code are as follow:
Warning in gzfile(file, "wb") :
cannot open compressed file 'I:/Rlearning/.Rproj.user/shared/notebooks/58A1385C-PCA作图/1/2C15461A183AC56C/cco192gb0pow1_t\_rs_rdf_32004888ecb.rdf', probable reason 'No such file or directory'
Error in gzfile(file, "wb") : cannot open the connection
Solution:
Create a new Rmarkdown file
Delete all codes
Copy your old Rmarkdown code into it.
I had the same problem.For me, it was caused due to not having enough disk space on the drive where R studio was installed.Freeing up space works.
The reason for the error is that your username is Chinese.Please create new user folder with English in the user directory.For example, you could name the folder for "DavidSmith".Then, you need create three folders("AppData","Local","Temp").File directory C:\Users\DavidSmith\AppData\Local\Temp.
In the Advanced system settings which will modify the environment variables TMP and TEMP C:\Users\DavidSmith\AppData\Local\Temp.Save them.
After modification, open RStudio and try again.
Notice:TMP and TEMP are modified in the USER VARIABLE.
I just ran into this problem after changing my system locale.
Check your locale using Sys.getlocale().
Change it to appropriate one using Sys.setLocale("LC_ALL","ENG") (replace "ENG" with appropriate one)
I can't say with certainty which locale would be appropriate, but it seems to be coherent with default OS one.
Hope this helps!
I had this error because of an invalid character in the filename to be used to save the file, in my case "/" (there are many such characters that cannot be used in a filename). I removed the character and it was solved.
In my case, I received the error "Error in gzfile(file, "wb") : cannot open the connection" when trying to exit R in the Anaconda Prompt and saving workspace image. I am using Windows 10 and R-3.5.2. To fix it, I had to go to the Program Files folder, right click and the R folder, then selected Properties. Selected the Security tab, then, in the Group or user names box, selected Users, then clicked Edit. In the Permissions for Users, I checked Full control and Modify and saved the changes. Then I was able to save the workspace image.
I have another instance of this error which seems to be new (or at least not listed here or here: apparently it's not OK to save a file with the name aux.RData. I guess it's a reserved filename.
x <- rnorm(9000)
save(x, file = "aux.RData")
Error in gzfile(file, "wb") : no se puede abrir la conexión
Also: Warning message:
In gzfile(file, "wb") :
cannot open compressed file 'aux.RData', probable reason 'No such file or directory'
But when I change the filename saves with no problem:
save(x, file = "aux_file.RData")
Haven't seen this case in the other answers:
if this seems to happen all the time, and to be very persistent when it does happen, check the default directory in your file handling software connection.
In my case FileZilla was logging on to my DigitalOcean droplet as "root" and whenever I used FileZilla to create a directory it was setting write permissions to "root", whereas my RStudio on the same droplet read/wrote as "My_Name". Anytime I set something up in FZ (e.g. large imported files, renamed or copied) the permissions would switch and I'd get this error.
If this is what is causing frequent error messages it can be solved instantly with chown -R My_Name directoryname but in the longer run, if you are going to be using your file handler to define and create a lot of directories, it will pay to create a connection whose default name is the same name you use for RStudio.
In my case, when it happened first, months ago, the solution here worked.
But recently, it came back, constantly... What solved this time was to change the anti-virus. I have not just the Windows defender, but also a 2nd anti-virus, the same in both times. I ended up deinstalling it and installing another antivirus... After this, the problem did not happen again...
After several days trying to solve this same ERROR or problem in my case (Windows 10 and R), I tried to save my file(file.RData) in D disk instead of C disk (where I always was working and I have installed R) and it was fine, without problems,my file was saved in D:/Users.When I tried many times to save it in C disk, always gave me Permission denied.
save(Myfile, file="D:/Users/Myfile.RData")
I encountered this same issue when trying to save an Rds file from an Markdown file. Changing my relative file path to an absolute file path worked for me.
In my case, this error was because the file that I wanted to re-write, was read-only (for whatever reason, I didn't do it myself). I just right-click on the file's name in the folder and unchecked the read-only property. After that it worked.

download.file() fails when appending a random suffix to the filename

I'm trying to download a file in R on a remote server which sits behind a number of proxies. Something - I can't figure out what - is causing the file to be returned cached whenever I try and access it on that server, whether I do so through R or just through a Web Browser.
I've tried using cacheOK=FALSE in my download.file call and this has had no effect.
Per Is there a way to force browsers to refresh/download images? I have tried adding a random suffix to the end of the URL:
download.file(url = paste("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?",
format(Sys.time(), "%d%m%Y"),sep=""),
destfile = "F-F_Research_Data_Factors_daily.zip", cacheOK=FALSE)
This produces, e.g., the following URL:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?17092012
Which when accessed from a Web Browser on the remote server, indeed returns the latest version of the file. However, when accessed using download.file in R, this returns a corrupted zip archive. Both WinRAR and R's unzip function complain that the zip file is corrupt.
unzip("F-F_Research_Data_Factors_daily.zip")
1: In unzip("F-F_Research_Data_Factors_daily.zip") :
internal error in unz code
I can't see why downloading this file via R would cause a corrupted file to be returned, whereas downloading it via a Web Browser gives no problem.
Can anyone suggest either a way to beat the cache from R (about which I'm not hopeful), or a reason why download.file doesn't like my URL with ?someRandomString tacked onto the end of it?
It will work if you use mode="wb"
download.file(url = paste("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily.zip?",format(Sys.time(),"%d%m%Y"),sep=""),
destfile = "F-F_Research_Data_Factors_daily.zip", mode='wb', cacheOK=FALSE)

Resources