Read open excel file in R - r

is there a way to read an open excel file into R?
When an excel file is open in Excel, Excel puts a lock on the file, such as the reading method in R cannot access the file.
Can you circumvent this lock?
Thanks
Edit: this occurs under windows with original excel.

I too do not have problem opening xlsx files that are already open in excel, but if you do i have a workaround that might work:
path_to_xlsx <- "C:/Some/Path/to/test.xlsx"
temp <- tempdir()
file.copy(path_to_xlsx, to = paste0(temp, "/test.xlsx"))
df <- openxlsx::read.xlsx(paste0(temp, "/test.xlsx"))
This copies the file (Which should not be blocked) to a temporary directory, and then loads the file from there. Again, i'm not sure if this is needed, as i do not have the problem you have.

You could try something like this using the ps package. I've used it on Windows and Mac to read from files that I had downloaded from some web resource and opened in Excel with openxlsx2, but it should work with other packages or programs too.
# get the path to the open file via the ps package
library(ps)
p <- ps()
# get the pid for the current program, in my case Excel on Mac
ppid <- p$pid[grepl("Excel", p$name)]
# get the list of open files for that program
pfiles <- ps_open_files(ps_handle(ppid))
pfile <- pfiles[grepl(".xlsx", pfiles$path),]
# return the path to the file
sel <- grepl("^(.|[^~].*)\\.xlsx", basename(pfile$path))
path <- pfile$path[sel]

What do you mean by "the reading method in R", and by "cannot access the file" (i.e. what code are you using and what error message do you get exactly)? I'm successfully importing Excel files that are currently open, with something like:
dat <- readxl::read_excel("PATH/TO/FILE.xlsx")
If the file is being edited in Excel, R imports the last saved version.
EDIT: I've now tried it on both Linux and Windows and it still works, at least with version 1.3.1 of 'readxl'.

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Download a large zipped CSV file, unzip and read into R on Linux

I wish to read into my environment a large CSV (~ 8Gb) but I am having issues.
My data is a publicly available dataset:
# CREATE A TEMP FILE TO STORE THE DOWNLOADED DATA
temp <- tempfile()
# DOWNLOAD THE FILE FROM THE CMS
download.file("https://download.cms.gov/nppes/NPPES_Data_Dissemination_February_2022.zip",
destfile = temp)
This is where I'm running into difficulty, I am unfamiliar with linux working directories and where temp folders are created.
When I use list.dir() or list.files() I don't see any reference to this temp file.
I am working in an R project and my working director is as follows:
getwd()
[1] "/home/myName/myProjectName"
I'm able to read in the first part of the file but my system crashes after about 4Gb.
# UNZIP THE NPI FILE
npi <- unz(temp, "npidata_pfile_20050523-20220213.csv")
I then came across this post which has a function for decompressing large zip files using the system2 unzip functionality. However due to my limited R knowledge and Linux experience I couldn't get the function to point to the downloaded file in the temp folder
checking the path for temp above I get the following path:
temp
[1] "/tmp/Rtmpl6SHIJ/file7e5e6c1fc693"
Using the system2 function from the link above I tried the following:
x <- decompress_file(directory = temp,
file = "NPPES_Data_Dissemination_February_2022.zip")
But get the following error about setting the working directory:
Any pointers to how I can get this file unzipped given it's size and read it into memory would be much appreciated.
It might be a file permission issue. To get around it work in a directory you're already in, or know you have access to.
# DOWNLOAD THE FILE
# to a directory you can access, and name the file. No need to overcomplicate this.
download.file("https://download.cms.gov/nppes/NPPES_Data_Dissemination_February_2022.zip",
destfile = "/home/myName/myProjectname/npi.csv")
# use the decompress function if you need to, though unzip might work
x <- decompress_file(directory = "/home/myName/myProjectname/",
file = "npi.zip")
# remove .zip file if you need the space back
file.remove("/home/myName/myProjectname/npi.zip")
temp is the path to the file, not just the directory. By default, tempfile does not add a file extension. It can be done by using tempfile(fileext = ".zip")
Consequently, decompress_file can not set the working directory to a file. Try this:
x <- decompress_file(directory = dirname(temp), file = basename(temp))

Error when trying to read excel file from web site

I'm trying to download the xlsx file that is available at the following url. If you go to the website and click the link, it will download as a file on your computer. However, I want to automate this process. I have tried the following:
library(RCurl)
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx", "temp.xlsx")
library(readxl)
tmp <- read_xlsx("temp.xlsx")
# Error: Evaluation error: error reading from the connection.
This method does download a temp.xlsx file to my drive. However, if you try and manually click on it to open, excel fails to open it. It knows it's size, but is unable to open.
.
readxl::read_xlsx("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx")
# Error: `path` does not exist: ‘https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx’
Both of these methods are my go-to for downloading excel files from websites. Is there some specific reason why these methods don't work here?
When downloading certain file formats on Windows you need to specify that it should be a binary rather than the (usual) default of a text transfer - from the download.file() documentation:
The choice of binary transfer (mode = "wb" or "ab") is important on
Windows, since unlike Unix-alikes it does distinguish between text and
binary files and for text transfers changes \n line endings to \r\n
(aka ‘CRLF’).
On Windows, if mode is not supplied (missing()) and url ends in one of
.gz, .bz2, .xz, .tgz, .zip, .rda, .rds or .RData, mode = "wb" is set
such that a binary transfer is done to help unwary users.
Code written to download binary files must use mode = "wb" (or "ab"),
but the problems incurred by a text transfer will only be seen on
Windows.
In this case so that the file is written correctly use:
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx",
"temp.xlsx", mode = "wb")

save zipped file directly from the web in r

I am trying to save a zip file from the internet onto my computer. I can download the content straight into R with:
sfile <- "http://xweb.geos.ed.ac.uk/~smaccal1/ARCLake/v3_0/PL/ALID0001.zip"
temp <- tempfile()
download.file(sfile,temp)
From here, how can I then save that zipped file on my computer without having to open it in R by unzipping the folder and then using read.table
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)
and then save that data. Essentially I would like to save the files directly from the web (still zipped). How can this be done?
You can use download.file to save the file in a specified location:
sfile <- "http://xweb.geos.ed.ac.uk/~smaccal1/ARCLake/v3_0/PL/ALID0001.zip"
download.file(sfile, destfile = "/path/to/myfile.zip")

Getting "invalid entry size" when trying to import Excel xlsx file to R

Every time I enter this line
cameradata <- read.xlsx("./data/cameras.xlsx" , 1)
I get error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod",
cl, : java.util.zip.ZipException: invalid entry size (expected 500
but got 502 bytes)
I have tried to clear RAM but file size is 10kp
Try this :-
May be this can work. It worked for me
1) when downloading the xlsx file, use this
download.file(fileURL, destfile="./whatever", mode="wb")
2) Switched to regular R, not R Studio,
The xlsx file you are trying to read maybe damaged. Try redownload the file or read another "healthy" xlsx file.
I had experience exactly the same issue.
What I did to resolve the problem was:
I had defined separate variable "fileURL2" and assigned "XLSX" download link
I had defined separate variable "cameraData2" and loaded XLSX file to it
I had downloaded file directly with Firefox and open it with MS Excel to assure it is OK, then I had save it to the working directory of "R" overwriting existing "cameras.xlsx" file
After new attempt to read the file with "R" read.xlsx() - was successful
In conclusion it seems that "R" had corrupted XLSX file during the download - which might be caused by a BUG inside of current version of the language.

Resources