Extracting file from LZMA archive with R - r

I am trying to extract a file from a LZMA archive downloaded from an API containing JSON files, using R. On my computer I can extract the file manually in Windows Explorer with no problems.
Here's my code currently (API details removed):
tempFile <- tempfile()
destDir <- "extracted-files"
if (!dir.exists(destDir)) dir.create(destDir)
download.file("api_url.tar.xz", destfile = tempFile)
untar(tempFile, exdir = destDir)
When I attempt to extract the file, I receive the following error messages:
/usr/bin/tar: This does not look like a tar archive
/usr/bin/tar: Skipping to next header
/usr/bin/tar: Exiting with failure status due to previous errors
Warning messages:
1: running command 'tar.exe -xf "C:\Users\XXX\AppData\Local\Temp\RtmpMncPWp\file2eec75e23a15" -C "extracted-files"' had status 2
2: In untar(tempFile, exdir = destDir) :
‘tar.exe -xf "C:\Users\XXX\AppData\Local\Temp\RtmpMncPWp\file2eec75e23a15" -C "extracted-files"’ returned error code 2
I am using Windows 10 with R version 3.3.1 (2016-06-21).

Using library(archive) one can also read in a particular csv file within an archive without having to UNZIP it first :
library(archive)
library(readr)
read_csv(archive_read("api_url.tar.xz", file = 1), col_types = cols()) # adjust file=XX as appropriate
This is quite a bit faster.
To unzip everything one can use
archive_extract("api_url.tar.xz", dir=XXX)
That worked very well for me & is faster than the unbuilt untar(). It also works on all platforms. It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.

SOLVED:
While it seemed to work perfectly on Mac, for it to work on Windows you need to open the compressed .xz file connection for reading in binary mode, before passing it to untar():
download.file(url, tmp)
zz <- xzfile(tmp, open = "rb")
untar(zz, exdir = destDir)

An alternative, and even simpler solution is to specify the 'mode' parameter for download.file() as follows:
download.fileurl, destfile = tmp, mode = "wb")

Related

How to remove a 'permission denied' file from folder within R

I am downloading a large xlsx file as a part of a function. This file is removed with file.remove() in linux and mac but I have permission denied in windows machines. Below is the code for my function.
download.file(
'http://mirtarbase.mbc.nctu.edu.tw/cache/download/7.0/miRTarBase_MTI.xlsx',
'miRTarBase.xlsx', mode = "wb")
readxl::read_excel('miRTarBase.xlsx') -> miRTarBase
write.csv(miRTarBase, 'miRTarBase.csv')
read.csv('miRTarBase.csv', row.names = 1) -> miRTarBase
file.remove("miRTarBase.xlsx")
I get the following error message in my console
Warning message:
In file.remove("miRTarBase.xlsx") :
cannot remove file 'miRTarBase.xlsx', reason 'Permission denied'.
Again this warning only appears in windows.
Furthermore, after checking the properties of the file itself the 'Read-only' attribute is unchecked.
Following this, the following code works perfectly fine so I do not think the issue is with the folder either.
file.remove("miRTarBase.csv")
I believe the issue lies in how .xlsx files are treated in windows.
When I try to delete the .xlsx file while Rstudio is still running I get a File in use warning message. After closing the R session the .xlsx file can be deleted with no hassle.
This has confused me because I am not used to working with windows. Has anyone had this issue before? Would appreciate any help that can be given. Many thanks.
Have you tried saving as a temporary file in windows?
tmp <- tempfile()
download.file(
'http://mirtarbase.mbc.nctu.edu.tw/cache/download/7.0/miRTarBase_MTI.xlsx', tmp, mode = "wb")
readxl::read_excel(tmp) -> miRTarBase
write.csv(miRTarBase, 'miRTarBase.csv')
read.csv('miRTarBase.csv', row.names = 1) -> miRTarBase
file.remove(tmp)

Can't move file after download and unzip

I'm trying to download a zip file from a source, unzip it and after move to another directory.
First the download:
if (!file.exists("inst/extdata/sp_resultados_universo")) {
tmp <- tempfile(fileext = ".zip")
download.file("ftp://ftp.ibge.gov.br/Censos/Censo_Demografico_2010/Resultados_do_Universo/Agregados_por_Setores_Censitarios/SP_Capital_20180416.zip", tmp, quiet = TRUE)
unzip(tmp, exdir = "inst/extdata/sp_resultados_universo", junkpaths=T)
unlink(tmp)
}
The file i want is on this directory inst/extdata/sp_resultados_universo/SP Capital/Base informa�oes setores2010 universo SP_Capital (codificação inválida)/CSV/, so when i try copy to inst/extdata/sp_resultados_universo/ i get an error
file.rename("inst/extdata/sp_resultados_universo/SP%20Capital/Base%20informa%87oes%20setores2010%20universo%20SP_Capital(condificação inválida)/CSV/Domicilio02_SP1.csv",
"inst/extdata/sp_resultados_universo/Domicilio02_SP1.csv")
Warning message:
In file.rename("inst/extdata/sp_resultados_universo/SP%20Capital/Base%20informa%87oes%20setores2010%20universo%20SP_Capital(condificação inválida)/CSV/Domicilio02_SP1.csv", :
it was not possible to rename file 'inst/extdata/sp_resultados_universo/SP%20Capital/Base%20informa%87oes%20setores2010%20universo%20SP_Capital(condificação inválida)/CSV/Domicilio02_SP1.csv'
for 'inst/extdata/sp_resultados_universo/Domicilio02_SP1.csv',
reason 'File or directory not found'
I'm translating the error message, so it could be inconsistent with english message.
I can change the directory name or move the file manually, but breaks the flow and it's not nice for reproducibility. How can i handle it inside R?
My system info:
Sys.info()
sysname
"Linux"
release
"4.9.0-6-amd64"
version
"#1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07)"
machine
"x86_64"
Many thanks in advance for any help.
when using R you can interact with the linux shell (or the windows cmd line) through a call to system() where you put the quoted command just as you would use in the shell,
for instance:
system("pwd") # prints current working directory
system("date") # prints
system("ls | grep .R") # prints a list of r scripts in the current working directory
system("mv file.txt /home/new_directory/file.txt") # moves your file to another directory

data.table fread error - gzip file - set temporary directory

I'm attempting to read a .gz-file using data.tables fread-function. I have tried the syntax suggested here:
dt = fread("gunzip -c myfile.gz")
but I get a verbose error message:
Error in fread("gunzip -c myfile.gz") :
File is empty: C:\Users\MARK~1.MUR\AppData\Local\Temp\RtmpIBawPA\file498c1c4114ef
In addition: Warning messages:
1: running command 'C:\Windows\system32\cmd.exe /c (gunzip -c myfile.gz) > C:\Users\MARK~1.MUR\AppData\Local\Temp\RtmpIBawPA\file498c1c4114ef' had status 1
2: In shell(paste("(", input, ") > ", tt, sep = "")) :
'(gunzip -c 180227.2101.2017.MRE.csv.gz) > C:\Users\MARK~1.MUR\AppData\Local\Temp\RtmpIBawPA\file498c1c4114ef' execution failed with error code 1
My guess here is that access to a temporary file is being denied by my IT masters (?). If this is the case how do I set the temporary file path to say the current directory for the unzip?
As you are on a Windows PC you probably don't have access to command line tools, which might be the reason for this.
A possible solution might be to unzip first and then read with fread. The following example works on my Windows VM:
write.csv(mtcars, 'mtcars.csv')
zip('mtcars.csv.zip', 'mtcars.csv')
unzip('mtcars.csv.zip')
fread('mtcars.csv')
For .gz files, you can use the gunzip function from R.utils. The following example works for me:
write.csv(mtcars, gzfile('mtcars2.csv.gz'))
library(R.utils)
gunzip('mtcars2.csv.gz')
fread('mtcars2.csv')
Consequently, you might need something like this:
library(R.utils)
gunzip('myfile.gz')
fread('myfile.csv')
Try read_csv() from the readr package, which handles .gz automatically:
dt = as.data.table(read_csv("myfile.gz"))
(or another read_* function if it's not a csv)

Error downloading GPL file with getGEO

Using OSX 10.11 and R 3.3.0 I get this error using GEOQuery package:
library(GEOquery)
GSE56045 <- getGEO("GSE56045")
It downloads the GSE file but not the GPL:
Error in download.file(myurl, destfile, mode = mode, quiet = TRUE, method = getOption("download.file.method.GEOquery")) :
cannot open URL 'http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?targ=self&acc=GPL10558&form=text&view=full'
It looks like the GPL file was redirected and the download method auto set in GEOquery fails to follow the redirect: setting options('download.file.method.GEOquery'='auto')
I was able to get it working by running this in R: options('download.file.method.GEOquery' = 'libcurl')
Also, I had to delete the old downloaded GPL file - which was just the redirect message. It's easier to just set a download directory instead of finding the temp file, using destdir = for the getGEO command.

R exdir does not exist error

I'm trying to download and extract a zip file using R. Whenever I do so I get the error message
Error in unzip(temp, list = TRUE) : 'exdir' does not exist
I'm using code based on the Stack Overflow question Using R to download zipped data file, extract, and import data
To give a simplified example:
# Create a temporary file
temp <- tempfile()
# Download ZIP archive into temporary file
download.file("http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip",temp)
# ZIP is downloaded successfully:
# trying URL 'http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip'
# Content type 'application/zip' length 4533970 bytes (4.3 Mb)
# opened URL
# downloaded 4.3 Mb
# Try to do something with the downloaded file
unzip(temp,list=TRUE)
# Error in unzip(temp, list = TRUE) : 'exdir' does not exist
What I've tried so far:
Accessing the temp file manually and unzipping it with 7zip: Can do this no problem, file is there and accessible.
Changing the temp directory to c:\temp. Again, the file is downloaded successfully, I can access it and unzip it with 7zip but R throws the exdir error message when it tries to access it.
R version 2.15.2
R-Studio version 0.97.306
Edit: The code works if I use unz instead of unzip but I haven't been able to figure out why one works and the other doesn't. From CRAN guidance:
unz reads (only) single files within zip files...
unzip extracts files from or list a zip archive
On a windows setup:
I had this error when I had exdir specified as a path. For me the solution was removing the trailing / or \\ in the path name.
Here's an example and it did create the new folder if it didn't already exist
locFile <- pathOfMyZipFile
outPath <- "Y:/Folders/MyFolder"
# OR
outPath <- "Y:\\Folders\\MyFolder"
unzip(locFile, exdir=outPath)
This can manifest another way, and the documentation doesn't make clear the cause. Your exdir cannot end in a "/", it must be just the name of the target folder.
For example, this was failing with 'exdir' does not exist:
unzip(temp, overwrite = F, exdir = "data_raw/system-data/")
And this worked fine:
unzip(temp, overwrite = F, exdir = "data_raw/system-data")
Presumably when unzip sees the "/" at the end of the exdir path it keeps looking; whereas omitting the "/" tells unzip "you've found it, unzip here".
A couple of years late but I still get this error when trying to use unzip(). It appears to be a bug because the man pages for unzip state if exdir is specified it will be created:
exdir The directory to extract files to (the equivalent of unzip -d).
It will be created if necessary.
A workaround I've been using is to manually create the necessary directory:
dir.create("directory")
unzip("file-to-unzip.zip", exdir = "directory/")
A pain, but it seems to work, at least for me.
I am using R3.2.1 on a Windows 7 machine.
The way I found to address this issue takes a few steps, but it works for me:
Create a vector that contains the name of the url from where you are downloading the file, e.g.
file_url <- "http://your.file.com/file_name.zip"
Use download.file to specify the url where you are downloading the file from (using your newly created vector), followed by the file name of the zipped file (that should be the last part of the url name). It will be saved as such in your working directory*, e.g.
download.file(file_url, "file_name.zip")
*If you are not sure of your working directory, you can use getwd() to check it. If you want to change your working directory, you can use setwd("C:users/username/...") to set it to what you want.
Use "unzip" to unzip the file into your working directory, with the name you will set using exdir, e.g.
unzip("file_name.zip", exdir = "file_name")
To check your work, you can use list.files, e.g.
list.files("file_name")
Hope this helps!

Resources