R error when using untar - r

I'm running a script with input parameters that are referenced in the code to automate the directory creation, download of file and untar of file. I would be fine with unzip, however this particular file I want to analyze is .tar.gz. I manually unpacked and it was tar.gz, unpacked to .tar file. Would that be the problem?
Full error: Error in untar2(tarfile, files, list, exdir) : unsupported entry type ‘’
Running Windows 10, 64 bit, R set to: [Default] [64-bit] C:\Program Files\R\R-3.2.2
Script notes one solution found (issues, lines 28-31), but I don't really understand it.
I did install 7-zip on my computer, restart and of course restart R:
`#DOWNLOADING AND UNZIPPING TAR FILE
#load required packages.
#If there is a load package error, use install.packages("[package]")
library(dplyr)
library(lubridate)
library(XML) # HTML processing
options(stringsAsFactors = FALSE)
#Set directory locations, data file and fetch data file from internet
#enter full url including file name between ' ' marks
mainDir<-"C:/R/BEES/"
subDir<-"C:/R/BEES/Killers"
Fetch<-'http://dds.cr.usgs.gov/pub/data/nationalatlas/afrbeep020_nt00218.tar.gz'
ArchFile<-basename(Fetch)
download.file<-(ArchFile)
#Check for file directories and create if directory if it doesn't exist
if(!file.exists(mainDir)){dir.create(mainDir)}
if(!file.exists(subDir)){dir.create(subDir)}
#set the working directory
setwd(file.path(subDir))
#check if file exists and download if it doesn't exist.
if(!file.exists(ArchFile))
{download.file (url=Fetch,destfile=ArchFile,method='auto')}
#unpack and view file list
untar(path.expand(ArchFile),list=TRUE,exdir=subDir,compressed="gzip")
list.files(subDir)
#Error: Error in untar2(tarfile, files, list, exdir) :
# unsupported entry type ‘’
#Need solution to use tar/untar app
#instructions here: https://stevemosher.wordpress.com/step-10-build/`
Appreciate feedback - I've been lurking around StackOverflow for some time to use other people's solutions.

Related

Downloading and unzipping GitHub zipped files directly in R

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?
You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Set wd in RStudio

I am creating a series of r scripts that will be used by multiple people, meaning that the working directory of files used and stored will differ. There are two folders, one for the R code, called "rcode," and another to store the generated outputs, called "data". These two folders will always be shared in tandem. To accommodate for the changing working directory I created a "global" script that has the following lines of code and resides in the "rcode" folder:
source_path = rstudioapi::getActiveDocumentContext()$path
setwd(dirname(source_path))
swd_data <- paste0("..\\data\\")
The first line gets the source path of the global script. The second line makes this the working directory. The third line essentially tells the script to store an output in the "data" folder, which has the same path as the "rcode" folder. So to read in a csv file within the "data" folder I write:
old_total_demand <- read.csv(paste0(swd_data, "boerne_total_demand.csv"))
When I use this script on my Windows laptop it works beautifully, but when I use it on my Mac I get the following error:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file '..\data\demand\boerne_total_demand.csv': No such file or directory
Would anyone have any idea why this would be? Thanks in advance for the help.
I'm not sure what systems your collaborators will be using, but you may run into issues due to differences between Window/Mac/Linux with regards to how paths are written. I suggest you create a R Project .Rprj using RStudio and save that in your directory that contains subdirectories for data and rcode, and share the entire project directory.
/Project_dir/MyProject.Rprj
/Project_dir/data/
/Project_dir/rcode/
Then from the R project opened through RStudio you should be able to directly refer to your data by:
data <- read.csv("data/boerne_total_demand.csv")
The working directory will always be where your .Rproj is stored, so you can avoid having to setwd as it causes lots of chaos when sharing and collaborating with others.
I have this code from my current script at hand.
I hope you like it !
path <- dirname(getActiveDocumentContext()$path)
setwd(path)
swd_path <- paste0(path,"/data/")
if(!dir.exists(swd_path)){
dir.create(swd_path)
}
old_total_demand <- read.csv(paste0(swd_data, "boerne_total_demand.csv"))

How can I source an R file from the parent directory via the shell?

I'm able to source an R script from the IDE (Rstudio), but not from a command line call. Is there a way to do this without having to supply the full path?
The file I want to source is in the parent directory.
This works:
source('../myfile.R') #in a call from Rstudio
However, this doesn't:
> Rscript filethatsources_myfile.R
Error in file(filename, "r", encoding = encoding) :
cannot open the connection
Calls: source -> file
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
cannot open file '../myfile.R': No such file or directory
Execution halted
This seems like it should be simple, but...
Edit: I'm using GNU bash, version 3.2.53(1)-release (x86_64-apple-darwin13)
In R, relative file locations are always relative to the current working directory. You can explicitly set your working directory like so:
setwd("~/some/location")
Once this is set, you can get your source file relative to the current working directory.
source("some_script.R") # In this directory
source("../another_script.R") # In the parent directory
source("folder/stuff.R") # In a child directory
Not sure what your current working directory is? You can check by submitting getwd().
What if your source file is in, for example, the parent directory but references files relative to its location? Use the chdir= option in source:
source("../another_script.R", chdir = TRUE)
This temporarily changes the working directory to the directory containing the source file for the duration of the source evaluation. Once that's done, your working directory is set back to what it was prior to running source.

Installing packages onto R

For some reason I am suddenly not able to install packages in R (I have subsequently updated to the latest version of R and am running Windows 7). For example, if I type:
install.packages('beeswarm')
Installing package into ‘D:/Rlibs’ (as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session --- trying URL 'http://www.stats.bris.ac.uk/R/bin/windows/contrib/3.0/beeswarm_0.1.5.zip'
Content type 'text/html' length unknown opened URL downloaded 1859
bytes
Error in read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package",
"Type")) : cannot open the connection In addition: Warning
messages: 1: In unzip(zipname, exdir = dest) : error 1 in extracting
from zip file 2: In read.dcf(file.path(pkgname, "DESCRIPTION"),
c("Package", "Type")) : cannot open compressed file
'beeswarm/DESCRIPTION', probable reason 'No such file or directory'
I have read that in Windows 7 there can be important restrictions on rights to writing to certain folders etc. so I've gone to some lengths to install R and library folders in non-default areas of my computer, and to allow myself rights to certain folders, but to no avail. Possibly also of importance is when I type:
.libPaths()
# [1] "D:/Rlibs"
# [2] "C:/Users/L.Halsey/Documents/R/win-library/3.0"
# [3] "C:/Users/L.Halsey/Documents/Documents/R-3.0.1/library"
I have created several folders in an attempt to create one that I could successfully install libraries into and set them up to be recognised by R using 'environment variables' from the start button. I don't know how to delete any of them though - not sure if this is relevant to my overall problem of not now being able to install/update packages for some reason.
The error being reported is inability to open a connection. In Windows that is often a firewall problem and is in the Windows R FAQ. The usual first attempt should be to run internet2.dll. From a console session you can use:
setInternet2(TRUE)
(You are correct in thinking this is not due to your library setup. The error says nothing about permissions.) I don't think just typing .libPaths should return that character vector since on my machine I would need to type .libPaths() to see something like that. If you wanted to reduce the number of places for libraries you can use the .libPaths function for setting the values. This would pick the second and third of the existing paths
.libPaths( .libPaths()[2:3] )
The inner call retrieves the path vector and the outer call sets it to a reduced vector.
Running RStudio as administrator fixed it for me!
I will probably duplicate a lot of other answers on the stackoverflow, but I got exactly the same error as OP, namely:
Warning messages: 1: In unzip(zipname, exdir = dest) : error 1 in extracting from zip file 2: In read.dcf(file.path(pkgname, "DESCRIPTION"), c("Package", "Type")) : cannot open compressed file 'zoo/DESCRIPTION', probable reason 'No such file or directory'
Turned out, while I as a user had permissions to write in a certain directory, R did not. In order to be sure you don't have something similar, do following:
get a usb drive, let's name it E
download package source as a .zip file and store it onto usb-drive in some directory, let's name it E:/source
Create directory for libraries on the usb drive, let's name it E:/libs
Install packages calling R command install.package from the R console and setting all relevant directories to point to your usb drive:
(here I use package zoo as an example)
install.packages("E:/source/zoo_1.7-12.zip",
destdir = 'E:/source', # no "/" after the path
lib = 'E:/libs',
repos = NULL)
Load the package from the directory, where you installed it:
library('zoo', lib.loc = 'E:/libs')
After you are sure, it works this way on your usb drive, you can start resolving directories permissions, and try out by changing the paths in the code above.
update:
In some windows environments even your usb-stick might be protected from read-write by the R. Make sure you check the permissions using the machine you are working from.
The following worked for me (based on the answer above)
install.packages("clustvarsel", lib = "C:/Users/dnentchev/My Programs/R-3.2.2/library")
I had the same problem. I turned the windows firewall off, and Run RStudio as administrator. so, that error fixed.

R exdir does not exist error

I'm trying to download and extract a zip file using R. Whenever I do so I get the error message
Error in unzip(temp, list = TRUE) : 'exdir' does not exist
I'm using code based on the Stack Overflow question Using R to download zipped data file, extract, and import data
To give a simplified example:
# Create a temporary file
temp <- tempfile()
# Download ZIP archive into temporary file
download.file("http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip",temp)
# ZIP is downloaded successfully:
# trying URL 'http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip'
# Content type 'application/zip' length 4533970 bytes (4.3 Mb)
# opened URL
# downloaded 4.3 Mb
# Try to do something with the downloaded file
unzip(temp,list=TRUE)
# Error in unzip(temp, list = TRUE) : 'exdir' does not exist
What I've tried so far:
Accessing the temp file manually and unzipping it with 7zip: Can do this no problem, file is there and accessible.
Changing the temp directory to c:\temp. Again, the file is downloaded successfully, I can access it and unzip it with 7zip but R throws the exdir error message when it tries to access it.
R version 2.15.2
R-Studio version 0.97.306
Edit: The code works if I use unz instead of unzip but I haven't been able to figure out why one works and the other doesn't. From CRAN guidance:
unz reads (only) single files within zip files...
unzip extracts files from or list a zip archive
On a windows setup:
I had this error when I had exdir specified as a path. For me the solution was removing the trailing / or \\ in the path name.
Here's an example and it did create the new folder if it didn't already exist
locFile <- pathOfMyZipFile
outPath <- "Y:/Folders/MyFolder"
# OR
outPath <- "Y:\\Folders\\MyFolder"
unzip(locFile, exdir=outPath)
This can manifest another way, and the documentation doesn't make clear the cause. Your exdir cannot end in a "/", it must be just the name of the target folder.
For example, this was failing with 'exdir' does not exist:
unzip(temp, overwrite = F, exdir = "data_raw/system-data/")
And this worked fine:
unzip(temp, overwrite = F, exdir = "data_raw/system-data")
Presumably when unzip sees the "/" at the end of the exdir path it keeps looking; whereas omitting the "/" tells unzip "you've found it, unzip here".
A couple of years late but I still get this error when trying to use unzip(). It appears to be a bug because the man pages for unzip state if exdir is specified it will be created:
exdir The directory to extract files to (the equivalent of unzip -d).
It will be created if necessary.
A workaround I've been using is to manually create the necessary directory:
dir.create("directory")
unzip("file-to-unzip.zip", exdir = "directory/")
A pain, but it seems to work, at least for me.
I am using R3.2.1 on a Windows 7 machine.
The way I found to address this issue takes a few steps, but it works for me:
Create a vector that contains the name of the url from where you are downloading the file, e.g.
file_url <- "http://your.file.com/file_name.zip"
Use download.file to specify the url where you are downloading the file from (using your newly created vector), followed by the file name of the zipped file (that should be the last part of the url name). It will be saved as such in your working directory*, e.g.
download.file(file_url, "file_name.zip")
*If you are not sure of your working directory, you can use getwd() to check it. If you want to change your working directory, you can use setwd("C:users/username/...") to set it to what you want.
Use "unzip" to unzip the file into your working directory, with the name you will set using exdir, e.g.
unzip("file_name.zip", exdir = "file_name")
To check your work, you can use list.files, e.g.
list.files("file_name")
Hope this helps!

Resources