Downloading and unzipping GitHub zipped files directly in R - r

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.
utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")
# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip", :
# error 1 in extracting from zip file
It says it is a warning message, although nothing has been downloaded or unzipped into my wd.
I can download the file to my machine:
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")
But I get the same message with the unzip function:
utils::unzip("Shape.zip")
And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.
So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):
utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")
I get a different warning with, similarly, nothing being executed:
Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code
I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.
Any idea of what I am doing wrong?

You need to use:
download.file(
"https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
"Shape.zip",
mode = "wb"
)
Without the query string ?raw=TRUE you are downloading the webpage and not the file.
(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

Related

Error when trying to read excel file from web site

I'm trying to download the xlsx file that is available at the following url. If you go to the website and click the link, it will download as a file on your computer. However, I want to automate this process. I have tried the following:
library(RCurl)
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx", "temp.xlsx")
library(readxl)
tmp <- read_xlsx("temp.xlsx")
# Error: Evaluation error: error reading from the connection.
This method does download a temp.xlsx file to my drive. However, if you try and manually click on it to open, excel fails to open it. It knows it's size, but is unable to open.
.
readxl::read_xlsx("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx")
# Error: `path` does not exist: ‘https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx’
Both of these methods are my go-to for downloading excel files from websites. Is there some specific reason why these methods don't work here?
When downloading certain file formats on Windows you need to specify that it should be a binary rather than the (usual) default of a text transfer - from the download.file() documentation:
The choice of binary transfer (mode = "wb" or "ab") is important on
Windows, since unlike Unix-alikes it does distinguish between text and
binary files and for text transfers changes \n line endings to \r\n
(aka ‘CRLF’).
On Windows, if mode is not supplied (missing()) and url ends in one of
.gz, .bz2, .xz, .tgz, .zip, .rda, .rds or .RData, mode = "wb" is set
such that a binary transfer is done to help unwary users.
Code written to download binary files must use mode = "wb" (or "ab"),
but the problems incurred by a text transfer will only be seen on
Windows.
In this case so that the file is written correctly use:
download.file("https://dshs.texas.gov/coronavirus/TexasCOVID19DailyCountyCaseCountData.xlsx",
"temp.xlsx", mode = "wb")

Source function in R, error cannot find file - do I have to change working directory?

This may be a duplicate, but I couldn't find a solution when I searched online and it's an issue that has been bugging me for some time. I am given a zip file with 2 .R files, I download the zip and move the .R files into a directory on my computer, let's say "/Users/Home/StatisticsStuff/LearningR/".
My two files are Stats1.R and Stats2.R, and the first line of code in Stats1.R is:
source(Stats2.R)
and I get the following error message:
> source("Stats2.R")
Error in file(filename, "r", encoding = encoding) : cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) : cannot open file 'Stats2.R': No such file or directory
when i run getwd(), to see which working directory I'm in, I get:
getwd()
"/Users/Home"
It seems like it would be a pain to have to change working directories in order to source files? Is there something I'm doing wrong here with regards to what I'm expecting from the source() function? Do I have to put a line in my code above everything else using setwd("whatever the correct wd is").
Any thoughts appreciated!
I added this before sourcing, it worked for me :
setwd(dirname(getwd()))
first write this command list.files() then you can know which location of your R is pointing or see the below image that R-Pointing to then write the source command with correct path. it will be executed.
follow these steps to see the R pointing path.
Go to properties of your R software which is installed as below
Find the Start in path: which is the path R pointing to so in source command use that path

Cannot load my CSV file into my R? keep getting error messages

So basically I succesfully exported my SQL view data into a csv file. but no when I load into Rgui software, I get the following errror:
> load("C:\\Users\\dachen\\Documents\\vTargetBuyers.csv")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘vTargetBuyers.csv’ has magic number 'Marit'
Use of save versions prior to 2 is deprecated
What should I do? Is it the R version installed wrong? or something wrong with my CSV file?
Try using read.csv instead of load. load is for reading files created by save.
Type ?read.csv to access the documentation.

R error when using untar

I'm running a script with input parameters that are referenced in the code to automate the directory creation, download of file and untar of file. I would be fine with unzip, however this particular file I want to analyze is .tar.gz. I manually unpacked and it was tar.gz, unpacked to .tar file. Would that be the problem?
Full error: Error in untar2(tarfile, files, list, exdir) : unsupported entry type ‘’
Running Windows 10, 64 bit, R set to: [Default] [64-bit] C:\Program Files\R\R-3.2.2
Script notes one solution found (issues, lines 28-31), but I don't really understand it.
I did install 7-zip on my computer, restart and of course restart R:
`#DOWNLOADING AND UNZIPPING TAR FILE
#load required packages.
#If there is a load package error, use install.packages("[package]")
library(dplyr)
library(lubridate)
library(XML) # HTML processing
options(stringsAsFactors = FALSE)
#Set directory locations, data file and fetch data file from internet
#enter full url including file name between ' ' marks
mainDir<-"C:/R/BEES/"
subDir<-"C:/R/BEES/Killers"
Fetch<-'http://dds.cr.usgs.gov/pub/data/nationalatlas/afrbeep020_nt00218.tar.gz'
ArchFile<-basename(Fetch)
download.file<-(ArchFile)
#Check for file directories and create if directory if it doesn't exist
if(!file.exists(mainDir)){dir.create(mainDir)}
if(!file.exists(subDir)){dir.create(subDir)}
#set the working directory
setwd(file.path(subDir))
#check if file exists and download if it doesn't exist.
if(!file.exists(ArchFile))
{download.file (url=Fetch,destfile=ArchFile,method='auto')}
#unpack and view file list
untar(path.expand(ArchFile),list=TRUE,exdir=subDir,compressed="gzip")
list.files(subDir)
#Error: Error in untar2(tarfile, files, list, exdir) :
# unsupported entry type ‘’
#Need solution to use tar/untar app
#instructions here: https://stevemosher.wordpress.com/step-10-build/`
Appreciate feedback - I've been lurking around StackOverflow for some time to use other people's solutions.

R exdir does not exist error

I'm trying to download and extract a zip file using R. Whenever I do so I get the error message
Error in unzip(temp, list = TRUE) : 'exdir' does not exist
I'm using code based on the Stack Overflow question Using R to download zipped data file, extract, and import data
To give a simplified example:
# Create a temporary file
temp <- tempfile()
# Download ZIP archive into temporary file
download.file("http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip",temp)
# ZIP is downloaded successfully:
# trying URL 'http://cran.r-project.org/bin/windows/contrib/r-release/ggmap_2.2.zip'
# Content type 'application/zip' length 4533970 bytes (4.3 Mb)
# opened URL
# downloaded 4.3 Mb
# Try to do something with the downloaded file
unzip(temp,list=TRUE)
# Error in unzip(temp, list = TRUE) : 'exdir' does not exist
What I've tried so far:
Accessing the temp file manually and unzipping it with 7zip: Can do this no problem, file is there and accessible.
Changing the temp directory to c:\temp. Again, the file is downloaded successfully, I can access it and unzip it with 7zip but R throws the exdir error message when it tries to access it.
R version 2.15.2
R-Studio version 0.97.306
Edit: The code works if I use unz instead of unzip but I haven't been able to figure out why one works and the other doesn't. From CRAN guidance:
unz reads (only) single files within zip files...
unzip extracts files from or list a zip archive
On a windows setup:
I had this error when I had exdir specified as a path. For me the solution was removing the trailing / or \\ in the path name.
Here's an example and it did create the new folder if it didn't already exist
locFile <- pathOfMyZipFile
outPath <- "Y:/Folders/MyFolder"
# OR
outPath <- "Y:\\Folders\\MyFolder"
unzip(locFile, exdir=outPath)
This can manifest another way, and the documentation doesn't make clear the cause. Your exdir cannot end in a "/", it must be just the name of the target folder.
For example, this was failing with 'exdir' does not exist:
unzip(temp, overwrite = F, exdir = "data_raw/system-data/")
And this worked fine:
unzip(temp, overwrite = F, exdir = "data_raw/system-data")
Presumably when unzip sees the "/" at the end of the exdir path it keeps looking; whereas omitting the "/" tells unzip "you've found it, unzip here".
A couple of years late but I still get this error when trying to use unzip(). It appears to be a bug because the man pages for unzip state if exdir is specified it will be created:
exdir The directory to extract files to (the equivalent of unzip -d).
It will be created if necessary.
A workaround I've been using is to manually create the necessary directory:
dir.create("directory")
unzip("file-to-unzip.zip", exdir = "directory/")
A pain, but it seems to work, at least for me.
I am using R3.2.1 on a Windows 7 machine.
The way I found to address this issue takes a few steps, but it works for me:
Create a vector that contains the name of the url from where you are downloading the file, e.g.
file_url <- "http://your.file.com/file_name.zip"
Use download.file to specify the url where you are downloading the file from (using your newly created vector), followed by the file name of the zipped file (that should be the last part of the url name). It will be saved as such in your working directory*, e.g.
download.file(file_url, "file_name.zip")
*If you are not sure of your working directory, you can use getwd() to check it. If you want to change your working directory, you can use setwd("C:users/username/...") to set it to what you want.
Use "unzip" to unzip the file into your working directory, with the name you will set using exdir, e.g.
unzip("file_name.zip", exdir = "file_name")
To check your work, you can use list.files, e.g.
list.files("file_name")
Hope this helps!

Resources