Downloading and opening zipfile

Downloading and opening zipfile - r

I would like to download the following zip folder which contains 2 files and open it to manipulate the data. Presently I only know how to download it manually.
https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip
I would also like to download the following zip folder which contains a series of nested folders containing data that I am interested in downloading to manipulate. Presently I am only downloading them manually:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Can anyone advise me how to do it through coding in R?

Here's a way to do it:
fn <- "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip"
download.file(fn, tf <- tempfile(fileext = ".zip"))
unzip(tf, exdir = td <- file.path(tempdir(), "myzip"))
(list.files(td, full.names = TRUE, recursive = TRUE))

Related

read a single .xlsx file in R without the use of filename but utilizing the .xlsx

I download an xlxs file everyday with a long unique name with dates each day. I need R to read the new xlsx file saved in the directory everyday without typing the unique name everyday. My idea is to utilize the *.xlsx but whenever I try it, it always say the path does not exist:
excel_df <- read_excel("C:/Home/User/dbd/*.xlsx")
the code above does not work
This code says the same:
base <- as.character("C:/Home/User/dbd/*.xlsx")
files <- file.info(list.files(path = base, pattern = '*.xlsx',
full.names = TRUE, no.. = TRUE))
daily_numebrs<-readxl::read_excel(rownames(files)[order(files$mtime)][nrow(files)])
each line of results shows the
...path does not exist.

The path shouldn't contain the pattern:
path <- "C:/Home/User/dbd"
files <- list.files(path= path, full.names=T, pattern ='\\.xlsx$')
files
lapply(files, function(file) readxl::read_excel(file))

Unzip failing due to long name in zipped folder

I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim

How do I download files using R from a specific folder in Google drive?

This is NOT a duplicate. I am trying to retrieve files in a specific folder.
I want this statement f <- drive_find(n_max=30) to return only the files in a specific directory. How do I point it to a specific directory?
For more detail on what I am doing, see below.
I am using R and can download files with the following code:
install.packages("googledrive")
#load googledrive
library("googledrive")
drive_download(file = as_id(drive_find(pattern = "abc.xlsx", n_max = 30)$id),
path = "/Users/me/Desktop/abc.xlsx")
But I want to download only specific files in a specific directory and I don't know how to specify that directory specifically and exclusively.
I have tried drive_get and drive_download but am unable to specify a specific directory.
f <- drive_find(n_max = 30) # this gives me a list of files
for (i in 1:nrow(f)){
d_path <- f$name[i]
drive_download(file = as_id(drive_find(pattern = f$name[i], n_max = 30)$id),
path = paste("/Users/me/Desktop/Gdrive/", d_path, sep =""))
}
The problem is that the statement f <- drive_find(n_max = 30) gives me a list that includes folders and files I do not want. So I need to specify the exact directory to look in. How do I do that?

On a previous thread the folder ID was used to specify the exact folder location. Unless you have them in the "drive" folder, this could help. R How to read a file from google drive using R
The code provided there
library(googledrive)
temp <- tempfile(fileext = ".zip")
dl <- drive_download(as_id("1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0"),
path = temp, overwrite = TRUE)
out <- unzip(temp, exdir = tempdir())
bank <- read.csv(out[14], sep = ";")
You can find the file ID when you get a shareable link in google drive.It will look something like this:
https://drive.google.com/file/d/"file ID you need"/view?usp=sharing

Trying to convert XML into a dataframe

I am downloading the zip files from the location
http://nemweb.com.au/Data_Archive/Wholesale_Electricity/NEMDE/2019/NEMDE_2019_03/NEMDE_Market_Data/NEMDE_Files/NemPriceSetter_20190301_xml.zip
The zip file has multiple xml files inside which iam trying to read but based on the style of the XML file I cannot parse it properly and cannot convert that into a data frame
I have tried to download the zip file into a temporary directory and then tried parsing one file at a time
library(xml2)
library(tidyverse)
tf <- tempfile(tmpdir = tdir <- tempdir())
download.file("http://nemweb.com.au/Data_Archive/Wholesale_Electricity/NEMDE/2019/NEMDE_2019_03/NEMDE_Market_Data/NEMDE_Files/NemPriceSetter_20190301_xml.zip", tf)
xml_files <- unzip(tf, exdir = tdir)
library(XML)
doc<-xmlParse(xml_files[1])
a<-xmlToDataFrame(nodes=getNodeSet(doc,"//SolutionAnalysis/PriceSetting"))
unlink(tdir, T, T)
This is how the XML file looks
and I am trying to put the information in a specific column using a data frame

Copying only text files into new folder in R

I have a folder of PDFs that I am supposed to perform text analytics on within R. Thus far the best method of doing so has been using R to convert these files to text files using pdftotext. After this however I am unable to perform any analytics as the text files are placed into the same folder as the PDFs from which they are derived.
I am achieving this through:
dest <- "C:/PDF"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"',i,'"')), wait= FALSE))
I was wondering the best method of retaining only the text files, whether it be saving them to a newly created folder in this step or if more must be done.
I have tried:
dir.create("C:/txtfiles")
new.folder <- "C:/txtfiles"
dest <- "C:/PDF"
list.of.files <-list.files(dest, ".txt$")
file.copy(list.of.files, new.folder)
However this only fills the new folder 'txtfiles' with blank text files named after the ones created by the first few lines of code.

use the following code:
files <- list.files(path="current folder location",pattern = "\\.txt$") #lists all .txt files
for(i in 1:length(files)){
file.copy(from=paste("~/current folder location/",files[i],sep=""),
to="destination folder")
This should copy all text files in "current folder location" into a separate folder "destination folder".

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Downloading and opening zipfile - r

Here's a way to do it: fn <- "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip" download.file(fn, tf <- tempfile(fileext = ".zip")) unzip(tf, exdir = td <- file.path(tempdir(), "myzip")) (list.files(td, full.names = TRUE, recursive = TRUE))

Related

read a single .xlsx file in R without the use of filename but utilizing the .xlsx

Unzip failing due to long name in zipped folder

How do I download files using R from a specific folder in Google drive?

Trying to convert XML into a dataframe

Copying only text files into new folder in R

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Downloading and opening zipfile - r

Here's a way to do it: fn <- "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip" download.file(fn, tf <- tempfile(fileext = ".zip")) unzip(tf, exdir = td <- file.path(tempdir(), "myzip")) (list.files(td, full.names = TRUE, recursive = TRUE))

Related

read a single *.xlsx file in R without the use of filename but utilizing the *.xlsx

Unzip failing due to long name in zipped folder

How do I download files using R from a specific folder in Google drive?

Trying to convert XML into a dataframe

Copying only text files into new folder in R

Categories

Resources

read a single .xlsx file in R without the use of filename but utilizing the .xlsx