I would like to download the following zip folder which contains 2 files and open it to manipulate the data. Presently I only know how to download it manually.
https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip
I would also like to download the following zip folder which contains a series of nested folders containing data that I am interested in downloading to manipulate. Presently I am only downloading them manually:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Can anyone advise me how to do it through coding in R?
Here's a way to do it:
fn <- "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2FNEI_data.zip"
download.file(fn, tf <- tempfile(fileext = ".zip"))
unzip(tf, exdir = td <- file.path(tempdir(), "myzip"))
(list.files(td, full.names = TRUE, recursive = TRUE))
Related
I download an xlxs file everyday with a long unique name with dates each day. I need R to read the new xlsx file saved in the directory everyday without typing the unique name everyday. My idea is to utilize the *.xlsx but whenever I try it, it always say the path does not exist:
excel_df <- read_excel("C:/Home/User/dbd/*.xlsx")
the code above does not work
This code says the same:
base <- as.character("C:/Home/User/dbd/*.xlsx")
files <- file.info(list.files(path = base, pattern = '*.xlsx',
full.names = TRUE, no.. = TRUE))
daily_numebrs<-readxl::read_excel(rownames(files)[order(files$mtime)][nrow(files)])
each line of results shows the
...path does not exist.
The path shouldn't contain the pattern:
path <- "C:/Home/User/dbd"
files <- list.files(path= path, full.names=T, pattern ='\\.xlsx$')
files
lapply(files, function(file) readxl::read_excel(file))
I want to be able to read and edit spatial SQlite tables that are downloaded from a server. These come compressed.
These zip files have a folder in them that contains information about the model that has been run as the name of the folder, and as such these can sometimes be quite long.
When this folder name gets too long, unziping the folder fails. I ultimately dont need to unzip the file. But i seem to get the same error when I use unz within readOGR.
I cant think of how to recreate a replicate able example but I can give an example of a path that works and one that doesnt.
Works:
"S:\3_Projects\CRC00001\4699-12103\scenario_initialised model\performance_assessment.sqlite"
4699-12103 is the zip file name
and "scenario_initialised model" is the offending subfolder
Fails:
""S:\3_Projects\CRC00001\4699-12129\scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0\performance_assessment.sqlite""
4699-12103 is the zip file name
and "scenario_tree_canopy_7, number_of_trees_0, roads_False, compliance_75, year_2030, nrz_cover_0.6, green_roofs_0" is the offending subfolder
The code would work in a similar fashion to this.
list_zips <- list.files(pattern = "*.zip", recursive = TRUE, include.dirs = TRUE)
unzip(zipfile = paste(getwd(),"/",list_zips[i],sep = ""),
exdir=substr(paste(getwd(),"/",list_zips[i],sep = ""),1,nchar(paste(getwd(),"/",list_zips[i],sep = ""))-4))
But I would prefer to directly be able to load the spatial file in without unzipping. Such as:
sq_path <- unzip(list_zips[i], list=TRUE)[2,1]
temp <- unz(paste(getwd(),"/",list_zips[i],sep = ""),sq_path)
vectorImport <- readOGR(dsn=temp, layer="micro_climate_grid")
Any help would be appreciated! Tim
This is NOT a duplicate. I am trying to retrieve files in a specific folder.
I want this statement f <- drive_find(n_max=30) to return only the files in a specific directory. How do I point it to a specific directory?
For more detail on what I am doing, see below.
I am using R and can download files with the following code:
install.packages("googledrive")
#load googledrive
library("googledrive")
drive_download(file = as_id(drive_find(pattern = "abc.xlsx", n_max = 30)$id),
path = "/Users/me/Desktop/abc.xlsx")
But I want to download only specific files in a specific directory and I don't know how to specify that directory specifically and exclusively.
I have tried drive_get and drive_download but am unable to specify a specific directory.
f <- drive_find(n_max = 30) # this gives me a list of files
for (i in 1:nrow(f)){
d_path <- f$name[i]
drive_download(file = as_id(drive_find(pattern = f$name[i], n_max = 30)$id),
path = paste("/Users/me/Desktop/Gdrive/", d_path, sep =""))
}
The problem is that the statement f <- drive_find(n_max = 30) gives me a list that includes folders and files I do not want. So I need to specify the exact directory to look in. How do I do that?
On a previous thread the folder ID was used to specify the exact folder location. Unless you have them in the "drive" folder, this could help. R How to read a file from google drive using R
The code provided there
library(googledrive)
temp <- tempfile(fileext = ".zip")
dl <- drive_download(as_id("1AiZda_1-2nwrxI8fLD0Y6e5rTg7aocv0"),
path = temp, overwrite = TRUE)
out <- unzip(temp, exdir = tempdir())
bank <- read.csv(out[14], sep = ";")
You can find the file ID when you get a shareable link in google drive.It will look something like this:
https://drive.google.com/file/d/"file ID you need"/view?usp=sharing
I am downloading the zip files from the location
http://nemweb.com.au/Data_Archive/Wholesale_Electricity/NEMDE/2019/NEMDE_2019_03/NEMDE_Market_Data/NEMDE_Files/NemPriceSetter_20190301_xml.zip
The zip file has multiple xml files inside which iam trying to read but based on the style of the XML file I cannot parse it properly and cannot convert that into a data frame
I have tried to download the zip file into a temporary directory and then tried parsing one file at a time
library(xml2)
library(tidyverse)
tf <- tempfile(tmpdir = tdir <- tempdir())
download.file("http://nemweb.com.au/Data_Archive/Wholesale_Electricity/NEMDE/2019/NEMDE_2019_03/NEMDE_Market_Data/NEMDE_Files/NemPriceSetter_20190301_xml.zip", tf)
xml_files <- unzip(tf, exdir = tdir)
library(XML)
doc<-xmlParse(xml_files[1])
a<-xmlToDataFrame(nodes=getNodeSet(doc,"//SolutionAnalysis/PriceSetting"))
unlink(tdir, T, T)
This is how the XML file looks
and I am trying to put the information in a specific column using a data frame
I have a folder of PDFs that I am supposed to perform text analytics on within R. Thus far the best method of doing so has been using R to convert these files to text files using pdftotext. After this however I am unable to perform any analytics as the text files are placed into the same folder as the PDFs from which they are derived.
I am achieving this through:
dest <- "C:/PDF"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"',i,'"')), wait= FALSE))
I was wondering the best method of retaining only the text files, whether it be saving them to a newly created folder in this step or if more must be done.
I have tried:
dir.create("C:/txtfiles")
new.folder <- "C:/txtfiles"
dest <- "C:/PDF"
list.of.files <-list.files(dest, ".txt$")
file.copy(list.of.files, new.folder)
However this only fills the new folder 'txtfiles' with blank text files named after the ones created by the first few lines of code.
use the following code:
files <- list.files(path="current folder location",pattern = "\\.txt$") #lists all .txt files
for(i in 1:length(files)){
file.copy(from=paste("~/current folder location/",files[i],sep=""),
to="destination folder")
This should copy all text files in "current folder location" into a separate folder "destination folder".