Specifying pathname in map_dfr - r

The structure of my directory is as follows:
Extant_Data -> Data -> Raw
-> course_enrollment
-> frpm
I have a few different function to to read in some text files and excel files respectively.
read_fun = function(path){
test = read.delim(path, sep="\t", header=TRUE, fill = TRUE, colClasses = c(rep("character",23)))
test
}
read_fun_frpm= function(path){
test = read_excel(path, sheet = 2, col_names = frpm_names)
}
I feed this into map_dfr so that the function reads in each of the files and rowbinds them.
allfiles = list.files(path = "Extant_Data/Data/Raw/course_enrollment",
pattern = "CourseEnrollment.txt",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!! BUT I HAVE set the working directory to a subdirectory so that it finds those files
setwd("/Extant_Data/Data/Raw/course_enrollment")
course_combined <- map_dfr(allfiles,read_fun)
allfiles = list.files(path = "Extant_Data/Data/Raw/frpm/post12",
pattern = "frpm*",
full.names=FALSE,
recursive = T)
# Rowbind all the course enrollment data
# !!!I have to change the directory AGAIN
setwd(""Extant_Data/Data/Raw/frpm/post12")
frpm_combined <- map_dfr(allfiles,read_fun_frpm)
As mentioned in the comments, I have to keep changing the working directory so that map_dfr can locate the files. I don't think this is best practice, how might I work around this so I don't have to keep changing the directory? Any suggestions appreciated. Sorry it's hard to provide a re-producible example.
Note: This throws an error.
frpm_combined <- map_dfr(allfiles,read_fun_frpm('Extant_Data/Data/Raw/frpm/post12'))

Related

How to rbind similar csv files that are scattered in many different zip files, using a function?

Consider one file 'C:/ZFILE' that includes many zip files.
Now, consider that each of these zip includes many csv, among which one specific csv named 'NAME.CSV', all these scattered 'NAME.CSV' being similarly named and structured (i.e., same columns).
How to rbind all these scattered csv?
The script below allows that, but a function would be more appropriate.
How to do this?
Thanks
zfile <- "C:/ZFILE"
zlist <- list.files(path = zfile, pattern = "\\.zip$", recursive = FALSE, full.names = TRUE)
zlist # list all zip from the zfile file
zunzip <- lapply(zlist, unzip, exdir = zfile) # unzip all zip in the zfile file (may takes time depending on the number of zip)
library(data.table) # rbindlist & fread
csv_name <- "NAME.CSV"
csv_list <- list.files(path = zfile, pattern = paste0("\\", csv_name, "$"), recursive = TRUE, ignore.case = FALSE, full.names = TRUE)
csv_list # list all 'NAME.CSV' from the zfile file
csv_rbind <- rbindlist(sapply(csv_list, fread, simplify = FALSE), idcol = 'filename')
You can try this type of function ( you can pass the unzip call directly to the cmd param of data.table::fread())
get_zipped_csv <- function(path) {
fnames = list.files(path,full.names = T)
rbindlist(lapply(fnames, \(f) fread(cmd = paste0("unzip -p ",f))[,src:=f]))
}
Usage:
get_zipped_csv(path = "C:\ZFILE\")

Read several txt file from different directories in R

I have several txt files in different directories. I want to read each file separately in R that I will apply some analysis on each one later.
The directories are the same except the last folder as the following:
c:/Desktop/ATA/1/"files.txt"
c:/Desktop/ATA/2/"files.txt"
c:/Desktop/ATA/3/"files.txt"
...
...
The files in all directories have the same name and the last folder starts from 1 to last order.
Create all the filenames to read using sprintf or something similar. Then use read.table or whatever you use to read the text files.
lapply(sprintf("c:/Desktop/ATA/%d/files.txt", 1:10), function(x)
read.table(x, header = TRUE))
Replace 10 with the number of folders you have.
Maybe you can try:
list_file <- list.files(path = "c:/Desktop/ATA", recursive = T, pattern = ".txt", full.names = T)
This will return the list of text files contained in your folder. Then, you can create a for loop to open them and apply some functions on each.
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
... function to apply
}
First Thanks Guys, I mixed your codes and modified a little bit:
common_path = "c:/Desktop/ATA/"
primary_dirs = length(list.files(common_path)) # Gives no. of folders in path
list_file <- sprintf("c:/Desktop/ATA/%d/files.txt", 1:primary_dirs)
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
}
So, by this way the folders are sorted based on 1,2,3 not 1,10,11,2,3.

R rename files keeping part of original name

I'm trying to rename all files in a folder (about 7,000 files) files with just a portion of their original name.
The initial fip code is a 4 or 5 digit code that identifies counties, and is different for every file in the folder. The rest of the name in the original files is the state_county_lat_lon of every file.
For example:
Original name:
"5081_Illinois_Jefferson_-88.9255_38.3024_-88.75_38.25.wth"
"7083_Illinois_Jersey_-90.3424_39.0953_-90.25_39.25.wth"
"11085_Illinois_Jo_Daviess_-90.196_42.3686_-90.25_42.25.wth"
"13087_Illinois_Johnson_-88.8788_37.4559_-88.75_37.25.wth"
"17089_Illinois_Kane_-88.4342_41.9418_-88.25_41.75.wth"
And I need it to rename with just the initial code (fips):
"5081.wth"
"7083.wth"
"11085.wth"
"13087.wth"
"17089.wth"
I've tried by using the list.files and file.rename functions, but I do not know how to identify the code name out of he full name. Some kind of a "wildcard" could work, but don't know how to apply those properly because they all have the same pattern but differ in content.
This is what I've tried this far:
setwd("C:/Users/xxx")
Files <- list.files(path = "C:/Users/xxx", pattern = "fips_*.wth" all.files = TRUE)
newName <- paste("fips",".wth", sep = "")
for (x in length(Files)) {
file.rename(nFiles,newName)}
I've also tried with the "sub" function as follows:
setwd("C:/Users/xxxx")
Files <- list.files(path = "C:/Users/xxxx", all.files = TRUE)
for (x in length(Files)) {
sub("_*", ".wth", Files)}
but get Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
OR
setwd("C:/Users/xxxx")
Files <- list.files(path = "C:/Users/xxxx", all.files = TRUE)
for (x in length(Files)) {
sub("^(\\d+)_.*", "\\1.wth", file)}
Which runs without errors but does nothing to the names in the file.
I could use any help.
Thanks
Here is my example.
Preparation for data to use;
dir.create("test_dir")
data_sets <- c("5081_Illinois_Jefferson_-88.9255_38.3024_-88.75_38.25.wth",
"7083_Illinois_Jersey_-90.3424_39.0953_-90.25_39.25.wth",
"11085_Illinois_Jo_Daviess_-90.196_42.3686_-90.25_42.25.wth",
"13087_Illinois_Johnson_-88.8788_37.4559_-88.75_37.25.wth",
"17089_Illinois_Kane_-88.4342_41.9418_-88.25_41.75.wth")
setwd("test_dir")
file.create(data_sets)
Rename the files;
Files <- list.files(all.files = TRUE, pattern = ".wth")
newName <- sub("^(\\d+)_.*", "\\1.wth", Files)
file.rename(Files, newName)

exifr doesn't extract information from photoes

I have some pictures with longitude/latitude information. R finds them with the command list.files, but when I use exifr(files) it returns a dataset with 1 column and 0 observations. What am I doing wrong?
files <- list.files(path = "C:/Users/user1/Downloads/pictures", pattern = "*.jpg")
dat <- exifr(files)
I tried your code on my machine and got the same result. You need the full paths to the pictures. list.files, as you have called it, will return just the file name, e.g. photo.jpg. If the photos are not in R's working directory, exifr() will not read them. What you need to add to list.files is full.names = TRUE:
files <- list.files(path = "C:/Users/user1/Downloads/pictures", pattern = "*.jpg",
full.names = TRUE)
dat <- exifr(files)

How to return the same name of a file with a little change?

I am doing some calculations for some files and then I am writing them back. my code here reads the data,does simple calculations, and then write the results and take the same name of the original file.
dir1<- list.files("/data/fgoon", "*.img", full.names = TRUE)
for (files in seq_along(dir1)){
hefile <- readBin(dir1[files], numeric(), size = 4, n = 1300*500, signed = T)
results <- hefile+44.8
outputDir <- "/data/baie"
outputFile <- file.path(outputDir, basename(dir1[files]))
writeBin(as.double(results), outputFile, size = 4)
}
The original file name is for example
dert_E_lwe_20030102_yout.img
What I need is to return the same name (as done in the code) but change lwe to sh for all files:
dert_E_sh_20030102_yout.img
all parts of the name are the same for all files except the date.
Any help is appreciated!
You can use something like this (it's an idea):
str_replace_all(basename(dir1[files]),"lwe","sh")

Resources