I'd want to read the CSV files modified( or created) most recently in differents directories and then put it in a pre-existing single dataframe (df_total).
I have two kinds of directories to read:
A:/LogIIS/FOLDER01/"files.csv"
On others there a folder with several files.csv, as the example bellow:
"A:/LogIIS/FOLDER02/FOLDER_A/"files.csv"
"A:/LogIIS/FOLDER02/FOLDER_B/"files.csv"
"A:/LogIIS/FOLDER02/FOLDER_C/"files.csv"
"A:/LogIIS/FOLDER03/FOLDER_A/"files.csv"
"A:/LogIIS/FOLDER03/FOLDER_B/"files.csv"
"A:/LogIIS/FOLDER03/FOLDER_C/"files.csv"
"A:/LogIIS/FOLDER03/FOLDER_D/"files.csv"
Something like this...
#get a vector of all filenames
files <- list.files(path="A:/LogIIS",pattern="files.csv",full.names = TRUE,recursive = TRUE)
#get the directory names of these (for grouping)
dirs <- dirname(files)
#find the last file in each directory (i.e. latest modified time)
lastfiles <- tapply(files,dirs,function(v) v[which.max(file.mtime(v))])
You can then loop through these and read them in.
If you just want the latest file overall, this will be files[which.max(file.mtime(files))].
Here a tidyverse-friendly solution
list.files("data/",full.names = T) %>%
enframe(name = NULL) %>%
bind_cols(pmap_df(., file.info)) %>%
filter(mtime==max(mtime)) %>%
pull(value)
Consider creating a data frame of files as file.info maintains OS file system metadata per path such as created time:
setwd("A:/LogIIS")
files <- list.files(getwd(), full.names = TRUE, recursive = TRUE)
# DATAFRAME OF FILE, DIR, AND METADATA
filesdf <- cbind(file=files,
dir=dirname(files),
data.frame(file.info(files), row.names =NULL),
stringsAsFactors=FALSE)
# SORT BY DIR AND CREATED TIME (DESC)
filesdf <- with(filesdf, filesdf[order(dir, -xtfrm(ctime)),])
# AGGREGATE LATEST FILE PER DIR
latestfiles <- aggregate(.~dir, filesdf, FUN=function(i) head(i)[[1]])
# LOOP THROUGH LATEST FILE VECTOR FOR IMPORT
df_total <- do.call(rbind, lapply(latestfiles$file, read.csv))
Here is a pipe-friendly way to get the most recent file in a folder. It uses an anonymous function which in my view is slightly more readable than a one-liner. file.mtime is faster than file.info(fpath)$ctime.
dir(path = "your_path_goes_here", full.names = T) %>% # on W, use pattern="^your_pattern"
(function(fpath){
ftime <- file.mtime(fpath) # file.info(fpath)$ctime for file CREATED time
return(fpath[which.max(ftime)]) # returns the most recent file path
})
Related
I have a folder with multiple *.rar and *.zip files.
Each *.rar and *.zip files have one folder and inside this folder have multiples folders.
I would like to generate a dataset with the names of these multiple folders.
How can I do this using R?
I trying:
temp <- list.files(pattern = "\\.zip$")
lapply(temp, function(x) unzip(x, list = T))
But it returns:
I would like to get just the names: "Nova pasta1" and Nova pasta2"
Thanks
Let's create an simple set of directories/files that are representative of your own. You described having a single .zip file that contains multiple zipped directories, which may contain unzipped files and/or sub-directoris.
# Example main directory
dir.create("main_dir")
# Example directory with 1 file and a subdirectory with 1 file
dir.create("main_dir/example_dir1")
write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_file.csv")
dir.create("main_dir/example_dir1/example_subdir")
write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_subdir/example_subdirfile.csv")
# Example directory with 1 file
dir.create("main_dir/example_dir2")
write.csv(data.frame(x = "foo"), file = "main_dir/example_dir2/example_file2.csv")
# NOTE: I was having issues with using `zip()` to zip each directory
# then the main (top) directory, so I manually zipped them below.
# Manually zip example_dir1 and example_dir2, then zip main_dir at this point.
Given this structure, we can get the paths to all of the directories within the highest level directory (main_dir) using unzip(list = TRUE) since we know the name of the single zipped directory containing all of these additional zipped sub-directories.
# Unzip the highest level directory available, get all of the .zip dirs within
ex_path <- "main_dir"
all_zips <- unzip(zipfile = paste0(ex_path, ".zip"), list = TRUE)
all_zips
# We can remove the main_path string if we want so that we only
# the zip files within our main directory instead of the full path.
library(dplyr)
all_zips %>%
filter(Name != paste0(ex_path, "/")) %>%
mutate(Name = sub(paste0(ex_path, "/"), "", Name))
If you had multiple zipped directories with nested directories similar to main_dir, you could just put their paths in a list and apply the function to each element of the list. Below I reproduce this.
# Example of multiple zip directory paths in a list
ziplist <- list(ex_path, ex_path, ex_path)
lapply(ziplist, function(x) {
temp <- unzip(zipfile = paste0(x, ".zip"), list = TRUE)
temp <- temp %>% mutate(main_path = x)
temp <- temp %>%
filter(Name != paste0(ex_path, "/")) %>%
mutate(Name = sub(paste0(ex_path, "/"), "", Name))
temp
})
If all of the .zip files in the current working directory are files you want to do this for, you can get ziplist above via:
list.files(pattern = ".zip") %>% as.list()
I appreciate all help, but I think that I found a short way to solve my question.
temp.zip <- list.files(pattern = ".zip")
temp.rar <- list.files(pattern = ".rar")
mydata <- lapply(c(temp.rar, temp.zip),
function(x) unique(c(na.omit(str_extract(unlist
(untar(tarfile = x,
list = TRUE)),
'(?<=/).*(?=/)')))))
unlist(mydata)
Thanks all
I have folders with 4 .csv files in each folder. Currently I am batch reading the .csv files in the folder:
setwd("/Users/Drive/MS/Ma/Ec/Effort_variation_Ec/MES1/")
ecosmpr <-
list.files(pattern = "*.csv") %>%
map_df(~read_csv(.))
ecosmpr=data.frame(ecosmpr)
After batch reading in the 4 csvs in the folder as one data.frame, I need to do some formatting:
ecosmpr1=ecosmpr[,-c(2:13)]
dim(ecosmpr1)
ecosmpr1=ecosmpr1 %>%
row_to_names(row_number = 1)
names(ecosmpr1)=rev(c("detritus","phyto","peri","zoops","amphipods","inverts","leucisids","lns","yct5plus","yct4","yct3","yct2","yctyoy","lkt5plus","lkt34","lkt2","lkt7mo1yo",
"lktyoy","years"))
Then I want to export the formatted data.frame to a csv, but in a different location:
write.csv(ecosmpr1,"/Users/Drive/MS/Ma/Ec/Effort_variation_Ec/ecosmpr1_partialformat.csv",row.names = FALSE)
My issue is that I need to loop through the first setwd(MESXX) rename each "ecosmprXX" and export each "ecosmprXX_partialformat.csv" I am having issues with even starting this loop. My naming convention for the folder is MESXX (where XX is the number, 1:30),data frame is ecosmprXX (where XX is the number, 1:30), and exported .csv is ecosmprXX_partialformat.csv (where XX is the number, 1:30). I have 30 different folders so doing this without a loop is inefficient.
This should do the trick:
library(tidyverse)
library(janitor)
new_col_names <- rev(c("detritus","phyto","peri","zoops","amphipods","inverts","leucisids","lns",
"yct5plus","yct4","yct3","yct2","yctyoy","lkt5plus","lkt34","lkt2",
"lkt7mo1yo", "lktyoy","years"))
for (i in 1:30) {
setwd(paste0("/Users/Drive/MS/Ma/Ec/Effort_variation_Ec/MES", i, "/"))
ecosmpr <- list.files(pattern = "*.csv") %>%
map_df(~read_csv(.x))
ecosmpr <- ecosmpr %>%
select(-c(2:13)) %>%
row_to_names(row_number = 1)
names(ecosmpr) <- new_col_names
output_file <-
paste0("/Users/Drive/MS/Ma/Ec/Effort_variation_Ec/ecosmpr", i, "_partialformat.csv")
write.csv(ecosmpr, output_file, row.names = FALSE)
}
I'm doing a project and I have to import a ton of .csv files into Rstudio. The files correspond to dates. Each date has a directory filled with files from that day. To get all the data for one day I'v been using:
im = list.files(pattern = "*.csv")
my-data = lapply(im, read_csv)
The problem is I have hundreds of days worth of files so hundreds of directories to go through. Is there a way to pull all the files from all the directories at once into the same data table? Bonus if it can include the date (title of the directory it's in) in the data table. Also we use tidyverse and tibbles if that makes a difference.
If all of the directories are in one root directory, try list.files(recursive = TRUE), which will search subdirectories as well. Additionally, look at the purrr trick of set_names() and imap, which iterates over both contents and names of an object, would let you bring the filenames in as a column. Something like the following. Note that you'll have to use some string tools to get just the date from the end of the filepaths.
library(tidyverse)
all_files <- "path/to/root/folder" %>%
list.files(pattern = "csv", "recursive = TRUE", full.names = "TRUE") %>%
set_names() %>%
imap_dfr(~ bind_cols(read_csv(.x), filepath = .y))
i had the same problem - below solution worked for me :-)
require(tidyverse)
path <- "/my/root/"
read_plus <- function(flnm) {
read_csv(flnm) %>%
mutate(filename = flnm)
}
my_data <-
list.files(path, pattern = "*.csv",
recursive = T,
full.names = T) %>%
map_df(~read_plus(.))
I have a base folder and it has many folders in it. I want to go to each folder, find a file that has name table_amzn.csv (if exists) and then read all of those files in R and put all files in a single dataframe one after other. I have verified that all files have same columns. I know how to read CSVs into R. But how could i loop over all the folders within a base folder and concatenate data
This also can be straightforward in base R:
## change `dir` to whatever your 'base folder' actually is
dir <- '~/base_folder'
ff <- list.files(dir, pattern = "table_amzn.csv", recursive = TRUE, full.names = TRUE)
out <- do.call(rbind, lapply(ff, read.csv))
In the event that your columns are the same but for whatever reason (typo, etc) have different column names, you could modify the above like:
out <- do.call(rbind, lapply(ff, read.csv, header = FALSE, skip = 1))
names(out) <- c('stub1', 'stub2') # whatever they should be
Here is an implementation that was recently added to the package rio:
files <- list.files(pattern = "table_amzn.csv", recursive = TRUE, full.names = TRUE)
devtools::install_github("leeper/rio")
library(rio)
df <- import_list(files, rbind = TRUE)
This will load all the objects in files to a single data.frame object. Alternatively, if you call with rbind = FALSE then a list of data.frames is returned
I feel I am very close to the solution but at the moment i cant figure out how to get there.
I´ve got the following problem.
In my folder "Test" I´ve got stacked datafiles with the names M1_1; M1_2, M1_3 and so on: /Test/M1_1.dat for example.
No I want to seperate the files, so that I get: M1_1[1].dat, M1_1[2].dat, M1_1[3].dat and so on. These files I´d like to save in specific subfolders: Test/M1/M1_1[1]; Test/M1/M1_1[2] and so on, and Test/M2/M1_2[1], Test/M2/M1_2[2] and so on.
Now I already created the subfolders. And I got the following command to split up the files so that i get M1_1.dat[1] and so on:
for (e in dir(path = "Test/", pattern = ".dat", full.names=TRUE, recursive=TRUE)){
data <- read.table(e, header=TRUE)
df <- data[ -c(2) ]
out <- split(df , f = df$.imp)
lapply(names(out),function(z){
write.table(out[[z]], paste0(e, "[",z,"].dat"),
sep="\t", row.names=FALSE, col.names = FALSE)})
}
Now the paste0 command gets me my desired split up data (although its M1_1.dat[1] instead of M1_1[1].dat), but i cant figure out how to get this data into my subfolders.
Maybe you´ve got an idea?
Thanks in advance.
I don't have any idea what your data looks like so I am going to attempt to recreate the scenario with the gender datasets available at baby names
Assuming all the files from the zip folder are stored to "inst/data"
store all file paths to all_fi variable
all_fi <- list.files("inst/data",
full.names = TRUE,
recursive = TRUE,
pattern = "\\.txt$")
> head(all_fi, 3)
[1] "inst/data/yob1880.txt" "inst/data/yob1881.txt"
Preset function that will apply to each file in the directory
f.it <- function(f_in = NULL){
# Create the new folder based on the existing basename of the input file
new_folder <- file_path_sans_ext(f_in)
dir.create(new_folder)
data.table::fread(f_in) %>%
select(name = 1, gender = 2, freq = 3) %>%
mutate(
gender = ifelse(grepl("F", gender), "female","male")
) %>% (function(x){
# Dataset contains names for males and females
# so that's what I'm using to mimic your split
out <- split(x, x$gender)
o <- rbind.pages(
lapply(names(out), function(i){
# New filename for each iteration of the split dataframes
###### THIS IS WHERE YOU NEED TO TWEAK FOR YOUR NEEDS
new_dest_file <- sprintf("%s/%s.txt", new_folder, i)
# Write the sub-data-frame to the new file
data.table::fwrite(out[[i]], new_dest_file)
# For our purposes return a dataframe with file info on the new
# files...
data.frame(
file_name = new_dest_file,
file_size = file.size(new_dest_file),
stringsAsFactors = FALSE)
})
)
o
})
}
Now we can just loop through:
NOTE: for my purposes I'm not going to spend time looping through each file, for your purposes this would apply to each of your initial files, or in my case all_fi rather than all_fi[2:5].
> rbind.pages(lapply(all_fi[2:5], f.it))
============================ =========
file_name file_size
============================ =========
inst/data/yob1881/female.txt 16476
inst/data/yob1881/male.txt 15306
inst/data/yob1882/female.txt 18109
inst/data/yob1882/male.txt 16923
inst/data/yob1883/female.txt 18537
inst/data/yob1883/male.txt 15861
inst/data/yob1884/female.txt 20641
inst/data/yob1884/male.txt 17300
============================ =========