R list.files for multiple specific file names - r

I have a directory in windows with multiple sub-directories. I also have a list of file names that I would like to search for within the directory structure (to retrieve exact path)
This works fine for a single value as below (current.folder is a variable for the main directory) -
files <- list.files(current.folder,
pattern = test,
recursive = TRUE,
full.names = TRUE)
I can then use the returned path to do file.copy.
The problem I am now having is applying that function to multiple file names stored in a dataframe or even a vector.
I've tried referencing them in the pattern argument (which only returns for first value) and used a for loop on a set of just two filenames (returns blank).
Am I using the wrong technique or just not finding the correct setup?
Edit for clarification - test refers to value - "1-5FX3C7P_1-5FX3C8T_JNJLFSPROD-ZZFDA-CDRH-AERS-3500A-01162017131543-1-5FX3C7P.xml"

Does this work:
sapply(test, function(x)list.files(current.folder,
pattern=x,
recursive=TRUE,
full.names=TRUE))

Related

List of subfolders names, folder path and date modified field

Need to write a piece of code in R that'll create a list specifying:
Names of subfolders with a pre-set depth (e.g. 2 levels down)
Path
Date modified
I've tried to use the following generic function but had no luck:
list.files(path, pattern=NULL, all.files=FALSE,
full.names=FALSE)
dir(path, pattern=NULL, all.files=FALSE,
full.names=FALSE)
Would very much appreciate your response.
I think what your are missing is the recursive = TRUE parameter in list.files()
One possible solution could be to list all files first and then limit the output to 2 levels accordingly.
files <- list.files(path = "D:/cmder/", recursive = TRUE)
Since R represents paths using "/" a simple example could be to remove everything that has more than 3 slashes if you need a depth of 2.
files[!grepl(".*/.*/.*/.*", files)]
Be careful on windows as you might see a back slash "\" there sometimes, only if your path information comes from something different that R itself, e.g. a csv import.
My grepl() statement can probably be improved as I'm not an expert there.

R: extracting files using dir() function with pattern being identified in a matrix

I am currently trying to create a function that will extract file paths located in a specific directory. These files are extracted if the pattern identified in the dir() function is present within a specific filepath. Each pattern that I am looking for is stored within a matrix. There are multiple files associated with each pattern. I am currently testing this function on 2 file names, but will be adding on several more once it is up and running.
I am running into two issues within this approach:
I don't believe the pattern being pulled from the matrix is identified as a string in the dir() function (path=data_sets[i,2]) - there are no quotes around it.
I receive the error : "replacement has length zero" (This may be related to the first error I listed above but am unsure).
I am wondering if someone can help me turn the pattern column within my matrix into a string that will be recognized as a pattern within the dir() function in my code or alternatively, identify an easier way to get the end result.
Here's the code I am currently using:
data_sets<-c("data1", "data2")
extract_most_recent_files_to_source<-function(data_sets){
data_sets<-as.data.table(data_sets)
data_sets[, pattern:=paste0("\"^[", data_sets, "]\"")]
data_sets[, pattern:=gsub("\\\\", "", pattern )]
data_sets<-as.matrix(data_sets)
df_mod<-do.call("rbind", lapply(1:nrow(data_sets), function(i){
files<-dir("filepath", pattern = as.character(data_sets[i,2]), full.names = TRUE, ignore.case = TRUE)
files<-as.data.table(files)
return(files)
}))

parameter not passed to the function when using walk function in PURRR package

I am using the purrr:walk to read multiple excel files and it failed. I have 3 questions:
(1) I used the function list.files to read the excel file list in one folder. But the returned values also included the subfolders. I tried set value for the parameters recursive= and include.dirs=, but it didn't work.
setwd(file_path)
files<-as_tibble(list.files(file_path,recursive=F,include.dirs=F)) %>%
filter(str_detect(value,".xlsx"))
files
(2) When I used the following piece of code, it can run without any error or warning message, but there is no returned data.
###read the excel data
file_read <- function(value1) {
print(value1)
file1<-read_excel(value1,sheet=1)
}
walk(files$value,file_read)
When I used the following, it worked. Not sure why.
test<-read_excel(files$value,sheet=1)
(3) In Q2, actually I want to create file1 to file6, suppose there are 6 excel files. How can I dynamically assign the dataset name?
list.files has pattern argument where you can specify what kind of files you are looking for. This will help you avoid filter(str_detect(value,".xlsx")) step. Also list.files only returns the files that are included in the main directory (file_path) and not it's subdirectory unless you specify recursive = TRUE.
library(readxl)
setwd(file_path)
files <- list.files(pattern = '\\.xlsx')
In the function you need to return the object.
file_read <- function(value1) {
data <- read_excel(value1,sheet=1)
return(data)
}
Now you can use map/lapply to read the files.
result <- purrr::map(files,file_read)

Why does "write.dat" (R) save data files within folders?

In order to conduct some analysis using a particular software, I am required to have separate ".dat" files for each participant, with each file named as the participant number, all saved in one directory.
I have tried to do this using the "write.dat" function in R (from the 'multiplex' package).
I have written a loop that outputs a ".dat" file for each participant in a dataset. I would like each file that is outputted to be named the participant number, and for them all to be stored in the same folder.
## Using write.dat
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
data_list[[i]] <- newdata %>%
filter(SJNB == participants_ID[i])
write.dat(data_list[[i]], paste0("/Filepath/Directory/", participants_ID[i], ".dat"))
}
## Using write_csv this works perfectly:
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
newdata %>%
filter(SJNB == participants_ID[i]) %>%
write_csv(paste0("/Filepath/Directory/", participants_ID[i], ".csv"), append = FALSE)
}
If I use the function "write_csv", this works perfectly (saving .csv files for each participant). However, if I use the function "write.dat" each participant file is saved inside a separate folder - the folder name is the participant number, and the file inside the folder is called "data_list[[i]]". In order to get all of the data_list files into the same directory, I then have to rename them which is time consuming.
I could theoretically output the files to .csv and then convert them to .dat, but I'm just intrigued to know if there's anything I could do differently to get the write.dat function to work the way I'm trying it :)
The documentation on write.dat is subminimal, but it would appear that you have confused a directory path with a file name . You have deliberately created a directory named "/Filepath/Directory/[participants_ID[i]].dat" and that's where each output file is placed. That you cannot assing a name to the x.dat file itself appears to be a defect in the package as supplied.
However, not all is lost. Inside your loop, replace your write.dat line with the following lines, or something similar (not tested):
edit
It occurs to me that there's a smoother solution, albeit using the dreaded eval:
Again inside the loop, (assuming participants_ID[i] is a char string)
eval(paste0(participants_ID[i],'<- dataList[[i]]'))
write.dat(participants_ID[i], "/Filepath/Directory/")
previous answer
write.dat(data_list[[i]], "/Filepath/Directory/")
thecommand = paste0('mv /Filepath/Directory/dataList[[i]] /Filepath/Directory/',[participants_ID[i]],'.dat',collapse="")
system(thecommand)

read .csv file with unknown path -- R

I know this might be very stupid question but I have been spending hours on this
want to read a .csv file that I dont have its full path (*/*data.csv). I know that following would get the path of the current directory but don't know how to adapt
Marks <- read.csv(dir(path = '.', full.names=T, pattern='^data.*\\.csv'))
tried this one as well but not working
Marks <- read.csv(file = "*/*/data.csv", sep = ",", header=FALSE))
I can't identify specific path as this will be used on different machines with different paths but I am sure about the sub-folders of the main directory as they are result of a bash script
and I am planing to call this from within unix which defines the workspace
my data structure is
lecture01/test/data.csv
lecture02/test/data.csv
lecture03/test/data.csv
Your comments -- though not currently your question itself -- indicate you expect to run your code in a working directory that contains some number of subdirectories (lecture01, lecture02, etc), each of which contain a subdirectory 'marks' that in turn contains a data.csv file. If this is so, and your objective is to read the csv from within each subdirectory, then you have a couple of options depending on the remaining details.
Case 1: Specify the top-level directory names directly, if you know them all and they are potentially idiosyncratic:
dirs <- c("lecture01", "lecture02", "some_other_dir")
paths <- file.path(dirs, "marks/data.csv")
Case 2: Construct the top-level directory names, e.g. if they all start with "lecture", followed by a two digit number, and you are able to (or specifically wish to) specify a numeric range, e.g. 01 though 15:
dirs <- sprintf("lecture%02s", 1:15)
paths <- file.path(dirs, "marks/data.csv")
Case 3: Determine the top-level directory names by matching a pattern, e.g. if you want to read data from within every directory starting with the string "lecture":
matched.names <- list.files(".", pattern="^lecture")
dirs <- matched.names[file.info(matched.names)$isdir]
paths <- file.path(dirs, "marks/data.csv")
Once you have a vector of the paths, I'd probably use lapply to read the data into a list for further processing, naming each one with the base directory name:
csv.data <- lapply(paths, read.csv)
names(csv.data) <- dirs
Alternatively, if whatever processing you do on each individual CSV is done just for its side effects, such as modifying the data and writing out a new version, and especially if you don't ever want all of them to be in memory at the same time, then use a loop.
If this answer misses the mark, of even if it doesn't, it would be great if you could clarify the question accordingly.
I have no code but I would do a reclusive glob from the root and do a preg_match to find the .csv file (use glob brace).

Resources