How to read excel files based on partial file names

How to read excel files based on partial file names - r

How do I read excel files just based on the first part of the file name? For example my file is "File_01_01_2019", where "File" is always the same but the date changes often, so I would want to read excel files that start with "File" in this scenario.

This should help you
library(readxl)
sapply(list.files(path = "your_path",
# regex that defines to start with "File" and ends with ".xlsx"
pattern = "^File.*\\.xlsx$",
full.names = TRUE),
read_excel)

Related

How to select specific files according to a spreadsheet criteria and then copy from directory to another directory in R?

I have a task that requires me to use a specific column in a CSV spreadsheet that stores the file names, for example:
File Name
CA-001
WV-001
ma-001
My task is to move some files from folder 'source' to folder 'target'.
And I'm using this csv spreadsheet as a crosswalk to select any files with names that match with what's in the column 'File Name'. Then I'm asking R to copy from the source folder that contains not only these files but also other files that are not in this list(eg: CO-001, SC-001...). If it's helpful, all of the files are PDFs, so we don't worry about file type. I want only the files that have names match with what's in the csv spreadsheet. How can I do this?
I have some sample code below, but it still didn't execute successfully.
source <- "C:/Users/53038/MovePDF/Test_From"
target <- "C:/Users/53038/MovePDF/Test_To"
all.files <- list.files(path = source)
csvfile <- read.csv('C:/Users/53038/MovePDF/Master.csv')
toCopy <- all.files[all.files %in% csvfile$Move]
file.copy(toCopy, target)
Thank you!

With the provided code, the selection of patterns you want to match will be in csvfile$File.Name.
I'm assuming the source directory is potentially very large. Instead of performing slow regular expressions to match substrings (while we know the exact filename), and/or getting a complete file listing (which is also slow), I will only seek if the exactly wanted filenames exist before copying them:
source <- "C:/Users/53038/MovePDF/Test_From"
target <- "C:/Users/53038/MovePDF/Test_To"
csvfile <- read.csv('C:/Users/53038/MovePDF/Master.csv')
# add .pdf suffix
toCopy <- paste0(csvfile$File.Name,'.pdf')
# add source directory path
toCopy <- file.path(source, toCopy)
# optional: extract only the existing files from toCopy. You can skip this step if you're sure they exist and/or you don't mind receiving errors
toCopy <- toCopy[file.exists(toCopy)]
# make it so
file.copy(toCopy, target, overwrite = T)
I would preferably keep the .pdf extension in the filename at all times, so also in the source CSV. There would be an issue on case-sensitive filesystems (almost all Linux installations, rarely macOS or Windows) if the extension is .PDF, .Pdf, etc.

read a single .xlsx file in R without the use of filename but utilizing the .xlsx

I download an xlxs file everyday with a long unique name with dates each day. I need R to read the new xlsx file saved in the directory everyday without typing the unique name everyday. My idea is to utilize the *.xlsx but whenever I try it, it always say the path does not exist:
excel_df <- read_excel("C:/Home/User/dbd/*.xlsx")
the code above does not work
This code says the same:
base <- as.character("C:/Home/User/dbd/*.xlsx")
files <- file.info(list.files(path = base, pattern = '*.xlsx',
full.names = TRUE, no.. = TRUE))
daily_numebrs<-readxl::read_excel(rownames(files)[order(files$mtime)][nrow(files)])
each line of results shows the
...path does not exist.

The path shouldn't contain the pattern:
path <- "C:/Home/User/dbd"
files <- list.files(path= path, full.names=T, pattern ='\\.xlsx$')
files
lapply(files, function(file) readxl::read_excel(file))

How to combine path and variable in readr read_csv (list.files for loop)?

I need to mass-import some data for my R project. Following some guide, I wrote a simple for loop which goes like that:
for (for_variable in list.files(path = "./data", pattern = ".csv$")) {
temp <- read_csv(for_variable)
# Some data wranglig
database <- rbind(database, temp)
rm(temp)
}
The problem is that my data is in the data folder in my working directory, as I've specified in list.files(path = "./data"). The problem is that I can't use read_csv(for_variable) because I get an error:
'file_name.csv' does not exist in current working directory
And if I try to specify the path in read_csv, it doesn't understand what 'for_variable' is, it tries to find literal 'for_variable' file in the data folder. So how can I combine path and variable name in read_csv? Or is there any other way of solving the problem?

I would recommend reading this post as it is helpful for importing multiple csv files.
But to help with your specific question, your error is likely caused becauseo you need to pass the full path name for the files you want to import and that can be specified by using the full.names = TRUE argument in list.files(). Passing just the file name contained in for_variable to read_csv won't work.
list.files(path = "./data", full.names = TRUE, pattern = ".csv$")

Excluding a particular file extension while reading from a folder

I have many files inside a folder with xlm, xls and xlsx and I want to read only file with type xls, xlsx .
xlsxfile.list <- list.files(path = path, pattern='*.xlsx', full.names = TRUE)
filePath <- list.files(path=path,recursive=T,pattern=".xlsx",full.names=T)
If I use the above mentioned code, I am not able to read .xls file and if I change the pattern to .xls than I can see that .xlm files are also included into file list which i dont want.
Is there any library or simple way to achieve this. I am pretty new to R so any help is appreciated.

list.files(path = 'path', pattern='*.xls$|*.xlsx', full.names = TRUE)
In pattern, $ denotes end of string and | is 'or'.

You can use the pattern *.xlsx?, where it will match no matter if the extension has the last x
list.files(path = 'path', pattern='*.xlsx?', full.names = TRUE)

copy csv file from multiple directories to a new one in R

I am trying to extract many .csv files from multiple directories/subdirectories and copy them in a new folder, where I would like to end up with only .csv files.
The csv files are stored in subdirectories with the following structure:
D:\R data\main_folder\03\07\04\BBB_0120180307031414614.csv
D:\R data\main_folder\03\07\05\BBB_0120180307031414615.csv
I am trying the list.files function to extract the csv files names only.
my_dirs <- list.files("D:\\R data\\main_folder\\",pattern="\\.csv$" ,recursive = TRUE,
include.dirs = FALSE,full.names = FALSE)
The problem is that csv files are listed with the directory path, e.g.
03/07/03/BBB_0120180307031414614.csv
And this, even though full.names and include.dirs is set to FALSE.
This prevents me from copying those files in a new folder, as the name is not recognized.
What am I doing wrong?
Thanks

Use basename function coupled with list.files like below.
If I understood you correctly then you want to fetch the names of .csv files present in different directory.
I have made a temp folder in my documents directory of windows machine , Inside that I have two folders "one" and "two", Inside these folders I have csv files named as "just_one.csv" and "just_two.csv".
So If I want to fetch the names "just_one.csv" and "just_two.csv" then I could do this:
basename(list.files("C:/Users/C_Nfdl_99878314/Documents/temp", "*.csv", recursive=T))
Which results to:
[1] "just_one.csv" "just_two.csv"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to read excel files based on partial file names - r

How do I read excel files just based on the first part of the file name? For example my file is "File_01_01_2019", where "File" is always the same but the date changes often, so I would want to read excel files that start with "File" in this scenario.

This should help you library(readxl) sapply(list.files(path = "your_path", # regex that defines to start with "File" and ends with ".xlsx" pattern = "^File.*\\.xlsx$", full.names = TRUE), read_excel)

Related

How to select specific files according to a spreadsheet criteria and then copy from directory to another directory in R?

read a single .xlsx file in R without the use of filename but utilizing the .xlsx

How to combine path and variable in readr read_csv (list.files for loop)?

Excluding a particular file extension while reading from a folder

copy csv file from multiple directories to a new one in R

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to read excel files based on partial file names - r

How do I read excel files just based on the first part of the file name? For example my file is "File_01_01_2019", where "File" is always the same but the date changes often, so I would want to read excel files that start with "File" in this scenario.

This should help you library(readxl) sapply(list.files(path = "your_path", # regex that defines to start with "File" and ends with ".xlsx" pattern = "^File.*\\.xlsx$", full.names = TRUE), read_excel)

Related

How to select specific files according to a spreadsheet criteria and then copy from directory to another directory in R?

read a single *.xlsx file in R without the use of filename but utilizing the *.xlsx

How to combine path and variable in readr read_csv (list.files for loop)?

Excluding a particular file extension while reading from a folder

copy csv file from multiple directories to a new one in R

Categories

Resources

read a single .xlsx file in R without the use of filename but utilizing the .xlsx