Excluding a particular file extension while reading from a folder - r

I have many files inside a folder with xlm, xls and xlsx and I want to read only file with type xls, xlsx .
xlsxfile.list <- list.files(path = path, pattern='*.xlsx', full.names = TRUE)
filePath <- list.files(path=path,recursive=T,pattern=".xlsx",full.names=T)
If I use the above mentioned code, I am not able to read .xls file and if I change the pattern to .xls than I can see that .xlm files are also included into file list which i dont want.
Is there any library or simple way to achieve this. I am pretty new to R so any help is appreciated.

list.files(path = 'path', pattern='*.xls$|*.xlsx', full.names = TRUE)
In pattern, $ denotes end of string and | is 'or'.

You can use the pattern *.xlsx?, where it will match no matter if the extension has the last x
list.files(path = 'path', pattern='*.xlsx?', full.names = TRUE)

Related

read a single *.xlsx file in R without the use of filename but utilizing the *.xlsx

I download an xlxs file everyday with a long unique name with dates each day. I need R to read the new xlsx file saved in the directory everyday without typing the unique name everyday. My idea is to utilize the *.xlsx but whenever I try it, it always say the path does not exist:
excel_df <- read_excel("C:/Home/User/dbd/*.xlsx")
the code above does not work
This code says the same:
base <- as.character("C:/Home/User/dbd/*.xlsx")
files <- file.info(list.files(path = base, pattern = '*.xlsx',
full.names = TRUE, no.. = TRUE))
daily_numebrs<-readxl::read_excel(rownames(files)[order(files$mtime)][nrow(files)])
each line of results shows the
...path does not exist.
The path shouldn't contain the pattern:
path <- "C:/Home/User/dbd"
files <- list.files(path= path, full.names=T, pattern ='\\.xlsx$')
files
lapply(files, function(file) readxl::read_excel(file))

How to combine path and variable in readr read_csv (list.files for loop)?

I need to mass-import some data for my R project. Following some guide, I wrote a simple for loop which goes like that:
for (for_variable in list.files(path = "./data", pattern = ".csv$")) {
temp <- read_csv(for_variable)
# Some data wranglig
database <- rbind(database, temp)
rm(temp)
}
The problem is that my data is in the data folder in my working directory, as I've specified in list.files(path = "./data"). The problem is that I can't use read_csv(for_variable) because I get an error:
'file_name.csv' does not exist in current working directory
And if I try to specify the path in read_csv, it doesn't understand what 'for_variable' is, it tries to find literal 'for_variable' file in the data folder. So how can I combine path and variable name in read_csv? Or is there any other way of solving the problem?
I would recommend reading this post as it is helpful for importing multiple csv files.
But to help with your specific question, your error is likely caused becauseo you need to pass the full path name for the files you want to import and that can be specified by using the full.names = TRUE argument in list.files(). Passing just the file name contained in for_variable to read_csv won't work.
list.files(path = "./data", full.names = TRUE, pattern = ".csv$")

list.files pattern for specific word and file extension

I'm trying to list all files that include a specific word ("students") and file extension (".csv") in their file name. I want to do this through pattern but I must be doing something wrong.
# (1) Create File List
pat=paste("students","*\\.csv$")
csv_files <- list.files (path = "R/win-library/Practice/Schoolprac/",
pattern = pat,
recursive = T,
full.names = T)
The file should include the word "students" and be a .csv file. What could I be doing wrong? Students doesn't need to be right before .csv nor at the beginning, just included. I'm not getting an error, just no results.
try
pattern = '.*students.*\\.csv$'
You can test regular expressions in R with this tester

How to read excel files based on partial file names

How do I read excel files just based on the first part of the file name? For example my file is "File_01_01_2019", where "File" is always the same but the date changes often, so I would want to read excel files that start with "File" in this scenario.
This should help you
library(readxl)
sapply(list.files(path = "your_path",
# regex that defines to start with "File" and ends with ".xlsx"
pattern = "^File.*\\.xlsx$",
full.names = TRUE),
read_excel)

Creating a list of files from a list of directories in R

I have a list of many directories, each of which have 5 files inside, from these files within each directory I want to select one (for example let us say the one with extension .txt) and compile a list of these .txt files....how do I create a loop that selects txt files from a list of directories in R?
You can do:
dir(path = ".", pattern = "\\.txt$", full.names = TRUE, recursive = TRUE)
Where path is the root that contains all the folders you want to look up, pattern is regular expression that matches the files you are interested in (in the example all the files with the .txt extension, full.names return the full path of the files, and recursiveto explore all the subfoders in path. This returns a vector with the full path for the files that match your query.
list.files is a vectorized function already, so you can pass the vector of directories to it, no loop needed.
my_dirs <- c("foo/bar", "foo/baz")
all_text_files <- list.files(my_dirs, pattern = "\\.txt$", full.names = TRUE)
If you want a list separating files by directory...
split(all_text_files, dirname(all_text_files))
If you have the list of directory names in dirs,
you can get the .txt files in all of them as a vector with:
files <- unlist(lapply(dirs, function(dir) list.files(path = dir, pattern = '\\.txt$')))
You can achieve the same using a loop as you asked,
but it's less elegant, and I don't recommend it:
files <- c()
for (dir in dirs) {
files <- c(files, list.files(path = dir, pattern = '\\.txt$'))
}

Resources