Get files number in a dir in R? - r

in shell ,to make a dir:
mkdir /home/test
then ,to create a file named ".test" in the "/home/test"
a=list.files(path = "/home/test",include.dirs = FALSE)
a
character(0)
a=list.files(path = "/home/test",include.dirs = TRUE)
a
character(0)
a=list.files(path = "/home/test/",include.dirs = TRUE)
a
character(0)
list.files(path = '/home/test', all.files=TRUE,inclued.dirs=FALSE)
[1] "." ".." ".test"
a=list.files(path = '/home/test', all.files=TRUE)
length(a)
[1] 3
how can i get length(a) = 1 using regular expression parameters pattern= in list.files to prune . and ..

Use all.files=TRUE to show all file names including hidden files.
list.files(path = '/home/test', all.files=TRUE)
To answer your edit, one way would be to use a negative number with tail
tail(list.files(path = '/home/test', all.files=TRUE), -2)
Using only the pattern argument:
list.files(path='/home/test', all.files=TRUE, pattern="^[^\\.]|\\.[^\\.]")
The pattern says "anything that starts with something other than a dot or anything that starts with a dot followed by anything other than a dot."
Although it breaks your requirement to use the pattern argument of list.files, I would actually probably wrap grep around list.statements in this case.
grep("^\\.*\\.$", list.files(path='/home/test', all.files=TRUE),
invert=TRUE, value=TRUE)
The above will find any file names that only contain dots, then return everything else. invert=TRUE means "find the names that do not match", and value=TRUE means "return the names instead of their location."

Related

How to use wildcard in file path name for pattern matching in R?

I have a directory with my rproj and a "data" folder for all outputs. There are 40 subdirectories within the data folder containing each an "output.csv". The subdirectories have completely different names but all end with 1 or 2.
data/****1/output.csv
data/****2/output.csv
The astericks represent the varying part of the name (different number of letters), and each csv I need has the exact same name.
I need to seperately list all of the "output.csv"s into based on whether its subdirectory ends with 1 or 2, and I have been trying with the grep() function
allOutputFiles <- list.files(pattern = "output.csv", recursive = TRUE, full.names = TRUE)
files1 <- grep(pattern = "./data/1$", allOutputFiles, value = TRUE)
files2 <- grep(pattern = "./data/2$", allOutputFiles, value = TRUE)
But every time I run it, it returns character(0). If I add a '\' in front of the 1$, it returns invalid regular expression './data/\1$', reason 'Invalid back reference'
How do I properly apply wildcard to the varying file path?
We can use dirname to get the parent directory, then gsub to extract the last character of it. Then we use split to separate the filenames by this one letter.
# allOutputFiles <- c("data/****1/output.csv","data/****2/output.csv")
allOutputFiles <- list.files(pattern = "output.csv", recursive = TRUE, full.names = TRUE)
gsub(".*(.)$", "\\1", dirname(allOutputFiles))
# [1] "1" "2"
out <- split(allOutputFiles, gsub(".*(.)$", "\\1", dirname(allOutputFiles)))
out
# $`1`
# [1] "data/****1/output.csv"
# $`2`
# [1] "data/****2/output.csv"
If you want to index on that, index with out[["1"]] (while out[[1]] would conveniently work here, that's coincidence based on your choice of last-letters and should not be relied on).

R: list.files returning NA instead of file name

I am trying to retrieve the file name for the file "cg3-chem-djtayl18PSY101.txt" which exists in my working directory using the following commands.
regexName = "*chem-djtayl18*.txt"
fileName <- list.files(path = ".", pattern = regexName, ignore.case = TRUE)[1] # returning NA
However, it is returning the file name as NA. Although it is working for 100 other files in the same directory in the same manner. Why is it behaving this way?
Because your regex does not match the file name.
regexName = "*chem-djtayl18*.txt"
filename <- "cg3-chem-djtayl18PSY101.txt"
grepl(regexName, filename)
#[1] FALSE
Maybe you need the pattern.
regexName = "chem-djtayl18.*\\.txt"
grepl(regexName, filename)
#[1] TRUE

R function to get directory name of a file as characters

I can create a list of csv files in folder_A:
list1 <- dir_ls("path to folder_A")
I can define a function to add a column with filenames and combine these files into one dataframe:
read_and_save_combo <- function(fileX){
read_csv(fileX) %>%
mutate(fileX = path_file(fileX)}
combo_df <- map_df(list1, read_and_save_combo)
I want to add another column with enclosing folder name (would be the same for all files, folder_A). If I use dirname() on an individual file, I get the full parent directory path to folder_A. I only want the characters "folder_A". If I use dirname() as part of the function, I get another column but its filled with "." Less importantly, I don't know why I get the "." instead of the full path, but more importantly is there a function like path_parentfoldername, that would let me add a new column with only the name of the folder containing each file to each row of the combined dataframe?
Thanks!
Edit:
New function for clarity after answers:
read_and_save_combo <- function(fileX){
read_csv(fileX) %>%
mutate(filename = path_file(fileX), foldername = dirname(fileX) %>%
str_replace(pattern = ".*/", replacement = ""))}
This works because . is the wildcard but * modifies the meaning to 0-infinity characters, so ".*" is any character and any number of characters preceding /. Gregor said this but now I understand it.
Also, I was getting the column filled with ".", because in the function, I was reading one file, but then trying to mutate with dirname operating on the list, which is a vector with multiple elements (more than one file).
You can use dirname + basename :
list1 <- list.files('folder_A_path', full.names = TRUE)
read_and_save_combo <- function(fileX) {
readr::read_csv(fileX) %>%
dplyr::mutate(fileX = basename(dirname(fileX)))
}
combo_df <- purrr::map_df(list1, read_and_save_combo)
If your file is at the path 'Users/Downloads/FolderA/Filename.csv' :
dirname('Users/Downloads/FolderA/Filename.csv')
#[1] "Users/Downloads/FolderA"
basename(dirname('Users/Downloads/FolderA/Filename.csv'))
#[1] "FolderA"
"path to folder_A" is a bad example, use "path/to/folder_A". You need to delete everything from the start through the last /:
library(stringr)
str_replace("path/to/folder_A", pattern = ".*/", replacement = "")
# [1] "folder_A"
If you're worried about \\ or other non-standard things, use dirname() as the input.
Here are two ways to do what I wanted, using the helpful answers above:
read_and_save_combo <- function(file){
read_csv(file) %>%
mutate(filename = path_file(file), foldername = basename(dirname(file)))}
read_and_save_combo <- function(file){
read_csv(file) %>%
mutate(filename = path_file(file), foldername = dirname(file) %>%
str_replace(pattern = ".*/", replacement = ""))}
Other basic things I learned that could be helpful for other beginners:
(1) While writing the function, point all the functions (read_csv(), dirname(), etc.) at a uniform variable (here written as "file" but it could be just a letter "g" or whatever you choose). Then you will avoid the problem I had where part of the function is acting on one file and another part is acting on a list.
(2)
filex and fileX
appear far too similar to each other using certain fonts, which can mess you up (capitalization).

Using a function within list.files function in r

I want to create a program where I select files with a user defined prefix in list.files()
My folder will have files beginning with various characters. I want to define a variable or function at the beginning of the program which I can use in list.files in the program
List of file
MP201901 MP201902 MP201903 SG201901 SG201902 SG201903 XY201901 XY202001 XY202002
If I use
inpfiles1 <- list.files(path =Input, pattern = "*SG.*.csv", full.names = TRUE)
it gives correct output but I want to store the prefix somewhere so we can just change the prefix
Currently using code
A<-"SG"
inpfiles2 <- list.files(path =Input, pattern = "*A*.*.csv", full.names = TRUE)
but this is giving empty result
With your current code, R doesn't know that A is a variable name, and so it's ignoring your variable and literally using the letter A.
You can use paste0 instead:
A <- "SG"
pattern <- paste0(A, '.*.csv')
You have to concatenate the user-inputted pattern in A with your own suffix. I.e.
A <- "SG"
pattern <- paste0(A, ".*.csv")
inpfiles2 <- list.files(path=Input, pattern=pattern, full.names=TRUE)

search the directory to get the filename in R

There is a file named haha in C:\test, haha contains character look for me,in linux ,i can search to get the filename.
find / -name "look for me"
can i search the file with some kind of R command in xp os?
if i don't know the file name which contain character look for me is haha,how can i do then ?
or with plyr:
require(plyr) # uses plyr
textFiles<-list.files(pattern=".txt") # only looks at .txt file, you can change or omit
#alply reads each file and returns
# a list of filenames which pass the grep test
# and indicate the first line identified
mylist<-alply(textFiles,
1,
function(f){fline<-grep("LOOK FOR ME",readLines(f))
ifelse(fline>0,paste(f,fline,sep=" - line:"),NULL)
})
Filter(is.character,mylist) # gives you a list of all files containing the term
This code wll find a filename with the phrase 'haha' inside of it. And then check if the string "look for me" occurs anywhere within it. Is that what you want?
whichfile <- grep(
x = list.files(),
pattern = "haha",
value = TRUE
)
sum(
grepl(
x = readLines(whichfile),
pattern = 'look for me')
)

Resources