Using a function within list.files function in r - r

I want to create a program where I select files with a user defined prefix in list.files()
My folder will have files beginning with various characters. I want to define a variable or function at the beginning of the program which I can use in list.files in the program
List of file
MP201901 MP201902 MP201903 SG201901 SG201902 SG201903 XY201901 XY202001 XY202002
If I use
inpfiles1 <- list.files(path =Input, pattern = "*SG.*.csv", full.names = TRUE)
it gives correct output but I want to store the prefix somewhere so we can just change the prefix
Currently using code
A<-"SG"
inpfiles2 <- list.files(path =Input, pattern = "*A*.*.csv", full.names = TRUE)
but this is giving empty result

With your current code, R doesn't know that A is a variable name, and so it's ignoring your variable and literally using the letter A.
You can use paste0 instead:
A <- "SG"
pattern <- paste0(A, '.*.csv')

You have to concatenate the user-inputted pattern in A with your own suffix. I.e.
A <- "SG"
pattern <- paste0(A, ".*.csv")
inpfiles2 <- list.files(path=Input, pattern=pattern, full.names=TRUE)

Related

Copy and rename Specific Files based on parent directories in R

I am attempting to solve this issue in R, but I'll upvote answers in any programming language.
I have an example vector of filenames like so called file_list
c("D:/example/sub1/session1/OD/CD/text.txt", "D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
What I'm trying to do is move and rename the text files to be based on the part of the parent directory that contains the part about sub and session. So the first file would be renamed sub2_session1_text.txtand be copied along with the other text files to just 1 new directory called all_files
I'm struggling with some of the specifics of how to rename the file. I'm trying to use substr combined with str_locate_all and paste0 to copy and rename the files based on these parent directories.
Locate the position in each element of the vector file_list to construct starting and ending position for substr
library(stringr)
ending<-str_locate_all(pattern="/OD",file_list)
starting <- str_locate_all(pattern="/sub", file_list)
I then want to somehow pull out of those lists the starting and ending position of those patterns for each element and then feed it to substr to get the naming down and then in turn use paste0 to create
What I'd like is something like
substr_naming_vector<-substr(file_list, start=starting[starting_position],stop=ending[starting_position])
but I don't know how to index the list such that it can know how to correctly index for each element the starting_position. Once I figure that out I'd fill in something like this
#paste the filenames into a vector that represents them being renamed in a new directory
all_files <- paste0("D:/all_files/", substr_naming_vector)
#rename and copy the files
file.copy(from = file_list, to = all_files)
Here's an example using regular expression, which makes it somewhat shorter:
library(stringr)
library(magrittr)
all_dirs <-
c("D:/example/sub1/session1/OD/CD/text.txt",
"D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
new_dirs <-
all_dirs %>%
# Match each group using regex
str_match_all("D:/example/(.+)/(.+)/OD/CD/(.+)") %>%
# Paste the matched groups into one path
vapply(function(x) paste0(x[2:4], collapse = "_"), character(1)) %>%
paste0("D:/all_files/", .)
# Copy them.
file.copy(all_dirs, new_dirs)
This is one way of doing it. I assumed your file is always called text.txt.
library(stringr)
my_files <- c("D:/example/sub1/session1/OD/CD/text.txt",
"D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
# get the sub information
subs <- str_extract(string = my_files,
pattern = "sub[0-9]")
# get the session information
sessions <- str_extract(string = my_files,
pattern = "session[0-9]")
# paste it all together
new_file_names <- paste("D:/all_files/",
paste(subs,
sessions,
"text.txt",
sep = "_"),
sep = "")
file.copy(from = my_files,
to = new_file_names)

Appending a list in a loop (R)

I want to use a loop to read in multiple csv files and append a list in R.
path = "~/path/to/csv/"
file.names <- dir(path, pattern =".csv")
mylist=c()
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i],header=TRUE, sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[ ,6]
finallist <- append(mylist, listtmp)
}
finallist
For each csv file, the desired column has a different length.
In the end, I want to get the full appended list with all values in that certain column from all csv files.
I am fairly new to R, so I am not sure what I'm missing...
There are four errors in your approach.
First, file.names <- dir(path, pattern =".csv") will extract just file names, without path. So, when you try to import then, read.csv() doesn't find.
Building the path
You can build the right path including paste0():
path = "~/path/to/csv/"
file.names <- paste0(path, dir(path, pattern =".csv"))
Or file.path(), which add slashes automaticaly.
path = "~/path/to/csv"
file.names <- file.path(path, dir(path, pattern =".csv"))
And another way to create the path, for me more efficient, is that suggested in the answer commented by Tung.
file.names <- list.files(path = "~/path/to/csv", recursive = TRUE,
pattern = "\\.csv$", full.names = TRUE)
This is better because in addition to being all in one step, you can use within a directory containing multiple files of various formats. The code above will match all .csv files in the folder.
Importing, selecting and creating the list
The second error is in mylist <- c(). You want a list, but this creates a vector. So, the correct is:
mylist <- list()
And the last error is inside the loop. Instead of create other list when appending, use the same object created before the loop:
for(i in 1:length(file.names)){
datatmp <- read.csv(file.names[i], sep=";", stringsAsFactors=FALSE)
listtmp = datatmp[, 6]
mylist <- append(mylist, list(listtmp))
}
mylist
Another approach, easier and cleaner, is looping with lapply(). Just this:
mylist <- lapply(file.names, function(x) {
df <- read.csv(x, sep = ";", stringsAsFactors = FALSE)
df[, 6]
})
Hope it helps!

How to insert text in specific in directory in R

I am looking for an elegant way to insert character (name) into directory and create .csv file. I found one possible solution, however I am looking another without "replacing" but "inserting" text between specific charaktects.
#lets start
df <-data.frame()
name <- c("John Johnson")
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
#how to insert "name" vector between "Desktop/" and "." to get:
dir <- c("C:/Users/uzytkownik/Desktop/John Johnson.csv")
write.csv(df, file=dir)
#???
#I found the answer but it is not very elegant in my opinion
library(qdapRegex)
dir2 <- c("C:/Users/uzytkownik/Desktop/ab.csv")
dir2<-rm_between(dir2,'a','b', replacement = name)
> dir2
[1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
write.csv(df, file=dir2)
I like sprintf syntax for "fill-in-the-blank" style string construction:
name <- c("John Johnson")
sprintf("C:/Users/uzytkownik/Desktop/%s.csv", name)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
Another option, if you can't put the %s in the directory string, is to use sub. This is replacing, but it replaces .csv with <name>.csv.
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
sub(".csv", paste0(name, ".csv"), dir, fixed = TRUE)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
This should get you what you need.
dir <- "C:/Users/uzytkownik/Desktop/.csv"
name <- "joe depp"
dirsplit <- strsplit(dir,"\\/\\.")
paste0(dirsplit[[1]][1],"/",name,".",dirsplit[[1]][2])
[1] "C:/Users/uzytkownik/Desktop/joe depp.csv"
I find that paste0() is the way to go, so long as you store your directory and extension separately:
path <- "some/path/"
file <- "file"
ext <- ".csv"
write.csv(myobj, file = paste0(path, file, ext))
For those unfamiliar, paste0() is shorthand for paste( , sep="").
Let’s suppose you have list with the desired names for some data structures you want to save, for instance:
names = [“file_1”, “file_2”, “file_3”]
Now, you want to update the path in which you are going to save your files adding the name plus the extension,
path = “/Users/Documents/Test_Folder/”
extension = “.csv”
A simple way to achieve it is using paste() to create the full path as input for write.csv() inside a lapply, as follows:
lapply(names, function(x) {
write.csv(x = data,
file = paste(path, x, extension))
}
)
The good thing of this approach is you can iterate on your list which contain the names of your files and the final path will be updated automatically. One possible extension is to define a list with extensions and update the path accordingly.

exifr doesn't extract information from photoes

I have some pictures with longitude/latitude information. R finds them with the command list.files, but when I use exifr(files) it returns a dataset with 1 column and 0 observations. What am I doing wrong?
files <- list.files(path = "C:/Users/user1/Downloads/pictures", pattern = "*.jpg")
dat <- exifr(files)
I tried your code on my machine and got the same result. You need the full paths to the pictures. list.files, as you have called it, will return just the file name, e.g. photo.jpg. If the photos are not in R's working directory, exifr() will not read them. What you need to add to list.files is full.names = TRUE:
files <- list.files(path = "C:/Users/user1/Downloads/pictures", pattern = "*.jpg",
full.names = TRUE)
dat <- exifr(files)

How to extract part of a file name to a text file?

I have several files in a directory data.
these files are named like this:
file_file_sd_daf_800_800_log-(3-got)_20100101_20121012
All files share all parts of the name but differ with part sd.
I want to extract only this part of file name as one column and write out to text file .
I list all file like this:
dir<- list.files("C:\\data", "*.txt", full.names = TRUE)
okay, this should work (using regular expressions):
dir_ <- list.files("C:\\data", "*.txt", full.names = TRUE)
tmp <- regmatches(dir_, regexec("file_file_(.+)_daf.+", dir_))
sapply(tmp, "[", 2)
a little test with your example:
x <- "file_file_sd_daf_800_800_log-(3-got)_20100101_20121012"
regmatches(x, regexec("file_file_(.+)_daf.+", x))[[1]][2]
# [1] "sd"
you can then write the different bits you get to a file by using write.

Resources