R function to get directory name of a file as characters - r

I can create a list of csv files in folder_A:
list1 <- dir_ls("path to folder_A")
I can define a function to add a column with filenames and combine these files into one dataframe:
read_and_save_combo <- function(fileX){
read_csv(fileX) %>%
mutate(fileX = path_file(fileX)}
combo_df <- map_df(list1, read_and_save_combo)
I want to add another column with enclosing folder name (would be the same for all files, folder_A). If I use dirname() on an individual file, I get the full parent directory path to folder_A. I only want the characters "folder_A". If I use dirname() as part of the function, I get another column but its filled with "." Less importantly, I don't know why I get the "." instead of the full path, but more importantly is there a function like path_parentfoldername, that would let me add a new column with only the name of the folder containing each file to each row of the combined dataframe?
Thanks!
Edit:
New function for clarity after answers:
read_and_save_combo <- function(fileX){
read_csv(fileX) %>%
mutate(filename = path_file(fileX), foldername = dirname(fileX) %>%
str_replace(pattern = ".*/", replacement = ""))}
This works because . is the wildcard but * modifies the meaning to 0-infinity characters, so ".*" is any character and any number of characters preceding /. Gregor said this but now I understand it.
Also, I was getting the column filled with ".", because in the function, I was reading one file, but then trying to mutate with dirname operating on the list, which is a vector with multiple elements (more than one file).

You can use dirname + basename :
list1 <- list.files('folder_A_path', full.names = TRUE)
read_and_save_combo <- function(fileX) {
readr::read_csv(fileX) %>%
dplyr::mutate(fileX = basename(dirname(fileX)))
}
combo_df <- purrr::map_df(list1, read_and_save_combo)
If your file is at the path 'Users/Downloads/FolderA/Filename.csv' :
dirname('Users/Downloads/FolderA/Filename.csv')
#[1] "Users/Downloads/FolderA"
basename(dirname('Users/Downloads/FolderA/Filename.csv'))
#[1] "FolderA"

"path to folder_A" is a bad example, use "path/to/folder_A". You need to delete everything from the start through the last /:
library(stringr)
str_replace("path/to/folder_A", pattern = ".*/", replacement = "")
# [1] "folder_A"
If you're worried about \\ or other non-standard things, use dirname() as the input.

Here are two ways to do what I wanted, using the helpful answers above:
read_and_save_combo <- function(file){
read_csv(file) %>%
mutate(filename = path_file(file), foldername = basename(dirname(file)))}
read_and_save_combo <- function(file){
read_csv(file) %>%
mutate(filename = path_file(file), foldername = dirname(file) %>%
str_replace(pattern = ".*/", replacement = ""))}
Other basic things I learned that could be helpful for other beginners:
(1) While writing the function, point all the functions (read_csv(), dirname(), etc.) at a uniform variable (here written as "file" but it could be just a letter "g" or whatever you choose). Then you will avoid the problem I had where part of the function is acting on one file and another part is acting on a list.
(2)
filex and fileX
appear far too similar to each other using certain fonts, which can mess you up (capitalization).

Related

Copy and rename Specific Files based on parent directories in R

I am attempting to solve this issue in R, but I'll upvote answers in any programming language.
I have an example vector of filenames like so called file_list
c("D:/example/sub1/session1/OD/CD/text.txt", "D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
What I'm trying to do is move and rename the text files to be based on the part of the parent directory that contains the part about sub and session. So the first file would be renamed sub2_session1_text.txtand be copied along with the other text files to just 1 new directory called all_files
I'm struggling with some of the specifics of how to rename the file. I'm trying to use substr combined with str_locate_all and paste0 to copy and rename the files based on these parent directories.
Locate the position in each element of the vector file_list to construct starting and ending position for substr
library(stringr)
ending<-str_locate_all(pattern="/OD",file_list)
starting <- str_locate_all(pattern="/sub", file_list)
I then want to somehow pull out of those lists the starting and ending position of those patterns for each element and then feed it to substr to get the naming down and then in turn use paste0 to create
What I'd like is something like
substr_naming_vector<-substr(file_list, start=starting[starting_position],stop=ending[starting_position])
but I don't know how to index the list such that it can know how to correctly index for each element the starting_position. Once I figure that out I'd fill in something like this
#paste the filenames into a vector that represents them being renamed in a new directory
all_files <- paste0("D:/all_files/", substr_naming_vector)
#rename and copy the files
file.copy(from = file_list, to = all_files)
Here's an example using regular expression, which makes it somewhat shorter:
library(stringr)
library(magrittr)
all_dirs <-
c("D:/example/sub1/session1/OD/CD/text.txt",
"D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
new_dirs <-
all_dirs %>%
# Match each group using regex
str_match_all("D:/example/(.+)/(.+)/OD/CD/(.+)") %>%
# Paste the matched groups into one path
vapply(function(x) paste0(x[2:4], collapse = "_"), character(1)) %>%
paste0("D:/all_files/", .)
# Copy them.
file.copy(all_dirs, new_dirs)
This is one way of doing it. I assumed your file is always called text.txt.
library(stringr)
my_files <- c("D:/example/sub1/session1/OD/CD/text.txt",
"D:/example/sub2/session1/OD/CD/text.txt",
"D:/example/sub3/session1/OD/CD/text.txt")
# get the sub information
subs <- str_extract(string = my_files,
pattern = "sub[0-9]")
# get the session information
sessions <- str_extract(string = my_files,
pattern = "session[0-9]")
# paste it all together
new_file_names <- paste("D:/all_files/",
paste(subs,
sessions,
"text.txt",
sep = "_"),
sep = "")
file.copy(from = my_files,
to = new_file_names)

Append files based on their names

I am new in R and I have a lot of climate data files in text format with long names in the same folder, for example, "tasmax_SAM-44_ICHEC-EC-EARTH_rcp26_r12i1p1_SMHI-RCA4_v3_day_20060101-20101231.txt" where each term separated by "_" corresponds to a characteristic like the variable, domain, institute, scenario, etc.
What I want is a code that allows me to select all the files in my folder that have the same name as model name, scenario name, gcm name and append them by rows.
What I tried is to first create a list of the files and assigned variables for each part of their name like model_name, gcm_name, etc.
And then created a condition where I compare those variables through the files with a loop.
file <- list.files ( pattern = '*.txt' )
group <- function(input){
index = which(file == input)
df=read.table(input,header=FALSE,sep="")
fname= unlist((strsplit(input,"_")),use.names=FALSE)
model_name=fname[3]
sce_name=fname[4]
gcm_name=fname[6]
m=1
for (m in 1:length(file)) {
if (model_name[m]==model_name[m+1] & sce_name[m]==sce_name[m+1] & gcm_name[m]==gcm_name[m+1]) {
data=rbind(df[m],df[m+1])
} else {}
}
}
for (i in 1:length(file)) {
group(file[i])
}
The error I had with my code is this:
Error in if (model_name[m] == model_name[m + 1] & sce_name[m] ==
sce_name[m + : missing value where TRUE/FALSE needed
In the end, the code should append files that meet the if a condition like for example making a file out of these two files:
tasmax_SAM-44_ICHEC-EC-EARTH_rcp26_r12i1p1_SMHI-RCA4_v3_day_20060101-20101231.txt
tasmax_SAM-44_ICHEC-EC-EARTH_rcp26_r12i1p1_SMHI-RCA4_v3_day_20110101-20151231.txt
Any help and suggestions are very welcome!
I would suggest a completely different approach:
Get the list of all txt files:
file <- list.files ( pattern = '*.txt' )
Read all the files into a single dataframe:
library(dplyr)
library(readr)
df <- suppressMessages(do.call(bind_rows,lapply(file, read_csv, col_names = FALSE)))
Then group_by the fields you want and write each frame into a separate csv file
df %>%
group_by(X3, X4, X6) %>%
do(write_csv(., paste(.$X3, .$X4, .$X6, ".csv", sep = "_")))
Not sure if i get your question completely but this may help:
The code works as follows
Read the values of the file you give as input.
loop over all other files and append them if they match your conditions.
The If condition checks the values of your input and then compares it with the names of file[m] now. If true, it gets appended to your data. Another fix: you have to use return(data) at the end of your function.
file <- list.files ( pattern = '*.txt' )
group <- function(input){
index = which(file == input)
data=read.table(input,header=FALSE,sep="")
fname= unlist((strsplit(input,"_")),use.names=FALSE)
model_name=fname[3]
sce_name=fname[4]
gcm_name=fname[6]
for (m in 2:length(file)) {
index = file[m]
df_new=read.table(file[m],header=FALSE,sep="")
fname= unlist((strsplit(input,"_")),use.names=FALSE)
if (model_name==fname[3] & sce_name==fname[4] & gcm_name==fname[6]) {
data=rbind(data,df_new)
} else {}
}
return(data)
}
group(file[1])
Problems which still have to be solved: You have to fix if you don't input the first file. Since this code using the file you input in your group function. But the for loop goes with the second file. So if you use group(file[3]) the first file will be skipped and the third file will be doubled. You could use something like another if condition. if(file==input){skip} (not actual syntax, just for an idea, also make sure you get your loop range correct then)

How to replace the title of columns in a merged document with the file directory using R?

I have performed an experiment under different conditions. Each of those condition has its own Folder. In each of those folders, there is a subfolder for each replicate that containts a text file called DistList.txt. This then looks like this, where the folders "C1.1", "C1.2" and so on contain the mentioned .txt files:
I have now managed to combine all those single DistList.txt files using the following script:
setwd("~/Desktop/Experiment/.")
fileList <- list.files(path = ".", recursive = TRUE, pattern = "DistList.txt", full.names = TRUE)
listData <- lapply(fileList, read.table)
names(listData) <- gsub("DistList.txt","",basename(fileList))
library(tidyverse)
library(reshape2)
bind_rows(listData, .id = "FileName") %>%
group_by(FileName) %>%
mutate(rowNum = row_number()) %>%
dcast(rowNum~FileName, value.var = "V1") %>%
select(-rowNum) %>%
write.csv(file="Result.csv")
This then yields a .csv file that has just numbers a titles (marked in red), which are not that useful for me, as shown in this picture:
I would rather like to have the directory of the "DistList.txt" files or even better only the name of the folder they are in as a title. I thought that I could do that using the function list.dirs() and colnames, but I somehow didn't manage to get it to work.
I would be very grateful, if someone could help me with this issue!
I think this line
names(listData) <- gsub("DistList.txt", "", basename(fileList))
should be:
names(listData) <- gsub("DistList.txt", "", fileList)
Because by using basename we are removing all the folders, leaving us with filename "DistList.txt", and that filename gets replaced by empty string "" using gsub.
We might actually want below instead, extract the last directory, which should give in your case something like c("C1.1", "C1.2", ...):
names(listData) <- basename(dirname(fileList))

How to insert text in specific in directory in R

I am looking for an elegant way to insert character (name) into directory and create .csv file. I found one possible solution, however I am looking another without "replacing" but "inserting" text between specific charaktects.
#lets start
df <-data.frame()
name <- c("John Johnson")
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
#how to insert "name" vector between "Desktop/" and "." to get:
dir <- c("C:/Users/uzytkownik/Desktop/John Johnson.csv")
write.csv(df, file=dir)
#???
#I found the answer but it is not very elegant in my opinion
library(qdapRegex)
dir2 <- c("C:/Users/uzytkownik/Desktop/ab.csv")
dir2<-rm_between(dir2,'a','b', replacement = name)
> dir2
[1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
write.csv(df, file=dir2)
I like sprintf syntax for "fill-in-the-blank" style string construction:
name <- c("John Johnson")
sprintf("C:/Users/uzytkownik/Desktop/%s.csv", name)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
Another option, if you can't put the %s in the directory string, is to use sub. This is replacing, but it replaces .csv with <name>.csv.
dir <- c("C:/Users/uzytkownik/Desktop/.csv")
sub(".csv", paste0(name, ".csv"), dir, fixed = TRUE)
# [1] "C:/Users/uzytkownik/Desktop/John Johnson.csv"
This should get you what you need.
dir <- "C:/Users/uzytkownik/Desktop/.csv"
name <- "joe depp"
dirsplit <- strsplit(dir,"\\/\\.")
paste0(dirsplit[[1]][1],"/",name,".",dirsplit[[1]][2])
[1] "C:/Users/uzytkownik/Desktop/joe depp.csv"
I find that paste0() is the way to go, so long as you store your directory and extension separately:
path <- "some/path/"
file <- "file"
ext <- ".csv"
write.csv(myobj, file = paste0(path, file, ext))
For those unfamiliar, paste0() is shorthand for paste( , sep="").
Let’s suppose you have list with the desired names for some data structures you want to save, for instance:
names = [“file_1”, “file_2”, “file_3”]
Now, you want to update the path in which you are going to save your files adding the name plus the extension,
path = “/Users/Documents/Test_Folder/”
extension = “.csv”
A simple way to achieve it is using paste() to create the full path as input for write.csv() inside a lapply, as follows:
lapply(names, function(x) {
write.csv(x = data,
file = paste(path, x, extension))
}
)
The good thing of this approach is you can iterate on your list which contain the names of your files and the final path will be updated automatically. One possible extension is to define a list with extensions and update the path accordingly.

For loop with file names in R

I have a list of files like:
nE_pT_sbj01_e2_2.csv,
nE_pT_sbj02_e2_2.csv,
nE_pT_sbj04_e2_2.csv,
nE_pT_sbj05_e2_2.csv,
nE_pT_sbj09_e2_2.csv,
nE_pT_sbj10_e2_2.csv
As you can see, the name of the files is the same with the exception of 'sbj' (the number of the subject) which is not consecutive.
I need to run a for loop, but I would like to retain the original number of the subject. How to do this?
I assume I need to replace length(file) with something that keeps the original number of the subject, but not sure how to do it.
setwd("/path")
file = list.files(pattern="\\.csv$")
for(i in 1:length(file)){
data=read.table(file[i],header=TRUE,sep=",",row.names=NULL)
source("functionE.R")
Output = paste("e_sbj", i, "_e2.Rdata")
save.image(Output)
}
The code above gives me as output:
e_sbj1_e2.Rdata,e_sbj2_e2.Rdata,e_sbj3_e2.Rdata,
e_sbj4_e2.Rdata,e_sbj5_e2.Rdata,e_sbj6_e2.Rdata.
Instead, I would like to obtain:
e_sbj01_e2.Rdata,e_sbj02_e2.Rdata,e_sbj04_e2.Rdata,
e_sbj05_e2.Rdata,e_sbj09_e2.Rdata,e_sbj10_e2.Rdata.
Drop the extension "csv", then add "Rdata", and use filenames in the loop, for example:
myFiles <- list.files(pattern = "\\.csv$")
for(i in myFiles){
myDf <- read.csv(i)
outputFile <- paste0(tools::file_path_sans_ext(i), ".Rdata")
outputFile <- gsub("nE_pT_", "e_", outputFile, fixed = TRUE)
save(myDf, file = outputFile)
}
Note: I changed your variable names, try to avoid using function names as a variable name.
If you use regular expressions and sprintf (or paste0), you can do it easily without a loop:
fls <- c('nE_pT_sbj01_e2_2.csv', 'nE_pT_sbj02_e2_2.csv', 'nE_pT_sbj04_e2_2.csv', 'nE_pT_sbj05_e2_2.csv', 'nE_pT_sbj09_e2_2.csv', 'nE_pT_sbj10_e2_2.csv')
sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
[1] "e_sbj01_e2.Rdata" "e_sbj02_e2.Rdata" "e_sbj04_e2.Rdata" "e_sbj05_e2.Rdata" "e_sbj09_e2.Rdata" "e_sbj10_e2.Rdata"
You can easily feed the vector to a function (if possible) or feed the function to the vector with sapply or lapply
fls_new <- sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
res <- lapply(fls_new,function(x) yourfunction(x))
If I understood correctly, you only change extension from .csv to .Rdata, remove last "_2" and change prefix from "nE_pT" to "e". If yes, this should work:
Output = sub("_2.csv", ".Rdata", sub("nE_pT, "e", file[i]))

Resources