I have a question about loop for my script. I have different directories and in each of them there are different files. My script analizes two files at the same time. I have for example file1, file2, file 3 and file 4. My script works analizing in couple file1-file2, then I have to do file1-file3, file1-file4, and file2-file3 and so on. So I have to analize each file with all the others without repetition. I was doing something like
dirs <- list.dirs()
for (d in dirs) {
files <- list.files()
a <- read.table("file1") ##what I have to write here?
b <- read.table("file2") ## and here?
dm <- dist(a, b , method = "euclidean")
write.table(dm, "file1-file2.csv")
}
My question is about calling file1 and file2 (and others too) after listed them. The format name file is like "1abc_A.txt"
thank you :)
Try the following :
Use lapply to loop over each directory.
For each directory get all text filenames using list.files.
Create every combination of 2 text files and apply the dist function to every 2 files.
Write the output. If the directory is called A and filenames are f1 and f2. It should write a file called A_f1_f2.csv in the working directory.
dirs <- list.dirs()
lapply(dirs, function(y) {
files <- list.files(y, pattern = '\\.txt')
if(length(files) < 2) return(NULL)
combn(files, 2, function(x) {
a <- read.table(x[1])
b <- read.table(x[2])
dm <- dist(a, b , method = "euclidean")
write.table(dm, sprintf('%s_%s.csv', basename(y), paste0(x, collapse = '_')))
}, simplify = FALSE)
})
Related
I want to use a defined function morphon an entire folder /folder of .txt files and afterwards save all the new files to another folder /folder2.
How can I use lapply to cycle through every file and return another file that can be saved to the new folder? If possible retaining the old name as well!
Currently a list of multiple data.frames is created, none being named.
Code is something I came up with, but doesn't work as intended.
old.files <- list.files(path="/Users/F/folder/", pattern="*.txt", full.names=TRUE, recursive=FALSE)
new.files <- paste0("/Users/F/folder2/New_Profile_",1:length(old.files),".txt")
morph <- function (x){
tx0 <- read.table(x,row.names = NULL,col.names= c("Time","Stage"), skip=7, stringsAsFactors = F)
tx1<- as.data.frame(ifelse(tx0 == "Wach",0,
ifelse(tx0=="N1",2,
ifelse(tx0=="N2",1,
ifelse(tx0=="N3",3,
ifelse(tx0=="N4",4,
ifelse(tx0=="Rem",5,
ifelse(tx0=="Bewegung",0,
ifelse(tx0=="A",0,
tx0[tx0==""]<-tx0$Time)))))))))
mutate(tx1, Epoch=1:n())
}
files.list <- as.list(lapply (old.files, morph))
file.copy(from = files.list,to=new.files)
If we need to save the files
out <- lapply(old.files, morph)
lapply(seq_along(out), function(i) write.table(out[[i]], file = new.files[i]))
I have multiple csv files in different folders i.e. Main folder contains week 1 and week 2 folder. Week1 in turns contains file1.csv and week2 contains file2.csv. All files have same column name. There are 100's of such files in different directories
file1 <- data.frame(name = c("Bill","Tom"),Age = c(23,45))
file2 <- data.frame(name = c("Harry","John"),Age = c(34,56))
How can I load them and do a rbind in r and get them in a final data frame
I got some clue here: How can I read multiple files from multiple directories into R for processing?
what I did is slight modification to the function to do row bind as follows but nowhere near to what I want
# Creating a function to process each file
empty_df <- data.frame()
processFile <- function(f) {
df <- read.csv(f)
rbind(empty_df,df)
}
# Find all .csv files
files <- dir("/foo/bar/", recursive=TRUE, full.names=TRUE, pattern="\\.csv$")
# Apply the function to all files.
result <- sapply(files, processFile)
Any help is greatly appreciated!
I'd have tried to do something with a for loop on my side such as
temp = read.csv('week1/file1.csv')
for(i in 2:n){ #n being the number of weeks you have
temp= rbind(temp, read.csv(paste('week',i,'/file',i,'.csv', sep='')))
}
I hope it helped
How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}
I have a folder that contain many folders and each folder contains one csv file. I want to split each file on the basis of CN into its own folder.This is the position of files:
home -> folder -> f_5324 -> f_5324.csv
-> f_5674 -> f_5674.csv
-> f_8769 -> f_8769.csv and so on
I want to write a code that will take first folder(f_5324) read csv file then split that file and save in that folder(f_5324) then take second folder(f_5674) read csv file then split and save in that folder(f_5674) then will do the same with all folders.
This is my code in R :-
dir <- "/home/folder"
my_dirs <- list.dirs(dir, recursive = FALSE)
for(i in my_dirs){
a <- list.files(path = i, full.names = TRUE, recursive = TRUE)
df <- read.csv(a)
a0 <- df[df$CN=="cn=0",]
a1 <- df[df$CN=="cn=1",]
a3 <- df[df$CN=="cn=3",]
a4 <- df[df$CN=="cn=4",]
write.csv(a0,"cn0.csv")
write.csv(a1,"cn1.csv")
write.csv(a3,"cn3.csv")
write.csv(a4,"cn4.csv")
}
I am trying hard but it's not working properly it splits the file but creates one file for cn0,cn1,cn3,cn4 and overwrite all results. please tell me how to pass path to each folder and get separate result for all csv file in there own folder.
Help will appreciated
Use -
dir <- "/home/folder"
my_dirs <- list.dirs(dir, recursive = FALSE)
for(i in my_dirs){
a <- list.files(path = i, full.names = TRUE, recursive = TRUE)
df <- read.csv(a)
a0 <- df[df$CN=="cn=0",]
a1 <- df[df$CN=="cn=1",]
a3 <- df[df$CN=="cn=3",]
a4 <- df[df$CN=="cn=4",]
write.csv(a0,paste(i,"cn0.csv",sep="/"))
write.csv(a1,paste(i,"cn1.csv",sep="/"))
write.csv(a3,paste(i,"cn3.csv",sep="/"))
write.csv(a4,paste(i,"cn4.csv",sep="/"))
}
Explanation
In your initial implementation, write.csv(a0,"cn0.csv") implies you are writing a csv file named cn0.csv to your present working directory.
Next time the loop comes, it is just overriding the existing file again and again.
To avoid this, you need to specify the directory correctly for each csv write, which has been done by modifying to write.csv(a0,paste(i,"cn0.csv",sep="/")), which populates the correct target directory.
I have directory with a list of folders which contains a folder named "ABC" . This "ABC" has '.xlsm' files. I want to use a R code to read '.xlsm' files in the folder "ABC", which under different folders in a directory.
Thank you for your help
If you already know the paths to each file, then simply use read_excel from the readxl package:
library(readxl)
mydata <- read_excel("ABC/myfile.xlsm")
If you first need to get the paths to each file, you can use a system command (I'm on Ubuntu 18.04) to find all of the paths and store them in a vector. You can then import them one at a time:
myshellcommand <- "find /path/to/top/directory -path '*/ABC/*' -type d"
mypaths <- system(command = myshellcommand, intern = TRUE)
Because of your directory requirements, one method for finding all of the files can be a double list.files:
ld <- list.files(pattern="^ABC$", include.dirs=TRUE, recursive=TRUE, full.names=TRUE)
lf <- list.files(ld, pattern="\\.xlsm$", ignore.case=TRUE, recursive=TRUE, full.names=TRUE)
To read them all into a list (good ref for dealing with a list-of-frames: http://stackoverflow.com/a/24376207/3358272):
lstdf <- sapply(lf, read_excel, simplify=FALSE)
This defaults to opening the first sheet in each workbook. Other options in readxl::read_excel that might be useful: sheet=, range=, skip=, n_max=.
Given a list of *.xlsm files in your working directory you can do the following:
list.files(
path = getwd(),
pattern = glob2rx(pattern = "*.xlsm"),
full.names = TRUE,
recursive = TRUE
) -> files_to_read
lst_dta <- lapply(
X = files_to_read,
FUN = function(x) {
cat("Reading:", x, fill = TRUE)
openxlsx::read.xlsx(xlsxFile = x)
}
)
Results
Given two files, each with two columns A, B and C, D the generated list corresponds to:
>> lst_dta
[[1]]
C D
1 3 4
[[2]]
A B
1 1 2
Notes
This will read all .xlsm files found in the directory tree starting from getwd().
openxlsx is efficient due to the use of Rcpp. If you are going to be handling a substantial amount of MS Excel files this package is worth exploring, IMHO.
Edit
As pointed out by #r2evans in comments, you may want to read *.xlsm files that reside only within ABC folder ignoring *.xlsm files outside the ABC folder. You could filter your files vector in the following manner:
grep(pattern = "ABC", x = files_to_read, value = TRUE)
Unlikely, if you have *.xlsm files that have ABC string in names and are saved outside ABC folder you may get extra matches.