Split files present in different Folders in R

Split files present in different Folders in R - r

I have a folder that contain many folders and each folder contains one csv file. I want to split each file on the basis of CN into its own folder.This is the position of files:
home -> folder -> f_5324 -> f_5324.csv
-> f_5674 -> f_5674.csv
-> f_8769 -> f_8769.csv and so on
I want to write a code that will take first folder(f_5324) read csv file then split that file and save in that folder(f_5324) then take second folder(f_5674) read csv file then split and save in that folder(f_5674) then will do the same with all folders.
This is my code in R :-
dir <- "/home/folder"
my_dirs <- list.dirs(dir, recursive = FALSE)
for(i in my_dirs){
a <- list.files(path = i, full.names = TRUE, recursive = TRUE)
df <- read.csv(a)
a0 <- df[df$CN=="cn=0",]
a1 <- df[df$CN=="cn=1",]
a3 <- df[df$CN=="cn=3",]
a4 <- df[df$CN=="cn=4",]
write.csv(a0,"cn0.csv")
write.csv(a1,"cn1.csv")
write.csv(a3,"cn3.csv")
write.csv(a4,"cn4.csv")
}
I am trying hard but it's not working properly it splits the file but creates one file for cn0,cn1,cn3,cn4 and overwrite all results. please tell me how to pass path to each folder and get separate result for all csv file in there own folder.
Help will appreciated

Use -
dir <- "/home/folder"
my_dirs <- list.dirs(dir, recursive = FALSE)
for(i in my_dirs){
a <- list.files(path = i, full.names = TRUE, recursive = TRUE)
df <- read.csv(a)
a0 <- df[df$CN=="cn=0",]
a1 <- df[df$CN=="cn=1",]
a3 <- df[df$CN=="cn=3",]
a4 <- df[df$CN=="cn=4",]
write.csv(a0,paste(i,"cn0.csv",sep="/"))
write.csv(a1,paste(i,"cn1.csv",sep="/"))
write.csv(a3,paste(i,"cn3.csv",sep="/"))
write.csv(a4,paste(i,"cn4.csv",sep="/"))
}
Explanation
In your initial implementation, write.csv(a0,"cn0.csv") implies you are writing a csv file named cn0.csv to your present working directory.
Next time the loop comes, it is just overriding the existing file again and again.
To avoid this, you need to specify the directory correctly for each csv write, which has been done by modifying to write.csv(a0,paste(i,"cn0.csv",sep="/")), which populates the correct target directory.

Related

Read multiple “.xlsx” files

I am trying to read multiple excel files under different folders by R
Here is my solution:
setwd("D:/data")
filename <- list.files(getwd(),full.names = TRUE)
# Four folders "epdata1" "epdata2" "epdata3" "epdata4" were inside the folder "data"
dataname <- list.files(filename,pattern="*.xlsx$",full.names = TRUE)
# Every folder in the folder "data" contains five excel files
datalist <- lapply(dataname,read_xlsx)
Error: `path` does not exist:'D:/data/epidata1/出院舱随访1.xlsx'
But read_xlsx was successfully run
read_xlsx("D:/data/epidata1/出院舱随访1.xlsx")
All file directories are available in the "data" folder and why R fails to read those excel file?
Your help will much appreciated!

I dont see any point why your code shouldnt work. Make sure your folder names are correct. In your comments you write "epdata1" and your error says "epidata1".
I tried it with some csv and mixed xlsx files.
This is again what i would come up with, to find the error/typo:
library(readxl)
pp <- function(...){print(paste(...))}
main <- function(){
# finding / setting up data main folder
# You may change this to your needs
main_dir <- paste0(getwd(),"/data/")
pp("working directory:",dir_data)
pp("Found following folders:")
pp(list.files(main_dir,full.names = FALSE))
data_folders <- list.files(dir_data,full.names = TRUE)
pp("Found these files in folders:",list.files(data_folders,full.names = TRUE))
pp("Filtering *.xlsx files",list.files(data_folders,pattern="*.xlsx$",full.names = TRUE))
files <- list.files(data_folders,pattern="\\.xlsx$",full.names = TRUE)
datalist <- lapply(files,read_xlsx)
print(datalist)
}
main()

How to save multiples files as .txt from a list of data frames

I want to use a defined function morphon an entire folder /folder of .txt files and afterwards save all the new files to another folder /folder2.
How can I use lapply to cycle through every file and return another file that can be saved to the new folder? If possible retaining the old name as well!
Currently a list of multiple data.frames is created, none being named.
Code is something I came up with, but doesn't work as intended.
old.files <- list.files(path="/Users/F/folder/", pattern="*.txt", full.names=TRUE, recursive=FALSE)
new.files <- paste0("/Users/F/folder2/New_Profile_",1:length(old.files),".txt")
morph <- function (x){
tx0 <- read.table(x,row.names = NULL,col.names= c("Time","Stage"), skip=7, stringsAsFactors = F)
tx1<- as.data.frame(ifelse(tx0 == "Wach",0,
ifelse(tx0=="N1",2,
ifelse(tx0=="N2",1,
ifelse(tx0=="N3",3,
ifelse(tx0=="N4",4,
ifelse(tx0=="Rem",5,
ifelse(tx0=="Bewegung",0,
ifelse(tx0=="A",0,
tx0[tx0==""]<-tx0$Time)))))))))
mutate(tx1, Epoch=1:n())
}
files.list <- as.list(lapply (old.files, morph))
file.copy(from = files.list,to=new.files)

If we need to save the files
out <- lapply(old.files, morph)
lapply(seq_along(out), function(i) write.table(out[[i]], file = new.files[i]))

Loop for directories and files in R

I have a question about loop for my script. I have different directories and in each of them there are different files. My script analizes two files at the same time. I have for example file1, file2, file 3 and file 4. My script works analizing in couple file1-file2, then I have to do file1-file3, file1-file4, and file2-file3 and so on. So I have to analize each file with all the others without repetition. I was doing something like
dirs <- list.dirs()
for (d in dirs) {
files <- list.files()
a <- read.table("file1") ##what I have to write here?
b <- read.table("file2") ## and here?
dm <- dist(a, b , method = "euclidean")
write.table(dm, "file1-file2.csv")
}
My question is about calling file1 and file2 (and others too) after listed them. The format name file is like "1abc_A.txt"
thank you :)

Try the following :
Use lapply to loop over each directory.
For each directory get all text filenames using list.files.
Create every combination of 2 text files and apply the dist function to every 2 files.
Write the output. If the directory is called A and filenames are f1 and f2. It should write a file called A_f1_f2.csv in the working directory.
dirs <- list.dirs()
lapply(dirs, function(y) {
files <- list.files(y, pattern = '\\.txt')
if(length(files) < 2) return(NULL)
combn(files, 2, function(x) {
a <- read.table(x[1])
b <- read.table(x[2])
dm <- dist(a, b , method = "euclidean")
write.table(dm, sprintf('%s_%s.csv', basename(y), paste0(x, collapse = '_')))
}, simplify = FALSE)
})

R rename files keeping part of original name

I'm trying to rename all files in a folder (about 7,000 files) files with just a portion of their original name.
The initial fip code is a 4 or 5 digit code that identifies counties, and is different for every file in the folder. The rest of the name in the original files is the state_county_lat_lon of every file.
For example:
Original name:
"5081_Illinois_Jefferson_-88.9255_38.3024_-88.75_38.25.wth"
"7083_Illinois_Jersey_-90.3424_39.0953_-90.25_39.25.wth"
"11085_Illinois_Jo_Daviess_-90.196_42.3686_-90.25_42.25.wth"
"13087_Illinois_Johnson_-88.8788_37.4559_-88.75_37.25.wth"
"17089_Illinois_Kane_-88.4342_41.9418_-88.25_41.75.wth"
And I need it to rename with just the initial code (fips):
"5081.wth"
"7083.wth"
"11085.wth"
"13087.wth"
"17089.wth"
I've tried by using the list.files and file.rename functions, but I do not know how to identify the code name out of he full name. Some kind of a "wildcard" could work, but don't know how to apply those properly because they all have the same pattern but differ in content.
This is what I've tried this far:
setwd("C:/Users/xxx")
Files <- list.files(path = "C:/Users/xxx", pattern = "fips_*.wth" all.files = TRUE)
newName <- paste("fips",".wth", sep = "")
for (x in length(Files)) {
file.rename(nFiles,newName)}
I've also tried with the "sub" function as follows:
setwd("C:/Users/xxxx")
Files <- list.files(path = "C:/Users/xxxx", all.files = TRUE)
for (x in length(Files)) {
sub("_*", ".wth", Files)}
but get Error in as.character(x) :
cannot coerce type 'closure' to vector of type 'character'
OR
setwd("C:/Users/xxxx")
Files <- list.files(path = "C:/Users/xxxx", all.files = TRUE)
for (x in length(Files)) {
sub("^(\\d+)_.*", "\\1.wth", file)}
Which runs without errors but does nothing to the names in the file.
I could use any help.
Thanks

Here is my example.
Preparation for data to use;
dir.create("test_dir")
data_sets <- c("5081_Illinois_Jefferson_-88.9255_38.3024_-88.75_38.25.wth",
"7083_Illinois_Jersey_-90.3424_39.0953_-90.25_39.25.wth",
"11085_Illinois_Jo_Daviess_-90.196_42.3686_-90.25_42.25.wth",
"13087_Illinois_Johnson_-88.8788_37.4559_-88.75_37.25.wth",
"17089_Illinois_Kane_-88.4342_41.9418_-88.25_41.75.wth")
setwd("test_dir")
file.create(data_sets)
Rename the files;
Files <- list.files(all.files = TRUE, pattern = ".wth")
newName <- sub("^(\\d+)_.*", "\\1.wth", Files)
file.rename(Files, newName)

R script to read / execute many .csv file from command line and write all the results in a .csv file

I have many .csv files in a folder. I want to get the binning result from each of the .csv file one by one automatically by R scripting from command line, and one by one write the result of all files into result.csv file. For example, I have file01.csv, file02.csv, file03.csv, file04.csv, file05.csv. I want that first R script will read / execute file01.csv and write the result into result.csv file, then read / execute file02.csv and write result into result.csv, again read / execute file03.csv and write result into result.csv, and so on. This is like a loop on all the files, and I want to execute the R script from the command line.
Here is my starting R script:
data <- read.table("file01.csv",sep=",",header = T)
df.train <- data.frame(data)
library(smbinning)　# Install if necessary
<p>#Analysis by dwell:</p>
df.train_amp <-
rbind(df.train)
res.bin <- smbinning(df=df.train_amp, y="cvflg",x="dwell")
res.bin #Result
<p># Analysis by pv</p>
df.train_amp <-
rbind(df.train)
res.bin <- smbinning(df=df.train_amp, y="cvflg",x="pv")
res.bin #Result
Any suggestion and support would be appreciated highly.
Thank

Firstly you will want to read in the files from your directory. Place all of your source files in the same source directory. I am assuming here that your CSV files all have the same shape. Also, I am doing nothing about headers here.
directory <- "C://temp" ## for example
filenames <- list.files(directory, pattern = "*.csv", full.names = TRUE)
# If you need full paths then change the above to
# filenames <- list.files(directory, pattern = "*.csv", full.names = TRUE)
bigDF <- data.frame()
for (f in 1:length(filenames)){
tmp <- read.csv(paste(filenames[f]), stringsAsFactors = FALSE)
bigDF <- rbind(bigDF, tmp)
}
This will add the rows in tmp to bigDF for each read, and should result in final bigDF.
To write the df to a csv is trivial in R as well. Anything like
# Write to a file, suppress row names
write.csv(bigDF, "myData.csv", row.names=FALSE)
# Same, except that instead of "NA", output blank cells
write.csv(bigDF, "myData.csv", row.names=FALSE, na="")
# Use tabs, suppress row names and column names
write.table(bigDF, "myData.csv", sep="\t", row.names=FALSE, col.names=FALSE)

Finally I find the above problem can be solved as follows:
library(smbinning)　#Install if necessary。
files <- list.files(pattern = ".csv") ## creates a vector with all files names in your folder
cutpoint <- rep(0,length(files))
for(i in 1:length(files)){
data <- read.csv(files[i],header=T)
df.train <- data.frame(data)
df.train_amp <- rbind(df.train,df.train,df.train,df.train,df.train,df.train,df.train,df.train) # Just to multiply the data
cutpoint[i] <- smbinning(df=df.train_amp, y="cvflg",x="dwell") # smbinning is calculated here
}
result <- cbind(files,cutpoint) # Produce details results
result <- cbind(files,bands) # Produce bands results
write.csv(result,"result_dwell.csv") # write result into csv file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Split files present in different Folders in R - r

Related

Read multiple “.xlsx” files

How to save multiples files as .txt from a list of data frames

Loop for directories and files in R

R rename files keeping part of original name

R script to read / execute many .csv file from command line and write all the results in a .csv file

Categories

Resources