I have a directory containing 70 other sub-directories with different CSV files. CSV files in each directory look like this Modified2-3.csv, added2_3.csv, Retired4_5.csv. My end result is to join all CSV starting with the name Modified but before that How can I loop through all subdirectories selecting only files starting with modified
I have tried this method but it says the character is zero
list.files(pattern = "^Modified.*name.csv")
I do want my result is a list of modified CSV looking like this Modified2_3.csv, Modified3_4.csv,Modified7_8.csv
You should be able to go through them without a loop with the use of list.files()'s recursive argument.
list.files(pattern = "^Modified", recursive=TRUE)
Related
The main idea is that I have two folders/paths now in my local machine. In each folder, I have multiple csv files files I want to read into my R. However, instead of appending them all together into one files I want all folder1 files being in file1 and all folder2 files being in file2. I only know how to append them all together, but not know how to append them into two separate files. Below are my code so far.
dirs<-list("path/folder1","path/folder2")
data<-list()
for(dir in dirs){
##read in the list of files in each folder
flist<-list.files(path=dir,pattern = "\\.csv$")
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on",file)
indata<-fread(paste0(dir,file))
data<-rbind(data,indata)}
}
So far, I think the data keeps everything into one file. so How do I do to make it save them into two different files?
The quickest option I can think of is to try using data[[dir]] to make each directory's data its own object in the data list. Then you can access them with data$`path1` etc.
dirs<-list("path/folder1","path/folder2")
data<-list()
for(dir in dirs){
##read in the list of files in each folder
flist<-list.files(path=dir,pattern = "\\.csv$")
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on",file)
indata<-fread(paste0(dir,file))
data[[dir]]<-rbind(data[[dir]],indata)}
}
(However, it might be much nicer (and faster) to use lapply instead of for loops)
You could assign your read in files into new R objects named by your folder number. I changed list() to c() for dirs for easier assignment with assign(). And moved the data <- list() into the first loop so it gets overwritten after each folder is completed.
dirs<-c("path/folder1","path/folder2")
for(dir in 1:length(dirs)){
##read in the list of files in each folder
flist<-list.files(path=dirs[dir], pattern = "\\.csv$")
data <- list()
## a second for loop to read thru what's inside each folder
for (file in flist){message("working on", file)
indata<-read.csv(paste0(dirs[dir],"/",file))
data<-rbind(data,indata)
assign(paste0("data_",dir), data)
}
}
I have a directory which contains different subfolders and other files. I need to access each subfolder, read the .tsv file and carry out the following rscript. How to loop this rscript and run it from the terminal?
for(i in my_files){
s <- read.csv('abundance.tsv',sep = '\t')
colnames(compare)[1] <- 'target_id'
colnames(s)[1] <- 'target_id'
s1 <- merge(compare, s, by = "target_id")
output.filename <- gsub("(.*?)", "\\1.csv", i)
write.table(s1, output.filename)
}
list.dirs() returns a list of the directories in the given path and list.files() a list of files in a given path, see here for the documentation.
list.dirs() can be recursive or not, so you can get only directory at the first level and then call list.dirs() again on each sub-directories (inside a loop) or directly get all the sub-directories.
With these two functions you can build your my_files array (since I do not know exactly your directory structure, I can't give an example).
If you have multiples files and want to open only some of them, you can check if the file name contains some sub-string you want (e.g. the file extension). The way to do it is shown here.
list of subfolders and files within the subfolders
list.files(recursive = TRUE)
I used this code to get list of files from subfolder. It worked fine , but it is listing upto 1000 files and remaining files are ommited.
In a folder I have some subfolder(i.e. A,B,C,D) and in that subfolder I have further some folders(A - A1,A2,A3,... B - B1,B2,B3,....)
How to list all the files in subfolder?
From the list of output, I need to search the files which have same prefix name and list all that files in separate folder.
I think your code does list all files, but as a default it only prints the first 1000. You could change this by setting options(max.print = 1000000) for example. However, if you assign the list.files() function to for example mylist <- as.list(list.files()) it will assign all 1000+ files anyway, without needing to adjust the max.print option. And if you want to select a certain pattern, you could add the pattern = "mypatternofinterest" argument in the list.files() function.
Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)
I am stuck. I need a way to iterate through a bunch of subfolders in a directory, pull out 4 .csv files , bind the contents of those 4 .csv files, then write out the new .csv to a new directory using the name of the initial subfolder as the name of the new .csv.
I know R could do this. But I am stuck at how to iterate across the subfolders and bind the csv files together. My obstacle is that each subfolder contains the same 4 .csv files using the same 8-digit id. For example, subfolder A contains 09061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. subfolder B contains 9061234.csv, 09061345.csv, 09061456.csv, and 09061560.csv. (...). There are 42 subfolders, and hence 168 csv files with the same names. I want to compact the files down to 42.
I can use list.files to retrieve all the subfolders. But then what?
##Get Files from directory
TF = "H:/working/TC/TMS/Counts/June09"
##List Sub folders
SF <- list.files(TF)
##List of File names inside folders
FN <- list.files(SF)
#Returns list of 168 filenames
###?????###
#How to iterate through each subfolder, read each 8-digit integer id file,
#bind them all together into one single csv,
#Then write to new directory using
#the name of the subfolder as the name of the new csv?
There is probably a way to do this easily but I am a noob with R. Something involving functions, paste and write.table perhaps? Any hints/help/suggestions is greatly appreciated. Thanks!
You can use recursive=T option for list.files,
lapply(c('1234' ,'1345','1456','1560'),function(x){
sources.files <- list.files(path=TF,
recursive=T,
pattern=paste('*09061*',x,'*.csv',sep='')
,full.names=T)
## ou read all files with the id and bind them
dat <- do.call(rbind,lapply(sources.files,read.csv))
### write the file for the
write(dat,paste('agg',x,'.csv',sep='')
}
After some tweaking of agstudy's code, I came up with the solution I was ultimately after. There were a couple of missing pieces that are more due to the nature of my specific problem, so I am leaving agstudy's answer as "accepted".
Turns out a function really wasn't needed. At least not for now. If I need to perform this same task again, I will create a function out of it. For now, I can solve this particular problem without it.
Also, for my instance, I needed a conditional "if" statement to handle any non-csv files that may have lived in the subfolders. By adding an if statement, R throws warnings and skips any files that are not comma-separated.
Code:
##Define directory path##
TF = "H:/working/TC/TMS/Counts/June09"
##List of subfolder files where file name starts with "0906"##
SF <- list.files(TF,recursive=T, pattern=paste("*09061*",x,'*.csv',sep=""))
##Define the list of files to search for##
x <- (c('1234' ,'1345','1456','1560')
##Create a conditional to skip over the non-csv files in each folder##
if (is.integer(x)){
sources.files <- list.files(TF, recursive=T,full.names=T)}
dat <- do.call(rbind,lapply(sources.files,read.csv))
#the warnings thrown are ok--these are generated due to the fact that some of the folders contain .xls files
write.table(dat,file="H:/working/TC/TMS/June09Output/June09Batched.csv",row.names=FALSE,sep=",")