the following code in R for all the files. actually I made a for loop for that but when I run it it will be applied only on one file not all of them. BTW, my files do not have header.
You use [[ to subset something from peaks. However, after reading it using the file name, it is a data frame with then no more reference to the file name. Thus, you just have to get rid of the [[i]].
for (i in filelist.coverages) {
peaks <- read.delim(i, sep='', header=F)
PeakSizes <- c(PeakSizes, peaks$V3 - peaks$V2)
}
By using the iterator i within read.delim() which holds a new file name each time, every time R goes through the loop, peaks will have the content of a new file.
In your code, i is referencing to a name file. Use indices instead.
And, by the way, don't use setwd, use full.names = TRUE option in list.files. And preallocate PeakSizes like this: PeakSizes <- numeric(length(filelist.coverages)).
So do:
filelist.coverages <- list.files('K:/prostate_cancer_porto/H3K27me3_ChIPseq/',
pattern = 'island.bed', full.names = TRUE)
##all 97 bed files
PeakSizes <- numeric(length(filelist.coverages))
for (i in seq_along(filelist.coverages)) {
peaks <- read.delim(filelist.coverages[i], sep = '', header = FALSE)
PeakSizes[i] <- peaks$V3 - peaks$V2
}
Or you could simply use sapply or purrr::map_dbl:
sapply(filelist.coverages, function(file) {
peaks <- read.delim(file, sep = '', header = FALSE)
peaks$V3 - peaks$V2
})
Related
I need to process all the files in a folder, and the files are named sequentially, so I think it is a good time for a loop. The code to process a single file is simple:
df<-read.table("CLIM0101.WTG", skip = 3, header = TRUE)
df<-df[,-1]
df$year<-2014
df$day<-c(1:365)
write.table(df, "clim201401.txt", rownames = "FALSE")
The 99 files to be read are "CLIM0101.WTG" through "CLIM9901.WTG" and they should be written to "clim201401.txt" through "clim201499.txt". Here's a link to the folder with the files:
https://www.dropbox.com/sh/y255e07wq5yj1nd/4dukOLxKgm
So what is the problem here? I don't understand how to write a loop, and haven't found a great description of how to do so. Previous loop questions have had non-loop answers, but it seems like this time it is really what I need.
I do that all the time. The basic idiom is
files <- list.files(....) # possibly with regexp
reslist <- lapply(files, function(f) { ... some expressions on f ... }
You simply need to encode your few steps into something like
myfun <- function(filename) {
df<-read.table(filename, skip = 3, header = TRUE)
df<-df[,-1]
df$year<-2014
df$day<-c(1:365)
newfile <- gsub(".WTG", ".txt", filename_
write.table(df, newfile, rownames = FALSE) # don't quote FALSE
}
and now you use use myfun ie the above becomes
files <- list.files(....) # possibly with regexp
invisible(lapply(files, myfun))
Untested, obviously.
I have several txt files in different directories. I want to read each file separately in R that I will apply some analysis on each one later.
The directories are the same except the last folder as the following:
c:/Desktop/ATA/1/"files.txt"
c:/Desktop/ATA/2/"files.txt"
c:/Desktop/ATA/3/"files.txt"
...
...
The files in all directories have the same name and the last folder starts from 1 to last order.
Create all the filenames to read using sprintf or something similar. Then use read.table or whatever you use to read the text files.
lapply(sprintf("c:/Desktop/ATA/%d/files.txt", 1:10), function(x)
read.table(x, header = TRUE))
Replace 10 with the number of folders you have.
Maybe you can try:
list_file <- list.files(path = "c:/Desktop/ATA", recursive = T, pattern = ".txt", full.names = T)
This will return the list of text files contained in your folder. Then, you can create a for loop to open them and apply some functions on each.
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
... function to apply
}
First Thanks Guys, I mixed your codes and modified a little bit:
common_path = "c:/Desktop/ATA/"
primary_dirs = length(list.files(common_path)) # Gives no. of folders in path
list_file <- sprintf("c:/Desktop/ATA/%d/files.txt", 1:primary_dirs)
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
}
So, by this way the folders are sorted based on 1,2,3 not 1,10,11,2,3.
How can I read many CSV files and make each of them into data tables?
I have files of 'A1.csv' 'A2.csv' 'A3.csv'...... in Folder 'A'
So I tried this.
link <- c("C:/A")
filename<-list.files(link)
listA <- c()
for(x in filename) {
temp <- read.csv(paste0(link , x), header=FALSE)
listA <- list(unlist(listA, recursive=FALSE), temp)
}
And it doesn't work well. How can I do this job?
Write a regex to match the filenames
reg_expression <- "A[0-9]+"
files <- grep(reg_expression, list.files(directory), value = TRUE)
and then run the same loop but use assign to dynamically name the dataframes if you want
for(file in files){
assign(paste0(file, "_df"),read.csv(file))
}
But in general introducing unknown variables into the scope is bad practice so it might be best to do a loop like
dfs <- list()
for(index in 1:length(files)){
file <- files[index]
dfs[index] <- read.csv(file)
}
Unless each file is a completely different structure (i.e., different columns ... the number of rows does not matter), you can consider a more efficient approach of reading the files in using lapply and storing them in a list. One of the benefits is that whatever you do to one frame can be immediately done to all of them very easily using lapply.
files <- list.files(link, full.names = TRUE, pattern = "csv$")
list_of_frames <- lapply(files, read.csv)
# optional
names(list_of_frames) <- files # or basename(files), if filenames are unique
Something like sapply(list_of_frames, nrow) will tell you how many rows are in each frame. If you have something more complex,
new_list_of_frames <- lapply(list_of_frames, function(x) {
# do something with 'x', a single frame
})
The most immediate problem is that when pasting your file path together, you need a path separator. When composing file paths, it's best to use the function file.path as it will attempt to determine what the path separator is for operating system the code is running on. So you want to use:
read.csv(files.path(link , x), header=FALSE)
Better yet, just have the full path returned when listing out the files (and can filter for .csv):
filename <- list.files(link, full.names = TRUE, pattern = "csv$")
Combining with the idea to use assign to dynamically create the variables:
link <- c("C:/A")
files <-list.files(link, full.names = TRUE, pattern = "csv$")
for(file in files){
assign(paste0(basename(file), "_df"), read.csv(file))
}
I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}
My script reads in a list of text files from a folder. A calculation for all values in a few columns in each text file is made.
At the end I want to write the resulting data.frame into a new text file in a different location.
The problem is, that the script keeps overwriting the file it created before. So I end up with only one file (the last one that was read in).
But I don't get what I am doing wrong here. The output file name is different each time, so in my head it should produce separate files.
The script looks as follows:
RAW <- "C:/path/tofiles"
files <- list.files(RAW, full.names = TRUE)
for(j in length(files)) {
if(file.exists(files[[j]])){
data <- read.csv(files[[j]], skip = 0, header=FALSE)
data[9] <- do.call(cbind,lapply(data[9], function(x){(data[9]*0.01701)/0.00848}))
data[11] <- do.call(cbind,lapply(data[11], function(x){(data[11]*0.01834)/0.00848}))
data[13] <- do.call(cbind,lapply(data[13], function(x){(data[13]*0.00982)/0.00848}))
data[15] <- do.call(cbind,lapply(data[15], function(x){(data[15]*0.01011)/0.00848}))
OUT <- paste("C:/path/to/destination_folder",basename(files[[j]]),sep="")
write.table(data, OUT, sep=",", row.names = FALSE, col.names = FALSE, append = FALSE)
}
}
The problem is in your for loop. length(files) just provides 1 value, namely the length of your files-vector, while I think you want to have a sequence with that length.
Try seq_along or just for(j in files).