I have a series of files in a folder that look like this;
B_1.csv, B_1_2.csv, B_2.csv, B_2_2.csv, B_3.csv, B_4.csv, B_4_1.csv
Basically, I wish to merge any files which contain '_2' to their proceeding number (i.e. B_1_2.csv merges with B_1.csv). A further complication is that some files (such as B_3.csv) do not have a second file (_2) and therefore need to be ignored. I cannot think of an easy way of completing this. Any help would be greatly appreciated. Many thanks
Untested, of course, but this should work (or something close to it):
# identify CSV files
files = list.files(pattern = "csv$")
# look for ones that need merging
to_merge = grep("[0-9]_2\\.csv", files, value = TRUE)
# identify what they should be merged to
merge_target_file = sub("_2.csv", ".csv", to_merge, fixed = TRUE)
# make sure those exist
problems = setdiff(merge_target_file, files)
if(length(problems)) stop(problems, " not found, need merge targets.")
# read in the data
merge_target = lapply(merge_target_file, read.csv, stringsAsFactors = FALSE)
to_merge = lapply(to_merge, read.csv, stringsAsFactors = FALSE)
# merge and write
for(i in seq_along(merge_target)) {
write.csv(rbind(merge_target[[i]], to_merge[[i]]), file = merge_target_file[i])
}
Related
R Beginners here
I have a folder contains 150 csv files, each file name is "student1" "student2"....
Each files has 2 columns with Courses and Score
I want to run a for loop for this and store all of the data into a new dataframe.
so far I have :
data_1 = dir(path_cwd.full.names = TRUE, pattern = "csv$")
for(i in data_1)
{
b = read.csv(i,sep = ", " header = TRUE)
}
Please help me and explain it to me!
Much thanks
You can use lapply here. It is basically the same as for loop but you will more control over the operation. Here we will use lapply to get each file and then using do.call we will bind all data frames into one dataframe. The point is that you should make sure all csv files have the same number of columns and their names and order of columns are matched. Else, you may need to do more steps in between.
data_1 = dir(path_cwd.full.names = TRUE, pattern = "csv$"
final_df <- lapply(data_1, function(i){
b = read.csv(i,sep = ", " header = TRUE)
}) %>% do.call(what = rbind)
I need to process all the files in a folder, and the files are named sequentially, so I think it is a good time for a loop. The code to process a single file is simple:
df<-read.table("CLIM0101.WTG", skip = 3, header = TRUE)
df<-df[,-1]
df$year<-2014
df$day<-c(1:365)
write.table(df, "clim201401.txt", rownames = "FALSE")
The 99 files to be read are "CLIM0101.WTG" through "CLIM9901.WTG" and they should be written to "clim201401.txt" through "clim201499.txt". Here's a link to the folder with the files:
https://www.dropbox.com/sh/y255e07wq5yj1nd/4dukOLxKgm
So what is the problem here? I don't understand how to write a loop, and haven't found a great description of how to do so. Previous loop questions have had non-loop answers, but it seems like this time it is really what I need.
I do that all the time. The basic idiom is
files <- list.files(....) # possibly with regexp
reslist <- lapply(files, function(f) { ... some expressions on f ... }
You simply need to encode your few steps into something like
myfun <- function(filename) {
df<-read.table(filename, skip = 3, header = TRUE)
df<-df[,-1]
df$year<-2014
df$day<-c(1:365)
newfile <- gsub(".WTG", ".txt", filename_
write.table(df, newfile, rownames = FALSE) # don't quote FALSE
}
and now you use use myfun ie the above becomes
files <- list.files(....) # possibly with regexp
invisible(lapply(files, myfun))
Untested, obviously.
So I have .csv's of nesting data that I need to trim. I wrote a series of functions in R and then spit out the new pretty .csv. The issue is that I need to do this with 59 .csv's and I would like to automate the name.
data1 <- read.csv("Nest001.csv", skip = 3, header=F)
functions functions functions
write.csv("Nest001_NEW.csv, file.path(out.path, edit), row.names=F)
So...is there any way for me to loop the name Nest001 to Nest0059 so that I don't have to delete and retype the name for every .csv?
EDIT to incorporate Gregor's suggestion:
One option:
filenames_in <- sprintf("Nest%03d.csv", 1:59)
filenames_out <- sub(pattern = "(\\d{3})(\\.)", replacement = "\\1_NEW\\2", filenames_in)
all_files <- matrix(c(filenames_in, filenames_out), ncol = 2)
And then loop through them:
for (i in 1:nrow(all_files)) {
temp <- read.csv(all_files[[i, 1]], skip = 3, header=F)
do stuff
write.csv(temp, all_files[[i, 2]], row.names = f)
)
To do this purrr-style, you would create two lists similar to the above, and then write a custom function to read in the file, perform all the functions, and then output it.
e.g.
purrr::walk2(
.x = list(filenames_in),
.y = list(filenames_out),
.f = ~my_function()
)
Consider .x and .y as the i in the for loop; it goes through both lists simultaneously, and performs the function on each item.
More info is available here.
Your best bet is to put all of these CSVs into one folder, without any other CSVs in that folder. Then, you can write a loop to go over every file in that folder, and read them in.
library(dplyr)
setwd("path to the folder with CSV's goes here")
combinedData = data.frame()
files = list.files()
for (file in files)
{
read.csv(file)
combinedData = bind_rows(combinedData, file)
}
EDIT: if there are other files in the folder that you don't want to read, you can add this line of code to only read in files that contain the word "Nest" in the title:
files= files[grepl("Nest",filesToRead)]
I don't remember off the top of my head if that is case sensitive or not
I have several txt files in different directories. I want to read each file separately in R that I will apply some analysis on each one later.
The directories are the same except the last folder as the following:
c:/Desktop/ATA/1/"files.txt"
c:/Desktop/ATA/2/"files.txt"
c:/Desktop/ATA/3/"files.txt"
...
...
The files in all directories have the same name and the last folder starts from 1 to last order.
Create all the filenames to read using sprintf or something similar. Then use read.table or whatever you use to read the text files.
lapply(sprintf("c:/Desktop/ATA/%d/files.txt", 1:10), function(x)
read.table(x, header = TRUE))
Replace 10 with the number of folders you have.
Maybe you can try:
list_file <- list.files(path = "c:/Desktop/ATA", recursive = T, pattern = ".txt", full.names = T)
This will return the list of text files contained in your folder. Then, you can create a for loop to open them and apply some functions on each.
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
... function to apply
}
First Thanks Guys, I mixed your codes and modified a little bit:
common_path = "c:/Desktop/ATA/"
primary_dirs = length(list.files(common_path)) # Gives no. of folders in path
list_file <- sprintf("c:/Desktop/ATA/%d/files.txt", 1:primary_dirs)
for(i in 1:length(list_file))
{
data = read.table(list_file[i],header = T, sep = "\t")
}
So, by this way the folders are sorted based on 1,2,3 not 1,10,11,2,3.
My script reads in a list of text files from a folder. A calculation for all values in a few columns in each text file is made.
At the end I want to write the resulting data.frame into a new text file in a different location.
The problem is, that the script keeps overwriting the file it created before. So I end up with only one file (the last one that was read in).
But I don't get what I am doing wrong here. The output file name is different each time, so in my head it should produce separate files.
The script looks as follows:
RAW <- "C:/path/tofiles"
files <- list.files(RAW, full.names = TRUE)
for(j in length(files)) {
if(file.exists(files[[j]])){
data <- read.csv(files[[j]], skip = 0, header=FALSE)
data[9] <- do.call(cbind,lapply(data[9], function(x){(data[9]*0.01701)/0.00848}))
data[11] <- do.call(cbind,lapply(data[11], function(x){(data[11]*0.01834)/0.00848}))
data[13] <- do.call(cbind,lapply(data[13], function(x){(data[13]*0.00982)/0.00848}))
data[15] <- do.call(cbind,lapply(data[15], function(x){(data[15]*0.01011)/0.00848}))
OUT <- paste("C:/path/to/destination_folder",basename(files[[j]]),sep="")
write.table(data, OUT, sep=",", row.names = FALSE, col.names = FALSE, append = FALSE)
}
}
The problem is in your for loop. length(files) just provides 1 value, namely the length of your files-vector, while I think you want to have a sequence with that length.
Try seq_along or just for(j in files).