Example Data
I'm writing a script with the intent to copy input files, each to multiple locations. Below is an example of functional code to achieve this:
##### File 1 #####
output_paths_1 <- list(c(paste0(path_1, "file_1", ".xlsx"),
paste0(path_2, "file_1", ".xlsx"),
paste0(path_3, "file_1", " ", gsub("-", "", Sys.Date()), ".xlsx")))
lapply(output_paths_1, function (x) file.copy(paste0(input_path, "input_1.xlsx"), x, overwrite = T))
##### File 2 #####
output_paths_2 <- list(c(paste0(path_1, "file_2", ".xlsx"),
paste0(path_2, "file_2", ".xlsx"),
paste0(path_3, "file_2", " ", gsub("-", "", Sys.Date()), ".xlsx")))
lapply(output_paths_2, function (x) file.copy(paste0(input_path, "input_2.xlsx"), x, overwrite = T))
##### File 3 #####
output_paths_3 <- list(c(paste0(path_1, "file_3", ".xlsx"),
paste0(path_2, "file_3", ".xlsx"),
paste0(path_3, "file_3", " ", gsub("-", "", Sys.Date()), ".xlsx")))
lapply(output_paths_3, function (x) file.copy(paste0(input_path, "input_3.xlsx"), x, overwrite = T))
Reprex
But I suspect there are more efficient methods. In my latest attempt, which does not work, I used a nested 'for' loop. I create data frames containing each input and file name. Then (in theory), for each i in inputs, I write an output paths data frame for each i in files. I filter this data frame for only one file at a time using grepl. See code below:
files <- data.frame(data = c("file_1", "file_2", "file_3"))
inputs <- data.frame(data = c("input_1.xlsx", "input_2.xlsx", "input_3.xlsx"))
for (i in seq_along(inputs)) {
for (i in seq_along(files)) {
output_paths <- data.frame(data = c(paste0(path_1, files[[i]], ".xlsx"),
paste0(path_2, files[[i]], ".xlsx"),
paste0(path_3, files[[i]], " ", gsub("-", "", Sys.Date()), ".xlsx"))) %>%
filter(grepl(files[[i]], `data`))
lapply(output_paths, function (x) file.copy(paste0(input_path, inputs[[i]]), x, overwrite = T))
}
}
I expected this to copy the first file to three locations, then the next file to those same locations, etc. Instead, the following Warning appears, and only the first file is copied to the desired locations:
Warning message:
In grepl(files[[i]], data) :
argument 'pattern' has length > 1 and only the first element will be used
Running the code without including the grepl function does nothing at all - no files are copied to the desired locations.
Questions:
How might I tweak the code above to iterate for all elements, instead of the first element only?
Is there a more elegant approach entirely? (just looking for pointers, not reprex necessarily)
I don't understand what you are trying to accomplish with your "Reprex" approach. But if you want to do what your first but of code does by writing less code, then you could do something like
files = c("file1", "file2", "file3") # file names
opaths = c("path1", "path2", "path3") # output paths
df = expand.grid(file = files, path = opaths, stringsAsFactors = F)
df$from = file.path(input_path, df$file)
df$to = file.path(df$path, df$file)
file.copy(from = df$from, to = df$to)
If you want the timestamp in the file name for path3, you could then do something like
df$to[df$path == "path3"] <- file.path(df$path[df$path == "path3"],
paste0(format(Sys.Date(), "%Y%m%d_"), df$file[df$path == "path3"])
)
Related
I have a list of 50 text files all beginning with NEW.
I want to loop through each textfile/dataframe and run some function and then output the results via the write.table function. Therefore for each file, a function is applied and then an output should be created containing the original name with output at the end.
Here is my code.
fileNames <- Sys.glob("*NEW.*")
for (fileName in fileNames) {
df <- read.table(fileName, header = TRUE)
FUNCTION (not shown as this works)
...
result <-print(chr1$results) #for each file a result would be printed.
write.table(result, file = paste0(fileName,"_output.txt"), quote = F, sep = "\t", row.names = F, col.names = T)
#for each file a new separate file is created with the original output name retained.
}
However, I only get one output rather than 50 output files. It seems like its only looping through one file. What am I doing wrong?
readme <- function(folder_name = "my_texts"){
file_list <- list.files(path = folder_name, pattern = "*.txt",
recursive = TRUE, full.names = TRUE).
#list files with .txt ending
textdata <- lapply(file_list, function(x) {.
paste(readLines(x), collapse=" ").
}).
#apply readlines over the file list.
data.table::setattr(textdata, "names", file_list) .
#add names attribute to textdata from file_list.
lapply(names(file_list), function(x){.
lapply(names(file_list[[x]]), function(y) setattr(DT[[x]], y,
file_list[[x]][[y]])).
}).
#set names attribute over the list.
df1 <- data.frame(doc_id = rep(names(textdata), lengths(textdata)),
doc_text = unlist(textdata), row.names = NULL).
#convert to dataframe where names attribute is doc_id and textdata is text.
return(df1).
}
I found this code here, and it worked to convert the '.txt' to '.csv' but the file is not broken into columns, pretty sure there's an easy fix or line to add here, but I'm not finding it. Still new to r and working through, so any help or direction is appreciated.
EDIT: The file contains the following, a list of invasive plants:
Header: Noxious Weed List.
'(a) Abrus precatorius – rosary pea '
'(b) Aeginetia spp. – aeginetia'
'(c) Ageratina adenophora – crofton weed '
'(d) Alectra spp. – alectra '
And so I would like to get all the parts, i.e., genus, species, and common name, in a separate column. and if possible, delete the letters '(a)' and the ' - ' separating hyphen.
filelist = list.files(pattern = ".txt")
for (i in 1:length(filelist)) {
input<-filelist[i]
output <- paste0(gsub("\\.txt$", "", input), ".csv")
print(paste("Processing the file:", input))
data = read.delim(input, header = TRUE)
write.table(data, file=output, sep=",", col.names=TRUE, row.names=FALSE)
}
You'll need to adjust if you have common names with three or more words, but this is the general idea:
path <- "C:\\Your File Path Here\\"
file <- paste0(path, "WeedList.txt")
DT <- read.delim(file, header = FALSE, sep = " ")
DT <- DT[-c(1),-c(1,4,7)]
colnames(DT) <- c("Genus", "Species", "CommonName", "CommonName2")
DT$CommonName <- gsub("'", "", DT$CommonName)
DT$CommonName2 <- gsub("'", "", DT$CommonName2)
DT$CommonName <- paste(DT$CommonName, DT$CommonName2, sep = " ")
DT <- DT[,-c(4)]
write.csv(DT, paste0(path, "WeedList.csv"), row.names = FALSE)
I have a code that reads two different csv files from a folder at the time of execution. i need to use for loop in this context to execute this multiple times and write the output in to a separate csv file of the form "bsc_.csv". The file format of the two input csv files are "base_.csv" and "fut_.csv". The files are incrementally numbered, and that that is the pattern I need to iterative over. The sample code is attached below.
library('CDFt')
d1<-read.csv("base1.csv",header=TRUE)
d2<-read.csv("fut1.csv",header=TRUE)
A1<-d1[,2]
A2<-d1[,3]
A3<-d2[,2]
CT<-CDFt(A1,A2,A3)
x<-CT$x
FGp<-CT$FGp
FGf<-CT$FGf
FRp<-CT$FRp
FRf<-CT$FRf
ds<-CT$DS
d<-round(ds,3)
dat<-replace(d,d<0,0)
write.table(dat,"bsc1.csv", row.names=F,na="NA",append=T, quote= FALSE, sep=",", col.names=F)
Try this (untested):
bases <- list.files(pattern = "base[0-9]*\\.csv$")
futs <- list.files(pattern = "fut[0-9]*\\.csv$")
mismatches <- setdiff(gsub("^base", "", bases), gsub("^fut", "", futs) )
if (length(mismatches)) {
warning("'bases' files not in 'futs': ", paste(sQuote(mismatches), collapse = ", "))
bases <- setdiff(bases, paste0("base", mismatches))
}
# and the reverse
mismatches <- setdiff(gsub("^fut", "", futs), gsub("^base", "", bases) )
if (length(mismatches)) {
warning("'futs' files not in 'bases': ", paste(sQuote(mismatches), collapse = ", "))
futs <- setdiff(futs, paste0("fut", mismatches))
}
ign <- Map(function(fb, ff) {
bdat <- read.csv(fb, header = TRUE)
fdat <- read.csv(ff, header = TRUE)
# ...
newfn <- gsub("^base", "bsc", fb)
write.table(dat, newfn, ...)
}, bases, futs)
I have a several .csv files that have to be reformatted and saved again using an R script.
The function that is needed to do the changes and the reformating of the files, is already established and works perfectly fine. But as there are always lots of documents to change, I would like to have a for lLoop so that I don't have to adapt my code for every single document. But unfortunately I don't have experience in the use of loops using R so far.
My code looks like this at the moment:
setwd("C:/users/Desktop/Raw/.")
df <- read.csv("A1.csv", sep= ",")
new_df <- wrap_frame(df, nr = 61, rownames = "", unique_names = FALSE)
write.csv(new_df, "C:/users/Desktop/Data/A1.csv", row.names = FALSE)
The original .csv files are always called the same way with a letter (A to Z) followed by a number from 1 to 12. The number of the .csv files to change may adapt. But their names are always following the mentioned rules.
I would be very grateful, if somebody could help me with this issue!
You can get a vector with all filenames that exist in your folder (as this folder contains no other files than those you want to edit) with
setwd( "C:/users/Desktop/Raw/" )
files <- Sys.glob( "*.csv" )
and then process them one by one with
for( i in files )
{
df <- read.csv( i )
new_df <- wrap_frame(df, nr = 61, rownames = "", unique_names = FALSE)
write.csv(new_df, paste( "C:/users/Desktop/Data/", i, sep = "" ), row.names = FALSE)
}
Try out:
# vector of file names
my.files <- paste0(c(outer(LETTERS, 1:12, FUN = "paste0")),
".csv")
# for loop
for (i in seq_along(my.files)) {
df <- read.csv(my.files[i], sep= ",") # open
new_df <- wrap_frame(df, nr = 61, rownames = "", unique_names = FALSE) # mutate
write.csv(new_df, paste0("C:/users/Desktop/Data/", my.files[i]),
row.names = FALSE) # save
}
I have a folder of files that are in .csv format. They have blank lines in them that are necessary (this indicates an absence of a measure from a LiDAR unit, which is good and needs to stay in). But occasionally, the first row is empty this throws off the code and the package and everything aborts.
Right now I have to open each .csv and see if the first line is empty.
I would like to do one of the following, but am at a loss how to:
1) write a code that quickly scans through all of the files in the directory and tells me which ones are missing the first line
2) be able to skip the empty lines that are only at the beginning--which can vary, sometimes more than one line is empty
3) have a code that cycles through all of the .csv files and inserts a dummy first line of numbers so the files all import no problem.
Thanks!
Here's a bit of code that does 1 and 2 above. I'm not sure why you'd want to insert dummy line(s) given the ability to do 1 and 2; it's straightforward to do, but usually it's not a good idea to modify raw data files.
# Create some test files
cat("x,y", "1,2", sep="\n", file = "blank0.csv")
cat("", "x,y", "1,2", sep="\n", file = "blank1.csv")
cat("", "", "x,y", "1,2", sep="\n", file = "blank2.csv")
files <- list.files(pattern = "*.csv", full.names = TRUE)
for(i in seq_along(files)) {
filedata <- readLines(files[i])
lines_to_skip <- min(which(filedata != "")) - 1
cat(i, files[i], lines_to_skip, "\n")
x <- read.csv(files[i], skip = lines_to_skip)
}
This prints
1 ./blank0.csv 0
2 ./blank1.csv 1
3 ./blank2.csv 2
and reads in each dataset correctly.
I believe that the two functions that follow can do what you want/need.
First, a function to determine the files with a second line blank.
second_blank <- function(path = ".", pattern = "\\.csv"){
fls <- list.files(path = path, pattern = pattern)
second <- sapply(fls, function(f) readLines(f, n = 2)[2])
which(nchar(gsub(",", "", second)) == 0)
}
Then, a function to read in the files with such lines, one at a time. Note that I assume that the first line is the columns header and that at least the second line is left blank. There is a dots argument, ..., for you to pass other arguments to read.table, such as stringsAsFactors = FALSE.
skip_blank <- function(file, ...){
header <- readLines(file, n = 1)
header <- strsplit(header, ",")[[1]]
count <- 1L
while(TRUE){
txt <- scan(file, what = "character", skip = count, nlines = 1)
if(nchar(gsub(",", "", txt)) > 0) break
count <- count + 1L
}
dat <- read.table(file, skip = count, header = TRUE, sep = ",", dec = ".", fill = TRUE, ...)
names(dat) <- header
dat
}
Now, an example usage.
second_blank(pattern = "csv") # a first run as an example usage
inx <- second_blank() # this will be needed later
fl_names <- list.files(pattern = "\\.csv") # get all the CSV files
df_list <- lapply(fl_names[inx], skip_blank) # read the problem ones
names(df_list) <- fl_names[inx] # tidy up the result list
df_list