In R I have a code that runs a loop over a list of search words that are downloaded from a website. Each search word is then saved as a csv file:
...some code...
x <- try(read.table(text=res, sep=",", col.names=c("Week", "TrendsCount"), skip=31, nrows=515))
for(i in 1:iterations){
...some code...
filename <- paste(wordlist[i], "csv", sep = ".")
write.table(x, file = filename, sep = ";", col.names = NA, qmethod = "double")
}
Sometimes the list will contain a search word that returns an error when code is executed, because the word does not exist on the website. This stops the loop. How can I make the loop skip the write.table part and just continue with the next word in the list?
just put write.table in
try(..., silent=T)
Related
I have a list of 50 text files all beginning with NEW.
I want to loop through each textfile/dataframe and run some function and then output the results via the write.table function. Therefore for each file, a function is applied and then an output should be created containing the original name with output at the end.
Here is my code.
fileNames <- Sys.glob("*NEW.*")
for (fileName in fileNames) {
df <- read.table(fileName, header = TRUE)
FUNCTION (not shown as this works)
...
result <-print(chr1$results) #for each file a result would be printed.
write.table(result, file = paste0(fileName,"_output.txt"), quote = F, sep = "\t", row.names = F, col.names = T)
#for each file a new separate file is created with the original output name retained.
}
However, I only get one output rather than 50 output files. It seems like its only looping through one file. What am I doing wrong?
readme <- function(folder_name = "my_texts"){
file_list <- list.files(path = folder_name, pattern = "*.txt",
recursive = TRUE, full.names = TRUE).
#list files with .txt ending
textdata <- lapply(file_list, function(x) {.
paste(readLines(x), collapse=" ").
}).
#apply readlines over the file list.
data.table::setattr(textdata, "names", file_list) .
#add names attribute to textdata from file_list.
lapply(names(file_list), function(x){.
lapply(names(file_list[[x]]), function(y) setattr(DT[[x]], y,
file_list[[x]][[y]])).
}).
#set names attribute over the list.
df1 <- data.frame(doc_id = rep(names(textdata), lengths(textdata)),
doc_text = unlist(textdata), row.names = NULL).
#convert to dataframe where names attribute is doc_id and textdata is text.
return(df1).
}
I am running the following loop in R:
for (i in 1:4) {
do some operation on these files
output.file <- 'results.csv'
write.table(file = output.file, x = MYFILE, append=TRUE, col.names=FALSE, row.names = FALSE, sep = "\t")
}
Since all output files have the same header, I want to save only the first one at the beginning and delete all the others. Alternatively, to add a header consequently. With col.names=FALSE I have no header at all. How to do this in R?
I have a list of approximately 500 csv files each with a filename that consists of a six-digit number followed by a year (ex. 123456_2015.csv). I would like to append all files together that have the same six-digit number. I tried to implement the code suggested in this question:
Import and rbind multiple csv files with common name in R but I want the appended data to be saved as new csv files in the same directory as the original files are currently saved. I have also tried to implement the below code however the csv files produced from this contain no data.
rm(list=ls())
filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test")
NAPS_ID <- gsub('.+?\\([0-9]{5,6}?)\\_.+?$', '\\1', filenames)
Unique_NAPS_ID <- unique(NAPS_ID)
n <- length(Unique_NAPS_ID)
for(j in 1:n){
curr_NAPS_ID <- as.character(Unique_NAPS_ID[j])
NAPS_ID_pattern <- paste(".+?\\_(", curr_NAPS_ID,"+?)\\_.+?$", sep = "" )
NAPS_filenames <- list.files(path = "C:/Users/smithma/Desktop/PM25_test", pattern = NAPS_ID_pattern)
write.csv(do.call("rbind", lapply(NAPS_filenames, read.csv, header = TRUE)),file = paste("C:/Users/smithma/Desktop/PM25_test/MERGED", "MERGED_", Unique_NAPS_ID[j], ".csv", sep = ""), row.names=FALSE)
}
Any help would be greatly appreciated.
Because you're not doing any data manipulation, you don't need to treat the files like tabular data. You only need to copy the file contents.
filenames <- list.files("C:/Users/smithma/Desktop/PM25_test", full.names = TRUE)
NAPS_ID <- substr(basename(filenames), 1, 6)
Unique_NAPS_ID <- unique(NAPS_ID)
for(curr_NAPS_ID in Unique_NAPS_ID){
NAPS_filenames <- filenames[startsWith(basename(filenames), curr_NAPS_ID)]
output_file <- paste0(
"C:/Users/nwerth/Desktop/PM25_test/MERGED_", curr_NAPS_ID, ".csv"
)
for (fname in NAPS_filenames) {
line_text <- readLines(fname)
# Write the header from the first file
if (fname == NAPS_filenames[1]) {
cat(line_text[1], '\n', sep = '', file = output_file)
}
# Append every line in the file except the header
line_text <- line_text[-1]
cat(line_text, file = output_file, sep = '\n', append = TRUE)
}
}
My changes:
list.files(..., full.names = TRUE) is usually the best way to go.
Because the digits appear at the start of the filenames, I suggest substr. It's easier to get an idea of what's going on when skimming the code.
Instead of looping over the indices of a vector, loop over the values. It's more succinct and less likely to cause problems if the vector's empty.
startsWith and endsWith are relatively new functions, and they're great.
You only care about copying lines, so just use readLines to get them in and cat to get them out.
You might consider something like this:
##will take the first 6 characters of each file name
six.digit.filenames <- substr(filenames, 1,6)
path <- "C:/Users/smithma/Desktop/PM25_test/"
unique.numbers <- unique(six.digit.filenames)
for(j in unique.numbers){
sub <- filenames[which(substr(filenames,1,6) == j)]
data.for.output <- c()
for(file in sub){
##now do your stuff with these files including read them in
data <- read.csv(paste0(path,file))
data.for.output <- rbind(data.for.output,data)
}
write.csv(data.for.output,paste0(path,j, '.csv'), row.names = F)
}
I have a large number of files to import which are all saved as zip files.
From reading other posts it seems I need to pass the zip file name and then the name of the file I want to open. Since I have a lot of them I thought I could loop through all the files and import them one by one.
Is there a way to pass the name dynamically or is there an easier way to do this?
Here is what I have so far:
Temp_Data <- NULL
Master_Data <- NULL
file.names <- c("f1.zip", "f2.zip", "f3.zip", "f4.zip", "f5.zip")
for (i in 1:length(file.names)) {
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
Master_Data <- rbind(Master_Data, Temp_Data)
}
I get the following error:
In open.connection(file, "rt") :
I can import them manually using:
dt <- read.table(unz("D:/f1.zip", "f1.csv"), sep = ",")
I can create the sting dynamically but it feels long winded - and doesn't work when I wrap it with read.table(unz(...)). It seems it can't find the file name and so throws an error
cat(paste(toString(shQuote(paste("D:/",zipFile, sep = ""))),",",
toString(shQuote(dataFile)), sep = ""), "\n")
But if I then print this to the console I get:
"D:/f1.zip","f1.csv"
I can then paste this into `read.table(unz(....)) and it works so I feel like I am close
I've tagged in data.table since this is what I almost always use so if it can be done with 'fread' that would be great.
Any help is appreciated
you can use the list.files command here:
first set your working directory, where all your files are stored there:
setwd("C:/Users/...")
then
file.names = list.files(pattern = "*.zip", recursive = F)
then your for loop will be:
for (i in 1:length(file.names)) {
#open the files
zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)
Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
# your function for the opened file
Master_Data <- rbind(Master_Data, Temp_Data)
#write the file finaly
write_delim(x = Master_Data, path = paste(file.names[[i]]), delim = "\t",
col_names = T )}
I tried some options from stackoverflow(e.g.1) but this also doens't work so maybe there is a mistake in my code:
fileConn<-file("outputR.txt")
for (i in 1:length(lines)){
line = lines[i]
fields = strsplit(line, "\t")[[1]]
id = fields[1]
goIDs = fields[2:length(fields)]
list = as.list(GOCCANCESTOR[goIDs])
text = paste(toString(id), ":", toString(goIDs))
cat(text, file=fileConn, append=TRUE, sep = "\n")
}
close(fileConn)
when I run this code it keeps overwriting the data in the outputR.txt file.
Any suggestions to fix this problem?
the problem is that you are using a Fileconnection in combination with cat then the append won't work. There are several option you could use, the most easy one is to this:
first "create" the file, if you want to add a header for example:
header = "some header"
## if you don't want to use a header then leave the header blank
header =""
cat(text, file="outputR.txt", append=FALSE, sep = "\n")
notice the append = FALSE this is necessary if you want to clear the file if it already exist otherwise you have to use append = TRUE
the you can write text to it using:
text = text = paste(toString(id), ":", toString(goIDs))
cat(text file="outputR.txt", append=TRUE, sep = "\n")
You have two options here:
1.
Open the file in write mode:
lines <- c("aaaaa", "bbbb")
fileConn<-file("test.txt", "w")
for (i in 1:length(lines)){
line = lines[i]
cat(line, file=fileConn, append=TRUE, sep = "\n")
}
close(fileConn)
2
Use the write function with the append argument:
lines <- c("aaaaa", "bbbb")
for (i in 1:length(lines)){
line = lines[i]
write(line,file="test2",append=TRUE)
}
As the help page for cat states:
append: logical. Only used if the argument file is the name of file (and not a connection or "|cmd"). If TRUE output will be appended to file; otherwise, it will overwrite the contents of file.
thus, if you use a connection in the file argument the value of the append argument is ignored.
simply specify the file argument as name of file:
cat(text, file="outputR.txt", append=TRUE, sep = "\n")
alternatively you can open the file connection with the correct mode specified
w+ - Open for reading and writing, truncating file initially.
fileConn <- file("outputR.txt", open = "w+")
for (i in 1:length(lines)){
text <- paste("my text in line", i)
cat(text, file = fileConn, sep = "\n")
}
close(fileConn)