R Temporarily write csv files during loop in R - r

I have a loop which downloads data from urls. Now I would like that for every x iterations, the information so far gets written away.
As such I have the following code:
baseurl <- "http://zoeken.kvk.nl/Address.ashx?site=handelsregister&partialfields=&q=010"
pages3 <- list()
for(i in 1:99999){
if(i < 10){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"00000",i), flatten=TRUE)
}
if(i < 100 & i >= 10){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"0000",i), flatten=TRUE)
}
if(i < 1000 & i >= 100){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"000",i), flatten=TRUE)
}
if(i < 10000 & i >= 1000){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"00",i), flatten=TRUE)
}
if(i < 100000 & i >= 10000){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"0",i), flatten=TRUE)
}
if(i < 1000000 & i >= 100000){
mydata <- RJSONIO::fromJSON(paste0(baseurl,i), flatten=TRUE)
}
mydata <- RJSONIO::fromJSON(paste0(baseurl,i), flatten=TRUE)
pages3[[i]] <- mydata$resultatenHR
options(timeout = 4000000)
if(i %% 100 == 0){Sys.sleep(5)}
if(i %% 1000 == 0){
final_df<-do.call(rbind,pages3)
final<- Reduce(function(x,y) merge(x, y, all=TRUE), final_df)
mytime <- format(Sys.time(), "%b_%d_%H_%M_%S_%Y")
myfile <- file.path(R(), paste0(mytime, "_", i, ".csv"))
write.csv2(final, file = myfile, sep = "", row.names = FALSE, col.names = FALSE,
quote = FALSE, append = FALSE)
}
}
}
}
However, nothing gets saved in the meantime? Where did I go wrong with the code? Thank you for your insights.

I think that your problem is probably in:
myfile <- file.path(R(), paste0(mytime, "_", i, ".csv"))
As R thinks R() is a function.
Error in R() : could not find function "R"
You can change it to getwd() if you want (don't forget to set working directory first setwd()) or specify a different path .
In addition, here: write.csv2(final, file = myfile, sep = "", row.names = FALSE, col.names = FALSE, quote = FALSE, append = FALSE), your forgot to write paste(), and for convince you can delete the default arguments you used.
write.csv2(final, file = paste(myfile, sep = "" ))
#Edit
This is probably not the most efficient way but it will probably do the trick.
The main issue is that you append the pages3 list object/csv file by the url index. If you'd create a new index for pages3, you can reset it every i urls.
setwd("Your working directory path")
baseurl <- "http://zoeken.kvk.nl/Address.ashx?site=handelsregister&partialfields=&q=010"
pages3 <- list()
#Counter for the url loop
i <- 1
#Counter for the appended csv file/ list object pages3
k <- 1
for(i in 1:99999){
#Read JSON file by i index
mydata <- RJSONIO::fromJSON(paste0(baseurl,i), flatten=TRUE)
#Appending to the Pages3 list object by k index
pages3[[k]] <- mydata$resultatenHR
# Increasing the k counter
k <- k + 1
options(timeout = 4000000)
if(i %% 100 == 0) {Sys.sleep(5)}
if(i %% 1000 == 0) {
final_df <- do.call(rbind, pages3)
final <- Reduce(function(x,y) merge(x, y, all=TRUE), final_df)
mytime <- format(Sys.time(), "%b_%d_%H_%M_%S_%Y")
myfile <- file.path(getwd(), paste0(mytime, "_", i, ".csv"))
write.csv2(final, file = paste(myfile, sep = "" ))
#Resetting the pages3 list object
pages3 <- NULL
#Resting the k index counter
k <- 1
}
}
However, depending on your computer/the size of the files you try to import, maybe it would be more efficient to save and split to different csv files when you finished imported all the urls.

Related

Skip empty files when importing text files

I have a folder with about 700 text files that I want to import and add a column to. I've figured out how to do this using the following code:
files = list.files(pattern = "*c.txt")
DF <- NULL
for (f in files) {
data <- read.table(f, header = F, sep=",")
data$species <- strsplit(f, split = "c.txt") <-- (column name is filename)
DF <- rbind(DF, data)
}
write.xlsx(DF,"B:/trends.xlsx")
Problem is, there are about 100 files that are empty. so the code stops at the first empty file and I get this error message:
Error in read.table(f, header = F, sep = ",") :
no lines available in input
Is there a way to skip over these empty files?
You can skip empty files by checking that file.size(some_file) > 0:
files <- list.files("~/tmp/tmpdir", pattern = "*.csv")
##
df_list <- lapply(files, function(x) {
if (!file.size(x) == 0) {
read.csv(x)
}
})
##
R> dim(do.call("rbind", df_list))
#[1] 50 2
This skips over the 10 files that are empty, and reads in the other 10 that are not.
Data:
for (i in 1:10) {
df <- data.frame(x = 1:5, y = 6:10)
write.csv(df, sprintf("~/tmp/tmpdir/file%i.csv", i), row.names = FALSE)
## empty file
system(sprintf("touch ~/tmp/tmpdir/emptyfile%i.csv", i))
}
For a different approach that introduces explicit error handling, think about a tryCatch to handle anything else bad that might happen in your read.table.
for (f in files) {
data <- tryCatch({
if (file.size(f) > 0){
read.table(f, header = F, sep=",")
}
}, error = function(err) {
# error handler picks up where error was generated
print(paste("Read.table didn't work!: ",err))
})
data$species <- strsplit(f, split = "c.txt")
DF <- rbind(DF, data)
}

How to parse INI like configuration files with R?

Is there an R function for parsing INI like configuration files?
While searching I only found this discussion.
Here is an answer that was given to exact the same question on r-help in 2007 (thanks to #Spacedman for pointing this out):
Parse.INI <- function(INI.filename)
{
connection <- file(INI.filename)
Lines <- readLines(connection)
close(connection)
Lines <- chartr("[]", "==", Lines) # change section headers
connection <- textConnection(Lines)
d <- read.table(connection, as.is = TRUE, sep = "=", fill = TRUE)
close(connection)
L <- d$V1 == "" # location of section breaks
d <- subset(transform(d, V3 = V2[which(L)[cumsum(L)]])[1:3],
V1 != "")
ToParse <- paste("INI.list$", d$V3, "$", d$V1, " <- '",
d$V2, "'", sep="")
INI.list <- list()
eval(parse(text=ToParse))
return(INI.list)
}
Actually, I wrote a short and presumably buggy function (i.e. not covering all corner cases) which works for me now:
read.ini <- function(x) {
if(length(x)==1 && !any(grepl("\\n", x))) lines <- readLines(x) else lines <- x
lines <- strsplit(lines, "\n", fixed=TRUE)[[1]]
lines <- lines[!grepl("^;", lines) & nchar(lines) >= 2] # strip comments & blank lines
lines <- gsub("\\r$", "", lines)
idx <- which(grepl("^\\[.+\\]$", lines))
if(idx[[1]] != 1) stop("invalid INI file. Must start with a section.")
res <- list()
fun <- function(from, to) {
tups <- strsplit(lines[(from+1):(to-1)], "[ ]*=[ ]*")
for (i in 1:length(tups))
if(length(tups[[i]])>2) tups[[i]] <- c(tups[[i]][[1]], gsub("\\=", "=", paste(tail(tups[[i]],-1), collapse="=")))
tups <- unlist(tups)
keys <- strcap(tups[seq(from=1, by=2, length.out=length(tups)/2)])
vals <- tups[seq(from=2, by=2, length.out=length(tups)/2)]
sec <- strcap(substring(lines[[from]], 2, nchar(lines[[from]])-1))
res[[sec]] <<- setNames(vals, keys)
}
mapply(fun, idx, c(tail(idx, -1), length(lines)+1))
return(res)
}
where strcap is a helper function that capitalizes a string:
strcap <- function(s) paste(toupper(substr(s,1,1)), tolower(substring(s,2)), sep="")
There are also some C solutions for this, like inih or libini that might be useful. I did not try them out, though.

Why is my loop in r is just running once?

I asked a very similar question before but the answers i got dont seem to apply in this case. The aim of my code is primarily to take a file, manipulate it and the save the manipulated file over the old file. Unfortunately there are a lot of file so I have incorporated a for loop but it is stopping after just one run through the loop. I think my return function is in the right place and my for statement worked in a previous slightly different version of the script.
Here is my code:
AddLatLon<- function(num, condition){
#Set working directiory
# num is the number of files needing extraction e.g (3:5) , c(2,7)
connect <- file("./EOBS DATA/sources.txt", "r")
locdata <- readLines(connect)
close(connect)
info <- locdata[24:length(locdata)] # removes non data before the data begins
Infodata <- read.table(text = info, sep=',',fill=TRUE,colClasses='character',header ==TRUE )
InfoTable <- read.csv("./EOBS DATA/sources.csv")
InfoTable$STAID <- as.numeric(InfoTable$STAID)
for(i in c(num)){
filename <-paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = "")
#if(i <10){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID00000", i, ".txt", sep = "")
#}
#if(i >=10 & i < 100){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID0000", i, ".txt", sep = "")
#}
#if(i>= 100 & i <1000){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID000", i, ".txt", sep = "")
#}
#if(i>= 1000){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID00", i, ".txt", sep = "")
#}
if(file.exists(filename) == FALSE) {
next
}
#con <- file(filename, "r")
#data <- readLines(con)
#close(con)
#q <- data[21:length(data)] # removes non data before the data begins
#Impactdata <- read.table(text = q, sep=',',fill=TRUE,colClasses='character',header = TRUE )
x <- read.csv(filename)
point <- match(i, InfoTable$STAID)
Lat <- InfoTable[point,5]
Lon <- InfoTable[point,6]
Lat <- as.character(Lat)
Lon <- as.character(Lon)
x$Lat <- Lat
x$Lon <- Lon
x$X <- NULL
x$DATE<- as.Date(as.character(x$DATE), format='%Y%m%d')
Savename <- paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = "")
if(condition == "rr"){
condition <- "Precipitation"
}
if(condition == "tn"){
condition <- "Minimum Temperature"
}
if(condition == "tx"){
condition <- "Maximum Temperature"
}
names(x)<- c("Station_ID", "Source_ID", "Date(yyyy-mm-dd)", condition, "Quality_code(0='valid'; 1='suspect')", "Latitude", "Longitude")
write.csv(x, Savename)
}
return(head(x))
}
num is not defined, but from the name I'm pretty sure you want to be looping over 1:num, not c(num). So just replace:
for(i in c(num)){
with
for(i in 1:num)){
or
for(i in seq_len(num)){
Why seq_len? It will do the right thing if num is zero (no looping) or negative (throw an error).

Can anyone tell my while my for loop in r is just running once?

Can anyone tell my while my for loop in r is just running once? The script is just attempting to create csv files for a list of about 200 subfiles within about 5 major files. Here is my code :
ImpactGrid<- function(num, condition, CONDITION){
#Set working directiory
for(i in num){
if(i <10){
filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID00000", i, ".txt", sep = "")
}
if(i >=10 & i < 100){
filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID0000", i, ".txt", sep = "")
}
if(i>= 100){
filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID000", i, ".txt", sep = "")
}
con <- file(filename, "r")
data <- readLines(con)
close(con)
q <- data[21:length(data)] # removes non data before the data begins
Impactdata <- read.table(text = q, sep=',',fill=TRUE,colClasses='character',header = TRUE )
Savename <- paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = "")
write.csv(Impactdata, Savename)
x <- read.csv(paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = ""))
return(head(x))
}
}
If you are trying to go from 1 to num, the code is:
for(i in 1:num)
for loops iterate over a vector but num has a length 1 so it iterates only 1 time.
You also need to remove the return statement from the body of the loop. Otherwise, it will always exit the first time it hits return.
While I think the 1:num is a good answer and may be a problem, it looks like the for loop encompasses everything including the last return() statement. So even if num were a vector, it'd only loop once through all the code and return() from the function after one loop.

Cross Product of specific file and rest of the folder -- R

I have a problems making R read a set of files in a folder and returning cross product of them.
I have a folder which contains one test.csv file and n train.csv files.
I need a loop to read though on folder and return a file that contain the cross product of test and each of the train files… so the rows of file should look like this.
test*train01
test*train02
test*train03
...
I wrote a script to make that for two defined line but don’t know how to adapt that for the whole folder and the pattern that I need.
data01 <- as.matrix(read.csv(file = "test.csv", sep = ",", header=FALSE))
data02 <- as.matrix(read.csv(file = "train.csv", sep = ",", header=FALSE))
test <- list()
test01<- list()
test02<- list()
i<- 1
while (i <= 25){
test01[[i]] <- c(data01[i, ])
test02[[i]] <- c(data02[i, ])
test[[i]]<- crossprod(test01[[i]],test02[[i]])
i <- i+1
}
write.csv(test, file="testing.csv", row.names = FALSE)
Try:
test <- function(data) {
data01 <- as.matrix(read.csv(file = "test.csv", sep = ",", header=FALSE))
data02 <- as.matrix(read.csv(file = data, sep = ",", header=FALSE))
test <- list()
test01<- list()
test02<- list()
i<- 1
while (i <= 25){
test01[[i]] <- c(data01[i, ])
test02[[i]] <- c(data02[i, ])
test[[i]]<- crossprod(test01[[i]],test02[[i]])
i <- i+1
}
return(test)
}
result <- lapply(list.files(pattern='Train.*'),test)
Then just loop result to save in CSV file.
EDIT: How to save:
files <- list.files(pattern='Train.*')
for (i in seq(length(result))) {
write.csv(result[[i]], paste0('result_',files[i]), row.names = FALSE)
}
EDIT: Saving in one file:
write.csv(do.call(rbind,result),'result.csv', row.names = FALSE) # Appending by row
or
write.csv(do.call(cbind,result),'result.csv', row.names = FALSE) # Appending by column

Resources