I asked a very similar question before but the answers i got dont seem to apply in this case. The aim of my code is primarily to take a file, manipulate it and the save the manipulated file over the old file. Unfortunately there are a lot of file so I have incorporated a for loop but it is stopping after just one run through the loop. I think my return function is in the right place and my for statement worked in a previous slightly different version of the script.
Here is my code:
AddLatLon<- function(num, condition){
#Set working directiory
# num is the number of files needing extraction e.g (3:5) , c(2,7)
connect <- file("./EOBS DATA/sources.txt", "r")
locdata <- readLines(connect)
close(connect)
info <- locdata[24:length(locdata)] # removes non data before the data begins
Infodata <- read.table(text = info, sep=',',fill=TRUE,colClasses='character',header ==TRUE )
InfoTable <- read.csv("./EOBS DATA/sources.csv")
InfoTable$STAID <- as.numeric(InfoTable$STAID)
for(i in c(num)){
filename <-paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = "")
#if(i <10){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID00000", i, ".txt", sep = "")
#}
#if(i >=10 & i < 100){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID0000", i, ".txt", sep = "")
#}
#if(i>= 100 & i <1000){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID000", i, ".txt", sep = "")
#}
#if(i>= 1000){
#filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID00", i, ".txt", sep = "")
#}
if(file.exists(filename) == FALSE) {
next
}
#con <- file(filename, "r")
#data <- readLines(con)
#close(con)
#q <- data[21:length(data)] # removes non data before the data begins
#Impactdata <- read.table(text = q, sep=',',fill=TRUE,colClasses='character',header = TRUE )
x <- read.csv(filename)
point <- match(i, InfoTable$STAID)
Lat <- InfoTable[point,5]
Lon <- InfoTable[point,6]
Lat <- as.character(Lat)
Lon <- as.character(Lon)
x$Lat <- Lat
x$Lon <- Lon
x$X <- NULL
x$DATE<- as.Date(as.character(x$DATE), format='%Y%m%d')
Savename <- paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = "")
if(condition == "rr"){
condition <- "Precipitation"
}
if(condition == "tn"){
condition <- "Minimum Temperature"
}
if(condition == "tx"){
condition <- "Maximum Temperature"
}
names(x)<- c("Station_ID", "Source_ID", "Date(yyyy-mm-dd)", condition, "Quality_code(0='valid'; 1='suspect')", "Latitude", "Longitude")
write.csv(x, Savename)
}
return(head(x))
}
num is not defined, but from the name I'm pretty sure you want to be looping over 1:num, not c(num). So just replace:
for(i in c(num)){
with
for(i in 1:num)){
or
for(i in seq_len(num)){
Why seq_len? It will do the right thing if num is zero (no looping) or negative (throw an error).
Related
Hope you don't mind if this is too easy for you.
In R, I am using fromJSON() to read from 3 urls (tier 1 url) , in the JSON file there is "link" field which give me another url (tier 2 url) and I use that and read.table() to get my final data. My code now is like this:
# note, this code does not run
urlJohn <- www.foo1.com
urlJane <- www.foo2.com
urlJoe <- www.foo3.com
tempJohn <- fromJson(urlJohn)
tempJohn[["data"]][["rows"]]$link %<>%
{clean up this data}
dataJohn <- read.table(tempJohn[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
tempJane <- fromJson(urlJane)
tempJane[["data"]][["rows"]]$link %<>%
{clean up this data}
dataJane <- read.table(tempJane[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
tempJoe <- fromJson(urlJoe)
tempJoe[["data"]][["rows"]]$link %<>%
{clean up this data}
dataJoe <- read.table(tempJoe[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
As you can see, I am just copying-n-pasting code blocks. What I wish is this:
# note, this code also does not run
urlJohn <- www.foo1.com
urlJane <- www.foo2.com
urlJoe <- www.foo3.com
source <- c("John", "Jane", "joe")
for (i in source){
temp <- paste(temp, i, sep = "")
url <- paste(url, i, sep = "")
data <- paste(data, i, sep = "")
temp <- fromJson(url)
temp[["data"]][["rows"]]$link %<>%
{clean up this data}
data <- read.table(temp[["data"]][["rows"]]$link,
header = TRUE,
sep = ",")
}
What do I need to do to make the for loop work? If my question is not clear, please ask me to clarify it.
I usually find using lapply convenient than a for loop. Although you can easily convert this to a for loop if needed.
URLs <- c('www.foo1.com', 'www.foo2.com', 'www.foo3.com')
lapply(URLs, function(x) {
temp <- jsonlite::fromJSON(x)
temp[["data"]][["rows"]]$link %<>% {clean up this data}
read.table(temp[["data"]][["rows"]]$link,header = TRUE,sep = ",")
}) -> list_data
list_data
Thanks to #Ronak Shah. The R community strongly favors "non-For-loop" solution.
The way to get my desired result is lapply.
Below is non-running codes in mnemonics:
URLs <- c('www.foo1.com', 'www.foo2.com', 'www.foo3.com')
lapply(URLs, function(x) {
temp <- jsonlite::fromJSON(x)
x <- temp[["data"]][["rows"]]$link %<>% {clean up this data}
y <- read.table(temp[["data"]][["rows"]]$link,header = TRUE,sep = ",")
return(list(x, y))
})
And this is a running example.
x <- list(alpha = 1:10,
beta = exp(-3:3),
logic = c(TRUE,FALSE,FALSE,TRUE))
lapply(x, function(x){
temp <- sum(x) / 2
temp2 <- list(x,
temp)
return(temp2)
}
)
I have a loop which downloads data from urls. Now I would like that for every x iterations, the information so far gets written away.
As such I have the following code:
baseurl <- "http://zoeken.kvk.nl/Address.ashx?site=handelsregister&partialfields=&q=010"
pages3 <- list()
for(i in 1:99999){
if(i < 10){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"00000",i), flatten=TRUE)
}
if(i < 100 & i >= 10){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"0000",i), flatten=TRUE)
}
if(i < 1000 & i >= 100){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"000",i), flatten=TRUE)
}
if(i < 10000 & i >= 1000){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"00",i), flatten=TRUE)
}
if(i < 100000 & i >= 10000){
mydata <- RJSONIO::fromJSON(paste0(baseurl,"0",i), flatten=TRUE)
}
if(i < 1000000 & i >= 100000){
mydata <- RJSONIO::fromJSON(paste0(baseurl,i), flatten=TRUE)
}
mydata <- RJSONIO::fromJSON(paste0(baseurl,i), flatten=TRUE)
pages3[[i]] <- mydata$resultatenHR
options(timeout = 4000000)
if(i %% 100 == 0){Sys.sleep(5)}
if(i %% 1000 == 0){
final_df<-do.call(rbind,pages3)
final<- Reduce(function(x,y) merge(x, y, all=TRUE), final_df)
mytime <- format(Sys.time(), "%b_%d_%H_%M_%S_%Y")
myfile <- file.path(R(), paste0(mytime, "_", i, ".csv"))
write.csv2(final, file = myfile, sep = "", row.names = FALSE, col.names = FALSE,
quote = FALSE, append = FALSE)
}
}
}
}
However, nothing gets saved in the meantime? Where did I go wrong with the code? Thank you for your insights.
I think that your problem is probably in:
myfile <- file.path(R(), paste0(mytime, "_", i, ".csv"))
As R thinks R() is a function.
Error in R() : could not find function "R"
You can change it to getwd() if you want (don't forget to set working directory first setwd()) or specify a different path .
In addition, here: write.csv2(final, file = myfile, sep = "", row.names = FALSE, col.names = FALSE, quote = FALSE, append = FALSE), your forgot to write paste(), and for convince you can delete the default arguments you used.
write.csv2(final, file = paste(myfile, sep = "" ))
#Edit
This is probably not the most efficient way but it will probably do the trick.
The main issue is that you append the pages3 list object/csv file by the url index. If you'd create a new index for pages3, you can reset it every i urls.
setwd("Your working directory path")
baseurl <- "http://zoeken.kvk.nl/Address.ashx?site=handelsregister&partialfields=&q=010"
pages3 <- list()
#Counter for the url loop
i <- 1
#Counter for the appended csv file/ list object pages3
k <- 1
for(i in 1:99999){
#Read JSON file by i index
mydata <- RJSONIO::fromJSON(paste0(baseurl,i), flatten=TRUE)
#Appending to the Pages3 list object by k index
pages3[[k]] <- mydata$resultatenHR
# Increasing the k counter
k <- k + 1
options(timeout = 4000000)
if(i %% 100 == 0) {Sys.sleep(5)}
if(i %% 1000 == 0) {
final_df <- do.call(rbind, pages3)
final <- Reduce(function(x,y) merge(x, y, all=TRUE), final_df)
mytime <- format(Sys.time(), "%b_%d_%H_%M_%S_%Y")
myfile <- file.path(getwd(), paste0(mytime, "_", i, ".csv"))
write.csv2(final, file = paste(myfile, sep = "" ))
#Resetting the pages3 list object
pages3 <- NULL
#Resting the k index counter
k <- 1
}
}
However, depending on your computer/the size of the files you try to import, maybe it would be more efficient to save and split to different csv files when you finished imported all the urls.
any help will be appreciated.
I used the following code to break down my large csv file (4gb) and now I am trying to save the 2nd, 3rd... part into a csv. However, I can only access the first chunk of my data.
Is there anything wrong with my code?
How do I save the second chunk of my data into csv?
rgfile <- 'filename.csv'
index <- 0
chunkSize <- 100000
con <- file(description = rgfile, open="r")
dataChunk <- read.table(con, nrows= chunkSize, header=T, fill= TRUE, sep= ",")
actualColumnNames <- names(dataChunk)
repeat {
index <- index + 1
print(paste('Processing rows:', index * chunkSize))
if (nrow(dataChunk) != chunkSize){
print('Processed all files!')
break
}
dataChunk <- read.table(
con, nrows = chunkSize, skip=0, header = FALSE,
fill=TRUE, sep = ",", col.names=actualColumnNames
)
break
}
library(tidyverse)
library(nycflights13)
# make the problelm reproducible
rgfile <- 'flights.csv'
write_csv(flights, rgfile)
# now, get to work
lines <- as.numeric(R.utils::countLines(rgfile))
chunk_size <- 100000
hdr <- read_csv(rgfile, n_max=2)
fnum <- 1
for (i in seq(1, lines, chunk_size)) {
suppressMessages(
read_csv(
rgfile, col_names=colnames(hdr), skip=(i-1), n_max=chunk_size
)
) -> x
if (i>1) colnames(x) <- colnames(hdr)
write_csv(x, sprintf("file%03d.csv", fnum))
fnum <- fnum + 1
}
Is there an R function for parsing INI like configuration files?
While searching I only found this discussion.
Here is an answer that was given to exact the same question on r-help in 2007 (thanks to #Spacedman for pointing this out):
Parse.INI <- function(INI.filename)
{
connection <- file(INI.filename)
Lines <- readLines(connection)
close(connection)
Lines <- chartr("[]", "==", Lines) # change section headers
connection <- textConnection(Lines)
d <- read.table(connection, as.is = TRUE, sep = "=", fill = TRUE)
close(connection)
L <- d$V1 == "" # location of section breaks
d <- subset(transform(d, V3 = V2[which(L)[cumsum(L)]])[1:3],
V1 != "")
ToParse <- paste("INI.list$", d$V3, "$", d$V1, " <- '",
d$V2, "'", sep="")
INI.list <- list()
eval(parse(text=ToParse))
return(INI.list)
}
Actually, I wrote a short and presumably buggy function (i.e. not covering all corner cases) which works for me now:
read.ini <- function(x) {
if(length(x)==1 && !any(grepl("\\n", x))) lines <- readLines(x) else lines <- x
lines <- strsplit(lines, "\n", fixed=TRUE)[[1]]
lines <- lines[!grepl("^;", lines) & nchar(lines) >= 2] # strip comments & blank lines
lines <- gsub("\\r$", "", lines)
idx <- which(grepl("^\\[.+\\]$", lines))
if(idx[[1]] != 1) stop("invalid INI file. Must start with a section.")
res <- list()
fun <- function(from, to) {
tups <- strsplit(lines[(from+1):(to-1)], "[ ]*=[ ]*")
for (i in 1:length(tups))
if(length(tups[[i]])>2) tups[[i]] <- c(tups[[i]][[1]], gsub("\\=", "=", paste(tail(tups[[i]],-1), collapse="=")))
tups <- unlist(tups)
keys <- strcap(tups[seq(from=1, by=2, length.out=length(tups)/2)])
vals <- tups[seq(from=2, by=2, length.out=length(tups)/2)]
sec <- strcap(substring(lines[[from]], 2, nchar(lines[[from]])-1))
res[[sec]] <<- setNames(vals, keys)
}
mapply(fun, idx, c(tail(idx, -1), length(lines)+1))
return(res)
}
where strcap is a helper function that capitalizes a string:
strcap <- function(s) paste(toupper(substr(s,1,1)), tolower(substring(s,2)), sep="")
There are also some C solutions for this, like inih or libini that might be useful. I did not try them out, though.
Can anyone tell my while my for loop in r is just running once? The script is just attempting to create csv files for a list of about 200 subfiles within about 5 major files. Here is my code :
ImpactGrid<- function(num, condition, CONDITION){
#Set working directiory
for(i in num){
if(i <10){
filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID00000", i, ".txt", sep = "")
}
if(i >=10 & i < 100){
filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID0000", i, ".txt", sep = "")
}
if(i>= 100){
filename <- paste("./EOBS DATA/ECA_blend_", condition, "/" ,CONDITION, "_STAID000", i, ".txt", sep = "")
}
con <- file(filename, "r")
data <- readLines(con)
close(con)
q <- data[21:length(data)] # removes non data before the data begins
Impactdata <- read.table(text = q, sep=',',fill=TRUE,colClasses='character',header = TRUE )
Savename <- paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = "")
write.csv(Impactdata, Savename)
x <- read.csv(paste("./EOBS DATA/",condition, "_csv_data/", condition,i, ".csv", sep = ""))
return(head(x))
}
}
If you are trying to go from 1 to num, the code is:
for(i in 1:num)
for loops iterate over a vector but num has a length 1 so it iterates only 1 time.
You also need to remove the return statement from the body of the loop. Otherwise, it will always exit the first time it hits return.
While I think the 1:num is a good answer and may be a problem, it looks like the for loop encompasses everything including the last return() statement. So even if num were a vector, it'd only loop once through all the code and return() from the function after one loop.