How can I ignore a data set if some column names don't exist in it?
I have a list of weather data from a stream but I think certain key weather conditions don't exist and therefore I have this error below with rbind:
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
My code:
weatherDf <- data.frame()
for(i in weatherData) {
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
So how can I make sure these key conditions below exist in the data:
meanwindspdi
meanwdird
meantempm
humidity
If any of them does not exit, then ignore the bunch of them. Is it possible?
EDIT:
The content of weatherData is in jsfiddle (I can't post it here as it is too long and I dunno where is the best place to show the data publicly for R...)
EDIT 2:
I get some error when I try to export the data into a txt:
> write.table(weatherData,"/home/teelou/Desktop/data/data.txt",sep="\t",row.names=FALSE)
Error in data.frame(date = list(pretty = "January 1, 1970", year = "1970", :
arguments imply differing number of rows: 1, 0
What does it mean? It seems that there are some errors in the data...
EDIT 3:
I have exported my entire data in .RData to my google drive:
https://drive.google.com/file/d/0B_w5RSQMxtRSbjdQYWJMX3pfWXM/view?usp=sharing
If you use RStudio, then you can just import the data.
EDIT 4:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
# If it has data then loop it.
if (!is.null(weatherData)) {
# Initialize a data frame.
weatherDf <- data.frame()
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
# Rename column names.
colnames(weatherDf) <- c("airport", "key_date", "ws", "wd", "tempi", 'humidity')
# Convert certain columns weatherDf type to numberic.
columns <-c("ws", "wd", "tempi", "humidity")
weatherDf[, columns] <- lapply(columns, function(x) as.numeric(weatherDf[[x]]))
}
Inspect the weatherDf:
> View(weatherDf)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
You can use next to skip the current iteration of the loop and go to the next iteration:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# continue with loop...
Related
I am new to R and trying to run a for loop which produces a dataframe for each run I would like to store each data frame into the list and later concatenate it as one data frame in R. I am trying to achieve like below but it throws error. What is the best way to append data frame into a list in for loop and concatenate list of data frames as one?
library(dplyr)
library(sf)
library(osrm)
df<- read.csv('csv path')
liist<-list()
for(i in df$h3_longitude) {
for (j in df$h3_latitude){
iso <- osrmIsochrone(loc = c(i, j), breaks = seq(0,90,30),osrm.profile='car')
liist[[iso]]<- iso
bind_rows(liist)
}
}
error:-
Error in `[[<-`(`*tmp*`, iso, value = iso) :
invalid subscript type 'list'
Error in `[[<-`(`*tmp*`, iso, value = iso): invalid subscript type 'list'
The problem is liist[[iso]]<- iso.
iso as the output of osrmIsochrone() is a dataframe.
You use this dataframe as an index to liist - that can't go well.
The for-loops
for(i in df$h3_longitude) {
for (j in df$h3_latitude){
}
}
show that you want all combinations of i and j from the longitude and latitude columns of df.
But what you want is a flat list.
In R, you do this better with expand.grid(vec1, vec2).
library(osrm)
df<- read.csv('csv path')
pairings <- expand.grid(df$h3_longitude, df$h3_latitude)
dfs <- lapply(data.frame(t(pairings)), function(v) {
osrmIsochrone(loc=c(v[1], v[2]),
breaks=seq(0,90,30),
osrm.profile='car')
})
res_df <- Reduce(rbind, dfs)
With a for-loop it would look like this:
library(osrm)
df<- read.csv('csv path')
pairings <- expand.grid(df$h3_longitude, df$h3_latitude)
dfs <- list()
for (i in 1:nrow(pairings)) {
v <- pairings[i,]
dfs[[i]] <- osrmIsochrone(loc=c(v[1], v[2]),
breaks=seq(0,90,30),
osrm.profile='car')
}
res_df <- Reduce(rbind, dfs)
I designed the following scripts months ago and it worked without any issue. The last few days I tried to rerun the same script but always got the same error. I've changed my script and updating the packages, but i'm unable to make it work again. The script should give me all the delays in the Belgian railroad stations.
I have add my two separate scripts (one is filled with the functions) and the error/traceback.
library(httr)
library(jsonlite)
library(tidyverse)
load.stations <- function(){
a <- GET("https://api.irail.be/stations/?format=json") #get command for all stations from irail api
parsed <- jsonlite::fromJSON(content(a, "text"), flatten=TRUE) #parse json into r
stations <- parsed$station %>%
filter(grepl("^BE.NMBS.0088",id)) #keep only stations in Belgium. Regular expression ^ is begins with
return(stations)
}
get.time <- function(){
time <- paste(format(Sys.time(),"%d/%m/%y %H:%M:%S")) #formats system time in dd/mm/yyyy hh:mm:ss in a string
strpt <- strptime(time,"%d/%m/%y %H:%M:%S") #takes time-string and converts to interpretable date and time
return(strpt)
}
get.temp_df <- function(stations, i){
goget <- paste0("https://api.irail.be/liveboard/?format=json&id=",stations$id[i]) #http for get command, get liveboard (similar to screens in station i)
c <- GET(goget) #get the data
parsed_c <- jsonlite::fromJSON(content(c, "text"), flatten=TRUE) #parse from json
temp_df <- parsed_c$departures$departure #get the dataframe with departures from the parsed json
return(temp_df)
}
add.to.all <- function(all_df, temp_df){
all_df <- rbind(all_df,temp_df)%>% #add temporary dataframe to master dataframe
group_by(stationneke,time,vehicle)%>% # group departure times by station - remove doubles
top_n(1,importtime)%>% #only keep the most recent observation - remove doubles 2
ungroup() #lift grouping
return(all_df)
}
save.day <- function(all_df){
strpt <- get.time()
saveRDS(all_df,file = paste(strpt$mday, strpt$mon+1, strpt$year+1900,"Punct.rda",sep = "-"))
Sys.sleep(time = 3600-(strpt$min*60+strpt$sec)) #sleep one hour minus number of secs in the sleep time
return(data.frame())
}
library(httr)
library(jsonlite)
library(tidyverse)
## all departures - scraper
loop.scraper <- function(hour_of_pause =3){
source("NMBS-punctuality-functions.R")
all_df <- data.frame() #leeg dataframe
stations <- load.stations()
while (TRUE) { #infinite loop
strpt <- get.time()
while(strpt$hour != hour_of_pause){ #enters loop when hour is not "hour_of_pause"
# startloop <- (strpt$min*60 + strpt$sec)
for (i in 1:nrow(stations)) { #second loop through the stations
temp_df <- get.temp_df(stations, i)
if(is.null(temp_df)) next #skip if dataframe is empty (some stations have been closed in recent years)
temp_df$stationneke <- stations$name[i] #add departure station name i to the dataframe
temp_df$importtime <- Sys.time() # add variable with the time of import of the observation
all_df <- add.to.all(all_df, temp_df)
strpt <- get.time()
} #end of loop through stations
# stoploop <- (strpt$min*60 + strpt$sec)
} #end of hour-check loop, code below only executed when no trains active (at night)
all_df <- save.day(all_df) #saves file and returns empty dataframe
}
}
Error: lexical error: invalid char in json text.
<br /> <b>Fatal error</b>: Unc
(right here) ------^
5.
parse_string(txt, bigint_as_char)
4.
parseJSON(txt, bigint_as_char)
3.
parse_and_simplify(txt = txt, simplifyVector = simplifyVector,
simplifyDataFrame = simplifyDataFrame, simplifyMatrix = simplifyMatrix,
flatten = flatten, ...)
2.
jsonlite::fromJSON(content(c, "text"), flatten = TRUE)
1.
loop.scraper(12)
I'm basically trying to call an API to retrieve weather information from a government website.
library(data.table)
library(jsonlite)
library(httr)
base<-"https://api.data.gov.sg/v1/environment/rainfall"
date1<-"2020-01-25"
call1<-paste(base,"?","date","=",date1,sep="")
get_rainfall<-GET(call1)
get_rainfall_text<-content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.frame(get_rainfall_json)
I'm getting an error
"Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 52, 287, 1"
Not too sure how to resolve this, i'm trying to format the retrieved data into a dataframe format so i can make sense of the readings.
Your "get_rainfall_json" object comes back as a "list". Trying to turn this into a data frame is where you are getting the error. If you specify the "items" object within the list, your error is resolved! (The outcome of this looks like it has some more embedded data within objects... So you'll have to parse through that into a format you're interested in.)
get_rainfall_df <- as.data.frame(get_rainfall_json$items)
Update
In order to loop through the next data frame. Here is one way you could do it. Which loops through each row, extracts the list in each row and turns that into a data frame and appends it to the "df". Then, you are left with one final df with all the data in one place.
library(data.table)
library(jsonlite)
library(httr)
library(dplyr)
base <- "https://api.data.gov.sg/v1/environment/rainfall"
date1 <- "2020-01-25"
call1 <- paste(base, "?", "date", "=", date1, sep = "")
get_rainfall <- GET(call1)
get_rainfall_text <- content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.table(get_rainfall_json$items)
df <- data.frame()
for (row in 1:nrow(get_rainfall_df)) {
new_date <- get_rainfall_df[row, ]$readings[[1]]
colnames(new_date) <- c("stationid", "value")
date <- get_rainfall_df[row, ]$timestamp
new_date$date <- date
df <- bind_rows(df, new_date)
}
I am trying to use the gmapsdistance package in R to calculate the journey time by public transport between a list of postcodes (origin) and a single destination postcode.
The output for a single query is:
$Time
[1] 5352
$Distance
[1] 34289
$Status
[1] "OK"
I actually have 2.5k postcodes to use but whilst I troubleshoot it I have set the iterations to 10. london1 is a dataframe containing a single column with 2500 postcodes in 2500 rows.
This is my attempt so far;
results <- for(i in 1:10) {
gmapsdistance::set.api.key("xxxxxx")
gmapsdistance::gmapsdistance(origin = "london1[i]"
destination = "WC1E 6BT"
mode = "transit"
dep_date = "2017-04-18"
dep_time = "09:00:00")}
When I run this loop I get
results <- for(i in 1:10) {
+ gmapsdistance::set.api.key("AIzaSyDFebeOppqSyUGSut_eGs8JcjdsgPBo8zk")
+ gmapsdistance::gmapsdistance(origin = "london1[i]"
+ destination = "WC1E 6BT"
Error: unexpected symbol in:
" gmapsdistance::gmapsdistance(origin = "london1[i]"
destination"
mode = "transit"
dep_date = "2017-04-18"
dep_time = "09:00:00")}
Error: unexpected ')' in " dep_time = "09:00:00")"
My questions are:
1)How can I fix this?
2) How do I need to format this, so the output is a dataframe or matrix containing the origin postcode and journey time
Thanks
There are a few things going on here:
"london[i]" needs to be london[i, 1]
you need to separate your arguments with commas ,
I get an error when using, e.g., "WC1E 6BT", I found it necessary to replace the space with a dash, like "WC1E-6BT"
the loop needs to explicitly assign values to elements of results
So your code would look something like:
library(gmapsdistance)
## some example data
london1 <- data.frame(postCode = c('WC1E-7HJ', 'WC1E-6HX', 'WC1E-7HY'))
## make an empty list to be filled in
results <- vector('list', 3)
for(i in 1:3) {
set.api.key("xxxxxx")
## fill in your results list
results[[i]] <- gmapsdistance(origin = london1[i, 1],
destination = "WC1E-6BT",
mode = "transit",
dep_date = "2017-04-18",
dep_time = "09:00:00")
}
It turns out you don't need a loop---and probably shouldn't---when using gmapsdistance (see the help doc) and the output from multiple inputs also helps in quickly formatting your output into a data.frame:
set.api.key("xxxxxx")
temp1 <- gmapsdistance(origin = london1[, 1],
destination = "WC1E-6BT",
mode = "transit",
dep_date = "2017-04-18",
dep_time = "09:00:00",
combinations = "all")
The above returns a list of data.frame objects, one each for Time, Distance and Status. You can then easily make those into a data.frame containing everything you might want:
res <- data.frame(origin = london1[, 1],
desination = 'WC1E-6BT',
do.call(data.frame, lapply(temp1, function(x) x[, 2])))
lapply(temp1, function(x) x[, 2]) extracts the needed column from each data.frame in the list, and do.call puts them back together as columns in a new data.frame object.
I'm retrieving one minute quotes from google. After processing the data I try to create an xts object with one minute intervals but get same datetime repeated several times but don't understand why. Note that if I use the same data to build a vector of timestamps called my.dat2it does work.
library(xts)
url <- 'https://www.google.com/finance/getprices?q=IBM&i=60&p=15d&f=d,o,h,l,c,v'
x <- read.table(url,stringsAsFactors = F)
mynam <- unlist(strsplit(unlist(strsplit(x[5,], split='=', fixed=TRUE))[2] , split=','))
interv <- as.numeric(unlist(strsplit(x[4,], split='=', fixed=TRUE))[2])
x2 <- do.call(rbind,strsplit(x[-(1:7),1],split=','))
rownames(x2) <- NULL
colnames(x2) <- mynam
ind <- which(nchar(x2[,1])>5)
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
#To convert from data.frame to numeric
class(x2) <- 'numeric'
my.dat <- rep(0,nrow(x2))
#Convert all to same format
for (i in 1:nrow(x2)) {
if (nchar(x2[i,1])>5) {
ini.dat <- x2[i,1]
my.dat[i] <- ini.dat
} else {
my.dat[i] <- ini.dat+interv*x2[i,1]
}
}
df <- xts(x2[,-1],as.POSIXlt(my.dat, origin = '1970-01-01'))
head(df,20)
my.dat2 <- as.POSIXlt(my.dat, origin = '1970-01-01')
head(my.dat2,20)
I tried a simpler example simulating the data and creating a sequence of dates by minute to create the xts object and it worked so it must be something that I'm missing when passing the dates to the xts function.
Your my.dat object has duplicated values and xts and zoo objects must be ordered, so all the duplicate values are being grouped together.
The issue is this line, where you only take the second element, rather than every non-blank element.
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
# this should be
x2[ind,1] <- sapply(strsplit(x2[ind,1], split='a', fixed=TRUE), "[[", 2)