Related
I have a working program which pulls addresses from a list in Excel and geocodes them using a Google API, but anytime it gets to an address with an apartment, unit, or unfindable address, it stops the program.
I can't get a workable tryCatch routine going inside my loop. :(
Here is the Code:
library("readxl")
library(ggplot2)
library(ggmap)
fileToLoad <- file.choose(new = TRUE)
origAddress <- read_excel(fileToLoad, sheet = "Sheet1")
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
write.csv(origAddress, "geocoded1.csv", row.names=FALSE)
And here is the Error message:
Warning: Geocoding "[removed address]" failed with error:
You must use an API key to authenticate each request to Google Maps Platform APIs. For additional information, please refer to http://g.co/dev/maps-no-account
Error: Can't subset columns that don't exist.
x Location 3 doesn't exist.
i There are only 2 columns.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Unknown or uninitialised column: `lon`.
2: Unknown or uninitialised column: `lat`.
3: Unknown or uninitialised column: `geoAddress`.
Now, this is not an API key error because the key works in calls after the error -- and it stops at any address that ends in a number after the street name.
I'm going to be processing batches of thousands of addresses every month and they are not all going to be perfect, so what I need is to be able to skip these bad addresses, put "NA" in the lon/lat columns, and move on.
I'm new to R and can't make a workable error handling routine to handle these types of mistakes. can anyone point me in the right direction? Thanks in advance.
When geocode fails to find an address and output = "latlona", the address field is not returned. You code can be made to work with the following modification.
#
# example data
#
origAddress <- data.frame(addresses = c("white house, Washington",
"white house, # 100, Washington",
"white hose, Washington",
"Washington Apartments, Washington, DC 20001",
"1278 7th st nw, washington, dc 20001") )
#
# simple fix for fatal error
#
for(i in 1:nrow(origAddress))
{
result <- geocode(origAddress$addresses[i], output = "latlona",
source = "google")
origAddress$lon[i] <- result$lon[1]
origAddress$lat[i] <- result$lat[1]
origAddress$geoAddress[i] <- ifelse( is.na(result$lon[1]), NA, result$address[1] )
}
However, you mention that some of your addresses may not be exact. Google's geocoding will try to interpret all address you supply. Sometimes it fails and returns NA but other times its interpretation may not be correct so you should always check geocode results.
A simple method which will catch many errors to set output = "more" in geocode and then examine the values returned in the loctype column. If loctype != "rooftop", you may have a problem. Examing the type column will give you more information. This check isn't complete. To do a more complete check, you could use output = "all" to return all data supplied by google for an address but this requires parsing a moderately complex list. You should read more about the data returned by google geocoding at https://developers.google.com/maps/documentation/geocoding/overview
Also, geocode will take at least tens of minutes at least to return results for thousands of addresses. To minimize the response time, you should supply addresses to geocode as a character vector of addresses. A data frame of results is then returned which you can use to update your origAddress data frame and check for errors as shown below.
#
# Solution should check for wrongly interpreted addresses
#
# see https://developers.google.com/maps/documentation/geocoding/overview
# for more information on fields returned by google geocoding
#
# return all addresses in single call to geocode
#
origAddress <- data.frame(addresses = c("white house, Washington", # identified by name
"white hose, Washington", # misspelling
"Washington Apartments, apt 100, Washington, DC 20001", # identified by name of apartment building
"Washington Apartments, # 100, Washington, DC 20001", # invalid apartment number specification
"1206 7th st nw, washington, dc 20001") ) # address on street but no structure with that address
result <- suppressWarnings(geocode(location = origAddress$addresses,
output = "more",
source = "google") )
origAddress <- cbind(origAddress, result[, c("address", "lon","lat","type", "loctype")])
#
# Addresses which need to be checked
#
check_addresses <- origAddress[ origAddress$loctype != "rooftop" |
is.na(origAddress$loctype), ]
I have some code that loops over a list of study IDs (ids) and turns them into separate polygons/spatial points. On the first execution of the loop it produces the following error:
Error in (function (x) : attempt to apply non-function
This is from the raster::rasterToPoints function. I've looked at the examples in the help section for this function and passing fun=NULL seems to be an acceptable method (filters out all NA values). All the values are equal to 1 anyways so I tried passing a simple function like it suggests such as function(x){x==1}. When this didn't work, I also tried to just suppress the error message but without any luck using try() or tryCatch().
Main questions:
1. Why does this produce an error at all?
2. Why does it only display the error on the first run through the loop?
Reproducible example:
library(ggplot2)
library(raster)
library(sf)
library(dplyr)
pacific <- map_data("world2")
pac_mod <- pacific
coordinates(pac_mod) <- ~long+lat
proj4string(pac_mod) <- CRS("+init=epsg:4326")
pac_mod2 <- spTransform(pac_mod, CRS("+init=epsg:4326"))
pac_rast <- raster(pac_mod2, resolution=0.5)
values(pac_rast) <- 1
all_diet_density_samples <- data.frame(
lat_min = c(35, 35),
lat_max = c(65, 65),
lon_min = c(140, 180),
lon_max = c(180, 235),
sample_replicates = c(38, 278),
id= c(1,2)
)
ids <- all_diet_density_samples$id
for (idnum in ids){
poly1 = all_diet_density_samples[idnum,]
pol = st_sfc(st_polygon(list(cbind(c(poly1$lon_min, poly1$lon_min, poly1$lon_max, poly1$lon_max, poly1$lon_min), c(poly1$lat_min, poly1$lat_max, poly1$lat_max, poly1$lat_min, poly1$lat_min)))))
pol_sf = st_as_sf(pol)
x <- rasterize(pol_sf, pac_rast)
df1 <- raster::rasterToPoints(x, fun=NULL, spatial=FALSE) #ERROR HERE
df2 <- as.data.frame(df1)
density_poly <- all_diet_density_samples %>% filter(id == idnum) %>% pull(sample_replicates)
df2$density <- density_poly
write.csv(df2, paste0("pol_", idnum, ".csv"))
}
Any help would be greatly appreciated!
These are error messages, but not errors in the strict sense as the script continues to run, and the results are not affected. They are related to garbage collection (removal from memory of objects that are no longer in use) and this makes it tricky to pinpoint what causes it (below you can see a slightly modified example that suggests another culprit), and why it does not always happen at the same spot.
Edit (Oct 2022)
These annoying messages
Error in x$.self$finalize() : attempt to apply non-function
Error in (function (x) : attempt to apply non-function
Will disappear with the next release of Rcpp, which is planned for Jan 2023. You can also install the development version of Rcpp like this:
install.packages("Rcpp", repos="https://rcppcore.github.io/drat")
I am trying to estimate the static yield curve for Brazil using termstrc package in R. I am using the function estim_nss.couponbonds and putting 0% coupon-rates and $0 cash-flows, except for the last one which is $1000 (the face-value at maturity) -- as far as I know this is the function to do this, because the estim_nss.zeroyields only calculates the dynamic curve. The problem is that I receive the following error message:
"Error in (pos_cf[i] + 1):pos_cf[i + 1] : NA/NaN argument In addition: Warning message: In max(n_of_cf) : no non-missing arguments to max; returning -Inf "
I've tried to trace the problem using trace(estim_nss.couponbons, edit=T) but I cannot find where pos_cf[i]+1 is calculated. Based on the name I figured it could come from the postpro_bondfunction and used trace(postpro_bond, edit=T), but I couldn't find the calculation again. I believe "cf" comes from cashflow, so there could be some problem in the calculation of the cashflows somehow. I used create_cashflows_matrix to test this theory, but it works well, so I am not sure the problem is in the cashflows.
The code is:
#Creating the 'couponbond' class
ISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021','ltn_2023')) #Bond's identification
MATURITYDATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
ISSUEDATE <- as.Date(c(41288,41666,42395, 42073, 42395), origin='1899-12-30') #Dates are in system's format
COUPONRATE <- rep(0,5) #Coupon rates are 0 because these are zero-coupon bonds
PRICE <- c(969.32, 867.77, 782.48, 628.43, 501.95) #Prices seen 'TODAY'
ACCRUED <- rep(0.1,5) #There is no accrued interest in the brazilian bond's market
#Creating the cashflows sublist
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021', 'ltn_2023')) #Bond's identification
CF <- c(1000,1000,1000,1000,1000)# The face-values
DATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil)
class(mybonds) <- "couponbonds"
#Estimating the zero-yield curve
ns_res <-estim_nss.couponbonds(mybonds, 'brasil' ,method = "ns")
#Testing the hypothesis that the error comes from the cashflow matrix
cf_p <- create_cashflows_matrix(mybonds[[1]], include_price = T)
m_p <- create_maturities_matrix(mybonds[[1]], include_price = T)
b <- bond_yields(cf_p,m_p)
Note that I am aware of this question which reports the same problem. However, it is for the dynamic curve. Besides that, there is no useful answer.
Your code has two problems. (1) doesn't name the 1st list (this is the direct reason of the error. But if modifiy it, another error happens). (2) In the cashflows sublist, at least one level of ISIN needs more than 1 data.
# ...
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019',
'ltn_2021', 'ltn_2023', 'ltn_2023')) # added a 6th element
CF <- c(1000,1000,1000,1000,1000, 1000) # added a 6th
DATE <- as.Date(c(42736,43101,43466,44197,44927, 44928), origin='1899-12-30') # added a 6th
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil = brasil) # named the list
class(mybonds) <- "couponbonds"
ns_res <-estim_nss.couponbonds(mybonds, 'brasil', method = "ns")
Note: the error came from these lines
bonddata <- bonddata[group] # prepro_bond()'s 1st line (the direct reason).
# cf <- lapply(bonddata, create_cashflows_matrix) # the additional error
create_cashflows_matrix(mybonds[[1]], include_price = F) # don't run
I'm using ggmap to find locations. Some locations generates error. For example,
library(ggmap)
loc = 'Blue Grass Airport'
geocode(loc, output = c("more"))
results in
Error in data.frame(long_name = "Blue Grass Airport", short_name = "Blue Grass Airport", :
arguments imply differing number of rows: 1, 0
It's ok if I can't get results for some locations, but I'm trying to work on 100 locations in a list. So is there a way to get NA instead of error and keep things go on? E.g.,
library(ggmap)
loc = c('Blue Grass Airport', 'Boston MA', 'NYC')
geocode(loc, output = c("more"))
should generate
NA
Result for Boston
Result for New York City
You can make use of the R tryCatch() function to handle these errors gracefully:
loc = 'Blue Grass Airport'
x <- tryCatch(geocode(loc, output = c("more")),
warning = function(w) {
print("warning");
# handle warning here
},
error = function(e) {
print("error");
# handle error here
})
If you intend to loop over locations explicitly using a for loop or using an apply function, then tryCatch() should also come in handy.
IN looking for a way to modify the .Holiday object in the chron package I discovered this solution
How to define holidays for is.holiday() chron package in R
Which works very well in itself, except when I include "GBNewYearsEve" in hlist, I recieve an error:
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'GBNewYearsEve' of mode 'function' was not found
This error doesn't appear if GBNewYearsEve is removed from the list. What have I missed?
Example Working Code:
library(chron)
library(timeDate)
hlist <- c("GBMayDay", "GBBankHoliday", "GBSummerBankHoliday", "ChristmasEve", "ChristmasDay", "BoxingDay", "NewYearsDay")
(ss <- dates(sapply(sapply(hlist,holiday,year=(c(2011)),as.Date)))
.Holidays <- ss
chron::.Holidays ##nochange
unlockBinding(".Holidays", as.environment("package:chron"))
assignInNamespace(".Holidays", .Holidays, ns="chron",
envir=as.environment("package:chron"))
assign(".Holidays", .Holidays, as.environment("package:chron"))
lockBinding(".Holidays", as.environment("package:chron"))
chron::.Holidays ##change
Example non-working code:
hlist <- c("GBMayDay", "GBBankHoliday", "GBSummerBankHoliday", "ChristmasEve", "ChristmasDay", "BoxingDay", "NewYearsDay", "GBNewYearsEve")
(ss <- dates(sapply(sapply(hlist,holiday,year=2011),as.Date)))
Not sure this is an answer that will suit you. I was curious with your problem and I've downloaded the timeDate package from CRAN. Although it seems to be documented in ?holiday, I don't think the code is ready for GBNewYearsEve.
If I run your code as it is I get:
> hlist <- c("GBMayDay", "GBBankHoliday", "GBSummerBankHoliday", "ChristmasEve", "ChristmasDay", "BoxingDay", "NewYearsDay", "GBNewYearsEve")
>
> (ss <- dates(sapply(sapply(hlist,holiday,year=2011),as.Date)))
Error in get(as.character(FUN), mode = "function", envir = envir) :
el objeto 'GBNewYearsEve' de modo 'function' no fue encontrado
(Sorry for the mixture of languages, basically the error message is saying that GBNewYearsEve was not found. I actually don't find it in the code of timeDate. However, if I add a definition like this:
GBNewYearsEve =
function(year = getRmetricsOptions("currentYear")) {
ans = year*10000 + 1231
timeDate(as.character(ans)) }
(Which is basically copied from DENewYearsEve, the only definition for New Years' Eve present in the package)
Then I get your code running:
> (ss <- dates(sapply(sapply(hlist,holiday,year=2011),as.Date)))
GBMayDay GBBankHoliday GBSummerBankHoliday ChristmasEve ChristmasDay BoxingDay
05/02/11 05/30/11 08/29/11 12/24/11 12/25/11 12/26/11
NewYearsDay GBNewYearsEve
01/01/11 12/31/11
However I'm not sure how good a solution is this. Note that in dateTime, some additional transformations are done so that e.g. when the holiday falls in a weekend it is moved to the following day. With the code above, you get just the New Years' Eve on the 31th of December.
For example, this is in holiday-LONDON.R:
# New Year's Day: if it falls on Sat/Sun, then is
# moved to following Monday
posix1 <- as.POSIXlt(NewYearsDay(y))
if (posix1$wday == 0 | posix1$wday == 6) {
lon <- timeDate(.on.or.after(y, 1, 1, 1), zone = "London",
FinCenter = "Europe/London")
holidays <- c(holidays, as.character(lon))
} else {
holidays <- c(holidays, as.character(posix1))
}
I guess the package is handling only official holidays for each country, and adding those additional rules?