Geocoding with R: Errors stopping program altogether - r

I have a working program which pulls addresses from a list in Excel and geocodes them using a Google API, but anytime it gets to an address with an apartment, unit, or unfindable address, it stops the program.
I can't get a workable tryCatch routine going inside my loop. :(
Here is the Code:
library("readxl")
library(ggplot2)
library(ggmap)
fileToLoad <- file.choose(new = TRUE)
origAddress <- read_excel(fileToLoad, sheet = "Sheet1")
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
write.csv(origAddress, "geocoded1.csv", row.names=FALSE)
And here is the Error message:
Warning: Geocoding "[removed address]" failed with error:
You must use an API key to authenticate each request to Google Maps Platform APIs. For additional information, please refer to http://g.co/dev/maps-no-account
Error: Can't subset columns that don't exist.
x Location 3 doesn't exist.
i There are only 2 columns.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Unknown or uninitialised column: `lon`.
2: Unknown or uninitialised column: `lat`.
3: Unknown or uninitialised column: `geoAddress`.
Now, this is not an API key error because the key works in calls after the error -- and it stops at any address that ends in a number after the street name.
I'm going to be processing batches of thousands of addresses every month and they are not all going to be perfect, so what I need is to be able to skip these bad addresses, put "NA" in the lon/lat columns, and move on.
I'm new to R and can't make a workable error handling routine to handle these types of mistakes. can anyone point me in the right direction? Thanks in advance.

When geocode fails to find an address and output = "latlona", the address field is not returned. You code can be made to work with the following modification.
#
# example data
#
origAddress <- data.frame(addresses = c("white house, Washington",
"white house, # 100, Washington",
"white hose, Washington",
"Washington Apartments, Washington, DC 20001",
"1278 7th st nw, washington, dc 20001") )
#
# simple fix for fatal error
#
for(i in 1:nrow(origAddress))
{
result <- geocode(origAddress$addresses[i], output = "latlona",
source = "google")
origAddress$lon[i] <- result$lon[1]
origAddress$lat[i] <- result$lat[1]
origAddress$geoAddress[i] <- ifelse( is.na(result$lon[1]), NA, result$address[1] )
}
However, you mention that some of your addresses may not be exact. Google's geocoding will try to interpret all address you supply. Sometimes it fails and returns NA but other times its interpretation may not be correct so you should always check geocode results.
A simple method which will catch many errors to set output = "more" in geocode and then examine the values returned in the loctype column. If loctype != "rooftop", you may have a problem. Examing the type column will give you more information. This check isn't complete. To do a more complete check, you could use output = "all" to return all data supplied by google for an address but this requires parsing a moderately complex list. You should read more about the data returned by google geocoding at https://developers.google.com/maps/documentation/geocoding/overview
Also, geocode will take at least tens of minutes at least to return results for thousands of addresses. To minimize the response time, you should supply addresses to geocode as a character vector of addresses. A data frame of results is then returned which you can use to update your origAddress data frame and check for errors as shown below.
#
# Solution should check for wrongly interpreted addresses
#
# see https://developers.google.com/maps/documentation/geocoding/overview
# for more information on fields returned by google geocoding
#
# return all addresses in single call to geocode
#
origAddress <- data.frame(addresses = c("white house, Washington", # identified by name
"white hose, Washington", # misspelling
"Washington Apartments, apt 100, Washington, DC 20001", # identified by name of apartment building
"Washington Apartments, # 100, Washington, DC 20001", # invalid apartment number specification
"1206 7th st nw, washington, dc 20001") ) # address on street but no structure with that address
result <- suppressWarnings(geocode(location = origAddress$addresses,
output = "more",
source = "google") )
origAddress <- cbind(origAddress, result[, c("address", "lon","lat","type", "loctype")])
#
# Addresses which need to be checked
#
check_addresses <- origAddress[ origAddress$loctype != "rooftop" |
is.na(origAddress$loctype), ]

Related

Reverse Geo Coding in R

I would like to reverse geo code address and pin code in R
These are the columns
A B C
15.3859085 74.0314209 7J7P92PJ+9H77QGCCCC
I have taken first four rows having columns A B and C among 1000's of rows
df<-ga.data[1:4,]
df <- cbind(df,do.call(rbind,
lapply(1:nrow(df),
function(i)
revgeocode(as.numeric(
df[i,3:1]), output = "more")
[c("administrative_area_level_1","locality","postal_code","address")])))
Error in revgeocode(as.numeric(df[i, 3:1]), output = "more") :
is.numeric(location) && length(location) == 2 is not TRUE
Also is there any other package or approach to find out the address and pincode most welcome
I also tried the following
When I tried using ggmap I got this error
In revgeocode(as.numeric(df[i, c("Latitude", "Longitude")]), output = "address") :
HTTP 400 Bad Request
Also i tried this
revgeocode(c(df$B[1], df$A[1]))
Warning Warning message: In revgeocode(c(df$Longitude[1],
df$Latitude[1])) : HTTP 400 Bad Request
Also I am from India and it does not work for me if i search for lat long of India. If I use lat long of US it gives me the exact address
seems fishy
data <- read.csv(text="ID, Longitude, Latitude
311175, 41.298437, -72.929179
292058, 41.936943, -87.669838
12979, 37.580956, -77.471439")
library(ggmap)
result <- do.call(rbind,
lapply(1:nrow(data),
function(i)revgeocode(as.numeric(data[i,3:2]))))
data <- cbind(data,result)
The current CRAN version of revgeo_0.15 does not have the revgeocode function. If you upgrade to this version, you'll find a revgeo function, which takes longitude, latitude arguments. Your column C should not be passed into the function.
revgeo::revgeo(latitude=df[, 'A'], longitude=df[, 'B'], output='frame')
[1] "Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=74.0314209&lat=15.3859085"
housenumber street city state zip country
1 House Number Not Found Street Not Found Borim Goa Postcode Not Found India

Geocoding Data Locations With Google in R

I am trying to use very well written instructions from this blog: https://www.jessesadler.com/post/geocoding-with-r/ to geocode locational data in R including specific cites and cities in Hawaii. I am having issues pulling information from Google. When running mutate_geocode my data runs but no output is gathered. I bypassed this for the time being with manual entry of lat and lon for just one location of my dataset, attempting to trouble shoot. Now, when I use get_googlemap, I get the error message "Error in Download File"
I have tried using mutate_geocode as well as running a loop using geocode. I either do not get output or I get the OVER_QUERY_LIMIT error (which seems to be very classic). After checking my query limit I am nowhere near the limit.
Method 1:
BH <- rename(location, place = Location)
BH_df <- as.data.frame(BH)
location_df <- mutate_geocode(HB, Location)
Method 2:
origAddress <- read.csv("HSMBH.csv", stringsAsFactors = FALSE)
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{
result <- geocode(HB$Location[i], output = "latlona", source = "google")
HB$lon[i] <- as.character(result[1])
HB$lat[i] <- as.character(result[2])
HB$geoAddress[i] <- as.character(result[3])
}
Post Manual Entry of lon and lat points I run in to this error:
map <- get_googlemap(center = c(-158.114, 21.59), zoom = 4)
I am hoping to gather lat and lon points for my locations, and then be able to use get_googlemap to draft a map with which I can plot density points of occurrences (I have the code for the points already).
Alternatively, you can use a one-liner for rapid geocoding via tmaptools::geocode_OSM():
Data
library(tmaptools)
addresses <- data.frame(address = c("New York", "Berlin", "Huangpu Qu",
"Vienna", "St. Petersburg"),
stringsAsFactors = FALSE)
Code
result <- lapply(addresses[, 1], geocode_OSM)
> result
$address
query lat lon lat_min lat_max lon_min lon_max
1 New York 40.73086 -73.98716 40.47740 40.91618 -74.25909 -73.70018
2 Berlin 52.51704 13.38886 52.35704 52.67704 13.22886 13.54886
3 Huangpu Qu 31.21823 121.48030 31.19020 31.24653 121.45220 121.50596
4 Vienna 48.20835 16.37250 48.04835 48.36835 16.21250 16.53250
5 St. Petersburg 27.77038 -82.66951 27.64364 27.91390 -82.76902 -82.54062
This way, you have both
the centroids (lon, lat) that are important for Google Maps and
boundary boxes (lon_min, lat_min, lon_max, lat_max) that mapping services like OSM or Stamen need.

Problems with reverse geocoding loops with latitude longitude co-ordinates using googleway: r gives the same results for different co-ordinates

Here is my sample dataset (called origAddress):
lat lng
1.436316 103.8299
1.375093 103.8516
1.369347 103.8398
1.367353 103.8426
I have many more rows of latitude and longitude numbers (330) and I would like to find the address. I have used this for loop to do that:
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- google_reverse_geocode(location = c(origAddress$lat[i],origAddress$lng[i]),
key = key,
location_type = "rooftop")
if(is.null(result) || length(dim(result)) < 2 || !nrow(result)) next
origAddress$venadd <- geocode_address(result)
}
It works for the first three or four rows but then returns the same address as the first row although the latitude and longitude numbers are definitely different. I have looked at other stackoverflow questions(here) and tried to copy their approach with similar bad results.
Please help!
It looks like the calls to google_geocode can return more than one address for each lat/longitude pair thus you could be overwriting your data in the output data frame.
Also, I am not sure your if statement is evaluating properly.
Here is my attempt on your problem:
library(googleway)
origAddress<-read.table(header = TRUE, text = "lat lng
1.436316 103.8299
1.375093 103.8516
1.369347 103.8398
1.367353 103.8426")
#add the output column
origAddress$venadd<-NA
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- google_reverse_geocode(location = c(origAddress$lat[i],origAddress$lng[i]),
key=key,
location_type = "rooftop")
#add a slight pause so not to overload the call requests
Sys.sleep(1)
if(result$status =="OK" ){
#multiple address can be returned with in gecode request picks the first one
origAddress$venadd[i] <- result$results$formatted_address[1]
#use this to collect all addresses:
#paste(result$results$formatted_address, collapse = " ")
}
}
Since the call to google_reverse_geocode returns the address, I just pull the first address from the result saving a call to the internet (performance improvement). Also since the call returns a status, I check for an OK and if exist save the first address.
Hope this helps.

eliminating words until the result is defined

I'm currently making code returning the lat and long of the address by using geocode.
library(ggmap)
name <- "720-37, Chorok-ro, Yanggam-myeon, Hwaseong-si, Gyeonggi-do, Republic of Korea"
address <- geocode(name)
df <- data.frame(lat = as.numeric(address[2]), lon = as.numeric(address[1]))
if the address in the name sentence have no result in google, it automatically returns NA for lat an long.
Therefore how can I keep eliminating the words until there is a result (Usually, if the address is too specified, there is no result). In this case, if "720-37, chorok-ro" is eliminated, it works.
As I mentioned in my comment, I can use the geocode function from the ggmap package to geocode the address you provided, so it is not a good test case. I thus changed your test case by adding two new words in the beginning.
# Test case
name <- "Alpha, Beta, 720-37, Chorok-ro, Yanggam-myeon, Hwaseong-si, Gyeonggi-do, Republic of Korea"
Here I showed that the standard geocode function will not work.
library(ggmap)
geocode(name)
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Alpha,%20Beta,%20720-37,%20Chorok-ro,%20Yanggam-myeon,%20Hwaseong-si,%20Gyeonggi-do,%20Republic%20of%20Korea&sensor=false
lon lat
1 NA NA
Warning message:
geocode failed with status ZERO_RESULTS, location = "Alpha, Beta, 720-37, Chorok-ro, Yanggam-myeon, Hwaseong-si, Gyeonggi-do, Republic of Korea"
And then I designed a function to conduct "stepwise" geocode, which uses a while-loop to check if there are results. If not, remove the first word and then try again until there are results.
# A function to perform the geocode by step-wise eliminating the word from the top
geocode_step <- function(name){
# Perform geocode
coords <- geocode(name)
# Use while loop to check the result, if both lat and lon are NA
# Remove the first word and then try again
while (is.na(coords[[1]]) & is.na(coords[[2]])){
name_vec <- strsplit(name, split = ",")[[1]][-1]
# All words are eliminated, stop the function and return a data frame with NA and warning
if (length(name_vec) == 0){
break
}
# Re-combine all words
name <- paste(name_vec, collapse = ", ")
# Conduct geocode again
coords <- geocode(name)
}
dat <- data.frame(lon = coords[[1]], lat = coords[[2]], name = name)
return(dat)
}
We can test the function as follows.
geocode_step(name)
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Alpha,%20Beta,%20720-37,%20Chorok-ro,%20Yanggam-myeon,%20Hwaseong-si,%20Gyeonggi-do,%20Republic%20of%20Korea&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=%20Beta,%20%20720-37,%20%20Chorok-ro,%20%20Yanggam-myeon,%20%20Hwaseong-si,%20%20Gyeonggi-do,%20%20Republic%20of%20Korea&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=%20%20720-37,%20%20%20Chorok-ro,%20%20%20Yanggam-myeon,%20%20%20Hwaseong-si,%20%20%20Gyeonggi-do,%20%20%20Republic%20of%20Korea&sensor=false
lon lat
1 126.9827 37.11354
name
1 720-37, Chorok-ro, Yanggam-myeon, Hwaseong-si, Gyeonggi-do, Republic of Korea
Warning messages:
1: geocode failed with status ZERO_RESULTS, location = "Alpha, Beta, 720-37, Chorok-ro, Yanggam-myeon, Hwaseong-si, Gyeonggi-do, Republic of Korea"
2: geocode failed with status ZERO_RESULTS, location = " Beta, 720-37, Chorok-ro, Yanggam-myeon, Hwaseong-si, Gyeonggi-do, Republic of Korea"
Finaly, if there are no any words will work, the function will still return a data frame with NA.
geocode_step("aawsd")
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=aawsd&sensor=false
lon lat name
1 NA NA aawsd
Warning message:
geocode failed with status ZERO_RESULTS, location = "aawsd"

how to handle error from geocode (ggmap R)

I'm using ggmap to find locations. Some locations generates error. For example,
library(ggmap)
loc = 'Blue Grass Airport'
geocode(loc, output = c("more"))
results in
Error in data.frame(long_name = "Blue Grass Airport", short_name = "Blue Grass Airport", :
arguments imply differing number of rows: 1, 0
It's ok if I can't get results for some locations, but I'm trying to work on 100 locations in a list. So is there a way to get NA instead of error and keep things go on? E.g.,
library(ggmap)
loc = c('Blue Grass Airport', 'Boston MA', 'NYC')
geocode(loc, output = c("more"))
should generate
NA
Result for Boston
Result for New York City
You can make use of the R tryCatch() function to handle these errors gracefully:
loc = 'Blue Grass Airport'
x <- tryCatch(geocode(loc, output = c("more")),
warning = function(w) {
print("warning");
# handle warning here
},
error = function(e) {
print("error");
# handle error here
})
If you intend to loop over locations explicitly using a for loop or using an apply function, then tryCatch() should also come in handy.

Resources