I'm trying to reverse geocode with R. I first used ggmap but couldn't get it to work with my API key. Now I'm trying it with googleway.
newframe[,c("Front.lat","Front.long")]
Front.lat Front.long
1 -37.82681 144.9592
2 -37.82681 145.9592
newframe$address <- apply(newframe, 1, function(x){
google_reverse_geocode(location = as.numeric(c(x["Front.lat"],
x["Front.long"])),
key = "xxxx")
})
This extracts the variables as a list but I can't figure out the structure.
I'm struggling to figure out how to extract the address components listed below as variables in newframe
postal_code, administrative_area_level_1, administrative_area_level_2, locality, route, street_number
I would prefer each address component as a separate variable.
Google's API returns the response in JSON. Which, when translated into R naturally forms nested lists. Internally in googleway this is done through jsonlite::fromJSON()
In googleway I've given you the choice of returning the raw JSON or a list, through using the simplify argument.
I've deliberately returned ALL the data from Google's response and left it up to the user to extract the elements they're interested in through usual list-subsetting operations.
Having said all that, in the development version of googleway I've written a few functions to help accessing elements of various API calls. Here are three of them that may be useful to you
## Install the development version
# devtools::install_github("SymbolixAU/googleway")
res <- google_reverse_geocode(
location = c(df[1, 'Front.lat'], df[1, 'Front.long']),
key = apiKey
)
geocode_address(res)
# [1] "45 Clarke St, Southbank VIC 3006, Australia"
# [2] "Bank Apartments, 275-283 City Rd, Southbank VIC 3006, Australia"
# [3] "Southbank VIC 3006, Australia"
# [4] "Melbourne VIC, Australia"
# [5] "South Wharf VIC 3006, Australia"
# [6] "Melbourne, VIC, Australia"
# [7] "CBD & South Melbourne, VIC, Australia"
# [8] "Melbourne Metropolitan Area, VIC, Australia"
# [9] "Victoria, Australia"
# [10] "Australia"
geocode_address_components(res)
# long_name short_name types
# 1 45 45 street_number
# 2 Clarke Street Clarke St route
# 3 Southbank Southbank locality, political
# 4 Melbourne City Melbourne administrative_area_level_2, political
# 5 Victoria VIC administrative_area_level_1, political
# 6 Australia AU country, political
# 7 3006 3006 postal_code
geocode_type(res)
# [[1]]
# [1] "street_address"
#
# [[2]]
# [1] "establishment" "general_contractor" "point_of_interest"
#
# [[3]]
# [1] "locality" "political"
#
# [[4]]
# [1] "colloquial_area" "locality" "political"
After reverse geocoding into newframe$address the address components could be extracted further as follows:
# Make a boolean array of the valid ("OK" status) responses (other statuses may be "NO_RESULTS", "REQUEST_DENIED" etc).
sel <- sapply(c(1: nrow(newframe)), function(x){
newframe$address[[x]]$status == 'OK'
})
# Get the address_components of the first result (i.e. best match) returned per geocoded coordinate.
address.components <- sapply(c(1: nrow(newframe[sel,])), function(x){
newframe$address[[x]]$results[1,]$address_components
})
# Get all possible component types.
all.types <- unique(unlist(sapply(c(1: length(address.components)), function(x){
unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
})))
# Get "long_name" values of the address_components for each type present (the other option is "short_name").
all.values <- lapply(c(1: length(address.components)), function(x){
types <- unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
matches <- match(all.types, types)
values <- address.components[[x]]$long_name[matches]
})
# Bind results into a dataframe.
all.values <- do.call("rbind", all.values)
all.values <- as.data.frame(all.values)
names(all.values) <- all.types
# Add columns and update original data frame.
newframe[, all.types] <- NA
newframe[sel,][, all.types] <- all.values
Note that I've only kept the first type given per component, effectively skipping the "political" type as it appears in multiple components and is likely superfluous e.g. "administrative_area_level_1, political".
You can use ggmap:revgeocode easily; look below:
library(ggmap)
df <- cbind(df,do.call(rbind,
lapply(1:nrow(df),
function(i)
revgeocode(as.numeric(
df[i,2:1]), output = "more")
[c("administrative_area_level_1","locality","postal_code","address")])))
#output:
df
# Front.lat Front.long administrative_area_level_1 locality
# 1 -37.82681 144.9592 Victoria Southbank
# 2 -37.82681 145.9592 Victoria Noojee
# postal_code address
# 1 3006 45 Clarke St, Southbank VIC 3006, Australia
# 2 3833 Cec Dunns Track, Noojee VIC 3833, Australia
You can add "route" and "street_number" to the variables that you want to extract but as you can see the second address does not have street number and that will cause an error.
Note: You may also use sub and extract the information from the address.
Data:
df <- structure(list(Front.lat = c(-37.82681, -37.82681), Front.long =
c(144.9592, 145.9592)), .Names = c("Front.lat", "Front.long"), class = "data.frame",
row.names = c(NA, -2L))
Related
I have a list of ambiguous addresses that I need to return full geocode information for.
Only issue is that what I get is a large list of nested lists (JSON)
I want to be able to get a data frame that contains the key information, i.e.
IDEAL OUTPUT
Original_Address, StreetNum, StreetName, Suburb, town_city, locality, Postcode, geo_xCord, Country, Postcode
I almost wonder if this is just too difficult and if there is an easier method that I haven't considered.
I basically just need to be able to spit out the key address elements for each address I have.
# Stack Overflow Example -------------------------------------------
random_addresses <- c('27 Hall Street, Wellington',
'52 Ethan Street, New Zealand',
'13 Epsom Street, Auckland',
'42 Elden Drive, New Zealand')
register_google(key = "MYAPIKEY")
place_lookup <- geocode(random_addresses, output = "all")
print(place_lookup[1])
>>>
[[1]]$results
[[1]]$results[[1]]
[[1]]$results[[1]]$address_components
[[1]]$results[[1]]$address_components[[1]]
[[1]]$results[[1]]$address_components[[1]]$long_name
[1] "27"
[[1]]$results[[1]]$address_components[[1]]$short_name
[1] "27"
[[1]]$results[[1]]$address_components[[1]]$types
[[1]]$results[[1]]$address_components[[1]]$types[[1]]
[1] "street_number"
[[1]]$results[[1]]$address_components[[2]]
[[1]]$results[[1]]$address_components[[2]]$long_name
[1] "Hall Street"
[[1]]$results[[1]]$address_components[[2]]$short_name
[1] "Hall St"
[[1]]$results[[1]]$address_components[[2]]$types
[[1]]$results[[1]]$address_components[[2]]$types[[1]]
[1] "route"
[[1]]$results[[1]]$address_components[[3]]
[[1]]$results[[1]]$address_components[[3]]$long_name
[1] "Newtown"
[[1]]$results[[1]]$address_components[[3]]$short_name
[1] "Newtown"
[[1]]$results[[1]]$address_components[[3]]$types
[[1]]$results[[1]]$address_components[[3]]$types[[1]]
[1] "political"
[[1]]$results[[1]]$address_components[[3]]$types[[2]]
[1] "sublocality"
[[1]]$results[[1]]$address_components[[3]]$types[[3]]
[1] "sublocality_level_1"
[[1]]$results[[1]]$address_components[[4]]
[[1]]$results[[1]]$address_components[[4]]$long_name
[1] "Wellington"
[[1]]$results[[1]]$address_components[[4]]$short_name
[1] "Wellington"
[[1]]$results[[1]]$address_components[[4]]$types
[[1]]$results[[1]]$address_components[[4]]$types[[1]]
[1] "locality"
[[1]]$results[[1]]$address_components[[4]]$types[[2]]
[1] "political"
[[1]]$results[[1]]$address_components[[5]]
[[1]]$results[[1]]$address_components[[5]]$long_name
[1] "Wellington"
[[1]]$results[[1]]$address_components[[5]]$short_name
[1] "Wellington"
[[1]]$results[[1]]$address_components[[5]]$types
[[1]]$results[[1]]$address_components[[5]]$types[[1]]
[1] "administrative_area_level_1"
[[1]]$results[[1]]$address_components[[5]]$types[[2]]
[1] "political"
[[1]]$results[[1]]$address_components[[6]]
[[1]]$results[[1]]$address_components[[6]]$long_name
[1] "New Zealand"
[[1]]$results[[1]]$address_components[[6]]$short_name
[1] "NZ"
[[1]]$results[[1]]$address_components[[6]]$types
[[1]]$results[[1]]$address_components[[6]]$types[[1]]
[1] "country"
[[1]]$results[[1]]$address_components[[6]]$types[[2]]
[1] "political"
[[1]]$results[[1]]$address_components[[7]]
[[1]]$results[[1]]$address_components[[7]]$long_name
[1] "6021"
[[1]]$results[[1]]$address_components[[7]]$short_name
[1] "6021"
[[1]]$results[[1]]$address_components[[7]]$types
[[1]]$results[[1]]$address_components[[7]]$types[[1]]
[1] "postal_code"
[[1]]$results[[1]]$formatted_address
[1] "27 Hall Street, Newtown, Wellington 6021, New Zealand"
[[1]]$results[[1]]$geometry
[[1]]$results[[1]]$geometry$bounds
[[1]]$results[[1]]$geometry$bounds$northeast
[[1]]$results[[1]]$geometry$bounds$northeast$lat
[1] -41.31066
[[1]]$results[[1]]$geometry$bounds$northeast$lng
[1] 174.7768
[[1]]$results[[1]]$geometry$bounds$southwest
[[1]]$results[[1]]$geometry$bounds$southwest$lat
[1] -41.31081
[[1]]$results[[1]]$geometry$bounds$southwest$lng
[1] 174.7766
[[1]]$results[[1]]$geometry$location
[[1]]$results[[1]]$geometry$location$lat
[1] -41.31074
[[1]]$results[[1]]$geometry$location$lng
[1] 174.7767
[[1]]$results[[1]]$geometry$location_type
[1] "ROOFTOP"
[[1]]$results[[1]]$geometry$viewport
[[1]]$results[[1]]$geometry$viewport$northeast
[[1]]$results[[1]]$geometry$viewport$northeast$lat
[1] -41.30932
[[1]]$results[[1]]$geometry$viewport$northeast$lng
[1] 174.778
[[1]]$results[[1]]$geometry$viewport$southwest
[[1]]$results[[1]]$geometry$viewport$southwest$lat
[1] -41.31202
[[1]]$results[[1]]$geometry$viewport$southwest$lng
[1] 174.7753
[[1]]$results[[1]]$place_id
[1] "ChIJiynBCOOvOG0RMx429ZNDR3A"
[[1]]$results[[1]]$types
[[1]]$results[[1]]$types[[1]]
[1] "premise"
[[1]]$status
[1] "OK"
---
You can explore the nested lists with viewer in Rstudio or listviewer::jsonedit. You can then drill down to the desired information. Basically using unnest_wider to spread the list to columns to then select desired columns and unnest_longer to tease out nested lists to then iterate through.
library(tidyverse)
map(random_addresses, ~geocode(.x, output = "all") %>%
# results is name of list with desired information, create tibble for unnest
tibble(output = .$results) %>%
# Create tibble with address_components as column-list
unnest_wider(output) %>%
dplyr::select(address_components) %>%
# Get address_components as list of lists, each list to df
unnest_longer(., col = "address_components") %>%
map_dfr(., ~.x) %>%
# types is the type of information. It is listed so unlist
mutate(types = unlist(types)) %>%
# Choose the information to keep
filter(types %in% c("street_number", "route")) %>%
# Choose the format of data
select(long_name, types) %>%
# Put in wide form
pivot_wider(names_from = "types", values_from = "long_name")
) %>%
bind_rows # create master df
It will give you lists with your information (before filtering)
[[4]]
# A tibble: 13 × 3
long_name short_name types
<chr> <chr> <chr>
1 New Zealand NZ country
2 New Zealand NZ political
3 42 42 street_number
4 Elden Drive Elden Dr route
5 Saddle River Saddle River locality
6 Saddle River Saddle River political
7 Bergen County Bergen County administrative_area_level_2
8 Bergen County Bergen County political
9 New Jersey NJ administrative_area_level_1
10 New Jersey NJ political
11 United States US country
12 United States US political
13 07458 07458 postal_code
Say I have a vector of cities and countries, which may or may not include names of places that have since changed names:
locations <- c("Paris, France", "Sarajevo, Yugoslavia", "Rome, Italy", "Leningrad, Soviet Union", "St Petersburg, Russia")
The problem is that I can't use something like ggmap::geocode since it doesn't appear to work well for locations whose names have changed:
ggmap::geocode(locations, source = "dsk")
lon lat
1 2.34880 48.85341 #Works for Paris
2 NA NA #Didn't work for Sarajevo
3 12.48390 41.89474 #Works for Rome
4 98.00000 60.00000 #Didn't work for the old name of St Petersburg seems to just get the center of Russia
5 30.26417 59.89444 #Worked for St Petersburg
Is there an alternative functions I could use? If I have to "update" the names of the cities & countries, is there an easy method of going through this? I have hundreds of locations that I was looking to collect the longitude and latitude coordinates.
This might not be what you had in mind, but if you use the exact same code with only the city names (and not the countries), at least the two cases that you mentioned (Sarajevo and Leningrad) seem to work fine. You could try to run the function with a modified locations vector including just the city names, and see if you still get errors. Something like this:
(cities <- gsub(',.*', '', locations))
## [1] "Paris" "Sarajevo" "Rome" "Leningrad" "St Petersburg"
cbind(ggmap::geocode(cities, source = 'dsk'), cities)
## lon lat cities
## 1 2.34880 48.85341 Paris
## 2 18.35644 43.84864 Sarajevo
## 3 12.48390 41.89474 Rome
## 4 30.26417 59.89444 Leningrad
## 5 30.26417 59.89444 St Petersburg
I am trying to get the coordinates of businesses by their name. I have reviewed several questions on using 'geocode' but they all seem to work based on the address. See below two examples trying to get the coordinates of The Westbury Hotel London:
library(ggmap)
geocode("London")
geocode("The Westbury Hotel London") # Returns coordinates of Westbury Road in London
A more complex approach:
require(RJSONIO)
library(ggmap)
geocodeAddress <- function(address) {
require(RJSONIO)
url <- "http://maps.google.com/maps/api/geocode/json?address="
url <- URLencode(paste(url, address, "&sensor=false", sep = ""))
x <- fromJSON(url, simplify = FALSE)
if (x$status == "OK") {
out <- c(x$results[[1]]$geometry$location$lng,
x$results[[1]]$geometry$location$lat)
} else {
out <- NA
}
Sys.sleep(0.2) # API only allows 5 requests per second
out
}
geocodeAddress("The Westbury Hotel London") # Returns London coordinates
Other questions mentioned that it is possible to get coordinates from places with 'geocode' but, at least in my case, it is not working. Any idea on how to get coordinates by business name from google maps hugely appreciated.
You can use the Google Places API to search for places using my googleway package. You'll have to do some work with the results, or refine your query if you want to get the exact business you're after as the API usually returns multiple possible results.
You need a Google API key to use their service
library(googleway)
## your API key
api_key <- "your_api_key_goes_here"
## general search on the name
general_result <- google_places(search_string = "The Westbury Hotel London",
key = api_key)
general_result$results$name
# [1] "The Westbury" "Polo Bar" "The Westbury"
general_result$results$geometry$location
# lat lng
# 1 53.34153 -6.2614740
# 2 51.51151 -0.1426609
# 3 51.59351 -0.0983930
## more refined search using a location
location_result <- google_places(search_string = "The Wesbury Hotel London",
location = c(51.5,0),
key = api_key)
location_result$results$name
# [11] "The Marylebone" "The Chelsea Harbour Hotel"
# "Polo Bar" "The Westbury" "The Gallery at The Westbury"
location_result$results$geometry$location
# lat lng
# 1 51.51801 -0.1498050
# 2 51.47600 -0.1819235
# 3 51.51151 -0.1426609
# 4 51.59351 -0.0983930
# 5 51.51131 -0.1426318
location_result$results$formatted_address
# [1] "37 Conduit St, London W1S 2YF, United Kingdom" "37 Conduit St, London, Mayfair W1S 2YF, United Kingdom"
# [3] "57 Westbury Ave, London N22 6SA, United Kingdom"
I have a GPS coordinates of several points and I want to know if they are on a highway, or trunk road, or minor road, and it would be even greater if I could identify a road name. I'm using R leaflet to draw maps and I can see with OpenStreetMap that different types of roads are colored differently, and I wonder how I can extract this information. It's not a problem to use Google maps instead if it will solve my problem.
I would appreciate any help.
You can use revgeocode() from ggmap:
library(ggmap)
gc <- c(-73.596706, 45.485501)
revgeocode(gc)
Which gives:
#[1] "4333 Rue Sherbrooke O, Westmount, QC H3Z 1E2, Canada"
Note: As per mentioned in the comments, this method uses Google Maps API, not OpenStreetMap. You have a limit of 2500 queries per day. You can always check how many queries you have left using geocodeQueryCheck()
From the package documentation:
reverse geocodes a longitude/latitude location using Google Maps. Note
that in most cases by using this function you are agreeing to the
Google Maps API Terms of Service at
https://developers.google.com/maps/terms.
Update
If you need more detailed information, use output = "all" and extract the components you need:
lst <- list(
g1 = c(-73.681069, 41.433155),
g2 = c(-73.643196, 41.416240),
g3 = c(-73.653324, 41.464168)
)
res <- lapply(lst, function(x) revgeocode(x, output = "all")[[1]][[1]][[1]][[2]])
Which gives:
#$g1
#$g1$long_name
#[1] "Highway 52"
#
#$g1$short_name
#[1] "NY-52"
#
#$g1$types
#[1] "route"
#
#
#$g2
#$g2$long_name
#[1] "Carmel Avenue"
#
#$g2$short_name
#[1] "US-6"
#
#$g2$types
#[1] "route"
#
#
#$g3
#$g3$long_name
#[1] "Wakefield Road"
#
#$g3$short_name
#[1] "Wakefield Rd"
#
#$g3$types
#[1] "route"
Using Google's API it's not possible to identify the type of road (yet - they may introduce that capability in the future).
But you can use their Roads API to get the road details for a given set of coordinates.
I've written the googleway package that accesses the roads API through the functions google_snapToRoads() and google_nearestRoads(), and if you have a premium account you can use google_speedLimits()
In all calls to Google's API you need a Google API key enabled on each API you are using.
library(googleway)
df_points <- data.frame(lat = c(60.1707, 60.172, 60.192),
lon = c(24.9426, 24.86, 24.89))
## plot the points on a map
google_map(key = map_key) %>%
add_markers(df_points)
nearRoads <- google_nearestRoads(df_points, key = api_key)
nearRoads
# $snappedPoints
# location.latitude location.longitude originalIndex placeId
# 1 60.17070 24.94272 0 ChIJNX9BrM0LkkYRIM-cQg265e8
# 2 60.17229 24.86028 1 ChIJpf7azXMKkkYRsk5L-U5W4ZQ
# 3 60.17229 24.86028 1 ChIJpf7azXMKkkYRs05L-U5W4ZQ
# 4 60.19165 24.88997 2 ChIJN1s1vhwKkkYRKGm4l5KmISI
# 5 60.19165 24.88997 2 ChIJN1s1vhwKkkYRKWm4l5KmISI
In these results, the originalIndex value tells you which of the orignal df_points the value is refering to (where 0 == the first row of df_points, 1 == the second row of df_points)
The placeId value is Google's unique key that identifies each place in their database. So you can then use Google's Places API to get the information about those places
roadDetails <- lapply(nearRoads$snappedPoints$placeId, function(x){
google_place_details(place_id = x, key = api_key)
})
## road address
lapply(roadDetails, function(x){
x[['result']][['formatted_address']]
})
# [[1]]
# [1] "Rautatientori, 00100 Helsinki, Finland"
#
# [[2]]
# [1] "Svedjeplogsstigen 7-9, 00340 Helsingfors, Finland"
#
# [[3]]
# [1] "Svedjeplogsstigen 18-10, 00340 Helsingfors, Finland"
#
# [[4]]
# [1] "Meilahdentie, 00250 Helsinki, Finland"
#
# [[5]]
# [1] "Meilahdentie, 00250 Helsinki, Finland"
I have a large file with a variable state that has full state names. I would like to replace it with the state abbreviations (that is "NY" for "New York"). Is there an easy way to do this (apart from using several if-else commands)? May be using replace() statement?
R has two built-in constants that might help: state.abb with the abbreviations, and state.name with the full names. Here is a simple usage example:
> x <- c("New York", "Virginia")
> state.abb[match(x,state.name)]
[1] "NY" "VA"
1) grep the full name from state.name and use that to index into state.abb:
state.abb[grep("New York", state.name)]
## [1] "NY"
1a) or using which:
state.abb[which(state.name == "New York")]
## [1] "NY"
2) or create a vector of state abbreviations whose names are the full names and index into it using the full name:
setNames(state.abb, state.name)["New York"]
## New York
## "NY"
Unlike (1), this one works even if "New York" is replaced by a vector of full state names, e.g. setNames(state.abb, state.name)[c("New York", "Idaho")]
Old post I know, but wanted to throw mine in there. I learned on tidyverse, so for better or worse I avoid base R when possible. I wanted one with DC too, so first I built the crosswalk:
library(tidyverse)
st_crosswalk <- tibble(state = state.name) %>%
bind_cols(tibble(abb = state.abb)) %>%
bind_rows(tibble(state = "District of Columbia", abb = "DC"))
Then I joined it to my data:
left_join(data, st_crosswalk, by = "state")
I found the built-in state.name and state.abb have only 50 states. I got a bigger table (including DC and so on) from online (e.g., this link: http://www.infoplease.com/ipa/A0110468.html) and pasted it to a .csv file named States.csv. I then load states and abbr. from this file instead of using the built-in. The rest is quite similar to #Aniko 's
library(dplyr)
library(stringr)
library(stringdist)
setwd()
# load data
data = c("NY", "New York", "NewYork")
data = toupper(data)
# load state name and abbr.
State.data = read.csv('States.csv')
State = toupper(State.data$State)
Stateabb = as.vector(State.data$Abb)
# match data with state names, misspell of 1 letter is allowed
match = amatch(data, State, maxDist=1)
data[ !is.na(match) ] = Stateabb[ na.omit( match ) ]
There's a small difference between match and amatch in how they calculate the distance from one word to another. See P25-26 here http://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf
You can also use base::abbreviate if you don't have US state names. This won't give you equally sized abbreviations unless you increase minlength.
state.name %>% base::abbreviate(minlength = 1)
Here is another way of doing it in case you have more than one state in your data and you want to replace the names with the corresponding abbreviations.
#creating a list of names
states_df <- c("Alabama","California","Nevada","New York",
"Oregon","Texas", "Utah","Washington")
states_df <- as.data.frame(states_df)
The output is
> print(states_df)
states_df
1 Alabama
2 California
3 Nevada
4 New York
5 Oregon
6 Texas
7 Utah
8 Washington
Now using the state.abb function you can easily convert the names into abbreviations, and vice-versa.
states_df$state_code <- state.abb[match(states_df$states_df, state.name)]
> print(states_df)
states_df state_code
1 Alabama AL
2 California CA
3 Nevada NV
4 New York NY
5 Oregon OR
6 Texas TX
7 Utah UT
8 Washington WA
If matching state names to abbreviations or the other way around is something you have to frequently, you could put Aniko's solution in a function in a .Rprofile or a package:
state_to_st <- function(x){
c(state.abb, 'DC')[match(x, c(state.name, 'District of Columbia'))]
}
st_to_state <- function(x){
c(state.name, 'District of Columbia')[match(x, c(state.abb, 'DC'))]
}
Using that function as a part of a dplyr chain:
enframe(state.name, value = 'state_name') %>%
mutate(state_abbr = state_to_st(state_name))