I'm trying to write a simple code to check if a street address exists:
In my first try I put the write address and it gives me the correct adress:
addr <- '2147 Newhall Street,Santa Clara,CA 95050'
url = paste('http://maps.google.com/maps/api/geocode/xml?address=', addr,'&sensor=false',sep='')
doc = xmlTreeParse(url)
root = xmlRoot(doc)
lat = xmlValue(root[['result']][['geometry']][['location']][['lat']])
long = xmlValue(root[['result']][['geometry']][['location']][['lng']])
lat
"37.3386004"
long
"-121.9405759"
But if I write a wrong street address it's still giving me co-ordinates:
addr <- 'xyz,Santa Clara,CA 95050' # set your address here
url = paste('http://maps.google.com/maps/api/geocode/xml?address=', addr,'&sensor=false',sep='')
doc = xmlTreeParse(url)
root = xmlRoot(doc)
lat = xmlValue(root[['result']][['geometry']][['location']][['lat']])
long = xmlValue(root[['result']][['geometry']][['location']][['lng']])
lat
"37.3539663"
long
"-121.9529992"
I'm sure the street address above does not exist, but I'm still getting some coordinates. Is there anyway I can return an NA value or some flag if there is no valid street address?
There's already a nice wrapper of the Google Maps geocoding API in the ggmap package. If you set its output parameter to more, it will return a loctype which indicates if the address is precisely matched (rooftop) or an approximation (approximate, range_interpolated, geometric_center). See the documentation for further detail.
library(ggmap)
addr <- '2147 Newhall Street,Santa Clara,CA 95050'
geocode(addr, 'more')
# Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=2147%20Newhall%20Street,Santa%20Clara,CA%2095050&sensor=false
# lon lat type loctype address north
# 1 -121.9406 37.3386 street_address rooftop 2147 newhall st, santa clara, ca 95050, usa 37.33995
# south east west street_number route locality
# 1 37.33725 -121.9392 -121.9419 2147 Newhall Street Santa Clara
# administrative_area_level_2 administrative_area_level_1 country postal_code
# 1 Santa Clara County California United States 95050
addr <- 'xyz,Santa Clara,CA 95050'
geocode(addr, 'more')
# Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=xyz,Santa%20Clara,CA%2095050&sensor=false
# lon lat type loctype address north south
# 1 -121.953 37.35397 postal_code approximate santa clara, ca 95050, usa 37.37448 37.32314
# east west postal_code locality administrative_area_level_2
# 1 -121.9309 -121.9703 95050 Santa Clara Santa Clara County
# administrative_area_level_1 country
# 1 California United States
Related
I recently update RStudio to the version RStudio 2022.07.1, working on Windows 10.
When I tried different geocode reverse functions(Which is input coordinate, output is the address), they all return no found.
Example 1:
library(revgeo)
revgeo(-77.016472, 38.785026)
Suppose return "146 National Plaza, Fort Washington, Maryland, 20745, United States of America". But I got
"Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026"
[[1]]
[1] "House Number Not Found Street Not Found, City Not Found, State Not Found, Postcode Not Found, Country Not Found"
Data from https://github.com/mhudecheck/revgeo
Example 2:
library(tidygeocoder)
library(dplyr)
path <- "filepath"
df <- read.csv (paste (path, "sample.csv", sep = ""))
reverse <- df %>%
reverse_geocode(lat = longitude, long = latitude, method = 'osm',
address = address_found, full_results = TRUE)
reverse
Where the sample.csv is
name
addr
latitude
longitude
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
Suppose to get
name
addr
latitude
longitude
address_found
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
White House, 1600, Pennsylvania Avenue Northwest, Washington, District of Columbia, 20500, United States
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Transamerica Pyramid, 600, Montgomery Street, Chinatown, San Francisco, San Francisco City and County, San Francisco, California, 94111, United States
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
South Wacker Drive, Printer’s Row, Loop, Chicago, Cook County, Illinois, 60606, United States
But I got
# A tibble: 3 × 5
name addr latitude longitude address_found
<chr> <chr> <dbl> <dbl> <chr>
1 White House 1600 Pennsylvania Ave NW, Wash… 38.9 -77.0 NA
2 Transamerica Pyramid 600 Montgomery St, San Francis… 37.8 -122. NA
3 Willis Tower 233 S Wacker Dr, Chicago, IL 6… 41.9 -87.6 NA
Data source: https://cran.r-project.org/web/packages/tidygeocoder/readme/README.html
However, when I tried
reverse_geo(lat = 38.895865, long = -77.0307713, method = "osm")
I'm able to get
# A tibble: 1 × 3
lat long address
<dbl> <dbl> <chr>
1 38.9 -77.0 Pennsylvania Avenue, Washington, District of Columbia, 20045, United States
I had contact the tidygeocoder developer, he/she didn't find out any problem. Detail in https://github.com/jessecambon/tidygeocoder/issues/175
Not sure which part goes wrong. Anyone want try on their RStudio?
The updated revgeo needs to be submitted to CRAN. This has nothing to do with RStudio.
Going to http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026 in my browser also returns an error. However, I searched for the Photon reverse geocoder, and their example uses .io not .de in the URL, and https://photon.komoot.io/reverse?lon=-77.016472&lat=38.785026 works.
Photon also include a Note at the bottom of their examples:
Until October 2020 the API was available under photon.komoot.de. Requests still work as they redirected to photon.komoot.io but please update your apps accordingly.
Seems like that redirect is either broken or deprecated.
The version of revgeo on github has this change made already, so you can get a working version by using remotes::install_github("https://github.com/mhudecheck/revgeo")
I have a column that contains thousands of descriptions like this (example) :
Description
Building a hospital in the city of LA, USA
Building a school in the city of NYC, USA
Building shops in the city of Chicago, USA
I'd like to create a column with the first word after "city of", like that :
Description
City
Building a hospital in the city of LA, USA
LA
Building a school in the city of NYC, USA
NYC
Building shops in the city of Chicago, USA
Chicago
I tried with the following code after seeing this topic Extracting string after specific word, but my column is only filled with missing values
library(stringr)
df$city <- data.frame(str_extract(df$Description, "(?<=city of:\\s)[^;]+"))
df$city <- data.frame(str_extract(df$Description, "(?<=of:\\s)[^;]+"))
I took a look at the dput() and the output is the same than the descriptions i see in the dataframe directly.
Solution
This should make the trick for the data you showed:
df$city <- str_extract(df$Description, "(?<=city of )(\\w+)")
df
#> Description city
#> 1 Building a hospital in the city of LA, USA LA
#> 2 Building a school in the city of NYC, USA NYC
#> 3 Building shops in the city of Chicago, USA Chicago
Alternative
However, in case you want the whole string till the first comma (for example in case of cities with a blank in the name), you can go with:
df$city <- str_extract(df$Description, "(?<=city of )(.+)(?=,)")
Check out the following example:
df <- data.frame(Description = c("Building a hospital in the city of LA, USA",
"Building a school in the city of NYC, USA",
"Building shops in the city of Chicago, USA",
"Building a church in the city of Salt Lake City, USA"))
str_extract(df$Description, "(?<=the city of )(\\w+)")
#> [1] "LA" "NYC" "Chicago" "Salt"
str_extract(df$Description, "(?<=the city of )(.+)(?=,)")
#> [1] "LA" "NYC" "Chicago" "Salt Lake City"
Documentation
Check out ?regex:
Patterns (?=...) and (?!...) are zero-width positive and negative
lookahead assertions: they match if an attempt to match the ...
forward from the current position would succeed (or not), but use up
no characters in the string being processed. Patterns (?<=...) and
(?<!...) are the lookbehind equivalents: they do not allow repetition
quantifiers nor \C in ....
I was working with the googleway package and I had a bunch of addresses that I needed to parse out the various components of the addresses that were in a nested list of lists. Loops (not encouraged) and apply functions both seemed confusing and I was not sure if there was a tidy solution. I found the map function (specifically the pluck function that it calls on lists on the backend) could accomplish my goal so I will share my solution.
Problem:
I need to pull out certain information about the White House such as
Latitude
Longitude
You need to set up your Google Cloud API Key with googleway::set_key(API_KEY), but this is just an example of a nested list that I hope someone working with this package will see.
# Address for the White House and the Lincoln Memorial
address_vec <- c(
"1600 Pennsylvania Ave NW, Washington, DC 20006",
"2 Lincoln Memorial Cir NW, Washington, DC 20002"
)
address_vec <- pmap(list(address_vec), googleway::google_geocode)
outputs
[[1]]
[[1]]$results
address_components
1 1600, Pennsylvania Avenue Northwest, Northwest Washington, Washington, District of Columbia, United States, 20500, 1600, Pennsylvania Avenue NW, Northwest Washington, Washington, DC, US, 20500, street_number, route, neighborhood, political, locality, political, administrative_area_level_1, political, country, political, postal_code
formatted_address geometry.bounds.northeast.lat
1 1600 Pennsylvania Avenue NW, Washington, DC 20500, USA 38.8979
geometry.bounds.northeast.lng geometry.bounds.southwest.lat geometry.bounds.southwest.lng geometry.location.lat
1 -77.03551 38.89731 -77.03796 38.89766
geometry.location.lng geometry.location_type geometry.viewport.northeast.lat geometry.viewport.northeast.lng
1 -77.03657 ROOFTOP 38.89895 -77.03539
geometry.viewport.southwest.lat geometry.viewport.southwest.lng place_id
1 38.89626 -77.03808 ChIJGVtI4by3t4kRr51d_Qm_x58
types
1 establishment, point_of_interest, premise
[[1]]$status
[1] "OK"
[[2]]
[[2]]$results
address_components
1 2, Lincoln Memorial Circle Northwest, Southwest Washington, Washington, District of Columbia, United States, 20037, 2, Lincoln Memorial Cir NW, Southwest Washington, Washington, DC, US, 20037, street_number, route, neighborhood, political, locality, political, administrative_area_level_1, political, country, political, postal_code
formatted_address geometry.location.lat geometry.location.lng
1 2 Lincoln Memorial Cir NW, Washington, DC 20037, USA 38.88927 -77.05018
geometry.location_type geometry.viewport.northeast.lat geometry.viewport.northeast.lng
1 ROOFTOP 38.89062 -77.04883
geometry.viewport.southwest.lat geometry.viewport.southwest.lng place_id
1 38.88792 -77.05152 ChIJgRuEham3t4kRFju4R6De__g
plus_code.compound_code plus_code.global_code types
1 VWQX+PW Washington, DC, USA 87C4VWQX+PW street_address
[[2]]$status
[1] "OK"
Here's some code that I got from the Googleway Vignette:
df <- google_geocode(address = "Flinders Street Station",
key = key,
simplify = TRUE)
geocode_coordinates(df)
# lat lng
# 1 -37.81827 144.9671
It looks like what you need to do is:
df <- google_geocode("1600 Pennsylvania Ave")
geocode_coordinates(df)
The solution I came up with is a custom function that can access any section of the list:
geocode_accessor <- function(df, accessor, ...) {
unlist(map(df, list(accessor, ...)))
}
This has three important parts to understand:
The map function is calling the pluck function for us (it replaces the use of [[ ). You can read more about what is happening here, but just know this lets us access things by name
The "..." in the function's definition as well as in the list allows us to access multiple levels. Again, the use of list() to access further levels in a list is explained in the pluck documentation
The use of unlist converts the list to a vector (what I want in my instance)
Putting this all together, we can get the latitude of the White House & Lincoln Memorial:
geocode_accessor(address_vec, "results", "geometry", "location", "lat")
[1] 38.89766 38.88927
## install the package "placement"
## install.packages("placement")
library(placement)
##============= First Run:
## Without using Google API Key
## Get coordinates for the Empire State Building and Google
address <- c("350 5th Ave, New York, NY 10118, USA",
"1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA")
coordset <- geocode_url(address, auth ="standard_api", privkey= "",
clean=TRUE, add_date='today', verbose=TRUE)
## View the returns
print(coordset[ , 1:5])
##============= Output:
## lat lng location_type formatted_address status
## 1 NA NA <NA> <NA> OVER_QUERY_LIMIT
## 2 NA NA <NA> <NA> OVER_QUERY_LIMIT
I have used only one set of lat-long so I am not sure why the OVER_QUERY_LIMIT should kick in. Looking forward to a resolution on this. Thanks in advance.
I am trying to get the coordinates of businesses by their name. I have reviewed several questions on using 'geocode' but they all seem to work based on the address. See below two examples trying to get the coordinates of The Westbury Hotel London:
library(ggmap)
geocode("London")
geocode("The Westbury Hotel London") # Returns coordinates of Westbury Road in London
A more complex approach:
require(RJSONIO)
library(ggmap)
geocodeAddress <- function(address) {
require(RJSONIO)
url <- "http://maps.google.com/maps/api/geocode/json?address="
url <- URLencode(paste(url, address, "&sensor=false", sep = ""))
x <- fromJSON(url, simplify = FALSE)
if (x$status == "OK") {
out <- c(x$results[[1]]$geometry$location$lng,
x$results[[1]]$geometry$location$lat)
} else {
out <- NA
}
Sys.sleep(0.2) # API only allows 5 requests per second
out
}
geocodeAddress("The Westbury Hotel London") # Returns London coordinates
Other questions mentioned that it is possible to get coordinates from places with 'geocode' but, at least in my case, it is not working. Any idea on how to get coordinates by business name from google maps hugely appreciated.
You can use the Google Places API to search for places using my googleway package. You'll have to do some work with the results, or refine your query if you want to get the exact business you're after as the API usually returns multiple possible results.
You need a Google API key to use their service
library(googleway)
## your API key
api_key <- "your_api_key_goes_here"
## general search on the name
general_result <- google_places(search_string = "The Westbury Hotel London",
key = api_key)
general_result$results$name
# [1] "The Westbury" "Polo Bar" "The Westbury"
general_result$results$geometry$location
# lat lng
# 1 53.34153 -6.2614740
# 2 51.51151 -0.1426609
# 3 51.59351 -0.0983930
## more refined search using a location
location_result <- google_places(search_string = "The Wesbury Hotel London",
location = c(51.5,0),
key = api_key)
location_result$results$name
# [11] "The Marylebone" "The Chelsea Harbour Hotel"
# "Polo Bar" "The Westbury" "The Gallery at The Westbury"
location_result$results$geometry$location
# lat lng
# 1 51.51801 -0.1498050
# 2 51.47600 -0.1819235
# 3 51.51151 -0.1426609
# 4 51.59351 -0.0983930
# 5 51.51131 -0.1426318
location_result$results$formatted_address
# [1] "37 Conduit St, London W1S 2YF, United Kingdom" "37 Conduit St, London, Mayfair W1S 2YF, United Kingdom"
# [3] "57 Westbury Ave, London N22 6SA, United Kingdom"