I recently update RStudio to the version RStudio 2022.07.1, working on Windows 10.
When I tried different geocode reverse functions(Which is input coordinate, output is the address), they all return no found.
Example 1:
library(revgeo)
revgeo(-77.016472, 38.785026)
Suppose return "146 National Plaza, Fort Washington, Maryland, 20745, United States of America". But I got
"Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026"
[[1]]
[1] "House Number Not Found Street Not Found, City Not Found, State Not Found, Postcode Not Found, Country Not Found"
Data from https://github.com/mhudecheck/revgeo
Example 2:
library(tidygeocoder)
library(dplyr)
path <- "filepath"
df <- read.csv (paste (path, "sample.csv", sep = ""))
reverse <- df %>%
reverse_geocode(lat = longitude, long = latitude, method = 'osm',
address = address_found, full_results = TRUE)
reverse
Where the sample.csv is
name
addr
latitude
longitude
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
Suppose to get
name
addr
latitude
longitude
address_found
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
White House, 1600, Pennsylvania Avenue Northwest, Washington, District of Columbia, 20500, United States
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Transamerica Pyramid, 600, Montgomery Street, Chinatown, San Francisco, San Francisco City and County, San Francisco, California, 94111, United States
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
South Wacker Drive, Printer’s Row, Loop, Chicago, Cook County, Illinois, 60606, United States
But I got
# A tibble: 3 × 5
name addr latitude longitude address_found
<chr> <chr> <dbl> <dbl> <chr>
1 White House 1600 Pennsylvania Ave NW, Wash… 38.9 -77.0 NA
2 Transamerica Pyramid 600 Montgomery St, San Francis… 37.8 -122. NA
3 Willis Tower 233 S Wacker Dr, Chicago, IL 6… 41.9 -87.6 NA
Data source: https://cran.r-project.org/web/packages/tidygeocoder/readme/README.html
However, when I tried
reverse_geo(lat = 38.895865, long = -77.0307713, method = "osm")
I'm able to get
# A tibble: 1 × 3
lat long address
<dbl> <dbl> <chr>
1 38.9 -77.0 Pennsylvania Avenue, Washington, District of Columbia, 20045, United States
I had contact the tidygeocoder developer, he/she didn't find out any problem. Detail in https://github.com/jessecambon/tidygeocoder/issues/175
Not sure which part goes wrong. Anyone want try on their RStudio?
The updated revgeo needs to be submitted to CRAN. This has nothing to do with RStudio.
Going to http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026 in my browser also returns an error. However, I searched for the Photon reverse geocoder, and their example uses .io not .de in the URL, and https://photon.komoot.io/reverse?lon=-77.016472&lat=38.785026 works.
Photon also include a Note at the bottom of their examples:
Until October 2020 the API was available under photon.komoot.de. Requests still work as they redirected to photon.komoot.io but please update your apps accordingly.
Seems like that redirect is either broken or deprecated.
The version of revgeo on github has this change made already, so you can get a working version by using remotes::install_github("https://github.com/mhudecheck/revgeo")
Related
I am currently working with data that is formatted like this:
tribble(
~street1, ~street2, ~county, ~state
N BENTON WY, W TEMPLE ST, LOS ANGELES, CA,
11TH PL, BLAINE ST, LOS ANGELES, CA,
W 6TH ST, HOPE ST, LOS ANGELES, CA,
S GRAND AV, W 18TH ST, LOS ANGELES, CA,
BROADWAY, 5TH ST, LOS ANGELES, CA,
)
This corresponds to a dataset containing around 825,000 observations with missing coordinates. These data have only the names of the nearest cross streets, county, and state information (note they not include street numbers). I need to geocode these observations and recover coordinates so that my final data will look something like this:
tribble(
~street1, ~street2, ~county, ~state, ~latitude, ~longitude
N BENTON WY, W TEMPLE ST, LOS ANGELES, CA, XX.XXXX, -YY.YYYY,
11TH PL, BLAINE ST, LOS ANGELES, CA, XX.XXXX, -YY.YYYY,
W 6TH ST, HOPE ST, LOS ANGELES, CA, XX.XXXX, -YY.YYYY,
S GRAND AV, W 18TH ST, LOS ANGELES, CA, XX.XXXX, -YY.YYYY,
BROADWAY, 5TH ST, LOS ANGELES, CA, XX.XXXX, -YY.YYYY,
)
I have already researched a few possible solutions but haven't found a method that will work.
While the Google Maps API (ggmap package) is very good at identifying coordinates from cross streets as inputs, the cost to geocode this many observations (4.00 USD per 1000 queries according to their website) makes that option infeasible.
I've looked through the documentation of other packages such as RDSTK and tidygeocoder but they don't seem to support API queries using two street names as inputs. The Census Geocoder similarly does not have that option, allowing only single address inputs.
Using the OpenStreetMap API through the osmdata package seemed like a promising option after reading this very detailed StackOverflow answer, but attempting to replicate this code with much bigger bounding boxes has produced runtime errors every time.
See for example the following code using Los Angeles county, following the format of user hugh-allan in the above post:
library(sf)
library(tidyverse)
library(osmdata)
tribble(
~point, ~lat, ~lon,
1, 32.75004, -118.951721,
2, 34.823302, -118.951721,
3, 34.823302, -117.646374,
4, 32.75004, -117.646374,
) %>%
st_as_sf(
coords = c('lon', 'lat'),
crs = 4326
) %>%
{. ->> LA_bounds}
st_bbox(LA_bounds) %>%
opq %>%
add_osm_feature(key = 'highway') %>%
osmdata_sf %>%
`[[`('osm_lines') %>%
{. ->> LA_streets}
If anyone knows how to get around this error with OpenStreetMaps or otherwise adjust the syntax of another package to accommodate cross streets and counties as inputs, I would greatly appreciate it.
I don't have the solution for osmdata. However, I did try it on tidygeocoder. If you're looking for batch encoding without requiring an API key, the only free method would be the US Census Bureau in tidygeocoder, but is computationally expensive. To do this, I combine street1 and street2 together with the ampersand sign &. Then combine it with the county and state into a single column called line_address instead of multiple columns:
examples_address <- tibble(line_address= c("N BENTON WY & W TEMPLE ST, LOS ANGELES, CA", "11TH PL & BLAINE ST, LOS ANGELES, CA", "W 6TH ST & HOPE ST, LOS ANGELES, CA", "S GRAND AV & W 18TH ST, LOS ANGELES, CA", "BROADWAY & 5TH ST, LOS ANGELES, CA"))
examples_address1 <- examples_address %>%
tidygeocoder::geocode(address = line_address, method = "census", verbose = TRUE)
examples_address1
The output that I got:
line_address lat long
N BENTON WY & W TEMPLE ST, LOS ANGELES, CA 34.07289 -118.2757
11TH PL & BLAINE ST, LOS ANGELES, CA NA NA
W 6TH ST & HOPE ST, LOS ANGELES, CA 34.04944 -118.2563
S GRAND AV & W 18TH ST, LOS ANGELES, CA 34.03420 -118.2673
BROADWAY & 5TH ST, LOS ANGELES, CA 34.04808 -118.2507
Unfortunately, as you can see above, not all of the rows gave us a lat and long back from the batch query.
We can use method = "argis" inside the function to give us results for all, but for some reasons, the returned lat and long may different. See the last entry:
line_address lat long
N BENTON WY & W TEMPLE ST, LOS ANGELES, CA 34.07290 -118.2757
11TH PL & BLAINE ST, LOS ANGELES, CA 34.04517 -118.2716
W 6TH ST & HOPE ST, LOS ANGELES, CA 34.04946 -118.2564
S GRAND AV & W 18TH ST, LOS ANGELES, CA 34.03417 -118.2673
BROADWAY & 5TH ST, LOS ANGELES, CA 34.01587 -118.4927
arcgis does not support batch query in tidygeocoder.
I'm trying to quantify the distance between localities, applying the Google_distance r function. I have run the examples of the Google_distance function, but the distance between localities is missing. The others fields are complete, but not the distance.
I have activated just the distance API from google.
The code is:
key <- "your key here"
set_key(key = key, api = "distance")
google_keys()
df <- google_distance(origins = list(c("Melbourne Airport, Australia"),
c("MCG, Melbourne, Australia"),
c(-37.81659, 144.9841)),
destinations = c("Portsea, Melbourne, Australia"),
key = key)
head(df)
$destination_addresses
\[1\] "Portsea VIC 3944, Australia"
$origin_addresses
\[1\] "Melbourne Orlando International Airport (MLB), 1 Air Terminal Pkwy, Melbourne, FL 32901, USA"
\[2\] "Brunton Ave, Richmond VIC 3002, Australia"
\[3\] "Jolimont, Wellington Cres, East Melbourne VIC 3002, Australia"
$rows
\*\*\* Empty \*\*\*
$status
\[1\] "OK"
Perhaps, I need to turn on others google APIs, but I am not sure which ones. Do you have any idea how I could solve this issue?
All suggestions and improvements are welcome,
Thank you very much
The problem might be located in your API or settings. Are you sure you selected the correct one? The default code is working for me and returns rows as expected. I used the Distance Matrix API.
df <- google_distance(origins = list(c("Melbourne Airport, Australia"),
c("MCG, Melbourne, Australia"),
c(-37.81659, 144.9841)),
destinations = c("Portsea, Melbourne, Australia"),
key = key)
head(df)
$destination_addresses
[1] "Portsea VIC 3944, Australia"
$origin_addresses
[1] "Melbourne Airport (MEL), Melbourne Airport VIC 3045, Australia" "Brunton Ave, Richmond VIC 3002, Australia"
[3] "Jolimont, Wellington Cres, East Melbourne VIC 3002, Australia"
$rows
elements
1 133 km, 133291, 1 hour 41 mins, 6043, OK
2 108 km, 107702, 1 hour 24 mins, 5018, OK
3 108 km, 108451, 1 hour 25 mins, 5074, OK
$status
[1] "OK"
I have a column that contains thousands of descriptions like this (example) :
Description
Building a hospital in the city of LA, USA
Building a school in the city of NYC, USA
Building shops in the city of Chicago, USA
I'd like to create a column with the first word after "city of", like that :
Description
City
Building a hospital in the city of LA, USA
LA
Building a school in the city of NYC, USA
NYC
Building shops in the city of Chicago, USA
Chicago
I tried with the following code after seeing this topic Extracting string after specific word, but my column is only filled with missing values
library(stringr)
df$city <- data.frame(str_extract(df$Description, "(?<=city of:\\s)[^;]+"))
df$city <- data.frame(str_extract(df$Description, "(?<=of:\\s)[^;]+"))
I took a look at the dput() and the output is the same than the descriptions i see in the dataframe directly.
Solution
This should make the trick for the data you showed:
df$city <- str_extract(df$Description, "(?<=city of )(\\w+)")
df
#> Description city
#> 1 Building a hospital in the city of LA, USA LA
#> 2 Building a school in the city of NYC, USA NYC
#> 3 Building shops in the city of Chicago, USA Chicago
Alternative
However, in case you want the whole string till the first comma (for example in case of cities with a blank in the name), you can go with:
df$city <- str_extract(df$Description, "(?<=city of )(.+)(?=,)")
Check out the following example:
df <- data.frame(Description = c("Building a hospital in the city of LA, USA",
"Building a school in the city of NYC, USA",
"Building shops in the city of Chicago, USA",
"Building a church in the city of Salt Lake City, USA"))
str_extract(df$Description, "(?<=the city of )(\\w+)")
#> [1] "LA" "NYC" "Chicago" "Salt"
str_extract(df$Description, "(?<=the city of )(.+)(?=,)")
#> [1] "LA" "NYC" "Chicago" "Salt Lake City"
Documentation
Check out ?regex:
Patterns (?=...) and (?!...) are zero-width positive and negative
lookahead assertions: they match if an attempt to match the ...
forward from the current position would succeed (or not), but use up
no characters in the string being processed. Patterns (?<=...) and
(?<!...) are the lookbehind equivalents: they do not allow repetition
quantifiers nor \C in ....
I was working with the googleway package and I had a bunch of addresses that I needed to parse out the various components of the addresses that were in a nested list of lists. Loops (not encouraged) and apply functions both seemed confusing and I was not sure if there was a tidy solution. I found the map function (specifically the pluck function that it calls on lists on the backend) could accomplish my goal so I will share my solution.
Problem:
I need to pull out certain information about the White House such as
Latitude
Longitude
You need to set up your Google Cloud API Key with googleway::set_key(API_KEY), but this is just an example of a nested list that I hope someone working with this package will see.
# Address for the White House and the Lincoln Memorial
address_vec <- c(
"1600 Pennsylvania Ave NW, Washington, DC 20006",
"2 Lincoln Memorial Cir NW, Washington, DC 20002"
)
address_vec <- pmap(list(address_vec), googleway::google_geocode)
outputs
[[1]]
[[1]]$results
address_components
1 1600, Pennsylvania Avenue Northwest, Northwest Washington, Washington, District of Columbia, United States, 20500, 1600, Pennsylvania Avenue NW, Northwest Washington, Washington, DC, US, 20500, street_number, route, neighborhood, political, locality, political, administrative_area_level_1, political, country, political, postal_code
formatted_address geometry.bounds.northeast.lat
1 1600 Pennsylvania Avenue NW, Washington, DC 20500, USA 38.8979
geometry.bounds.northeast.lng geometry.bounds.southwest.lat geometry.bounds.southwest.lng geometry.location.lat
1 -77.03551 38.89731 -77.03796 38.89766
geometry.location.lng geometry.location_type geometry.viewport.northeast.lat geometry.viewport.northeast.lng
1 -77.03657 ROOFTOP 38.89895 -77.03539
geometry.viewport.southwest.lat geometry.viewport.southwest.lng place_id
1 38.89626 -77.03808 ChIJGVtI4by3t4kRr51d_Qm_x58
types
1 establishment, point_of_interest, premise
[[1]]$status
[1] "OK"
[[2]]
[[2]]$results
address_components
1 2, Lincoln Memorial Circle Northwest, Southwest Washington, Washington, District of Columbia, United States, 20037, 2, Lincoln Memorial Cir NW, Southwest Washington, Washington, DC, US, 20037, street_number, route, neighborhood, political, locality, political, administrative_area_level_1, political, country, political, postal_code
formatted_address geometry.location.lat geometry.location.lng
1 2 Lincoln Memorial Cir NW, Washington, DC 20037, USA 38.88927 -77.05018
geometry.location_type geometry.viewport.northeast.lat geometry.viewport.northeast.lng
1 ROOFTOP 38.89062 -77.04883
geometry.viewport.southwest.lat geometry.viewport.southwest.lng place_id
1 38.88792 -77.05152 ChIJgRuEham3t4kRFju4R6De__g
plus_code.compound_code plus_code.global_code types
1 VWQX+PW Washington, DC, USA 87C4VWQX+PW street_address
[[2]]$status
[1] "OK"
Here's some code that I got from the Googleway Vignette:
df <- google_geocode(address = "Flinders Street Station",
key = key,
simplify = TRUE)
geocode_coordinates(df)
# lat lng
# 1 -37.81827 144.9671
It looks like what you need to do is:
df <- google_geocode("1600 Pennsylvania Ave")
geocode_coordinates(df)
The solution I came up with is a custom function that can access any section of the list:
geocode_accessor <- function(df, accessor, ...) {
unlist(map(df, list(accessor, ...)))
}
This has three important parts to understand:
The map function is calling the pluck function for us (it replaces the use of [[ ). You can read more about what is happening here, but just know this lets us access things by name
The "..." in the function's definition as well as in the list allows us to access multiple levels. Again, the use of list() to access further levels in a list is explained in the pluck documentation
The use of unlist converts the list to a vector (what I want in my instance)
Putting this all together, we can get the latitude of the White House & Lincoln Memorial:
geocode_accessor(address_vec, "results", "geometry", "location", "lat")
[1] 38.89766 38.88927
I'm trying to write a simple code to check if a street address exists:
In my first try I put the write address and it gives me the correct adress:
addr <- '2147 Newhall Street,Santa Clara,CA 95050'
url = paste('http://maps.google.com/maps/api/geocode/xml?address=', addr,'&sensor=false',sep='')
doc = xmlTreeParse(url)
root = xmlRoot(doc)
lat = xmlValue(root[['result']][['geometry']][['location']][['lat']])
long = xmlValue(root[['result']][['geometry']][['location']][['lng']])
lat
"37.3386004"
long
"-121.9405759"
But if I write a wrong street address it's still giving me co-ordinates:
addr <- 'xyz,Santa Clara,CA 95050' # set your address here
url = paste('http://maps.google.com/maps/api/geocode/xml?address=', addr,'&sensor=false',sep='')
doc = xmlTreeParse(url)
root = xmlRoot(doc)
lat = xmlValue(root[['result']][['geometry']][['location']][['lat']])
long = xmlValue(root[['result']][['geometry']][['location']][['lng']])
lat
"37.3539663"
long
"-121.9529992"
I'm sure the street address above does not exist, but I'm still getting some coordinates. Is there anyway I can return an NA value or some flag if there is no valid street address?
There's already a nice wrapper of the Google Maps geocoding API in the ggmap package. If you set its output parameter to more, it will return a loctype which indicates if the address is precisely matched (rooftop) or an approximation (approximate, range_interpolated, geometric_center). See the documentation for further detail.
library(ggmap)
addr <- '2147 Newhall Street,Santa Clara,CA 95050'
geocode(addr, 'more')
# Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=2147%20Newhall%20Street,Santa%20Clara,CA%2095050&sensor=false
# lon lat type loctype address north
# 1 -121.9406 37.3386 street_address rooftop 2147 newhall st, santa clara, ca 95050, usa 37.33995
# south east west street_number route locality
# 1 37.33725 -121.9392 -121.9419 2147 Newhall Street Santa Clara
# administrative_area_level_2 administrative_area_level_1 country postal_code
# 1 Santa Clara County California United States 95050
addr <- 'xyz,Santa Clara,CA 95050'
geocode(addr, 'more')
# Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=xyz,Santa%20Clara,CA%2095050&sensor=false
# lon lat type loctype address north south
# 1 -121.953 37.35397 postal_code approximate santa clara, ca 95050, usa 37.37448 37.32314
# east west postal_code locality administrative_area_level_2
# 1 -121.9309 -121.9703 95050 Santa Clara Santa Clara County
# administrative_area_level_1 country
# 1 California United States