geocode cannot find the location but not Openstreet - r

I am trying to use geocode from tidygeocoder to get latitude and longitude of a list of addresses.
I have no issues using locating my address using the webpage, but with tidygeocoder, the same address returns 0.
The test address is "Building 13, Connewarra Avenue, Aspendale, Melbourne, City of Kingston, Victoria, 3195, Australia"
I initially tried "13 Connewarra Avenue, ASPENDALE, VIC, 3195" with no luck, went to the webpage and can locate it there:
https://www.openstreetmap.org/search?query=13%20Connewarra%20Avenue%2C%20ASPENDALE%2C%20VIC%2C%203195#map=19/-38.02011/145.10469
so i copied the address it located as full address "Building 13, Connewarra Avenue, Aspendale, Melbourne, City of Kingston, Victoria, 3195, Australia", but still no luck with geocode.
test$text<-"Building 13, Connewarra Avenue, Aspendale, Melbourne, City of Kingston, Victoria, 3195, Australia"
test%>%
geocode(address = text, method = "osm", verbose = TRUE)
...
how to fix this?

Related

Unknown issue prevents geocode_reverse from working

I recently update RStudio to the version RStudio 2022.07.1, working on Windows 10.
When I tried different geocode reverse functions(Which is input coordinate, output is the address), they all return no found.
Example 1:
library(revgeo)
revgeo(-77.016472, 38.785026)
Suppose return "146 National Plaza, Fort Washington, Maryland, 20745, United States of America". But I got
"Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026"
[[1]]
[1] "House Number Not Found Street Not Found, City Not Found, State Not Found, Postcode Not Found, Country Not Found"
Data from https://github.com/mhudecheck/revgeo
Example 2:
library(tidygeocoder)
library(dplyr)
path <- "filepath"
df <- read.csv (paste (path, "sample.csv", sep = ""))
reverse <- df %>%
reverse_geocode(lat = longitude, long = latitude, method = 'osm',
address = address_found, full_results = TRUE)
reverse
Where the sample.csv is
name
addr
latitude
longitude
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
Suppose to get
name
addr
latitude
longitude
address_found
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
White House, 1600, Pennsylvania Avenue Northwest, Washington, District of Columbia, 20500, United States
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Transamerica Pyramid, 600, Montgomery Street, Chinatown, San Francisco, San Francisco City and County, San Francisco, California, 94111, United States
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
South Wacker Drive, Printer’s Row, Loop, Chicago, Cook County, Illinois, 60606, United States
But I got
# A tibble: 3 × 5
name addr latitude longitude address_found
<chr> <chr> <dbl> <dbl> <chr>
1 White House 1600 Pennsylvania Ave NW, Wash… 38.9 -77.0 NA
2 Transamerica Pyramid 600 Montgomery St, San Francis… 37.8 -122. NA
3 Willis Tower 233 S Wacker Dr, Chicago, IL 6… 41.9 -87.6 NA
Data source: https://cran.r-project.org/web/packages/tidygeocoder/readme/README.html
However, when I tried
reverse_geo(lat = 38.895865, long = -77.0307713, method = "osm")
I'm able to get
# A tibble: 1 × 3
lat long address
<dbl> <dbl> <chr>
1 38.9 -77.0 Pennsylvania Avenue, Washington, District of Columbia, 20045, United States
I had contact the tidygeocoder developer, he/she didn't find out any problem. Detail in https://github.com/jessecambon/tidygeocoder/issues/175
Not sure which part goes wrong. Anyone want try on their RStudio?
The updated revgeo needs to be submitted to CRAN. This has nothing to do with RStudio.
Going to http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026 in my browser also returns an error. However, I searched for the Photon reverse geocoder, and their example uses .io not .de in the URL, and https://photon.komoot.io/reverse?lon=-77.016472&lat=38.785026 works.
Photon also include a Note at the bottom of their examples:
Until October 2020 the API was available under photon.komoot.de. Requests still work as they redirected to photon.komoot.io but please update your apps accordingly.
Seems like that redirect is either broken or deprecated.
The version of revgeo on github has this change made already, so you can get a working version by using remotes::install_github("https://github.com/mhudecheck/revgeo")

Quantifying the distance using Google_distance r function

I'm trying to quantify the distance between localities, applying the Google_distance r function. I have run the examples of the Google_distance function, but the distance between localities is missing. The others fields are complete, but not the distance.
I have activated just the distance API from google.
The code is:
key <- "your key here"
set_key(key = key, api = "distance")
google_keys()
df <- google_distance(origins = list(c("Melbourne Airport, Australia"),
c("MCG, Melbourne, Australia"),
c(-37.81659, 144.9841)),
destinations = c("Portsea, Melbourne, Australia"),
key = key)
head(df)
$destination_addresses
\[1\] "Portsea VIC 3944, Australia"
$origin_addresses
\[1\] "Melbourne Orlando International Airport (MLB), 1 Air Terminal Pkwy, Melbourne, FL 32901, USA"
\[2\] "Brunton Ave, Richmond VIC 3002, Australia"
\[3\] "Jolimont, Wellington Cres, East Melbourne VIC 3002, Australia"
$rows
\*\*\* Empty \*\*\*
$status
\[1\] "OK"
Perhaps, I need to turn on others google APIs, but I am not sure which ones. Do you have any idea how I could solve this issue?
All suggestions and improvements are welcome,
Thank you very much
The problem might be located in your API or settings. Are you sure you selected the correct one? The default code is working for me and returns rows as expected. I used the Distance Matrix API.
df <- google_distance(origins = list(c("Melbourne Airport, Australia"),
c("MCG, Melbourne, Australia"),
c(-37.81659, 144.9841)),
destinations = c("Portsea, Melbourne, Australia"),
key = key)
head(df)
$destination_addresses
[1] "Portsea VIC 3944, Australia"
$origin_addresses
[1] "Melbourne Airport (MEL), Melbourne Airport VIC 3045, Australia" "Brunton Ave, Richmond VIC 3002, Australia"
[3] "Jolimont, Wellington Cres, East Melbourne VIC 3002, Australia"
$rows
elements
1 133 km, 133291, 1 hour 41 mins, 6043, OK
2 108 km, 107702, 1 hour 24 mins, 5018, OK
3 108 km, 108451, 1 hour 25 mins, 5074, OK
$status
[1] "OK"

Extracting first word after a specific expression in R

I have a column that contains thousands of descriptions like this (example) :
Description
Building a hospital in the city of LA, USA
Building a school in the city of NYC, USA
Building shops in the city of Chicago, USA
I'd like to create a column with the first word after "city of", like that :
Description
City
Building a hospital in the city of LA, USA
LA
Building a school in the city of NYC, USA
NYC
Building shops in the city of Chicago, USA
Chicago
I tried with the following code after seeing this topic Extracting string after specific word, but my column is only filled with missing values
library(stringr)
df$city <- data.frame(str_extract(df$Description, "(?<=city of:\\s)[^;]+"))
df$city <- data.frame(str_extract(df$Description, "(?<=of:\\s)[^;]+"))
I took a look at the dput() and the output is the same than the descriptions i see in the dataframe directly.
Solution
This should make the trick for the data you showed:
df$city <- str_extract(df$Description, "(?<=city of )(\\w+)")
df
#> Description city
#> 1 Building a hospital in the city of LA, USA LA
#> 2 Building a school in the city of NYC, USA NYC
#> 3 Building shops in the city of Chicago, USA Chicago
Alternative
However, in case you want the whole string till the first comma (for example in case of cities with a blank in the name), you can go with:
df$city <- str_extract(df$Description, "(?<=city of )(.+)(?=,)")
Check out the following example:
df <- data.frame(Description = c("Building a hospital in the city of LA, USA",
"Building a school in the city of NYC, USA",
"Building shops in the city of Chicago, USA",
"Building a church in the city of Salt Lake City, USA"))
str_extract(df$Description, "(?<=the city of )(\\w+)")
#> [1] "LA" "NYC" "Chicago" "Salt"
str_extract(df$Description, "(?<=the city of )(.+)(?=,)")
#> [1] "LA" "NYC" "Chicago" "Salt Lake City"
Documentation
Check out ?regex:
Patterns (?=...) and (?!...) are zero-width positive and negative
lookahead assertions: they match if an attempt to match the ...
forward from the current position would succeed (or not), but use up
no characters in the string being processed. Patterns (?<=...) and
(?<!...) are the lookbehind equivalents: they do not allow repetition
quantifiers nor \C in ....

Fuzzy Matching/Join Two Data Frames of University Names [duplicate]

This question already has answers here:
How can I match fuzzy match strings from two datasets?
(7 answers)
Closed 4 years ago.
I have a list of university names input with spelling errors and inconsistencies. I need to match them against an official list of university names to link my data together.
I know fuzzy matching/join is my way to go, but I'm a bit lost on the correct method. Any help would be greatly appreciated.
d<-data.frame(name=c("University of New Yorkk", "The University of South
Carolina", "Syracuuse University", "University of South Texas",
"The University of No Carolina"), score = c(1,3,6,10,4))
y<-data.frame(name=c("University of South Texas", "The University of North
Carolina", "University of South Carolina", "Syracuse
University","University of New York"), distance = c(100, 400, 200, 20, 70))
And I desire an output that has them merged together as closely as possible
matched<-data.frame(name=c("University of New Yorkk", "The University of South Carolina",
"Syracuuse University","University of South Texas","The University of No Carolina"),
correctmatch = c("University of New York", "University of South Carolina",
"Syracuse University","University of South Texas", "The University of North Carolina"))
I use adist() for things like this and have little wrapper function called closest_match() to help compare a value against a set of "good/permitted" values.
library(magrittr) # for the %>%
closest_match <- function(bad_value, good_values) {
distances <- adist(bad_value, good_values, ignore.case = TRUE) %>%
as.numeric() %>%
setNames(good_values)
distances[distances == min(distances)] %>%
names()
}
sapply(d$name, function(x) closest_match(x, y$name)) %>%
setNames(d$name)
University of New Yorkk The University of South\n Carolina Syracuuse University
"University of New York" "University of South Carolina" "University of New York"
University of South Texas The University of No Carolina
"University of South Texas" "University of South Carolina"
adist() utilizes Levenshtein distance to compare similarity between two strings.

Removing similar characters from field

I have File1.csv contains 3000 records, from which I need to remove characters not related to address.
Each record starts from "&" or "A/O".
I need to clean my "Address1" field, if it's no address related info in the field,
I need to have empty record.
Example:
File1.csv:
Address1
&&2340 Clemb Street
&&564 7th Street
&&&10th Street
A/O11th Street
A/ONorth Street
A/O/OSouth Street
A/Ocareof
A/Otttt
A/Oyuyuyu
A/Ouiuiuiuiui
A/O/yuyyuyuyuyugggh 4510th Street
&uhhhhhello 56 11th Street
I am expecting result in File1 -without A/O, A/O/O, A/Ouiuiuiui etc.:
File1.csv:
Address1
2340 Clemb Street
564 7th Street
10th Street
11th Street
North Street
South Street
<blank record>
<blank record>
<blank record>
<blank record>
4510th Street
56 11th Street
Thanx for help!
There are almost certainly fancier matching patterns you could use, but gsub() and the following seem to get the job done with this dataset:
x <- c('&&2340 Clemb Street',
'&&564 7th Street',
'&&&10th Street',
'A/O11th Street',
'A/ONorth Street',
'A/O/OSouth Street')
gsub("&|A/O|/O", "", x)
#-----
[1] "2340 Clemb Street" "564 7th Street" "10th Street" "11th Street"
[5] "North Street" "South Street"
Intro to regex can be found here.

Resources