Quantifying the distance using Google_distance r function - r

I'm trying to quantify the distance between localities, applying the Google_distance r function. I have run the examples of the Google_distance function, but the distance between localities is missing. The others fields are complete, but not the distance.
I have activated just the distance API from google.
The code is:
key <- "your key here"
set_key(key = key, api = "distance")
google_keys()
df <- google_distance(origins = list(c("Melbourne Airport, Australia"),
c("MCG, Melbourne, Australia"),
c(-37.81659, 144.9841)),
destinations = c("Portsea, Melbourne, Australia"),
key = key)
head(df)
$destination_addresses
\[1\] "Portsea VIC 3944, Australia"
$origin_addresses
\[1\] "Melbourne Orlando International Airport (MLB), 1 Air Terminal Pkwy, Melbourne, FL 32901, USA"
\[2\] "Brunton Ave, Richmond VIC 3002, Australia"
\[3\] "Jolimont, Wellington Cres, East Melbourne VIC 3002, Australia"
$rows
\*\*\* Empty \*\*\*
$status
\[1\] "OK"
Perhaps, I need to turn on others google APIs, but I am not sure which ones. Do you have any idea how I could solve this issue?
All suggestions and improvements are welcome,
Thank you very much

The problem might be located in your API or settings. Are you sure you selected the correct one? The default code is working for me and returns rows as expected. I used the Distance Matrix API.
df <- google_distance(origins = list(c("Melbourne Airport, Australia"),
c("MCG, Melbourne, Australia"),
c(-37.81659, 144.9841)),
destinations = c("Portsea, Melbourne, Australia"),
key = key)
head(df)
$destination_addresses
[1] "Portsea VIC 3944, Australia"
$origin_addresses
[1] "Melbourne Airport (MEL), Melbourne Airport VIC 3045, Australia" "Brunton Ave, Richmond VIC 3002, Australia"
[3] "Jolimont, Wellington Cres, East Melbourne VIC 3002, Australia"
$rows
elements
1 133 km, 133291, 1 hour 41 mins, 6043, OK
2 108 km, 107702, 1 hour 24 mins, 5018, OK
3 108 km, 108451, 1 hour 25 mins, 5074, OK
$status
[1] "OK"

Related

Unknown issue prevents geocode_reverse from working

I recently update RStudio to the version RStudio 2022.07.1, working on Windows 10.
When I tried different geocode reverse functions(Which is input coordinate, output is the address), they all return no found.
Example 1:
library(revgeo)
revgeo(-77.016472, 38.785026)
Suppose return "146 National Plaza, Fort Washington, Maryland, 20745, United States of America". But I got
"Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026"
[[1]]
[1] "House Number Not Found Street Not Found, City Not Found, State Not Found, Postcode Not Found, Country Not Found"
Data from https://github.com/mhudecheck/revgeo
Example 2:
library(tidygeocoder)
library(dplyr)
path <- "filepath"
df <- read.csv (paste (path, "sample.csv", sep = ""))
reverse <- df %>%
reverse_geocode(lat = longitude, long = latitude, method = 'osm',
address = address_found, full_results = TRUE)
reverse
Where the sample.csv is
name
addr
latitude
longitude
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
Suppose to get
name
addr
latitude
longitude
address_found
White House
1600 Pennsylvania Ave NW, Washington, DC
38.89770
-77.03655
White House, 1600, Pennsylvania Avenue Northwest, Washington, District of Columbia, 20500, United States
Transamerica Pyramid
600 Montgomery St, San Francisco, CA 94111
37.79520
-122.40279
Transamerica Pyramid, 600, Montgomery Street, Chinatown, San Francisco, San Francisco City and County, San Francisco, California, 94111, United States
Willis Tower
233 S Wacker Dr, Chicago, IL 60606
41.87535
-87.63576
South Wacker Drive, Printer’s Row, Loop, Chicago, Cook County, Illinois, 60606, United States
But I got
# A tibble: 3 × 5
name addr latitude longitude address_found
<chr> <chr> <dbl> <dbl> <chr>
1 White House 1600 Pennsylvania Ave NW, Wash… 38.9 -77.0 NA
2 Transamerica Pyramid 600 Montgomery St, San Francis… 37.8 -122. NA
3 Willis Tower 233 S Wacker Dr, Chicago, IL 6… 41.9 -87.6 NA
Data source: https://cran.r-project.org/web/packages/tidygeocoder/readme/README.html
However, when I tried
reverse_geo(lat = 38.895865, long = -77.0307713, method = "osm")
I'm able to get
# A tibble: 1 × 3
lat long address
<dbl> <dbl> <chr>
1 38.9 -77.0 Pennsylvania Avenue, Washington, District of Columbia, 20045, United States
I had contact the tidygeocoder developer, he/she didn't find out any problem. Detail in https://github.com/jessecambon/tidygeocoder/issues/175
Not sure which part goes wrong. Anyone want try on their RStudio?
The updated revgeo needs to be submitted to CRAN. This has nothing to do with RStudio.
Going to http://photon.komoot.de/reverse?lon=-77.016472&lat=38.785026 in my browser also returns an error. However, I searched for the Photon reverse geocoder, and their example uses .io not .de in the URL, and https://photon.komoot.io/reverse?lon=-77.016472&lat=38.785026 works.
Photon also include a Note at the bottom of their examples:
Until October 2020 the API was available under photon.komoot.de. Requests still work as they redirected to photon.komoot.io but please update your apps accordingly.
Seems like that redirect is either broken or deprecated.
The version of revgeo on github has this change made already, so you can get a working version by using remotes::install_github("https://github.com/mhudecheck/revgeo")

Extracting first word after a specific expression in R

I have a column that contains thousands of descriptions like this (example) :
Description
Building a hospital in the city of LA, USA
Building a school in the city of NYC, USA
Building shops in the city of Chicago, USA
I'd like to create a column with the first word after "city of", like that :
Description
City
Building a hospital in the city of LA, USA
LA
Building a school in the city of NYC, USA
NYC
Building shops in the city of Chicago, USA
Chicago
I tried with the following code after seeing this topic Extracting string after specific word, but my column is only filled with missing values
library(stringr)
df$city <- data.frame(str_extract(df$Description, "(?<=city of:\\s)[^;]+"))
df$city <- data.frame(str_extract(df$Description, "(?<=of:\\s)[^;]+"))
I took a look at the dput() and the output is the same than the descriptions i see in the dataframe directly.
Solution
This should make the trick for the data you showed:
df$city <- str_extract(df$Description, "(?<=city of )(\\w+)")
df
#> Description city
#> 1 Building a hospital in the city of LA, USA LA
#> 2 Building a school in the city of NYC, USA NYC
#> 3 Building shops in the city of Chicago, USA Chicago
Alternative
However, in case you want the whole string till the first comma (for example in case of cities with a blank in the name), you can go with:
df$city <- str_extract(df$Description, "(?<=city of )(.+)(?=,)")
Check out the following example:
df <- data.frame(Description = c("Building a hospital in the city of LA, USA",
"Building a school in the city of NYC, USA",
"Building shops in the city of Chicago, USA",
"Building a church in the city of Salt Lake City, USA"))
str_extract(df$Description, "(?<=the city of )(\\w+)")
#> [1] "LA" "NYC" "Chicago" "Salt"
str_extract(df$Description, "(?<=the city of )(.+)(?=,)")
#> [1] "LA" "NYC" "Chicago" "Salt Lake City"
Documentation
Check out ?regex:
Patterns (?=...) and (?!...) are zero-width positive and negative
lookahead assertions: they match if an attempt to match the ...
forward from the current position would succeed (or not), but use up
no characters in the string being processed. Patterns (?<=...) and
(?<!...) are the lookbehind equivalents: they do not allow repetition
quantifiers nor \C in ....

How to quickly access elements in a list of list without a loop in R? (Googleway package)

I was working with the googleway package and I had a bunch of addresses that I needed to parse out the various components of the addresses that were in a nested list of lists. Loops (not encouraged) and apply functions both seemed confusing and I was not sure if there was a tidy solution. I found the map function (specifically the pluck function that it calls on lists on the backend) could accomplish my goal so I will share my solution.
Problem:
I need to pull out certain information about the White House such as
Latitude
Longitude
You need to set up your Google Cloud API Key with googleway::set_key(API_KEY), but this is just an example of a nested list that I hope someone working with this package will see.
# Address for the White House and the Lincoln Memorial
address_vec <- c(
"1600 Pennsylvania Ave NW, Washington, DC 20006",
"2 Lincoln Memorial Cir NW, Washington, DC 20002"
)
address_vec <- pmap(list(address_vec), googleway::google_geocode)
outputs
[[1]]
[[1]]$results
address_components
1 1600, Pennsylvania Avenue Northwest, Northwest Washington, Washington, District of Columbia, United States, 20500, 1600, Pennsylvania Avenue NW, Northwest Washington, Washington, DC, US, 20500, street_number, route, neighborhood, political, locality, political, administrative_area_level_1, political, country, political, postal_code
formatted_address geometry.bounds.northeast.lat
1 1600 Pennsylvania Avenue NW, Washington, DC 20500, USA 38.8979
geometry.bounds.northeast.lng geometry.bounds.southwest.lat geometry.bounds.southwest.lng geometry.location.lat
1 -77.03551 38.89731 -77.03796 38.89766
geometry.location.lng geometry.location_type geometry.viewport.northeast.lat geometry.viewport.northeast.lng
1 -77.03657 ROOFTOP 38.89895 -77.03539
geometry.viewport.southwest.lat geometry.viewport.southwest.lng place_id
1 38.89626 -77.03808 ChIJGVtI4by3t4kRr51d_Qm_x58
types
1 establishment, point_of_interest, premise
[[1]]$status
[1] "OK"
[[2]]
[[2]]$results
address_components
1 2, Lincoln Memorial Circle Northwest, Southwest Washington, Washington, District of Columbia, United States, 20037, 2, Lincoln Memorial Cir NW, Southwest Washington, Washington, DC, US, 20037, street_number, route, neighborhood, political, locality, political, administrative_area_level_1, political, country, political, postal_code
formatted_address geometry.location.lat geometry.location.lng
1 2 Lincoln Memorial Cir NW, Washington, DC 20037, USA 38.88927 -77.05018
geometry.location_type geometry.viewport.northeast.lat geometry.viewport.northeast.lng
1 ROOFTOP 38.89062 -77.04883
geometry.viewport.southwest.lat geometry.viewport.southwest.lng place_id
1 38.88792 -77.05152 ChIJgRuEham3t4kRFju4R6De__g
plus_code.compound_code plus_code.global_code types
1 VWQX+PW Washington, DC, USA 87C4VWQX+PW street_address
[[2]]$status
[1] "OK"
Here's some code that I got from the Googleway Vignette:
df <- google_geocode(address = "Flinders Street Station",
key = key,
simplify = TRUE)
geocode_coordinates(df)
# lat lng
# 1 -37.81827 144.9671
It looks like what you need to do is:
df <- google_geocode("1600 Pennsylvania Ave")
geocode_coordinates(df)
The solution I came up with is a custom function that can access any section of the list:
geocode_accessor <- function(df, accessor, ...) {
unlist(map(df, list(accessor, ...)))
}
This has three important parts to understand:
The map function is calling the pluck function for us (it replaces the use of [[ ). You can read more about what is happening here, but just know this lets us access things by name
The "..." in the function's definition as well as in the list allows us to access multiple levels. Again, the use of list() to access further levels in a list is explained in the pluck documentation
The use of unlist converts the list to a vector (what I want in my instance)
Putting this all together, we can get the latitude of the White House & Lincoln Memorial:
geocode_accessor(address_vec, "results", "geometry", "location", "lat")
[1] 38.89766 38.88927

Extract date from texts in corpus R

I have a corpus object from which I want to extract data so I can add them as docvar.
The object looks like this
v1 <- c("(SE22-y -7 A go q ,, Document of The World Bank FOR OFFICIAL USE ONLY il I ( >I8.( )]i 1 t'f-l±E C 4'( | Report No. 9529-LSO l il .rt N ,- / . t ,!I . 1. 'i 1( T v f) (: AR.) STAFF APPRAISAL REPORT KINGDOM OF LESOTHO EDUCATION SECTOR DEVELOPMENT PROJECT JUNE 19, 1991 Population and Human Resources Division Southern Africa Department This document has a restricted distribution and may be used by reipients only in the performance of their official duties. Its contents may not otherwise be disclosed without World Bank authorization.",
"Document of The World Bank Report No. 13611-PAK STAFF APPRAISAL REPORT PAKISTAN POPULATION WELFARE PROGRAM PROJECT FREBRUARY 10, 1995 Population and Human Resources Division Country Department I South Asia Region",
"I Toward an Environmental Strategy for Asia A Summary of a World Bank Discussion Paper Carter Brandon Ramesh Ramankutty The World Bank Washliington, D.C. (C 1993 The International Bank for Reconstruction and Development / THiE WORLD BANK 1818 H Street, N.W. Washington, D.C. 20433 All rights reserved Manufactured in the United States of America First printing November 1993",
"Report No. PID9188 Project Name East Timor-TP-Emergency School (#) Readiness Project Region East Asia and Pacific Region Sector Other Education Project ID TPPE70268 Borrower(s) EAST TIMOR Implementing Agency Address UNTAET (UN TRANSITIONAL ADMINISTRATION FOR EAST TIMOR) Contact Person: Cecilio Adorna, UNTAET, Dili, East Timor Fax: 61-8 89 422198 Environment Category C Date PID Prepared June 16, 2000 Projected Appraisal Date May 27, 2000 Projected Board Date June 20, 2000",
"Page 1 CONFORMED COPY CREDIT NUMBER 2447-CHA (Reform, Institutional Support and Preinvestment Project) between PEOPLE'S REPUBLIC OF CHINA and INTERNATIONAL DEVELOPMENT ASSOCIATION Dated December 30, 1992")
c1 <- corpus(v1)
The first thing I want to do is extract the first occurring date, mostly it occurs as "Month Year" (December 1990) or "Month Day, Year" (JUNE 19, 1991) or with a typo FREBRUARY 10, 1995 in which case the month could be discarded.
My code is a combination of
Extract date text from string
&
Extract Dates in any format from Text in R:
lapply(c1$documents$texts, function(x) anydate(str_extract_all(c1$documents$texts, "[[:alnum:]]+[ /]*\\d{2}[ /]*\\d{4}")))
and get the error:
Error in anytime_cpp(x = x, tz = tz, asUTC = asUTC, asDate = TRUE, useR = useR, : Unsupported Type
However, I do not know how to supply the date format. Furthermore, I don't really get how to write the correct regular expressions.
https://www.regular-expressions.info/dates.html & https://www.regular-expressions.info/rlanguage.html
other questions on this subject are:
Extract date from text
Need to extract date from a text file of strings in R
http://r.789695.n4.nabble.com/Regexp-extract-first-occurrence-of-date-in-string-td997254.html
Extract date from given string in r
str_extract_all(texts(c1)
, "(\\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Nov(?:ember)?|Oct(?:ober)?|Dec(?:ember)?) (?:19[7-9]\\d|2\\d{3})(?=\\D|$))|(\\b(?:JAN(?:UARY)?|FEB(?:RUARY)?|MAR(?:CH)?|APR(?:IL)?|MAY|JUN(?:E)?|JUL(?:Y)?|AUG(?:UST)?|SEP(?:TEMBER)?|NOV(?:EMBER)?|OCT(?:OBER)?|DEC(?:EMBER)?) (?:19[7-9]\\d|2\\d{3})(?=\\D|$))|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\\s+\\d{1,2},\\s+\\d{4})|(\\b(JAN(UARY)?|FEB(RUARY)?|MAR(CH)?|APR(IL)?|MAY|JUN(E)?|JUL(Y)?|AUG(UST)?|SEP(TEMBER)?|OCT(OBER)?|NOV(EMBER)?|DEC(EMBER)?)\\s+\\d{1,2},\\s+\\d{4})"
, simplify = TRUE)[,1]
This gives the first occurrence of format JUNE 19, 1991 or December 1990

Extracting cities from string vector in R

I have a column in my dataset db, say db$affiliation, which looks like:
**db$affiliation**
[1] "[SCOTT, ALLEN J.] UNIV CALIF LOS ANGELES, DEPT GEOG, LOS ANGELES, CA 90095 USA"
[2] "[VAN DUINEN, RIANNE; VAN DER VEEN, ANNE] UNIV TWENTE, DEPT WATER ENGN & MANAGEMENT, DRIENERLOLAAN 5,POB 217, NL-7500 AE ENSCHEDE, NETHERLANDS."
[3] "[ANANTSUKSOMSRI, SUTEE] CHULALONGKORN UNIV, FAC ARCHITECTURE, BANGKOK, THAILAND."
[4] ...
I would like to create a column within the same dataset containing only the name of the city in db$affiliation, such as
**db$cities**
[1] LOS ANGELES
[2] TWENTE
[3] BANGKOK
[4] ...
If multiple city names are available, I'd like the command to return only the last one, if no city names are available I'd like to have NA. How can I do that?
I thought that I could use world.cities$name in data(world.cities) in the maps package but I can not figure out how.
I even tried to split the db$affiliation column such as:
db$affiliation <- gsub("\\[[^\\]]*\\]", "", db$affiliation, perl=TRUE) # remove content within brackets
db$affiliation[2] # check the separator
db <- cSplit(db, 'affiliation', sep=c(", "), type.convert=FALSE) # split after comma
Which results (I've truncated it after affiliation_3) in:
affiliation_1 affiliation_2 affiliation_3
[1] UNIV CALIF LOS ANGELES DEPT GEOG LOS ANGELES
[2] UNIV TWENTE DEPT WATER ENGN & MANAGEMENT DRIENERLOLAAN
[3] CHULALONGKORN UNIV FAC ARCHITECTURE BANGKOK
And then pass:
db$cities <- lapply(db$affiliation_1, function(x)x[which(x %in% world.cities$name)])
But I get an empty column.
Thanks for the help!
There are many cities in your sample string so you may need to think again if you still want to fetch the 'last city' in case multiple cities are found in affiliation column.
library(maps)
data(world.cities)
#sample data
df <- data.frame(affiliation = c("[SCOTT, ALLEN J.] UNIV CALIF LOS ANGELES, DEPT GEOG, LOS ANGELES, CA 90095 USA",
"[VAN DUINEN, RIANNE; VAN DER VEEN, ANNE] UNIV TWENTE, DEPT WATER ENGN & MANAGEMENT, DRIENERLOLAAN 5,POB 217, NL-7500 AE ENSCHEDE, NETHERLANDS.",
"[ANANTSUKSOMSRI, SUTEE] CHULALONGKORN UNIV, FAC ARCHITECTURE, BANGKOK, THAILAND.",
"Prem"), stringsAsFactors = F)
#fetch city and it's respective country from 'affiliation' column
cities_country <- lapply(gsub("\\[|\\]|[,;]|\\.","",df$affiliation), function(x)
paste(as.character(world.cities$name[sapply(world.cities$name, grepl, x, ignore.case=T)]),
as.character(world.cities$country.etc[sapply(world.cities$name, grepl, x, ignore.case=T)]),
sep="_"))
df$cities_country <- lapply(cities_country, function(x) if(identical(x, character(0))) NA_character_ else x)
df
Output is:
affiliation
1 [SCOTT, ALLEN J.] UNIV CALIF LOS ANGELES, DEPT GEOG, LOS ANGELES, CA 90095 USA
2 [VAN DUINEN, RIANNE; VAN DER VEEN, ANNE] UNIV TWENTE, DEPT WATER ENGN & MANAGEMENT, DRIENERLOLAAN 5,POB 217, NL-7500 AE ENSCHEDE, NETHERLANDS.
3 [ANANTSUKSOMSRI, SUTEE] CHULALONGKORN UNIV, FAC ARCHITECTURE, BANGKOK, THAILAND.
4 Prem
cities_country
1 Al_Norway, Alle_Switzerland, Allen_Philippines, Allen_USA, Angeles_Costa Rica, Angeles_Philippines, Cali_Colombia, Cot_Costa Rica, Li_Norway, Los Angeles_Chile, Los Angeles_USA, Os_Kyrgyzstan, Os_Norway, U_Micronesia, Usa_Japan
2 Ae_Marshall Islands, Ede_Netherlands, Ede_Nigeria, Enschede_Netherlands, Hede_China, Ine_Marshall Islands, Laa_Austria, Lola_Guinea, Man_Ivory Coast, Mana_French Guiana, Manage_Belgium, Nagem_Luxembourg, Ob_Russia, Ola_Panama, Po_Burkina Faso, U_Micronesia, Van_Turkey, Wa_Ghana, We_New Caledonia
3 Aila_Estonia, Al_Norway, Anan_Japan, Ba_Fiji, Bangkok_Thailand, Hit_Iraq, Ila_Nigeria, Ilan_Taiwan, Long_Thailand, Nan_Thailand, Tsu_Japan, U_Micronesia, Ula_Turkey
4 NA
(Note that in above output I have kept all occurrences of cities and for convenience also suffixed it with their respective countries)
From the few lines you have shown it looks like you might be able to do the following (note you missed aligning the casing):
tmpVec <- sapply(strsplit(db$affiliation, split = ","), function(x) {
cleanVec <- toupper(trimws(x))
cleanVec[max(which(cleanVec %in% toupper(maps::world.cities$name)))]
})
Or put a bit more code into the function to avoid the ugly warnings.
Let me leave a part of a solution. As far as I can tell from my own research, letters in the square brackets seem to indicate personal names. For example, I found that Sutee Anantsuksomsri is an actual name. This observation suggests that we probably want to remove texts in the brackets.
Once I removed the texts in the square brackets, I split the words using unnest_tokens() in the tidytext package. Note that the function converts all letters to small letters. If you do not like it, you can change that by specifying to_lower = FALSE. First, I split each city name into word. I also assigned an ID number for each city. Second, I cleaned up your data. As I said earlier, I removed texts in square brackets using gsub(). Then, I applied unnest_tokens() to the data. I subset words using the words from cities in filter(). The result we get up to this point is the following. Obviously, you have more work to do. I leave the sampling data, mydf below. I hope you can move on from here.
data(world.cities)
cities <- world.cities %>%
mutate(id = 1:n()) %>%
unnest_tokens(input = name, output = word, token = "words")
temp <- mydf %>%
mutate(affiliation = gsub(x = affiliation, pattern = "\\[.*\\]", replacement = "")) %>%
unnest_tokens(input = affiliation, output = word, token = "words") %>%
filter(word %in% cities$word)
id word
1 1 los
2 1 angeles
3 1 los
4 1 angeles
5 1 ca
6 1 usa
7 2 water
8 2 ae
9 2 enschede
10 3 bangkok
DATA
mydf <- structure(list(id = 1:3, affiliation = c("[SCOTT, ALLEN J.] UNIV CALIF LOS ANGELES, DEPT GEOG, LOS ANGELES, CA 90095 USA",
"[VAN DUINEN, RIANNE; VAN DER VEEN, ANNE] UNIV TWENTE, DEPT WATER ENGN & MANAGEMENT, DRIENERLOLAAN 5,POB 217, NL-7500 AE ENSCHEDE, NETHERLANDS.",
"[ANANTSUKSOMSRI, SUTEE] CHULALONGKORN UNIV, FAC ARCHITECTURE, BANGKOK, THAILAND."
)), .Names = c("id", "affiliation"), row.names = c(NA, -3L), class = "data.frame")

Resources