Calculating walking distance using Google Maps in R - r

I've been tying to get the distance between a list of home postcodes and a list of school postcodes for approximately 2,000 students. I'm using the gmapsdistance package within R to get this from the Google Maps Distance Matrix API. I've put in a valid API key and just replaced this in the following code for security reasons.
library(gmapsdistance)
set.api.key("valid API key")
results <- gmapsdistance(origin = school$HomePostcode,
destination = school$SchoolPostcode,
mode = "walking",
shape = "long")
However, this gives the following error code.
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to maps.googleapis.com:443
Looking on the Google APIs website, it looks like it hasn't ran the query for all the data, it says that there were only 219 requests. I know I'm limited as to how many requests I can do in one day, but the limit is 2,500 and it's not even letting me get close to that.
I've tried running the code on one set of postcodes, like below;
test <- gmapsdistance(origin = "EC4V+5EX",
destination = "EC4V+3AL",
mode = "walking",
shape = "long")
Which gives the following, as I would expect.
$Time
[1] 384
$Distance
[1] 497
$Status
[1] "OK"
My data looks something like this, I've anonymised the data and removed all variables that aren't needed. There are 1,777 sets of postcodes.
head(school)
HomePostcode SchoolPostcode
1 EC4V+5EX EC4V+3AL
2 EC2V+7AD EC4V+3AL
3 EC2A+1WD EC4V+3AL
4 EC1V+3QG EC4V+3AL
5 EC2N+2PT EC4V+3AL
6 EC1M+5QA EC4V+3AL

I do not have enough reputation to comment but have you tried to set the parameter combinations to "pairwise". If set to "all" then it will compute all the combinations between one origin and all destinations.
library(gmapsdistance)
from <- c("EC4V+5EX", "EC2V+7AD", "EC2A+1WD", "EC1V+3QG", "EC2N+2PT", "EC1M+5QA")
to <- c("EC4V+3AL", "EC4V+3AL", "EC4V+3AL", "EC4V+3AL", "EC4V+3AL", "EC4V+3AL")
test <- gmapsdistance(origin=from,
destination=to,
combinations="pairwise",
key="YOURAPIKEYHERE",
mode="walking")
test$Distance
or de Distance
1 EC4V+5EX EC4V+3AL 497
2 EC2V+7AD EC4V+3AL 995
3 EC2A+1WD EC4V+3AL 2079
4 EC1V+3QG EC4V+3AL 2492
5 EC2N+2PT EC4V+3AL 1431
6 EC1M+5QA EC4V+3AL 1892
With this small set of 6 destinations it works, I have an API key, if you send me a bigger set I can try.
Another option would be to use the package googleway, it allows to set as well an API key. Example:
library(googleway)
test <- google_distance(origins = from,
destinations = to,
mode = "walking",
key="YOURAPIKEYHERE")

Related

Geocoding with R: Errors stopping program altogether

I have a working program which pulls addresses from a list in Excel and geocodes them using a Google API, but anytime it gets to an address with an apartment, unit, or unfindable address, it stops the program.
I can't get a workable tryCatch routine going inside my loop. :(
Here is the Code:
library("readxl")
library(ggplot2)
library(ggmap)
fileToLoad <- file.choose(new = TRUE)
origAddress <- read_excel(fileToLoad, sheet = "Sheet1")
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
write.csv(origAddress, "geocoded1.csv", row.names=FALSE)
And here is the Error message:
Warning: Geocoding "[removed address]" failed with error:
You must use an API key to authenticate each request to Google Maps Platform APIs. For additional information, please refer to http://g.co/dev/maps-no-account
Error: Can't subset columns that don't exist.
x Location 3 doesn't exist.
i There are only 2 columns.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: Unknown or uninitialised column: `lon`.
2: Unknown or uninitialised column: `lat`.
3: Unknown or uninitialised column: `geoAddress`.
Now, this is not an API key error because the key works in calls after the error -- and it stops at any address that ends in a number after the street name.
I'm going to be processing batches of thousands of addresses every month and they are not all going to be perfect, so what I need is to be able to skip these bad addresses, put "NA" in the lon/lat columns, and move on.
I'm new to R and can't make a workable error handling routine to handle these types of mistakes. can anyone point me in the right direction? Thanks in advance.
When geocode fails to find an address and output = "latlona", the address field is not returned. You code can be made to work with the following modification.
#
# example data
#
origAddress <- data.frame(addresses = c("white house, Washington",
"white house, # 100, Washington",
"white hose, Washington",
"Washington Apartments, Washington, DC 20001",
"1278 7th st nw, washington, dc 20001") )
#
# simple fix for fatal error
#
for(i in 1:nrow(origAddress))
{
result <- geocode(origAddress$addresses[i], output = "latlona",
source = "google")
origAddress$lon[i] <- result$lon[1]
origAddress$lat[i] <- result$lat[1]
origAddress$geoAddress[i] <- ifelse( is.na(result$lon[1]), NA, result$address[1] )
}
However, you mention that some of your addresses may not be exact. Google's geocoding will try to interpret all address you supply. Sometimes it fails and returns NA but other times its interpretation may not be correct so you should always check geocode results.
A simple method which will catch many errors to set output = "more" in geocode and then examine the values returned in the loctype column. If loctype != "rooftop", you may have a problem. Examing the type column will give you more information. This check isn't complete. To do a more complete check, you could use output = "all" to return all data supplied by google for an address but this requires parsing a moderately complex list. You should read more about the data returned by google geocoding at https://developers.google.com/maps/documentation/geocoding/overview
Also, geocode will take at least tens of minutes at least to return results for thousands of addresses. To minimize the response time, you should supply addresses to geocode as a character vector of addresses. A data frame of results is then returned which you can use to update your origAddress data frame and check for errors as shown below.
#
# Solution should check for wrongly interpreted addresses
#
# see https://developers.google.com/maps/documentation/geocoding/overview
# for more information on fields returned by google geocoding
#
# return all addresses in single call to geocode
#
origAddress <- data.frame(addresses = c("white house, Washington", # identified by name
"white hose, Washington", # misspelling
"Washington Apartments, apt 100, Washington, DC 20001", # identified by name of apartment building
"Washington Apartments, # 100, Washington, DC 20001", # invalid apartment number specification
"1206 7th st nw, washington, dc 20001") ) # address on street but no structure with that address
result <- suppressWarnings(geocode(location = origAddress$addresses,
output = "more",
source = "google") )
origAddress <- cbind(origAddress, result[, c("address", "lon","lat","type", "loctype")])
#
# Addresses which need to be checked
#
check_addresses <- origAddress[ origAddress$loctype != "rooftop" |
is.na(origAddress$loctype), ]

Reverse Geo Coding in R

I would like to reverse geo code address and pin code in R
These are the columns
A B C
15.3859085 74.0314209 7J7P92PJ+9H77QGCCCC
I have taken first four rows having columns A B and C among 1000's of rows
df<-ga.data[1:4,]
df <- cbind(df,do.call(rbind,
lapply(1:nrow(df),
function(i)
revgeocode(as.numeric(
df[i,3:1]), output = "more")
[c("administrative_area_level_1","locality","postal_code","address")])))
Error in revgeocode(as.numeric(df[i, 3:1]), output = "more") :
is.numeric(location) && length(location) == 2 is not TRUE
Also is there any other package or approach to find out the address and pincode most welcome
I also tried the following
When I tried using ggmap I got this error
In revgeocode(as.numeric(df[i, c("Latitude", "Longitude")]), output = "address") :
HTTP 400 Bad Request
Also i tried this
revgeocode(c(df$B[1], df$A[1]))
Warning Warning message: In revgeocode(c(df$Longitude[1],
df$Latitude[1])) : HTTP 400 Bad Request
Also I am from India and it does not work for me if i search for lat long of India. If I use lat long of US it gives me the exact address
seems fishy
data <- read.csv(text="ID, Longitude, Latitude
311175, 41.298437, -72.929179
292058, 41.936943, -87.669838
12979, 37.580956, -77.471439")
library(ggmap)
result <- do.call(rbind,
lapply(1:nrow(data),
function(i)revgeocode(as.numeric(data[i,3:2]))))
data <- cbind(data,result)
The current CRAN version of revgeo_0.15 does not have the revgeocode function. If you upgrade to this version, you'll find a revgeo function, which takes longitude, latitude arguments. Your column C should not be passed into the function.
revgeo::revgeo(latitude=df[, 'A'], longitude=df[, 'B'], output='frame')
[1] "Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=74.0314209&lat=15.3859085"
housenumber street city state zip country
1 House Number Not Found Street Not Found Borim Goa Postcode Not Found India

R: How to use RegEx to search multiple words using a disjunction

Let me explain what I want to do. I have a corpus data (15 M words) about a political debate and I want to find the co-ocurrence of two terms within, say, 10k words.
I create two vectors of positions of two terms: "false" and "law".
false.v <- c(133844, 133880, 145106, 150995, 152516, 152557, 153697, 155507)
law.v <- c(48064, 155644, 251315, 297303, 323417, 349576, 368052, 543487)
Then I want to gather them on a matrix to see the co-ocurrence using the 'outer' function. The positions are taken from the same corpus, so I'm creating a matrix of differences:
distances <- outer(false.v, law.v, "-")
To make this easier to read lets name them:
rownames(distances) <- paste0("False", false.v)
colnames(distances) <- paste0("Law", law.v)
Okay, so we have the matrix ready. To find which pairs of positions were within 10000 words of each other I just run:
abs(distances) <= 10000
So I have to identify those moments in the political debate where there is a greater frequency of those co-occurences. Here comes the problem. I have to do it with more than a pair of words (In fact with 5 pair of words or so), so it would be great if I just could search multiple words instead of two pair of words at a time. So instead searching "false" and "law", search "false OR lie OR whatever" and "law OR money OR whatever". I guess I have to use RegEx for this task, isn't it? I just tried everything and nothing worked.
The example I just gave is a simplification. The command I use to search words is creating a vector out of the corpus:
positions.law.v <- which(C1.corpus.v == "law")
Soo it would be great if I can just use something like
which(C1.corpus.v == "law OR money OR prison OR ...")
which(C1.corpus.v == "false OR lie OR country OR ...")
It's like telling R "hey, give me the co-ocurrence positions of any possible combination between the first row of words (law or money or prison...) and the second one (false or lie or country...). I hope I'm explaining it in a clear way. I'm sorry for the language mistakes. Thank you!!
library(dplyr)
I have an extended answer here as well, but it could be as simple as:
mywords = c("law", "money", "prison", "false", "lie", "country")
which(C1.corpus.v %in% mywords)
Try:
library(quanteda)
I'll use the election manifestos of 9 UK political parties from 2010:
data_char_ukimmig2010
Create a tokens object (there are lots of settings - check out https://quanteda.io/)
mytoks <- data_char_ukimmig2010 %>%
char_tolower() %>%
tokens()
mywords = c("law", "money", "prison", "false", "lie", "country")
kwic "return[s] a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text" source
mykwic <- kwic(mytoks, mywords)
A kwic builds a data frame with various features, one of which is the integer value starting position of your keywords (because you can use it to look for phrases):
mykwic$from
Gives us:
> mykwic$from
[1] 130 438 943 1259 1281 1305 1339 1356 1743 1836 1859 2126 2187 2443 2546 2640 2763 2952 3186 3270 179 8 201
[24] 343 354 391 498 16 131 552 14 29 388 80 306 487 507
I think your problem is slightly more sophisticated than using regex. For instance, you may be willing to include law, legal and legislation in one group but do not include lawless. Regex like \blaw.*\b wouldn't help you much. In effect, you are interested in:
Creating feature co-occurrence matrix
Incorporating the semantic proximity of the words
Feature co-occurrence matrix
This is a well-established task and I would encourage you to use a tested solution like the fcm function. To introduce an example from the documentation:
txt <- "A D A C E A D F E B A C E D"
fcm(txt, context = "window", window = 2)
fcm(txt, context = "window", count = "weighted", window = 3)
fcm(txt, context = "window", count = "weighted", window = 3,
weights = c(3, 2, 1), ordered = TRUE, tri = FALSE)
Your regex
To suggest a solution to your particular problem. This:
which(C1.corpus.v == "law OR money OR prison OR ...")
where
C1.corpus.v <- c("law", "word", "something","legal", "stuff")
you could do
grep(
pattern = paste("legal", "law", "som.*", sep = "|"),
x = C1.corpus.v,
perl = TRUE,
value = FALSE
)
where sep = "|" serves as your ...OR.... IMHO, this is not what you want as it does not address semantic similarity. I would suggest you have a look at some of the good tutorials that are available on the net 1,2.
1 Taylor Arnold and Lauren Tilton Basic Text Processing in R
2 Islam, Aminul & Inkpen, Diana. (2008). Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity. TKDD. 2. 10.1145/1376815.1376819.

Extraction of post from Facebook using RFacebook package

I succeed getting the text of the post and share and likes count.
However, I am not able to get the like of the comments associated with the post . If this information is not avalaible , I would like to merge the like count of the post to each comments.
Example: A post gets 900 likes and 80 comments. I would like to associated the 900 likes values to each of the comments (a new column called post_like maybe).
I would like to use this information to perform a sentiment analysis using the number of likes (complexe like (i.e. haha, sad...)) in a logistic regression with the frequence of the most frequent words as the x variable.
Here is my script so far:
token<- "**ur token , get it at https://developers.facebook.com/tools/explorer/**"
# Function to download the comments
download.post <- function(i, refetch=FALSE, path=".") {
post <- getPost(post=fb_page$id[i], comments = TRUE, likes = TRUE, token=token)
post1<- as.data.frame(melt(post))
}
#----------------------- Request posts --- ALL
# Get post for ALL
fb_page<- getPage(page="**the page number u want**", token=token, since='2010/01/01', until='2016/01/01', n= 10000, reactions=TRUE)
fb_page$order <- 1:nrow(fb_page)
# Apply function to download comments
files<-data.frame(melt(lapply(fb_page$order, download.post)))
# Select only comments
files_c<-files[complete.cases(files$message),]
So basically I get the page with the post ID and create a function to get the post of the post ID on that page.
As you can see , I get all the information I need BESIDE the likes and share count.
I hope I am clear , thanks a lot for you help
It's all there:
library(Rfacebook)
token <- "#############" # https://developers.facebook.com/tools/explorer
fb_page <- getPage(page="europeanparliament", token=token, n = 3)
transform(
fb_page[,c("message", "likes_count", "comments_count", "shares_count")],
message = sapply(message, toString, width=30)
)
# message likes_count comments_count shares_count
# 1 This week members called o.... 92 73 21
# 2 Today we're all Irish, bea.... 673 133 71
# 3 European citizens will mee.... 1280 479 71
packageVersion("Rfacebook")
# [1] ‘0.6.12’

geocode result different from google maps

I'm trying to geocode different IATA airport codes in Italy, with the following (rudimentary) code in ggmap (version 2.4)
#list of all IATA codes
geo_apt <- c("AOI", "BGY", "BLQ", "BRI", "CTA", "FCO", "LIN", "MXP", "NAP",
"PMF", "PSA", "PSR", "RMI", "TRN", "VCE", "VRN")
#preparing an empty dataframe to store the geocodes
apt_geo <- data.frame(IATA=rep(NA,16), lon=rep(NA,16), lat=rep(NA,16))
#geocoding the codes
for (i in seq_along(geo_apt)) {
apt_geo[i,1] <- geo_apt[i]
apt_geo[i,2] <- (geocode(paste(geo_apt[i],"airport")))[1]
apt_geo[i,3] <- (geocode(paste(geo_apt[i],"airport")))[2]
}
and the geocode function of ggmap works perfectly fine with all of these codes except "PSR"
IATA lon lat
1 AOI 13.363752 43.61654
2 BGY 9.703631 45.66957
3 BLQ 11.287859 44.53452
4 BRI 16.765202 41.13751
5 CTA 15.065775 37.46730
6 FCO 12.246238 41.79989
7 LIN 9.276308 45.45218
8 MXP 8.725531 45.63006
9 NAP 14.286579 40.88299
10 PMF 10.295935 44.82326
11 PSA 10.397884 43.68908
12 PSR -81.117259 33.94855 #<- doens't work
13 RMI 12.618819 44.02289
14 TRN 7.647867 45.19654
15 VCE 12.339771 45.50506
16 VRN 10.890141 45.40000
I've tried to use revgeocode and those coordinates correspond to the following address:
revgeocode(as.numeric(apt_geo[12,2:3]))
#Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=33.948545,-81.1172588&sensor=false
[1] "Kentucky Avenue, West Columbia, SC 29170, USA"
On the contrary, if I go to Google maps, it works perfectly fine:
Does anybody have a clue on this apparently strange phenomenon?
EDIT
Following one suggestion in the comments below, I tried to use geocode(italy PSR airport) on version 2.4 again and instead of throwing a more accurate result or even the same result, this is the warning I got:
geocode("italy PSR airport")
lon lat
1 NA NA
Warning message:
geocode failed with status ZERO_RESULTS, location = "italy PSR airport"
while with the attempt airport PSR the coordinates are even different from those of PSR airport (at least this time it's an actual airport, although its IATA code is LEX instead of PSR).
revgeocode(as.numeric(geocode("airport PSR")))
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.0381454,-84.5970727&sensor=false
[1] "3895 Terminal Drive, Lexington, KY 40510, USA"
The whole question is a possible duplicate
Nonetheless, I don't get the reason for which the API and Google maps are using different datasets...

Resources