Turning Longitudes and Latitudes into UK Postcodes in R - r

I have a large set of data with Longitudes and Latitudes that I want to convert into UK Postcodes. I first tried downloading all of the UK postcodes with their corresponding long/lat and then joining the data together. This worked for some of the data but the majority didn't match due to postcode latitude and longitude being the centre of each postcode, where as my data is more accurate.
I've also tried a bit of code that converts Lat/long in America to give the corresponding state (given by Josh O'Brien here Latitude Longitude Coordinates to State Code in R), but I couldn't find a way to alter this to UK postcodes.
I've also tried running a calculation that tries to find the closest postcode to the long/lat but this create a file too large for R to handle.
Also seen some code that uses google maps (geocoding) and this does seem to work but I've read it only allows 2000 calculations a day, I have much more than this (around 5 million rows of data)

You might want to try my PostcodesioR package which includes reverse geocoding functions. However, there is a limit to the number of API calls to postcodes.io.
devtools::install_github("ropensci/PostcodesioR")
library(PostcodesioR)
reverse_geocoding(0.127, 51.507)
Another option is to use this function for reverse geocoding more than one pair of geographical coordinates:
geolocations_list <- structure(
list(
geolocations = structure(
list(
longitude = c(-3.15807731271522, -1.12935802905177),
latitude = c(51.4799900627036, 50.7186356978817),
limit = c(NA, 100L),
radius = c(NA, 500L)),
.Names = c("longitude", "latitude", "limit", "radius"),
class = "data.frame",
row.names = 1:2)),
.Names = "geolocations")
bulk_rev_geo <- bulk_reverse_geocoding(geolocations_list)
bulk_rev_geo[[1]]$result[[1]]
Given the size of your data set and usual limitations to the API calls, you might want to download the latest database with the UK geographical data and join it to your files.

I believe you want to do "Reverse Geocoding" with the google maps API. That is to parse the latitude and longitude and get the closest address. After that you can easily take just the post code from the address. (It is an item in the list you receive as an address from the google maps API.)
The api (last time I checked) allows 2500 free calls per day, but you can do several tricks (depending on your dataset) to match more records:
You can populate your dataset with 2400 records each day until it is complete or
You can change your IP and API key a few times to get more records in a single day or
You can always get a premium API key and pay for the number of requests you make
I did such geocoding in R a few years ago by following this popular tutorial: https://www.r-bloggers.com/using-google-maps-api-and-r/
Unfortunately the tutorial code is a bit out-of-date, so you will need to fix a few things to adapt it to your needs.

Related

Street address map plotting

I have 30 thousand home addresses and want to geocode them (i.e., convert "123 ABC Street" to a latitude and longitude).
Researched to find if there is any good tool available, but very confusing.
Anyone can suggest any resource?
Here's a function that will get you one address from the Google Maps geocoding API:
geocodeAddress <- function(address) {
base <- "https://maps.googleapis.com/maps/api/geocode/json?address="
key <- "your_google_maps_api_key_here"
url <- URLencode(paste0(base, address, "&key=", key))
RJSONIO::fromJSON(url, simplify=FALSE)
}
And how to use it:
result <- geocodeAddress("1600 Amphitheatre Parkway Mountain View, CA 94043")
You can pull out just the lat and lng with, e.g.:
result_lat <- result$results[[1]]$geometry$location$lat
result_lng <- result$results[[1]]$geometry$location$lng
For your 30k addresses, you can loop over them individually. More info available at developers.google.com. Last I checked, there are limits on the number of requests per second and total number of free requests per day, but I suspect the cost for 30k isn't very high.
Alternatively, you can upload data in csv format to UCLA's geocoder: gis.ucla.edu/geocoder.
A third alternative is to use Texas A&M's geocoder: geoservices.tamu.edu.
I suggest the free tidygeocoder package (https://jessecambon.github.io/tidygeocoder/).
Depending on the size of your data frame find my suggestion for parallelization: Is it possible to parallelize the geocode function from the tidygeocoder package in R?

Using latitude and longitude to generate timezone

I have about 9 million records with latitude and longitude, in addition to the timestamp in EST. I want to use the latitude and longitude to generate the appropriate regional timezone, from which I then want to adjust all these times for the relevant timezone.
I have tried using geonames
data_raw$tz <- mapply(GNtimezone, data$lat,data$lon)
However, this returns the following error:
Error in getJson("timezoneJSON", list(lat = lat, lng = lng, radius = 0)) :
error code 13 from server: ERROR: canceling statement due to statement timeout
I have tried to use a method described in this post.
data$tz_url <- sprintf("https://maps.googleapis.com/maps/api/timezone/%s?location=%s,%s&timestamp=%d&sensor=%s",
"xml",
data$lat,
data$lon,
as.numeric(data$time),
"false")
for(i in 1:100){
data$tz[i] <- xmlParse(readLines(data$tz_url[i]), isURL=TRUE)[["string(//time_zone_name)"]]
}
With this method, I am able to get the urls for the XML data. But when I try to pull the XML data in a for loop, and append the timezone to the dataframe, it doesn't do it for all the records... (in fact, only 10 records at a time intermittently).
Does anyone know of any alternate methods or packages to get the three character timezone (i.e. EST) for about 9 million records relatively quickly? Your help is much appreciated. Or better yet, if you have ideas on why the code I used above isn't working, I'd appreciate that too.
For a list of methods of converting latitude and longitude to time zone, see this post. These mechanisms will return the IANA/Olson time zone identifier, such as America/Los_Angeles.
However, you certainly don't want to make 9 million individual HTTP calls. You should attempt to group the records to distinct locations to minimize the number of lookups. If they are truly random, then you will still have a large number of locations, so you should consider the offline mechanisms described in the previous post (i.e. using the tz_world shapefile with some sort of geospatial lookup mechanism).
Once you have the IANA/Olson time zone identifier for the location, you can then use R's time zone functionality (as.POSIXct, format, etc.) with each of corresponding timestamp to obtain the abbreviation.
However, do recognize that time zone abbreviations themselves can be somewhat ambiguous. They are useful for human readability, but not much else.
I've written the package googleway to access google maps API. You'll need a valid API key (and, for Google to handle 9 million calls you'll have to pay for it as their free one only covers 2500)
library(googleway)
key <- "your_api_key"
google_timezone(location = c(-37, 144),
key = key)
$dstOffset
[1] 0
$rawOffset
[1] 36000
$status
[1] "OK"
$timeZoneId
[1] "Australia/Hobart"
$timeZoneName
[1] "Australian Eastern Standard Time"

neighbourhood reverse geocoding-geonames api error code 15-R language

I am encountering a problem with the reverse geocoding geonames api package in R. I have a dataset of nearly 900k rows containing latitude and longtitude and I am using GNneighbourhood(lat,lng)$name function to create the neighbourhood for every pair of coordinates(my dataset contains incidents in San Francisco).
Now, while the function is working perfectly for the big majority of points, there are times that it is giving error code 15 message :we are afraid we could not find a neighbourhood for latitude and longitude. The same procedure can be performes with revgeocode function(google reverse geocoding api) of the ggmap package and in fact it works right even for the points that give error with geoname package. The reason I am not using it is cause of the query limit per day.
Successful example
GNneighbourhood(37.7746,-122.4259)$name
[1] "Western Addition"
Failure
GNneighbourhood(37.76569,-122.4718)$name
Error in getJson("neighbourhoodJSON", list(lat = lat, lng = lng)) :
error code 15 from server: we are afraid we could not find a neighbourhood for latitude and longitude :37.76569,-122.4718
Searching for the above point in google maps works fine and we can also see that the incident is not on water or any other inappropriate location.(Unless the park nearby is indicating something, i don't know)
Anyone with experience with the procedure and the specific package? Is it possible for the api to be incomplete? It clearly states that it can handle all US cities. Some help would be appreciated.

Dissolved polygons using R not plotting correctly

I'm trying to perform a dissolve in R. I've previously done this in QGIS but I want to achieve this in R to integrate with the rest of my workflow if possible.
I have an ESRI shapefile with small geographical polygons (output areas, if you're familiar with UK census geography). I also have a lookup table provided to me with a list of all OA codes with their associated aggregated geography code.
I can't provide the actual files I'm working on, but comparable files and a minimal reproducable example below:
https://www.dropbox.com/s/4puoof8u5btigxq/oa-soa.csv?dl=1 (130kb csv)
https://www.dropbox.com/s/xqbi7ub2122q14r/soa.zip?dl=1 (~4MB shp)
And code:
require("rgdal") # for readOGR
require("rgeos") # for gUnion
require("maptools")
unzip("soa.zip")
soa <- readOGR(dsn = "soa", "england_oac_2011")
proj4string(soa) <- CRS("+init=epsg:27700") # British National Grid
lookup <- read.csv("oa-soa.csv")
slsoa <- gUnaryUnion(soa, id = lookup$LSOA11CD)
I've also tried:
slsoa <- unionSpatialPolygons(soa, lookup$$LSOA11CD)
but my understanding is that since I have (R)GEOS installed this uses the gUnion methods from the rgeos package anyway.
So, my problem is that the dissolve appears to work; I don't get an error message and the length() function suggests I now have fewer polygons:
length(soa#polygons) # 1,817
length(slsoa#polygons) # should be 338
but the plots appear to be the same (i.e. the internal dissolves haven't worked), as demonstrated by the following two plots:
plot(soa)
plot(slsoa)
I've looked around on the internet and stackoverflow to see if I can solve my issue and found several articles but without success.
problems when unioning and dissolving polygons in R (I don't think the quality of the shapefile is the problem because I'm using a lookup table to match geographies).
https://www.nceas.ucsb.edu/scicomp/usecases/PolygonDissolveOperationsR (uses two sp objects, not lookup table).
https://gis.stackexchange.com/questions/93441/problem-with-merging-and-dissolving-a-shapefile-in-r (as far as I can tell I've followed the relevant steps)
Does anyone have any idea what I'm doing wrong and why the plots aren't working correctly?
Thanks muchly.
First, your soa shapefile has 1817 elements, each with a unique code (corresponding to lookup$OA11CD). But your lookup file has only 1667 rows. Obviously, lookup does not have "a list of all OA codes".
Second, unless lookup has the same codes as your shapefile in the same order, using gUnaryUnion(...) this way will yield garbage. You need to merge soa#data with lookup on the corresponding fields first.
Third, gUnaryUnion(...) cannot remove internal boundaries if the polygons are not contiguous (obviously).
This seems to work
soa <- merge(soa,lookup,by.x="code",by.y="OA11CD",all.x=TRUE)
slsoa <- gUnaryUnion(soa,id=soa$LSOA11CD)
length(slsoa)
# [1] 338
par(mfrow=c(1,2),mar=c(0,0,0,0))
plot(soa);plot(slsoa)

R/GIS: bulk processing, distance and elevation calcuation

I have a dataframe df with three variable: city, state and country. I want to calculate 3 things
Calculate the latitude/longitude for each row.
Calculate the distance of each city from an arbitrary point, say country capital
Calculate the elevation of each point from the lat/long.
I can use the dismo package for 1, but can't figure out a way to "bulk process" from df, instead of copying and pasting the city, state and country names directly into the geocode(c()) code. As for 2 and 3, I am stumped completely. Any help would be appreciated.
EDIT NOTE: To other readers of this post... I appreciate the help from both Paul and Spacedman. The system won't allow me to mark more than one response as correct. I gave Paul the thumbs up because he responded before Spaceman. Please read what both of them have taken the time to write up. Thanks.
I'll share my thoughts on each of the points:
For 1. you can use paste to create an input vector for geocode:
df$geocode_string = with(df, paste(city, state, country, sep = ", "))
coords_latlong = geocode(df$geocode_string)
In regard to point number 2, after converting df to one the classes provided by the sp package (SpatialPointsDataFrame, look at the coordinates function from sp), you can use spDistsN1 to find the distance of all the points to one other point.
The final is a bit more tricky, to find the height you need a DEM (digital elevation model). Maybe there is a more easy way along the lines of geocode which I am not aware of.
You can use the geonames.org service to query a location on either SRTM or ASTER elevation databases:
http://www.geonames.org/export/web-services.html
and you might even be able to use my geonames package:
https://r-forge.r-project.org/projects/geonames/
to make life easier.

Resources