Street address map plotting - r

I have 30 thousand home addresses and want to geocode them (i.e., convert "123 ABC Street" to a latitude and longitude).
Researched to find if there is any good tool available, but very confusing.
Anyone can suggest any resource?

Here's a function that will get you one address from the Google Maps geocoding API:
geocodeAddress <- function(address) {
base <- "https://maps.googleapis.com/maps/api/geocode/json?address="
key <- "your_google_maps_api_key_here"
url <- URLencode(paste0(base, address, "&key=", key))
RJSONIO::fromJSON(url, simplify=FALSE)
}
And how to use it:
result <- geocodeAddress("1600 Amphitheatre Parkway Mountain View, CA 94043")
You can pull out just the lat and lng with, e.g.:
result_lat <- result$results[[1]]$geometry$location$lat
result_lng <- result$results[[1]]$geometry$location$lng
For your 30k addresses, you can loop over them individually. More info available at developers.google.com. Last I checked, there are limits on the number of requests per second and total number of free requests per day, but I suspect the cost for 30k isn't very high.
Alternatively, you can upload data in csv format to UCLA's geocoder: gis.ucla.edu/geocoder.
A third alternative is to use Texas A&M's geocoder: geoservices.tamu.edu.

I suggest the free tidygeocoder package (https://jessecambon.github.io/tidygeocoder/).
Depending on the size of your data frame find my suggestion for parallelization: Is it possible to parallelize the geocode function from the tidygeocoder package in R?

Related

Turning Longitudes and Latitudes into UK Postcodes in R

I have a large set of data with Longitudes and Latitudes that I want to convert into UK Postcodes. I first tried downloading all of the UK postcodes with their corresponding long/lat and then joining the data together. This worked for some of the data but the majority didn't match due to postcode latitude and longitude being the centre of each postcode, where as my data is more accurate.
I've also tried a bit of code that converts Lat/long in America to give the corresponding state (given by Josh O'Brien here Latitude Longitude Coordinates to State Code in R), but I couldn't find a way to alter this to UK postcodes.
I've also tried running a calculation that tries to find the closest postcode to the long/lat but this create a file too large for R to handle.
Also seen some code that uses google maps (geocoding) and this does seem to work but I've read it only allows 2000 calculations a day, I have much more than this (around 5 million rows of data)
You might want to try my PostcodesioR package which includes reverse geocoding functions. However, there is a limit to the number of API calls to postcodes.io.
devtools::install_github("ropensci/PostcodesioR")
library(PostcodesioR)
reverse_geocoding(0.127, 51.507)
Another option is to use this function for reverse geocoding more than one pair of geographical coordinates:
geolocations_list <- structure(
list(
geolocations = structure(
list(
longitude = c(-3.15807731271522, -1.12935802905177),
latitude = c(51.4799900627036, 50.7186356978817),
limit = c(NA, 100L),
radius = c(NA, 500L)),
.Names = c("longitude", "latitude", "limit", "radius"),
class = "data.frame",
row.names = 1:2)),
.Names = "geolocations")
bulk_rev_geo <- bulk_reverse_geocoding(geolocations_list)
bulk_rev_geo[[1]]$result[[1]]
Given the size of your data set and usual limitations to the API calls, you might want to download the latest database with the UK geographical data and join it to your files.
I believe you want to do "Reverse Geocoding" with the google maps API. That is to parse the latitude and longitude and get the closest address. After that you can easily take just the post code from the address. (It is an item in the list you receive as an address from the google maps API.)
The api (last time I checked) allows 2500 free calls per day, but you can do several tricks (depending on your dataset) to match more records:
You can populate your dataset with 2400 records each day until it is complete or
You can change your IP and API key a few times to get more records in a single day or
You can always get a premium API key and pay for the number of requests you make
I did such geocoding in R a few years ago by following this popular tutorial: https://www.r-bloggers.com/using-google-maps-api-and-r/
Unfortunately the tutorial code is a bit out-of-date, so you will need to fix a few things to adapt it to your needs.

How can I download daily summaries data from NOAA via the FTP link using R?

I would like to download daily summaries data in CSV format from all weather stations in a given US state between 01/01/1981 and 31/12/2016; however, this greatly exceeds the data limit that can be downloaded manually at once. I would like the data to be in metric units and include the station name and geographic location.
Is it possible to download this data via FTP link using R? If so, would anyone be able to explain how to do this, or point me in the right direction?
Any help would be greatly appreciated!
Assuming the ftp set up follows a standardized format (given its NOAA and longitudinal I think this is a safe assumption) you can make a list of the urls and the call download.file() using one of the many iterators like lapply or map. Here is some example code I've used to call Census LEHD data using map. Unfortunately, it is not a direct example using your data because I cannot get the link to work so you'll have to modify a bit. But the basic logic is you find which parts of the url change, make those parts variables and provide the values you need, then call. It's relatively straightforward. In this case, the primary variables that change are the state abbreviations and year. Because I only needed two years I can just type those in directly but I use the tigris package to get the unique state abbreviations.
if(!require(pacman)){install.packages("pacman"); library(pacman)}
p_load(tigris,purrr, dplyr)
#calls tigris "state" df to get unique state FIPS codes
us_states <- tolower(unique(fips_codes$state)[1:51])
year <- c(2004, 2014)
get_lehd <- function(states, year) {
#grabbing all private jobs WAC
lehd_url <- paste0("https://lehd.ces.census.gov/data/lodes/LODES7/",
states,"/wac/", states,"_wac_S000_JT02_",year,".csv.gz")
filenames <- paste0(states,"_", year,".csv.gz")
download.file(lehd_url, dest = filenames)
}
#use possibly so if it kicks an error it keeps going
possible_get_lehd <- possibly(get_lehd, otherwise = NA)
#download the files to current wd
map(us_states, possible_get_lehd,year = 2004)
map(us_states, possible_get_lehd,year = 2014)

Using latitude and longitude to generate timezone

I have about 9 million records with latitude and longitude, in addition to the timestamp in EST. I want to use the latitude and longitude to generate the appropriate regional timezone, from which I then want to adjust all these times for the relevant timezone.
I have tried using geonames
data_raw$tz <- mapply(GNtimezone, data$lat,data$lon)
However, this returns the following error:
Error in getJson("timezoneJSON", list(lat = lat, lng = lng, radius = 0)) :
error code 13 from server: ERROR: canceling statement due to statement timeout
I have tried to use a method described in this post.
data$tz_url <- sprintf("https://maps.googleapis.com/maps/api/timezone/%s?location=%s,%s&timestamp=%d&sensor=%s",
"xml",
data$lat,
data$lon,
as.numeric(data$time),
"false")
for(i in 1:100){
data$tz[i] <- xmlParse(readLines(data$tz_url[i]), isURL=TRUE)[["string(//time_zone_name)"]]
}
With this method, I am able to get the urls for the XML data. But when I try to pull the XML data in a for loop, and append the timezone to the dataframe, it doesn't do it for all the records... (in fact, only 10 records at a time intermittently).
Does anyone know of any alternate methods or packages to get the three character timezone (i.e. EST) for about 9 million records relatively quickly? Your help is much appreciated. Or better yet, if you have ideas on why the code I used above isn't working, I'd appreciate that too.
For a list of methods of converting latitude and longitude to time zone, see this post. These mechanisms will return the IANA/Olson time zone identifier, such as America/Los_Angeles.
However, you certainly don't want to make 9 million individual HTTP calls. You should attempt to group the records to distinct locations to minimize the number of lookups. If they are truly random, then you will still have a large number of locations, so you should consider the offline mechanisms described in the previous post (i.e. using the tz_world shapefile with some sort of geospatial lookup mechanism).
Once you have the IANA/Olson time zone identifier for the location, you can then use R's time zone functionality (as.POSIXct, format, etc.) with each of corresponding timestamp to obtain the abbreviation.
However, do recognize that time zone abbreviations themselves can be somewhat ambiguous. They are useful for human readability, but not much else.
I've written the package googleway to access google maps API. You'll need a valid API key (and, for Google to handle 9 million calls you'll have to pay for it as their free one only covers 2500)
library(googleway)
key <- "your_api_key"
google_timezone(location = c(-37, 144),
key = key)
$dstOffset
[1] 0
$rawOffset
[1] 36000
$status
[1] "OK"
$timeZoneId
[1] "Australia/Hobart"
$timeZoneName
[1] "Australian Eastern Standard Time"

neighbourhood reverse geocoding-geonames api error code 15-R language

I am encountering a problem with the reverse geocoding geonames api package in R. I have a dataset of nearly 900k rows containing latitude and longtitude and I am using GNneighbourhood(lat,lng)$name function to create the neighbourhood for every pair of coordinates(my dataset contains incidents in San Francisco).
Now, while the function is working perfectly for the big majority of points, there are times that it is giving error code 15 message :we are afraid we could not find a neighbourhood for latitude and longitude. The same procedure can be performes with revgeocode function(google reverse geocoding api) of the ggmap package and in fact it works right even for the points that give error with geoname package. The reason I am not using it is cause of the query limit per day.
Successful example
GNneighbourhood(37.7746,-122.4259)$name
[1] "Western Addition"
Failure
GNneighbourhood(37.76569,-122.4718)$name
Error in getJson("neighbourhoodJSON", list(lat = lat, lng = lng)) :
error code 15 from server: we are afraid we could not find a neighbourhood for latitude and longitude :37.76569,-122.4718
Searching for the above point in google maps works fine and we can also see that the incident is not on water or any other inappropriate location.(Unless the park nearby is indicating something, i don't know)
Anyone with experience with the procedure and the specific package? Is it possible for the api to be incomplete? It clearly states that it can handle all US cities. Some help would be appreciated.

R/GIS: bulk processing, distance and elevation calcuation

I have a dataframe df with three variable: city, state and country. I want to calculate 3 things
Calculate the latitude/longitude for each row.
Calculate the distance of each city from an arbitrary point, say country capital
Calculate the elevation of each point from the lat/long.
I can use the dismo package for 1, but can't figure out a way to "bulk process" from df, instead of copying and pasting the city, state and country names directly into the geocode(c()) code. As for 2 and 3, I am stumped completely. Any help would be appreciated.
EDIT NOTE: To other readers of this post... I appreciate the help from both Paul and Spacedman. The system won't allow me to mark more than one response as correct. I gave Paul the thumbs up because he responded before Spaceman. Please read what both of them have taken the time to write up. Thanks.
I'll share my thoughts on each of the points:
For 1. you can use paste to create an input vector for geocode:
df$geocode_string = with(df, paste(city, state, country, sep = ", "))
coords_latlong = geocode(df$geocode_string)
In regard to point number 2, after converting df to one the classes provided by the sp package (SpatialPointsDataFrame, look at the coordinates function from sp), you can use spDistsN1 to find the distance of all the points to one other point.
The final is a bit more tricky, to find the height you need a DEM (digital elevation model). Maybe there is a more easy way along the lines of geocode which I am not aware of.
You can use the geonames.org service to query a location on either SRTM or ASTER elevation databases:
http://www.geonames.org/export/web-services.html
and you might even be able to use my geonames package:
https://r-forge.r-project.org/projects/geonames/
to make life easier.

Resources