I have a large set of data with Longitudes and Latitudes that I want to convert into UK Postcodes. I first tried downloading all of the UK postcodes with their corresponding long/lat and then joining the data together. This worked for some of the data but the majority didn't match due to postcode latitude and longitude being the centre of each postcode, where as my data is more accurate.
I've also tried a bit of code that converts Lat/long in America to give the corresponding state (given by Josh O'Brien here Latitude Longitude Coordinates to State Code in R), but I couldn't find a way to alter this to UK postcodes.
I've also tried running a calculation that tries to find the closest postcode to the long/lat but this create a file too large for R to handle.
Also seen some code that uses google maps (geocoding) and this does seem to work but I've read it only allows 2000 calculations a day, I have much more than this (around 5 million rows of data)
You might want to try my PostcodesioR package which includes reverse geocoding functions. However, there is a limit to the number of API calls to postcodes.io.
devtools::install_github("ropensci/PostcodesioR")
library(PostcodesioR)
reverse_geocoding(0.127, 51.507)
Another option is to use this function for reverse geocoding more than one pair of geographical coordinates:
geolocations_list <- structure(
list(
geolocations = structure(
list(
longitude = c(-3.15807731271522, -1.12935802905177),
latitude = c(51.4799900627036, 50.7186356978817),
limit = c(NA, 100L),
radius = c(NA, 500L)),
.Names = c("longitude", "latitude", "limit", "radius"),
class = "data.frame",
row.names = 1:2)),
.Names = "geolocations")
bulk_rev_geo <- bulk_reverse_geocoding(geolocations_list)
bulk_rev_geo[[1]]$result[[1]]
Given the size of your data set and usual limitations to the API calls, you might want to download the latest database with the UK geographical data and join it to your files.
I believe you want to do "Reverse Geocoding" with the google maps API. That is to parse the latitude and longitude and get the closest address. After that you can easily take just the post code from the address. (It is an item in the list you receive as an address from the google maps API.)
The api (last time I checked) allows 2500 free calls per day, but you can do several tricks (depending on your dataset) to match more records:
You can populate your dataset with 2400 records each day until it is complete or
You can change your IP and API key a few times to get more records in a single day or
You can always get a premium API key and pay for the number of requests you make
I did such geocoding in R a few years ago by following this popular tutorial: https://www.r-bloggers.com/using-google-maps-api-and-r/
Unfortunately the tutorial code is a bit out-of-date, so you will need to fix a few things to adapt it to your needs.
I would like to download daily summaries data in CSV format from all weather stations in a given US state between 01/01/1981 and 31/12/2016; however, this greatly exceeds the data limit that can be downloaded manually at once. I would like the data to be in metric units and include the station name and geographic location.
Is it possible to download this data via FTP link using R? If so, would anyone be able to explain how to do this, or point me in the right direction?
Any help would be greatly appreciated!
Assuming the ftp set up follows a standardized format (given its NOAA and longitudinal I think this is a safe assumption) you can make a list of the urls and the call download.file() using one of the many iterators like lapply or map. Here is some example code I've used to call Census LEHD data using map. Unfortunately, it is not a direct example using your data because I cannot get the link to work so you'll have to modify a bit. But the basic logic is you find which parts of the url change, make those parts variables and provide the values you need, then call. It's relatively straightforward. In this case, the primary variables that change are the state abbreviations and year. Because I only needed two years I can just type those in directly but I use the tigris package to get the unique state abbreviations.
if(!require(pacman)){install.packages("pacman"); library(pacman)}
p_load(tigris,purrr, dplyr)
#calls tigris "state" df to get unique state FIPS codes
us_states <- tolower(unique(fips_codes$state)[1:51])
year <- c(2004, 2014)
get_lehd <- function(states, year) {
#grabbing all private jobs WAC
lehd_url <- paste0("https://lehd.ces.census.gov/data/lodes/LODES7/",
states,"/wac/", states,"_wac_S000_JT02_",year,".csv.gz")
filenames <- paste0(states,"_", year,".csv.gz")
download.file(lehd_url, dest = filenames)
}
#use possibly so if it kicks an error it keeps going
possible_get_lehd <- possibly(get_lehd, otherwise = NA)
#download the files to current wd
map(us_states, possible_get_lehd,year = 2004)
map(us_states, possible_get_lehd,year = 2014)
I am encountering a problem with the reverse geocoding geonames api package in R. I have a dataset of nearly 900k rows containing latitude and longtitude and I am using GNneighbourhood(lat,lng)$name function to create the neighbourhood for every pair of coordinates(my dataset contains incidents in San Francisco).
Now, while the function is working perfectly for the big majority of points, there are times that it is giving error code 15 message :we are afraid we could not find a neighbourhood for latitude and longitude. The same procedure can be performes with revgeocode function(google reverse geocoding api) of the ggmap package and in fact it works right even for the points that give error with geoname package. The reason I am not using it is cause of the query limit per day.
Successful example
GNneighbourhood(37.7746,-122.4259)$name
[1] "Western Addition"
Failure
GNneighbourhood(37.76569,-122.4718)$name
Error in getJson("neighbourhoodJSON", list(lat = lat, lng = lng)) :
error code 15 from server: we are afraid we could not find a neighbourhood for latitude and longitude :37.76569,-122.4718
Searching for the above point in google maps works fine and we can also see that the incident is not on water or any other inappropriate location.(Unless the park nearby is indicating something, i don't know)
Anyone with experience with the procedure and the specific package? Is it possible for the api to be incomplete? It clearly states that it can handle all US cities. Some help would be appreciated.
is there a way to use WMS->GetFeatureInfo specifying a TIME period (eg: 2014-01-01/2014-03-01) to extract a series of values from a raster layer loaded from a GeoServer instance?
Thanks in advance
Not at the moment, no. It may be added in the future though, it's not the first time I hear this request. I don't have a ETA, it depends on when funding to work on it shows up.
In the meantime, a somewhat complex workaround might be to configure the image mosaic index as a WFS feature type, query it by date, figure out the exact time values intersected by the interval, and then do N GetFeatureInfo requests, one for each of those values.
All,
I'm looking to download stock data either from Yahoo or Google on 15 - 60 minute intervals for as much history as I can get. I've come up with a crude solution as follows:
library(RCurl)
tmp <- getURL('https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL')
tmp <- strsplit(tmp,'\n')
tmp <- tmp[[1]]
tmp <- tmp[-c(1:8)]
tmp <- strsplit(tmp,',')
tmp <- do.call('rbind',tmp)
tmp <- apply(tmp,2,as.numeric)
tmp <- tmp[-apply(tmp,1,function(x) any(is.na(x))),]
Given the amount of data I'm looking to import, I worry that this could be computationally expensive. I also don't for the life of me, understand how the time stamps are coded in Yahoo and Google.
So my question is twofold--what's a simple, elegant way to quickly ingest data for a series of stocks into R, and how do I interpret the time stamping on the Google/Yahoo files that I would be using?
I will try to answer timestamp question first. Please note this is my interpretation and I could be wrong.
Using the link in your example https://www.google.com/finance/getprices?i=900&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL I get following data :
EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=900
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1357828200,528.5999,528.62,528.14,528.55,129259
1,522.63,528.72,522,528.6499,2054578
2,523.11,523.69,520.75,522.77,1422586
3,520.48,523.11,519.6501,523.09,1130409
4,518.28,520.579,517.86,520.34,1215466
5,518.8501,519.48,517.33,517.94,832100
6,518.685,520.22,518.63,518.85,565411
7,516.55,519.2,516.55,518.64,617281
...
...
Note the first value of first column a1357828200, my intuition was that this has something to do with POSIXct. Hence a quick check :
> as.POSIXct(1357828200, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 14:30:00 EST"
So my intuition seems to be correct. But the time seems to be off. Now we have one more info in the data. TIMEZONE_OFFSET=-300. So if we offset our timestamps by this amount we should get :
as.POSIXct(1357828200-300*60, origin = '1970-01-01', tz='EST')
[1] "2013-01-10 09:30:00 EST"
Note that I didn't know which day data you had requested. But quick check on google finance reveals, those were indeed price levels on 10th Jan 2013.
Remaining values from first column seem to be some sort of offset from first row value.
So downloading and standardizing the data ended up being more much of a bear than I figured it would--about 150 lines of code. The problem is that while Google provides the past 50 training days of data for all exchange-traded stocks, the time stamps within the days are not standardized: an index of '1,' for example could either refer to the first of second time increment on the first trading day in the data set. Even worse, stocks that only trade at low volumes only have entries where a transaction is recorded. For a high-volume stock like APPL that's no problem, but for low-volume small caps it means that your series will be missing much if not the majority of the data. This was problematic because I need all the stock series to lie neatly on to of each other for the analysis I'm doing.
Fortunately, there is still a general structure to the data. Using this link:
https://www.google.com/finance/getprices?i=1800&p=1000d&f=d,o,h,l,c,v&df=cpct&q=AAPL
and changing the stock ticker at the end will give you the past 50 days of trading days on 1/2-hourly increment. POSIX time stamps, very helpfully decoded by #geektrader, appear in the timestamp column at 3-week intervals. Though the timestamp indexes don't invariably correspond in a convenient 1:1 manner (I almost suspect this was intentional on Google's part) there is a pattern. For example, for the half-hourly series that I looked at the first trading day of ever three-week increment uniformly has timestamp indexes running in the 1:15 neighborhood. This could be 1:13, 1:14, 2:15--it all depends on the stock. I'm not sure what the 14th and 15th entries are: I suspect they are either daily summaries or after-hours trading info. The point is that there's no consistent pattern you can bank on.The first stamp in a training day, sadly, does not always contain the opening data. Same thing for the last entry and the closing data. I found that the only way to know what actually represents the trading data is to compare the numbers to the series on Google maps. After days of futiley trying to figure out how to pry a 1:1 mapping patter from the data, I settled on a "ballpark" strategy. I scraped APPL's data (a very high-volume traded stock) and set its timestamp indexes within each trading day as the reference values for the entire market. All days had a minimum of 13 increments, corresponding to the 6.5 hour trading day, but some had 14 or 15. Where this was the case I just truncated by taking the first 13 indexes. From there I used a while loop to essentially progress through the downloaded data of each stock ticker and compare its time stamp indexes within a given training day to the APPL timestamps. I kept the overlap, gap-filled the missing data, and cut out the non-overlapping portions.
Sounds like a simple fix, but for low-volume stocks with sparse transaction data there were literally dozens of special cases that I had to bake in and lots of data to interpolate. I got some pretty bizarre results for some of these that I know are incorrect. For high-volume, mid- and large-cap stocks, however, the solution worked brilliantly: for the most part the series either synced up very neatly with the APPL data and matched their Google Finance profiles perfectly.
There's no way around the fact that this method introduces some error, and I still need to fine-tune the method for spare small-caps. That said, shifting a series by a half hour or gap-filling a single time increment introduces a very minor amount of error relative to the overall movement of the market and the stock. I am confident that this data set I have is "good enough" to allow me to get relevant answers to some questions that I have. Getting this stuff commercially costs literally thousands of dollars.
Thoughts or suggestions?
Why not loading the data from Quandl? E.g.
library(Quandl)
Quandl('YAHOO/AAPL')
Update: sorry, I have just realized that only daily data is fetched with Quandl - but I leave my answer here as Quandl is really easy to query in similar cases
For the timezone offset, try:
as.POSIXct(1357828200, origin = '1970-01-01', tz=Sys.timezone(location = TRUE))
(The tz will automatically adjust according to your location)