Convert Lat/Lon to County Codes using FCC API - r

I previously figured out how to convert lattitude/longitude to county FIPS codes using the FCC API (Apply an API Function over 2 columns of Dataframe, Output a Third Column) thanks to #caldwellst and #rohit. Unfortunately, the FCC modified the API and I can't figure out how to fix the code to work again.
Here is a link to the new API: https://geo.fcc.gov/api/census/
Here is my dataframe:
> head(df_coords)
# A tibble: 6 x 3
lon lat censusYear
<dbl> <dbl> <dbl>
1 -112. 33.4 2010
2 -73.2 44.5 2010
3 -88.2 41.9 2010
4 -88.2 41.9 2010
5 -88.4 41.9 2010
6 -77.1 39.0 2010
Here is the function I previously borrowed / adapted as well as the command to run it:
geo2fips <- function(latitude, longitude) {
url <- "https://geo.fcc.gov/api/census/block/find?format=json&latitude=%f&longitude=%f"
url <- sprintf(url, latitude, longitude)
json <- RCurl::getURL(url)
json <- RJSONIO::fromJSON(json)
as.character(json$County['FIPS'])
}
df_fips$county_fips <- mapply(geo2fips, df_fips$lat, df_fips$lon)
And here is the error message I get when I run it:
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to geo.fcc.gov:443
Can anyone help me figure this out? I figured it may be related to a requirement for census year, so I tried to modify the code as follows but it returned the same error message:
geo2fips <- function(latitude, longitude, censusYear) {
+ url <- "https://geo.fcc.gov/api/census/block/find?format=json&latitude=%f&longitude=%f&censusYear=%f"
+ url <- sprintf(url, latitude, longitude, censusYear)
+ json <- RCurl::getURL(url)
+ json <- RJSONIO::fromJSON(json)
+ as.character(json$County['FIPS'])
+ }
> df_coords$county_fips <- mapply(geo2fips, df_coords$lat, df_coords$lon, df_coords$censusYear)
Error in function (type, msg, asError = TRUE) :
Unknown SSL protocol error in connection to geo.fcc.gov:443
>
Huge thank you to anyone who can help. -Mike

There's been a slight change to the URL and parameters - you can use:
geo2fips <- function(latitude, longitude) {
url <- "https://geo.fcc.gov/api/census/area?lat=%f&lon=%f&format=json"
res <- jsonlite::fromJSON(sprintf(url, latitude, longitude))[["results"]][["county_fips"]]
unique(res)
}
You can also simplify things a little if you use the jsonlite package instead of RSJONIO as the former accepts connections directly.

Related

How to access Youtube Data API v3 with R

I am trying to use R to retrieve data from the YouTube API v3 and there are few/no tutorials out there that show the basic process. I have figured out this much so far:
# Youtube API query
base_url <- "https://youtube.googleapis.com/youtube/v3/"
my_yt_search <- function(search_term, max_results = 20) {
my_api_url <- str_c(base_url, "search?part=snippet&", "maxResults=", max_results, "&", "q=", search_term, "&key=",
my_api_key, sep = "")
result <- GET(my_api_url)
return(result)
}
my_yt_search(search_term = "salmon")
But I am just getting some general meta-data and not the search results. Help?
PS. I know there is a package 'tuber' out there but I found it very unstable and I just need to perform simple searches so I prefer to code the requests myself.
Sadly there is no way to directly get the durations, you'll need to call the videos endpoint (with the part set to part=contentDetails) after doing the search if you want to get those infos, however you can pass as much as 50 ids in a single call thus we can save some time by pasting all the ids together.
library(httr)
library(jsonlite)
library(tidyverse)
my_yt_duration <- function(...){
my_api_url <- paste0(base_url, "videos?part=contentDetails", paste0("&id=", ..., collapse=""), "&key=",
my_api_key )
GET(my_api_url) -> resp
fromJSON(content(resp, "text"))$items %>% as_tibble %>% select(id, contentDetails) -> tb
tb$contentDetails$duration %>% tibble(id=tb$id, duration=.)
}
### getting the video IDs
my_yt_search(search_term = "salmon")->res
## Converting from JSON then selecting all the video ids
# fromJSON(content(res,as="text") )$items$id$videoId
my_yt_duration(fromJSON(content(res,as="text") )$items$id$videoId) -> tib.id.duration
# A tibble: 20 x 2
id duration
<chr> <chr>
1 -x2E7T3-r7k PT4M14S
2 b0ahREpQqsM PT3M35S
3 ROz8898B3dU PT14M17S
4 jD9VJ92xyzA PT5M42S
5 ACfeJuZuyxY PT3M1S
6 bSOd8r4wjec PT6M29S
7 522BBAsijU0 PT10M51S
8 1P55j9ub4es PT14M59S
9 da8JtU1YAyc PT3M4S
10 4MpYuaJsvRw PT8M27S
11 _NbbtnXkL-k PT2M53S
12 3q1JN_3s3gw PT6M17S
13 7A-4-S_k_rk PT9M37S
14 txKUTx5fNbg PT10M2S
15 TSSPDwAQLXs PT3M11S
16 NOHEZSVzpT8 PT7M51S
17 4rTMdQzsm6U PT17M24S
18 V9eeg8d9XEg PT10M35S
19 K4TWAvZPURg PT3M3S
20 rR9wq5uN_q8 PT4M53S

How do I use rvest to sort text into different columns?

I am using rvest to (try to) scrape all the author affiliation data from a database of academic publications called RePEc. I have the authors' short IDs, which I'm using to scrape affiliation data. However, each time I try, it gives me the 404 error: Error in open.connection(x, "rb") : HTTP error 404
It must be an issue with my use of sapply because when I test it using an individual ID, it works. Here is the code I'm using:
df$author_reg <- c("paa6","paa2","paa1", "paa8", "pve266", "pya500")
df$websites <- paste0("https://ideas.repec.org/e/", df$author_reg, ".html")
df$affiliation <- sapply(df$websites, function(x) try(x %>% read_html %>% html_nodes("#affiliation h3") %>% html_text()))
I actually need to do this for six columns of authors and there are NA values I'd like to skip so if anyone knows how to do that as well, I would be enormously grateful (but not a big deal if I not). Thank you in advance for your help!
EDIT: I have just discovered that the error is in the formula for the websites. Sometimes it should be df$websites <- paste0("https://ideas.repec.org/e/", df$author_reg, ".html") and sometimes it should be df$websites <- paste0("https://ideas.repec.org/f/", df$author_reg, ".html")
Does anyone know how to get R to try both and give me the one that works?
You can have the two links and use try on bottom of them. I am assuming there is only 1 that would give a valid website. Otherwise we can always edit the code to take in everything that works:
library(rvest)
library(purrr)
df = data.frame(id=1:6)
df$author_reg <- c("paa6","paa2","paa1", "paa8", "pve266", "pya500")
http1 <- "https://ideas.repec.org/e/"
http2 <- "https://ideas.repec.org/f/"
df$affiliation <- sapply(df$author_reg, function(x){
links = c(paste0(http1, x, ".html"),paste0(http2, x, ".html"))
# here we try both links and store under attempt
attempts = links %>% map(function(i){
try(read_html(i) %>% html_nodes("#affiliation h3") %>% html_text())
})
# the good ones will have "character" class, the failed ones, try-error
gdlink = which(sapply(attempts,class) != "try-error")
if(length(gdlink)>0){
return(attempts[[gdlink[1]]])
}
else{
return("True 404 error")
}
})
Check the results:
df
id author_reg
1 1 paa6
2 2 paa2
3 3 paa1
4 4 paa8
5 5 pve266
6 6 pya500
affiliation
1 Statistisk SentralbyråGovernment of Norway
2 Department of EconomicsCollege of BusinessUniversity of Wyoming
3 (80%) Institutt for ØkonomiUniversitetet i Bergen, (20%) Gruppe for trygdeøkonomiInstitutt for ØkonomiUniversitetet i Bergen
4 Centraal Planbureau (CPB)Government of the Netherlands
5 Department of FinanceRotterdam School of Management (RSM Erasmus University)Erasmus Universiteit Rotterdam
6 Business SchoolSwinburne University of Technology

Reverse Geo Coding in R

I would like to reverse geo code address and pin code in R
These are the columns
A B C
15.3859085 74.0314209 7J7P92PJ+9H77QGCCCC
I have taken first four rows having columns A B and C among 1000's of rows
df<-ga.data[1:4,]
df <- cbind(df,do.call(rbind,
lapply(1:nrow(df),
function(i)
revgeocode(as.numeric(
df[i,3:1]), output = "more")
[c("administrative_area_level_1","locality","postal_code","address")])))
Error in revgeocode(as.numeric(df[i, 3:1]), output = "more") :
is.numeric(location) && length(location) == 2 is not TRUE
Also is there any other package or approach to find out the address and pincode most welcome
I also tried the following
When I tried using ggmap I got this error
In revgeocode(as.numeric(df[i, c("Latitude", "Longitude")]), output = "address") :
HTTP 400 Bad Request
Also i tried this
revgeocode(c(df$B[1], df$A[1]))
Warning Warning message: In revgeocode(c(df$Longitude[1],
df$Latitude[1])) : HTTP 400 Bad Request
Also I am from India and it does not work for me if i search for lat long of India. If I use lat long of US it gives me the exact address
seems fishy
data <- read.csv(text="ID, Longitude, Latitude
311175, 41.298437, -72.929179
292058, 41.936943, -87.669838
12979, 37.580956, -77.471439")
library(ggmap)
result <- do.call(rbind,
lapply(1:nrow(data),
function(i)revgeocode(as.numeric(data[i,3:2]))))
data <- cbind(data,result)
The current CRAN version of revgeo_0.15 does not have the revgeocode function. If you upgrade to this version, you'll find a revgeo function, which takes longitude, latitude arguments. Your column C should not be passed into the function.
revgeo::revgeo(latitude=df[, 'A'], longitude=df[, 'B'], output='frame')
[1] "Getting geocode data from Photon: http://photon.komoot.de/reverse?lon=74.0314209&lat=15.3859085"
housenumber street city state zip country
1 House Number Not Found Street Not Found Borim Goa Postcode Not Found India

Extract table from

I would like to extract the following table using rvest from http://finra-markets.morningstar.com/BondCenter/TRACEMarketAggregateStats.jsp (for any date):
I tried the following but failed to produce any result:
library(rvest)
url <- "http://finra-markets.morningstar.com/BondCenter/TRACEMarketAggregateStats.jsp"
htmlSession <-html_session(url) ## create session
goForm <- html_form(htmlSession)[[2]] ## pull form from session
#filledGoForm <- set_values(goForm, value="04/26/2017") # This does not work
filledGoForm <- goForm
filledGoForm$fields[[1]]$value <- "04/26/2017"
htmlSession <- submit_form(htmlSession, filledGoForm)
> htmlSession <- submit_form(htmlSession, filledGoForm)
Submitting with ''
Warning message:
In request_POST(session, url = url, body = request$values, encode = request$encode, :
Not Found (HTTP 404).
Any hints on how to do this highly appreciated.
That site uses many XHR requests to populate the tables. And, it establishes a server session with a hidden POST request which won't be replicated with html_session().
We'll need to add in httr for some help:
library(httr)
library(rvest)
The first thing we need to do is to just hit the site to get an initial qs_wid cookie into the implicit cookie jar curl/httr/rvest share:
init <- GET("http://finra-markets.morningstar.com/MarketData/Default.jsp")
Next, we need to mimic the hidden "login" that the web page does:
nxt <- POST(url = "http://finra-markets.morningstar.com/finralogin.jsp",
body = list(redirectPage = "/BondCenter/TRACEMarketAggregateStats.jsp"),
encode = "form")
That creates a session on the server back-end and places a few other cookies in our cookie jar.
Finally:
GET(
url = "http://finra-markets.morningstar.com/transferPage.jsp",
query = list(
`path`="http://muni-internal.morningstar.com/public/MarketBreadth/C",
`date`="04/24/2017",
`_`=as.numeric(Sys.time())
)
) -> res
makes the request. You can make a function out of all three steps (together) and parameterize that last GET.
Unfortunately, that returns a very broken HTML <table> that html_table() can't translate into a data frame automagically for you, but that shouldn't stop you:
content(res) %>%
html_nodes("td") %>%
html_text() %>%
matrix(ncol=4, byrow=TRUE) %>%
as_data_frame() %>%
mutate_all(as.numeric) %>%
rename(all_issues=V1, investment_grade=V2, high_yield=V3, convertible=V4) %>%
mutate(category = c("total_issues_traded", "advances", "declines", "unchanged", "high_52", "low_52", "dollar_volume"))
## # A tibble: 7 × 5
## all_issues investment_grade high_yield convertible category
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 7983 5602 2194 187 total_issues_traded
## 2 3025 1798 1100 127 advances
## 3 4448 3575 824 49 declines
## 4 124 42 75 7 unchanged
## 5 257 66 175 16 high_52
## 6 139 105 33 1 low_52
## 7 22601 16143 5742 715 dollar_volume
To get the other data tables, go to the Developer Tools option in your browser (switch to one that has it if yours doesn't … you're likely on Windows given that you're doing finance things and IE/Edge aren't very good browsers for introspection) and refresh the page to see the other requests that get made.

API request with R

I try to do geocoding of French addresses. I'd like to use the following website : http://adresse.data.gouv.fr/
There is an example on this website on how is working the API but I think it's some Linux code and I'd like to translate in R code. The aim is to give a csv file with addresses and the result should be geo coordinates.
Linux code (example give on the website)
http --timeout 600 -f POST http://api-adresse.data.gouv.fr/search/csv/ data#path/to/file.csv
I tried to "translate" this in R with the following code
library(httr)
library(RCurl)
queryResults=POST("http://api-adresse.data.gouv.fr/search/csv/",body=list(data=fileUpload("file.csv")))
result_geocodage=content(queryResults)
But unfortunately I have a bad request error.
Does somebody knows what I'm missing in the translation to R?
Thanks!
Here's an example. First, some example data plus the request:
library(httr)
df <- data.frame(c("13 Boulevard Chanzy", "Gloucester St"),
c("93100 Montreuil", "Jersey"))
write.csv2(df, tf <- tempfile(fileext = ".csv"))
res <- POST("http://api-adresse.data.gouv.fr/search/csv/",
timeout(600),
body = list(data = upload_file(tf)))
Then, the result:
content(res, sep = ";", row.names = 1)
# c..13.Boulevard.Chanzy....Gloucester.St.. c..93100.Montreuil....Jersey.. latitude longitude
# 1 13 Boulevard Chanzy 93100 Montreuil 48.85825 2.434462
# 2 Gloucester St Jersey 49.46712 1.145554
# result_label result_score result_type result_id result_housenumber
# 1 13 Boulevard Chanzy 93100 Montreuil 0.88 housenumber ADRNIVX_0000000268334929 13
# 2 2 Résidence le Jersey 76160 Saint-Martin-du-Vivier 0.24 housenumber ADRNIVX_0000000311480901 2
# result_name result_street result_postcode result_city result_context result_citycode
# 1 Boulevard Chanzy NA 93100 Montreuil 93, Seine-Saint-Denis, Île-de-France 93048
# 2 Résidence le Jersey NA 76160 Saint-Martin-du-Vivier 76, Seine-Maritime, Haute-Normandie 76617
Or, just the coordinates:
subset(content(res, sep = ";", row.names = 1, check.names = FALSE), select = c("latitude", "longitude"))
# latitude longitude
# 1 48.85825 2.434462
# 2 49.46712 1.145554

Resources