R: Extracting json data from an OData API gives inconsistent results? - r

I'm attempting to extract data from the Wales government statistics services OData API. Details on the API can be found here, including an example of how to filter the data.
However I seem to be getting a non-deterministic subset of the data i.e. each attempt results in a different number of records returned.
Below is a simple reproducible example.
Additionally I have also tried:
using RJSONIO::fromJSON(), which has the same result.
attempting to use the odata.nextLink url, if it is returned in the json object, to keep extracting more data. Again, each attempt results in a different sized object.
Any insight will be much appreciated.
## preliminaries
library(jsonlite)
# prepare filters
filter1 <- "Column_ItemName_ENG"
filter1.value <- "Gross%20expenditure"
filter2 <- "Row_ItemName_ENG"
filter2.value <- "Parking%20of%20vehicles"
query <- paste0("http://open.statswales.gov.wales/en-gb/dataset/lgfs0009?$filter=",
filter1, "%20eq%20%27", filter1.value, "%27%20and%20",
filter2, "%20eq%20%27", filter2.value, "%27")
# test 1
test1 <- jsonlite::fromJSON(query)
test1 <- test1[[2]]
# test 2
test2 <- jsonlite::fromJSON(query)
test2 <- test2[[2]]
# test 3
test3 <- jsonlite::fromJSON(query)
test3 <- test3[[2]]
# compare results
nrow(test1)
nrow(test2)
nrow(test3)
PS: this question is cross-posted from RStudio Community, and I promise to update either post with relevant solutions found on the other one.

Related

How to use botornot function in R tweetbotornot package?

I am unable to even run the example code given on the botrnot documentation. Unsure what's happening.
# libraries
library(rtweet)
library(tweetbotornot)
# authentication for twitter API
auth <- rtweet_app()
auth_setup_default()
users <- c("kearneymw", "geoffjentry", "p_barbera",
"tidyversetweets", "rstatsbot1234", "RStatsStExBot")
## get most recent 10 tweets from each user
tmls <- get_timeline(users, n = 10)
## pass the returned data to botornot()
data <- botornot(tmls)
Expecting data frame titled data should be created and have an additional column that is the probability of the user being a bot. Instead I have this error.
?Error in botornot.data.frame(tmls) : "user_id" %in% names(x) is not TRUE
The table in the bottom of the documentation is what I'm hoping to achieve.
https://www.rdocumentation.org/packages/botrnot/versions/0.0.2

Inputting df frame value into GET function web query

I'm trying to input a list of values from a data frame into my get function for the web query and then cycle through each iteration as I go. If somebody would be able to link me some further resources to read and learn from this, it would be appreciated.
The following is the code which draws the data names from the API server. I plan on using purrr iteration functions to go over it. The input from the list would be inserted in the variable name RFG_SELECT.
library(httr)
library(purrr)
## Call up Query Development Script
## Calls up every single rainfall data gauge across the entirety of QLD
wmip_callup <- GET('https://water-monitoring.information.qld.gov.au/cgi/webservice.pl?{"function":"get_site_list","version":"1","params":{"site_list":"MERGE(GROUP(MGR_OFFICE_ALL,AYR),GROUP(MGR_OFFICE_ALL,BRISBANE),GROUP(MGR_OFFICE_ALL,BUNDABERG),GROUP(MGR_OFFICE_ALL,MACKAY),GROUP(MGR_OFFICE_ALL,MAREEBA),GROUP(MGR_OFFICE_ALL,ROCKHAMPTON),GROUP(MGR_OFFICE_ALL,SOUTH_JOHNSTONE),GROUP(MGR_OFFICE_ALL,TOOWOOMBA))"}}')
# Turns API server data into JSON data.
wmip_dataf <- content(wmip_callup, type = 'application/json')
# Returns the values of the rainfall gauge site names and is the directory function.
list_var <- wmip_dataf[["_return"]][["sites"]]
# Combines all of the rainfall gauge data together in a list (could be used for giving file names / looping the data).
rfg_bind <- do.call(rbind.data.frame, list_var)
# Sets the column name of the combination data frame.
rfg_bind <- setNames(rfg_bind, "Rainfall Gauge Name")
rfg_select <- rfg_bind$`Rainfall Gauge Name`
# Attempts to filter list into query:
wmip_input <- GET('https://water-monitoring.information.qld.gov.au/cgi/webservice.pl?{"function":"get_ts_traces","version":"1","params":{"site_list":**rfg_select**,"datasource":"AT","varfrom":"10","varto":"10","start_time":"0","end_time":"0","data_type":"mean","interval":"day","multiplier":"1"}}') ```
Hey there,
After some work I've found a solution using a concatenate string.
I setup a dummy variable that helped me select a data value.
# Dummy Variable string:
wmip_url <- 'https://water-monitoring.information.qld.gov.au/cgi/webservice.pl?{"function":"get_ts_traces","version":"1","params":{"site_list":"varinput","datasource":"AT","varfrom":"10","varto":"10","start_time":"0","end_time":"0","data_type":"mean","interval":"day","multiplier":"1"}}'
# Dummy String, grabs ones value from the list.
rfg_individual <- rfg_select[2:2]
# Replaces the specified input
rfg_replace <- gsub("varinput", rfg_individual, wmip_url)
# Result
"https://water-monitoring.information.qld.gov.au/cgi/webservice.pl?{\"function\":\"get_ts_traces\",\"version\":\"1\",\"params\":{\"site_list\":\"001203A\",\"datasource\":\"AT\",\"varfrom\":\"10\",\"varto\":\"10\",\"start_time\":\"0\",\"end_time\":\"0\",\"data_type\":\"mean\",\"interval\":\"day\",\"multiplier\":\"1\"}}"

List of Data Frames to One Data Frame

Disclaimer: I know that this question has been asked before. The answer provided in this answer worked for me in the past, but for some reason has stopped now.
I am pulling Marketing email statistics from the Mailchimp API. I have been doing this for the last half year or so. However, in the past 2 months, I believe the structure of what I pull has changed and thus, my code no longer works and I cannot figure out why. I believe it has something to do with the nested data frames within my list of data frames that I receive.
Here is an example of my code and the resulting list of data frames. I have removed sensitive information from my code and image:
library(httr)
library(jsonlite)
library(plyr)
#Opens-----------
opens1 <- GET("https://us4.api.mailchimp.com/3.0/reports/***ReportNumber***/sent-to?count=4000",authenticate('***My Company***', '***My-Password***'))
opens1 <- content(opens1,"text")
opens1 <- fromJSON(opens1)
Then I run opens1 <- ldply(opens1, data.frame), and I receive the following error:
Error in allocate_column(df[[var]], nrows, dfs, var) :
Data frame column 'merge_fields' not supported by rbind.fill
I tried using and looking up rbind.fill() and the other methods described in the linked answer at the top of my post, to no avail. What am I interpreting incorrectly about the merge_fields variable, or am I way off, and how do I correct it?
I'm just trying to get one data frame of all of the variables from the opens1 list.
Thanks for any and all help, and please, feel free to ask any clarification questions!
On a quick glance, this seems to work for me:
library(httr)
campaign_id <- "-------"
apikey = "------"
url <- sprintf("https://us1.api.mailchimp.com/3.0/reports/%s/sent-to", campaign_id)
opens <- GET(url, query = list(apikey = apikey, count = 4000L))
lst <- rjson::fromJSON(content(opens, "text"))
df <- dplyr::bind_rows(
lapply(lst$sent_to, function(x)
as.data.frame(t(unlist(x)), stringsAsFactors = F)
))

How to get company description, statistics using R from eg. Yahoo Finance?

I am looking for ways to get company description, key statistics, chairman name from Yahoo Finance (or other financial website) using R, for example package quantmod.
There is oodles of info how to get current and historical prices etc, but this is not what I want.
best,
This R package does not support queries for Asian bourses. The problem appears to be with the underlying Yahoo APIs.
You can get that using Intrinio's API. Their data tag directory allows you to look up the tags you want, in your case, "long_description" and "ceo" will get you the data you want:
#Install httr, which you need to request data via API
install.packages("httr")
require("httr")
#Create variables for your usename and password, get those at intrinio.com/login
username <- "Your_API_Username"
password <- "Your_API_Password"
#Making an api call for roic. This puts together the different parts of the API call
base <- "https://api.intrinio.com/"
endpoint <- "data_point"
stock <- "T"
item1 <- "long_description"
item2 <- "ceo"
#Pasting them together to make the API call
call1 <- paste(base,endpoint,"?","ticker","=", stock, "&","item","=",item1, sep="")
call2 <- paste(base,endpoint,"?","ticker","=", stock, "&","item","=",item2, sep="")
#Now we use the API call to request the data from Intrinio's database
ATT_description <- GET(call1, authenticate(username,password, type = "basic"))
ATT_CEO <- GET(call2, authenticate(username,password, type = "basic"))
#That gives us the ROIC value, but it isn't in a good format so we parse it
test1 <- unlist(content(ATT_description,"parsed"))
test2 <- unlist(content(ATT_CEO,"parsed"))
#Then make your data frame:
df1 <- data.frame(test1)
df2 <- data.frame(test2)
#From here you can rbind or cbind, and create loops to get the same data for many tickers
You can get your API keys here. Full API documentation here. This is what the CEO dataframe looks like:

How to extract sample titles (names) using GEOquery package?

GEOquery is a great R package to retrieve and analyze the Gene Expression data stored in NCBI Gene Expression Omnibus (GEO) database. I have used the following code provided from GEO2R service of GEO database (that generates the initial R script to analyze your desired data automatically) to extract some GEO series of experiments:
gset <- getGEO("GSE10246", GSEMatrix =TRUE)
if (length(gset) > 1) idx <- grep("GPL1261", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
gset # displays a summary of the data stored in this variable
The problem is that I can not retrieve the sample titles from it. I have found some function Columns() that works on GDS datasets and returns the sample names, but not on GSE.
Please note I am not interested in sample accession IDs (i.e. GSM258609 GSM258610, etc), what I want is the sample human readable titles.
Is there any idea? Thanks
After
gset <- getGEO("GSE10246", GSEMatrix =TRUE)
gset is a simple list, it's first element is an ExpressionSet, and the sample information are in the phenoData or pData, so maybe you're looking for
pData(gset[[1]])
See ?ExpressionSet for more.

Resources