Using GTmetrix REST API v2.0 in R - r

I am trying to integrate the performance testing of certain websites using GTmetrix. With the API, I am able to run the test and pull the results using the SEO connector tool in Microsoft Excel. However, it uses the xml with older version of API, and some new tests are not available in this. The latest version is 2.0
The link for the xml is here: GTmetrix XML for API 0.1.
I tried using the libraries httr and jsonlite. But, I don't know how authenticate with API, run the test and extract the results.
The documentation for API is available at API Documentation.
library(httr)
library(jsonlite)
url <- "https://www.berkeley.edu" # URL to be tested
location <- 1 # testing Location
browser <- 3 # Browser to be used for testing
res <- GET("https://gtmetrix.com/api/gtmetrix-openapi-v2.0.json")
data <- fromJSON(rawToChar(res$content))

Update 2021-11-08:
I whipped up a small library to talk to GTmetrix via R. There's some basic sanity checking baked in, but obviously this is still work in progress and there are (potentially critical) bugs. Feel free to check it out, though. Would love some feedback.
# Install and load library.
devtools::install_github("RomanAbashin/rgtmx")
library(rgtmx)
Update 2021-11-12: It's available on CRAN now. :-)
# Install and load library.
install_packages("rgtmx")
library(rgtmx)
Start test (and get results)
# Minimal example #1.
# Returns the final report after checking test status roughly every 3 seconds.
result <- start_test("google.com", "[API_KEY]")
This will start a test and wait for the report to be generated, returning the result as data.frame. Optionally, you can just simply return the test ID and other meta data via the parameter wait_for_completion = FALSE.
# Minimal example #2.
# Returns just the test ID and some meta data.
result <- start_test("google.com", "[API_KEY]", wait_for_completion = FALSE)
Other optional parameters: location,
browser,
report,
retention,
httpauth_username,
httpauth_password,
adblock,
cookies,
video,
stop_onload,
throttle,
allow_url,
block_url,
dns,
simulate_device,
user_agent,
browser_width,
browser_height,
browser_dppx,
browser_rotate.
Show available browsers
show_available_browsers("[API_KEY]")
Show available locations
show_available_locations("[API_KEY]")
Get specific test
get_test("[TEST_ID]", "[API_KEY]")
Get specific report
get_report("[REPORT_ID]", "[API_KEY]")
Get all tests
get_all_tests("[API_KEY]")
Get account status
get_account_status("[API_KEY]")
Original answer:
Pretty straightforward, actually:
0. Set test parameters.
# Your api key from the GTmetrix console.
api_key <- "[Key]"
# All attributes except URL are optional, and the availability
# of certain options may depend on the tier of your account.
# URL to test.
url <- "https://www.worldwildlife.org/"
# Testing location ID.
location_id <- 1
# Browser ID.
browser_id <- 3
1. Start a test
res_test_start <- httr::POST(
url = "https://gtmetrix.com/api/2.0/tests",
httr::authenticate(api_key, ""),
httr::content_type("application/vnd.api+json"),
body = jsonlite::toJSON(
list(
"data" = list(
"type" = "test",
"attributes" = list(
"url" = url,
# Optional attributes go here.
"location" = location_id,
"browser" = browser_id
)
)
),
auto_unbox = TRUE
),
encode = "raw"
)
2. Get test ID
test_id <- jsonlite::fromJSON(rawToChar(res_test_start$content))$data$id
3. Get report ID
# Wait a bit, as generating the report can take some time.
res_test_status <- httr::GET(
url = paste0("https://gtmetrix.com/api/2.0/tests/", test_id),
httr::authenticate(api_key, ""),
httr::content_type("application/vnd.api+json")
)
# If this returns the test ID, the report is not ready, yet.
report_id <- jsonlite::fromJSON(rawToChar(res_test_status$content))$data$id
4. Get report
res_report <- httr::GET(
url = paste0("https://gtmetrix.com/api/2.0/reports/", report_id),
httr::authenticate(api_key, ""),
httr::content_type("application/vnd.api+json")
)
# The report is a nested list with the results as you know them from GTmetrix.
report <- jsonlite::fromJSON(rawToChar(res_report$content))$data
I'm kinda tempted to build something for this as there seems to be no R library for it...

Related

Trouble downloading and accessing zip files from API in R

I am trying to automate a process in R which involves downloading a zipped folder from an API* which contains a few .csv/.xml files, accessing its contents, and then extracting the .csv/.xml that I actually care about into a dataframe (or something else that is workable). However, I am having some problems accessing the contents of the API pull. From what I gather, the proper process for pulling from an API is to use GET() from the httr package to access the API's files, then the jsonlite package to process it. The second step in this process is failing me. The code I have been trying to use is roughly as follows:
library(httr)
library(jsonlite)
req <- "http://request.path.com/thisisanapi/SingleZip?option1=yes&option2=no"
res <- GET(url = req)
#this works as expected, with res$status_code == 200
#OPTION 1:
api_char <- rawToChar(res$content)
api_call <- fromJSON(api_char, flatten=T)
#OPTION 2:
api_char2 <- content(res, "text")
api_call2 <- fromJSON(api_char2, flatten=T)
In option 1, the first line fails with an "embedded nul in string" error. In option 2, the second line fails with a "lexical error: invalid char in json text" error.
I did some reading and found a few related threads. First, this person looks to be doing a very similar thing to me, but did not experience this error (this suggests that maybe the files are zipped/stored differently between the APIs that the two of us are using and that I have set up the GET() incorrectly?). Second, this person seems to be experiencing a similar problem with converting the raw data from the API. I attempted the fix from this thread, but it did not work. In option 1, the first line ran but the second line gave a similar "lexical error: invalid char in json text" as in option before and, in option 2, the second line gave a "if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed" error, which I am not quite sure how to interpret. This may be because the content_type differs between our API pulls: mine is application/x-zip-compressed and theirs is text/tab-separated-values; charset=utf-16le, so maybe removing the null characters is altogether inappropriate here.
There is some documentation on usage of the API I am using*, but a lot of it is a few years old now and seems to focus more on manual usage rather than integration with large automated downloads like I am working on (my end goal is a loop which executes the process described many times over slightly varying urls). I am most certainly a beginner to using APIs like this, and would really appreciate some insight!
* = specifically, I am pulling from CAISO's OASIS API. If you want to follow along with some real files, replace "http://request.path.com/thisisanapi/SingleZip?option1=yes&option2=no" with "http://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_INTVL_LMP&version=3&startdatetime=20201225T09:00-0000&enddatetime=20201226T9:10-0000&market_run_id=RTM&grp_type=ALL"
I think the main issue here is that you don't have a JSON return from the API. You have a .zip file being returned, as binary (I think?) data. Your challenge is to process that data. I don't think fromJSON() will help you, as the data from the API isn't in JSON format.
Here's how I would do it. I prefer to use the httr2 package. The process below makes it nice and clear what the parameters of the query are.
library(httr2)
req <- httr2::request("http://oasis.caiso.com/oasisapi")
query <- req %>%
httr2::req_url_path_append("SingleZip") %>%
httr2::req_url_query(resultformat = 6) %>%
httr2::req_url_query(queryname = "PRC_INTVL_LMP") %>%
httr2::req_url_query(version = 3) %>%
httr2::req_url_query(startdatetime = "20201225T09:00-0000") %>%
httr2::req_url_query(enddatetime = "20201226T9:10-0000") %>%
httr2::req_url_query(market_run_id = "RTM") %>%
httr2::req_url_query(grp_type = "ALL")
# Check what our query looks like
query
#> <httr2_request>
#> GET
#> http://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_INTVL_LMP&version=3&startdatetime=20201225T09%3A00-0000&enddatetime=20201226T9%3A10-0000&market_run_id=RTM&grp_type=ALL
#> Body: empty
resp <- query %>%
httr2::req_perform()
# Check what content type and encoding we have
# All looks good
resp %>%
httr2::resp_content_type()
#> [1] "application/x-zip-compressed"
resp %>%
httr2::resp_encoding()
#> [1] "UTF-8"
Created on 2022-08-30 with reprex v2.0.2
Then you have a choice what to do, if you want to write the data to a zip file.
I discovered that the brio package will write raw data to a file nicely. Or you can just use download.file to download the .zip from the URL (you can just do that without all the httr stuff above). You need to use mode = "wb".
resp %>%
httr2::resp_body_raw() %>%
brio::write_file_raw(path = "out.zip")
# alternative using your original URL or query$url
download.file(query$url, "out.zip", mode = "wb")

how to retrieve data from a web server using an oauth2 token in R?

I've successfully received an access token from an oauth2.0 request so that I can start obtaining some data from the server. However, I keep getting error 403 on each attempt. APIs are very new to me and I only am entry level in using R so I can't figure out whats wrong with my request. I'm using the crul package currently, but I've tried to make the request with the httr package as well, but I can't get anything through without encountering the 403 error. I have a shiny app which in the end I'd like to be able to refresh with data imported from this other application which actually stores data, but I want to try to pull data to my console locally first so I can understand the basic process of doing so. I will post some of my current attempts.
(x <- HttpClient$new(
url = 'https://us.castoredc.com',
opts = list( exceptions = FALSE),
headers = list())
)
res.token <- x$post('oauth/token',
body = list(client_id = "{id}",
client_secret = "{secret}",
grant_type = 'client_credentials'))
importantStuff <- jsonlite::fromJSON(res$parse("UTF-8"))
token <- paste("Bearer", importantStuff$access_token)
I obtain my token, but the following doesn't seem to work.###
I'm attempting to get the list of study codes so that I can call on them in
further requests to actually get data from a study.
res.studies <- x$get('/api/study',headers = list(Authorization =
token,client_id = "{id}",
client_secret = "{secret}",
grant_type = 'client_credentials'),
body = list(
content_type = 'application/json'))
Their support team gave me the above endpoint to access the content, but I get 403 so I think i'm not using my token correctly?
status: 403
access-control-allow-headers: Authorization
access-control-allow-methods: Get,Post,Options,Patch
I'm the CEO at Castor EDC and although its pretty cool to see a Castor EDC question on here, I apologize for the time you lost over trying to figure this out. Was our support team not able to provide more assistance?
Regardless, I have actually used our API quite a bit in R and we also have an amazing R Engineer in house if you need more help.
Reflecting on your answer, yes, you always need a Study ID to be able to do anything interesting with the API. One thing that could make your life A LOT easier is our R API wrapper, you can find that here: https://github.com/castoredc/castoRedc
With that you would:
remotes::install_github("castoredc/castoRedc")
library(castoRedc)
castor_api <- CastorData$new(key = Sys.getenv("CASTOR_KEY"),
secret = Sys.getenv("CASTOR_SECRET"),
base_url = "https://data.castoredc.com")
example_study_id <- studies[["study_id"]][1]
fields <- castor_api$getFields(example_study_id)
etc.
Hope that makes you life a lot easier in the future.
So, After some investigation, It turns out that you first have to make a request to obtain another id for each Castor study under your username. I will post some example code that worked finally.
req.studyinfo <- httr::GET(url = "us.castoredc.com/api/study"
,httr::add_headers(Authorization = token))
json <- httr::content(req.studyinfo,as = "text")
studies <- fromJSON(json)
Then, this will give you a list of your studies in Castor for which you can obtain the ID that you care about for your endpoints. It will be a list that contains a data frame containing this information.
you use the same format with whatever endpoint you like that is posted in their documentation to retrieve data. Thank you for your observations! I will leave this here in case anyone is employed to develop anything from data used in the Castor EDC. Their documentation was vague to me, so maybe it will help someone in the future.
Example for next step:
req.studydata <- httr::GET("us.castoredc.com/api/study/{study id obtained
from previous step}/data-point-
collection/study",,httr::add_headers(Authorization =
token))
json.data <- httr::content(req.studydata,as = "text")
data <- fromJSON(json.data)
This worked for me, I removed the Sys.getenv() part
library(castoRedc)
castor_api <- CastorData$new(key = "CASTOR_KEY",
secret = "CASTOR_SECRET",
base_url = "https://data.castoredc.com")
example_study_id <- studies[["study_id"]][1]
fields <- castor_api$getFields(example_study_id)

readLines equivalent when using Azure Data Lakes and R Server together

Using R Server, I want to simply read raw text (like readLines in base) from an Azure Data Lake. I can connect and get data like so:
library(RevoScaleR)
rxSetComputeContext("local")
oAuth <- rxOAuthParameters(params)
hdFS <- RxHdfsFileSystem(params)
file1 <- RxTextData("/path/to/file.txt", fileSystem = hdFS)
RxTextData doesn't actually go and get the data once that line is executed, it works as more of a symbolic link. When you run something like:
rxSummary(~. , data=file1)
Then the data is retrieved from the data lake. However, it is always read in and treated as a delimited file. I want to either:
Download the file and store it locally with R code (preferably not).
Use some sort of readLines equivalent to get the data from but read it in 'raw' so that I can do my own data quality checks.
Does this functionality exist yet? If so, how is this done?
EDIT: I have also tried:
returnDataFrame = FALSE
inside RxTextData. This returns a list. But as I've stated, the data isn't read in immediately from the data lake until I run something like rxSummary, which then attempts to read it as a regular file.
Context: I have a "bad" CSV file containing line feeds inside double quotes. This causes RxTextData to break. However, my script detects these occurrences and fixes them accordingly. Therefore, I don't want RevoScaleR to read in the data and try and interpret the delimiters.
I found a method of doing this by calling the Azure Data Lake Store REST API (adapted from a demo from Hadley Wickham's httr package on GitHub):
library(httpuv)
library(httr)
# 1. Insert the app name ----
app_name <- 'Any name'
# 2. Insert the client Id ----
client_id <- 'clientId'
# 3. API resource URI ----
resource_uri <- 'https://management.core.windows.net/'
# 4. Obtain OAuth2 endpoint settings for azure. ----
azure_endpoint <- oauth_endpoint(
authorize = "https://login.windows.net/<tenandId>/oauth2/authorize",
access = "https://login.windows.net/<tenandId>/oauth2/token"
)
# 5. Create the app instance ----
myapp <- oauth_app(
appname = app_name,
key = client_id,
secret = NULL
)
# 6. Get the token ----
mytoken <- oauth2.0_token(
azure_endpoint,
myapp,
user_params = list(resource = resource_uri),
use_oob = FALSE,
as_header = TRUE,
cache = FALSE
)
# 7. Get the file. --------------------------------------------------------
test <- content(GET(
url = "https://accountName.azuredatalakestore.net/webhdfs/v1/<PATH>?op=OPEN",
add_headers(
Authorization = paste("Bearer", mytoken$credentials$access_token),
`Content-Type` = "application/json"
)
)) ## Returns as a binary body.
df <- fread(readBin(test, "character")) ## use readBin to convert to text.
You can do it with ScaleR functions like so. Set the delimiter to a character that doesn't occur in the data, and ignore column names. This will create a data frame containing a single character column which you can manipulate as necessary.
# assuming that ASCII 0xff/255 won't occur
src <- RxTextData("file", fileSystem="hdfs", delimiter="\x255", firstRowIsColNames=FALSE)
dat <- rxDataStep(src)
Although given that Azure Data Lake is really meant for storing big datasets, and this one seems to be small enough to fit in memory, I wonder why you couldn't just copy it to your local disk....

R - Twitter - fromJSON - get list of tweets

I would like to retrieve a list of tweets from Twitter for a given hashtag using package RJSONIO in R. I think I am pretty close to the solution, but I seem to miss one step.
My code reads as follows (in this example, I use #NBA as a hashtag):
library(httr)
library(RJSONIO)
# 1. Find OAuth settings for twitter:
# https://dev.twitter.com/docs/auth/oauth
oauth_endpoints("twitter")
# Replace key and secret below
myapp <- oauth_app("twitter",
key = "XXXXXXXXXXXXXXX",
secret = "YYYYYYYYYYYYYYYYY"
)
# 3. Get OAuth credentials
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
# 4. Use API
req=GET("https://api.twitter.com/1.1/search/tweets.json?q=%23NBA&src=typd",
config(token = twitter_token))
req <- content(req, as = "text")
response=fromJSON(req)
How can I get the list of tweets from object 'response'?
Eventually, I would like to get something like:
searchTwitter("#NBA", n=5000, lang="en")
Thanks a lot in advance!
The response object should be a list of length two: statuses and metadata. So, for example, to get the text of the first tweet, try:
response$statuses[[1]]$text
However, there are a couple of R packages designed to make just this kind of thing easier: Try streamR for the streaming API, and twitteR for the REST API. The latter has a searchTwitter function exactly as you describe.

automating the login to the uk data service website in R with RCurl or httr

I am in the process of writing a collection of freely-downloadable R scripts for http://asdfree.com/ to help people analyze the complex sample survey data hosted by the UK data service. In addition to providing lots of statistics tutorials for these data sets, I also want to automate the download and importation of this survey data. In order to do that, I need to figure out how to programmatically log into this UK data service website.
I have tried lots of different configurations of RCurl and httr to log in, but I'm making a mistake somewhere and I'm stuck. I have tried inspecting the elements as outlined in this post, but the websites jump around too fast in the browser for me to understand what's going on.
This website does require a login and password, but I believe I'm making a mistake before I even get to the login page.
Here's how the website works:
The starting page should be: https://www.esds.ac.uk/secure/UKDSRegister_start.asp
This page will automatically re-direct your web browser to a long URL that starts with: https://wayf.ukfederation.org.uk/DS002/uk.ds?[blahblahblah]
(1) For some reason, the SSL certificate does not work on this website. Here's the SO question I posted regarding this. The workaround I've used is simply ignoring the SSL:
library(httr)
set_config( config( ssl.verifypeer = 0L ) )
and then my first command on the starting website is:
z <- GET( "https://www.esds.ac.uk/secure/UKDSRegister_start.asp" )
this gives me back a z$url that looks a lot like the https://wayf.ukfederation.org.uk/DS002/uk.ds?[blahblahblah] page that my browser also re-directs to.
In the browser, then, you're supposed to type in "uk data archive" and click the continue button. When I do that, it re-directs me to the web page https://shib.data-archive.ac.uk/idp/Authn/UserPassword
I think this is where I'm stuck because I cannot figure out how to have cURL followlocation and land on this website. Note: no username/password has been entered yet.
When I use the httr GET command from the wayf.ukfederation.org.uk page like this:
y <- GET( z$url , query = list( combobox = "https://shib.data-archive.ac.uk/shibboleth-idp" ) )
the y$url string looks a lot like z$url (except it's got a combobox= on the end). Is there any way to get through to this uk data archive authentication page with RCurl or httr?
I can't tell if I'm just overlooking something or if I absolutely must use the SSL certificate described in my previous SO post or what?
(2) At the point I do make it through to that page, I believe the remainder of the code would just be:
values <- list( j_username = "your.username" ,
j_password = "your.password" )
POST( "https://shib.data-archive.ac.uk/idp/Authn/UserPassword" , body = values)
But I guess that page will have to wait...
The relevant data variables returned by the form are action and origin, not combobox. Give action the value selection and origin the value from the relevant entry in combobox
y <- GET( z$url, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") )
> y$url
[1] "https://shib.data-archive.ac.uk:443/idp/Authn/UserPassword"
Edit
It looks as though the handle pool isn't keeping your session alive correctly. You therefore need to pass the handles directly rather than automatically. Also for the POST command you need to set multipart=FALSE as this is the default for HTML forms. The R command has a different default as it is mainly designed for uploading files. So:
y <- GET( handle=z$handle, query = list( action="selection", origin = "https://shib.data-archive.ac.uk/shibboleth-idp") )
POST(body=values,multipart=FALSE,handle=y$handle)
Response [https://www.esds.ac.uk/]
Status: 200
Content-type: text/html
...snipped...
<title>
Introduction to ESDS
</title>
<meta name="description" content="Introduction to the ESDS, home page" />
I think one way to address "enter your organization" page goes like this:
library(tidyverse)
library(rvest)
library(stringr)
org <- "your_organization"
user <- "your_username"
password <- "your_password"
signin <- "http://esds.ac.uk/newRegistration/newLogin.asp"
handle_reset(signin)
# get to org page and enter org
p0 <- html_session(signin) %>%
follow_link("Login")
org_link <- html_nodes(p0, "option") %>%
str_subset(org) %>%
str_match('(?<=\\")[^"]*') %>%
as.character()
f0 <- html_form(p0) %>%
first() %>%
set_values(origin = org_link)
fake_submit_button <- list(name = "submit-btn",
type = "submit",
value = "Continue",
checked = NULL,
disabled = NULL,
readonly = NULL,
required = FALSE)
attr(fake_submit_button, "class") <- "btn-enabled"
f0[["fields"]][["submit"]] <- fake_submit_button
c0 <- cookies(p0)$value
names(c0) <- cookies(p0)$name
p1 <- submit_form(session = p0, form = f0, config = set_cookies(.cookies = c0))
Unfortunately, that doesn't solve the whole problem—(2) is harder than it looks. I've got more of what I think is a solution posted here: R: use rvest (or httr) to log in to a site requiring cookies. Hopefully someone will help us get the rest of the way.

Resources