I've been trying to extract multiple DNA-sequence alignments in R (4.0.3) invoking the alignment REST API endpoint from Ensembl. A toy example is below:
library(httr)
library(jsonlite)
tmp_chr = "16"
tmp_seq_str = "87187517"
tmp_seq_end = "87187717"
server = "http://rest.ensembl.org"
ext = paste0("/alignment/region/homo_sapiens/", tmp_chr, ":", tmp_seq_str, "-",
tmp_seq_end, "?species_set_group=primates")
r = GET(paste(server, ext, sep = ""), content_type("application/json"))
json_object = fromJSON(toJSON(content(r)))[[1]]
toJSON function works for some genomic locations, but not for some others giving the error message below:
Error in toJSON(content(r)) : unable to convert R type 22 to JSON
I was wondering if I do something wrong or if this is an issue with jsonlite. Please let me know if you need any additional info to reproduce the error. Many thanks!
Related
I am unable to download or read a zip file from an API request using httr package. Is there another package I can try that will allow me to download/read binary zip files stored within the response of a get request in R?
I tried two ways:
used GET to get an application/json type response object (successful) and then used fromJSON to extract content using content(my_response, 'text'). The output includes a column called 'zip' which is the data I'm interested in downloading, which documentation states is a base64 encoded binary file. This column is currently a really long string of random letters and I'm not sure how to convert this to the actual dataset.
I tried bypassing using fromJSON because I noticed there is a field of class 'raw' within the response object itself. This object is a list of random numbers which I suspect are the binary representation of the dataset. I tried using rawToChar(my_response$content) to try to convert the raw data type into character, but this results in the same long character string being produced as in #1.
I noticed that with approach #1, if I use base64_dec() to try to convert the long character string I also get the same type of output as the 'raw' field within the response object itself.
getzip1 <- GET(getzip1_link)
getzip1 # successful response, status 200
df <- fromJSON(content(getzip1, "text"))
df$status # "OK"
df$dataset$zip # <- this is the very long string of letters (eg. "I1NC5qc29uUEsBAhQDFA...")
# Method 1: try to convert from the 'zip' object in the output of fromJSON
try1 <- base64_dec(df$dataset$zip)
#looks similar to getzip1$content (i.e. this produces the list of numbers/letters 50 4b 03 04 14 00, etc, perhaps binary representation)
# Method 2: try to get data directly from raw object
class(getzip1$content) # <- 'raw' class object directly from GET request
try2 <- rawToChar(getzip1$content) #returns same output as df$data$zip
I should be able to use either the raw 'content' object from my response or the long character string in the 'zip' object of the output of fromJSON in order to view the dataset or somehow download it. I don't know how to do this. Please help!
welcome!
Based on the documentation for the API the response to the getDataset endpoint has schema
Dataset archive including meta information, the dataset itself is base64 encoded to allow for binary ZIP
transfers.
{
"status": "OK",
"dataset": {
"state_id": 5,
"session_id": 1624,
"session_name": "2019-2020 Regular Session",
"dataset_hash": "1c7d77fe298a4d30ad763733ab2f8c84",
"dataset_date": "2018-12-23",
"dataset_size": 317775,
"mime": "application\/zip",
"zip": "MIME 64 Encoded Document"
}
}
We can use R for obtaining the data with the following code,
library(httr)
library(jsonlite)
library(stringr)
library(maditr)
token <- "" # Your API key
session_id <- 1253L # Obtained from the getDatasetList endpoint
access_key <- "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile <- file.path("path", "to", "file.zip") # Modify
response <- str_c("https://api.legiscan.com/?key=",
token,
"&op=getDataset&id=",
session_id,
"&access_key=",
access_key) %>%
GET()
status_code(x = response) == 200 # Good
body <- content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() # This contains some extra metadata
content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() %>%
getElement(name = "dataset") %>%
getElement(name = "zip") %>%
base64_dec() %>%
writeBin(con = destfile)
unzip(zipfile = destfile)
unzip will unzip the files which in this case will look like
hash.md5 # Can be checked against the metadata
AL/2016-2016_1st_Special_Session/bill/*.json
AL/2016-2016_1st_Special_Session/people/*.json
AL/2016-2016_1st_Special_Session/vote/*.json
As always, wrap your code in functions and profit.
PS: Here is how the code would like like in Julia as a comparison.
using Base64, HTTP, JSON3, CodecZlib
token = "" # Your API key
session_id = 1253 # Obtained from the getDatasetList endpoint
access_key = "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile = joinpath("path", "to", "file.zip") # Modify
response = string("https://api.legiscan.com/?",
join(["key=$token",
"op=getDataset",
"id=$session_id",
"access_key=$access_key"],
"&")) |>
HTTP.get
#assert response.status == 200
JSON3.read(response.body) |>
(content -> content.dataset.zip) |>
base64decode |>
(data -> write(destfile, data))
run(pipeline(`unzip`, destfile))
I want to use this api:
http(s)://lindat.mff.cuni.cz/services/morphodita/api/
with the method "tag". It will tag and lemmatize my text input. It has worked fine with a text string (see below), but I need to send an entire file to the API.
Just to show that string as input works fine:
method <- "tag"
lemmatized_text <- RCurl::getForm(paste#
("http://lindat.mff.cuni.cz/services/morphodita/api/", method, sep = ""),
.params = list(data = "Peter likes cakes. John likes lollypops.",#
output = "json", model = "english-morphium-wsj-140407-no_negation"), #
method = method)
This is the - correct - result:
[1] "{\n \"model\": \"english-morphium-wsj-140407-no_negation\",\n
\"acknowledgements\": [\n \"http://ufal.mff.cuni.cz
/morphodita#morphodita_acknowledgements\",\n \"http://ufal.mff.cuni.cz
/morphodita/users-manual#english-morphium-wsj_acknowledgements\"\n ],\n
\"result\": [[{\"token\":\"Peter\",\"lemma\":\"Peter\",\"tag\":\"NNP
\",\"space\":\" \"},{\"token\":\"likes\",\"lemma\":\"like\",\"tag\":\"VBZ
\",\"space\":\" \"},{\"token\":\"cakes\",\"lemma\":\"cake\",\"tag\":\"NNS
[truncated by me]
However, replacing a string with a vector with elements corresponding to lines of a text file does not work, since the API requires a string on input. Only one, by default the first, vector element would be processed:
method <- "tag"
mydata <- c("cakes.", "lollypops")
lemmatized_text <- RCurl::getForm(paste("http://lindat.mff.cuni.cz
/services/morphodita/api/", method, sep = ""),
.params = list(data = mydata, output = "json",
model = "english-morphium-wsj-140407-no_negation"))
[1] "{\n \"model\": \"english-morphium-wsj-140407-no_negation\",\n
[truncated by me]
\"result\": [[{\"token\":\"cakes\",\"lemma\":\"cake\",\"tag\":\"NNS
\"},{\"token\":\".\",\"lemma\":\".\",\"tag\":\".\"}]]\n}\n"
This issue can be alleviated with sapply and a function calling that API on each element of the vector at the same time, but each element of the resulting vector contains a separate json document. To parse it, I need the entire data to be one single json document, though.
Eventually I tried textConnection, but it returns an erroneous output:
mydata <- c("cakes.", "lollypops")
mycon <- textConnection(mydata, encoding = "UTF-8")
lemmatized_text <- RCurl::getForm(paste#
("http://lindat.mff.cuni.cz/services/morphodita/api/", method,#
sep = ""), .params = list(data = mycon, output = "json",#
model = "english-morphium-wsj-140407-no_negation"))
[1] "{\n \"model\": \"english-morphium-wsj-140407-no_negation\",\n
\"acknowledgements\": [\n \"http://ufal.mff.cuni.cz
/morphodita#morphodita_acknowledgements\",\n \"http://ufal.mff.cuni.cz
/morphodita/users-manual#english-morphium-wsj_acknowledgements\"\n ],\n
\"result\": [[{\"token\":\"5\",\"lemma\":\"5\",\"tag\":\"CD\"}]]\n}\n"
attr(,"Content-Type")
I should probably also say that I have already tried to paste and collapse the vector into one single element, but that is very fragile. It works with dummy data but not with larger files and never with Czech files (although UTF-8 encoded). The API strictly requires UTF-8-encoded data. I therefore suspect encoding issues. I have tried this file:
mydata <- RCurl::getURI("https://ia902606.us.archive.org/4/items/maidmarian00966gut/maidm10.txt", .opts = list(.encoding = "UTF-8"))
and it said
Error: Bad Request
but when I only used a few lines, it suddenly worked. I also made a local copy of the file where I changed the newlines from MacIntosh to Windows. Maybe this helped a bit, but it was definitely not sufficient.
Eventually I should add that I work on Windows 8 Professional, running R-3.2.4 64bit, with RStudio Version 0.99.879.
I should have used RCurl::postForm instead of RCurl::getForm, with all other arguments remaining the same. The postForm function can not only be used to write files on the server, as I had wrongly believed. It does not impose strict limits on the size of the data to be processed, since, with postForm the data do not become part of the URL, unlike with getForm.
This is my convenience function (requires RCurl, stringi, stringr, magrittr):
process_w_morphodita <- function(method, data, output = "json", model
= "czech-morfflex-pdt-161115", guesser = "yes",...){
# for formally optional but very important argument-value pairs see
MorphoDiTa REST API reference at
# http://lindat.mff.cuni.cz/services/morphodita/api-reference.php
pokus <- RCurl::postForm(paste("http://lindat.mff.cuni.cz/services
/morphodita/api/", method, sep = ""), .params = list(data =
stringi::stri_enc_toutf8(data), output = output, model = model,
guesser = guesser,...))
if (output == "vertical") {
pokus <- pokus %>% stringr::str_trim(side = "both") %>%
stringr::str_conv("UTF-8") %>% stringr::str_replace_all(pattern =
"\\\\t", replacement = "\t") %>% stringr::str_replace_all(pattern =
"\\\\n", replacement = "\n") # look for four backslashes, replace
with one backslash to get vertical format in text file
}
return(pokus)
}
I am trying to obtain data from a website and thanks to a helper i could get to the following script:
require(httr)
require(rvest)
res <- httr::POST(url = "http://apps.kew.org/wcsp/advsearch.do",
body = list(page = "advancedSearch",
AttachmentExist = "",
family = "",
placeOfPub = "",
genus = "Arctodupontia",
yearPublished = "",
species ="scleroclada",
author = "",
infraRank = "",
infraEpithet = "",
selectedLevel = "cont"),
encode = "form")
pg <- content(res, as="parsed")
lnks <- html_attr(html_node(pg,"td"), "href")
However, in some cases, like the example above, it does not retrieve the right link because, for some reason, html_attr does not find urls ("href") within the node detected by html_node. So far, i have tried different CSS selector, like "td", "a.onwardnav" and ".plantname" but none of them generate an object that html_attr can handle correctly.
Any hint?
You are really close on getting the answer your were expecting. If you would like to pull the links off of the desired page then:
lnks <- html_attr(html_nodes(pg,"a"), "href")
will return a list of all of the links at the "a" tag with a "href" attribute. Notice the command is html_nodes and not node. There are multiple "a" tags thus the plural.
If you are looking for the information from the table in the body of then try this:
html_table(pg, fill=TRUE)
#or this
html_nodes(pg,"tr")
The second line will return a list of the 9 rows from the table which one could then parse to obtain the row names ("th") and/or row values ("td").
Hope this helps.
I've successfully imported my data into R as transactions, but when I try targeting a specific website, I get this error:
Error in asMethod(object) : FACEBOOK.COM is an unknown item label
Is there any reason why this could be happening? Here is a snippet of code:
target.conf80 = apriori(trans,
parameter = list(supp=.002,conf=.8),
appearance = list(default="lhs",rhs = "FACEBOOK.COM"),
control = list(verbose = F))
target.conf80 = sort(target.conf80,decreasing=TRUE,by="confidence")
inspect(target.conf80[1:10])
Thanks!
Here is what the transactions look like:
1 {V1=Google,
V2=Google Web Search,
V3=FACEBOOK.COM} 1
2 {V1=FACEBOOK.COM,
V2=MCAFEE.COM,
V3=7EER.NET,
V4=Google} 2
3 {V1=MCAFEE.COM,
The problem is the way you read/convert the data to transactions. The transactions should look like:
1 {Google,
Google Web Search,
FACEBOOK.COM} 1
2 {FACEBOOK.COM,
MCAFEE.COM,
7EER.NET,
Google} 2
3 {MCAFEE.COM,
...
Without the V1, V2, etc. In your transactions V1=Google and V4=Google are different items.
Error as(data, 'transactions') From Data Frames
I'm assuming that the dataset was transformed as follow...data <- as(data, 'transactions'). If you run that code without performing some manipulations with your data you will get those V1, V2, ....
Cleaning Data Before Transactions
I want to include how to manipulate the data to be ready for read.transctions(). After importing your data into R you want to convert your dataframe to a matrix like so... d.matrix <- as.matrix(df), the you want to eliminate any headers if happened that you do have headers; colnames(d.matrix) <- NULL. Now you don't have headers. After that you want to....
write.table(x = d.matrix,
file = 'clean_data.csv',
sep = ',',
col.names = FALSE,
row.names = FALSE)
Finally, you want to import the data as transaction like so...
data <- read.transactions('clean_data.csv',
format = 'basket',
sep = ',',
rm.duplicates = TRUE)
Now you have a dataset with no V1, V2, V3, ... and no row ID's
I am trying to extract data from Localytics using R. Here is the snippet of code I'm using:
library(httr)
localytics_url = 'https://api.localytics.com/v1/query'
r <- POST(url = localytics_url,
body=list(
app_id=app_id,
metrics=c("users","revenue"),
dimensions=c("day","birth_day"),
conditions=list(
day=c("between", "2015-02-01", "2015-04-01")
)
),
encode="json",
authenticate(key,secret),
accept("application/json"),
content_type("application/json")
)
stop_for_status(r)
content(r)
But the output I get from content is binary, and not a json. I'm confused. Furthermore if I try to look at the object 'r', I see
Response [https://api.localytics.com/v1/query]
Date: 2015-04-14 15:18
Status: 200
Content-Type: application/vnd.localytics.v1+hal+json;type=ResultSet; charset=utf-8
Size: 1.02 MB
<BINARY BODY>
I don't understand why it's a binary body or how to convert it back. Can anyone provide me any help/clues?
I've also tried this with Rcurl using the following code:
cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")
object <- getForm(uri=localytics_url, app_id=app_id, metrics="customers", dimensions="day", conditions = toJSON(list(day=c("between", "2015-01-01", "2015-04-09"))), .opts=curlOptions(userpwd=sprintf("%s:%s", key, password))
But that generates the error
Error in function (type, msg, asError = TRUE) :
SSL certificate problem: unable to get local issuer certificate
So I'm a bit stumped.
######## Added April 15, 2015
First thanks to MrFlick for his help so far. I got it to work with
contents=content(r, as="text")
Thanks very much for your help. I (think I) had tried that before and then went on to try and extract it to an R data format using fromJSON, but I was using the rjson library, and the jsonlite package worked for me.
I appreciate your patience.
Here's a complete sample of code on how you would get the data, and then extract the results and view them as a table.
library(httr)
library(jsonlite)
response <- POST(url = 'https://api.localytics.com/v1/query',
body=list(
app_id='APP_ID',
metrics='sessions',
conditions=list(
day=c("between", format(Sys.Date() - 31, "%Y-%m-%d"), format(Sys.Date() - 1, "%Y-%m-%d"))
),
dimensions=c('new_device','day')
),
encode="json",
authenticate('KEY','SECRET'),
accept("application/json"),
content_type("application/json"))
stop_for_status(response)
# Convert the content of the result to a string, you can load with jsonlite
result <- paste(rawToChar(response$content), collapse = "")
# Useful to print your result incase you are getting any errors
print(result)
# Load your data with jsonlite
document <- fromJSON(result)
# The results tag contains the table of data you need
View(document$results)