getting binary data when using POST request in httr package - r

I am using the POST function in httr library to get some data and the code is shown below.
library(httr)
url = "https://xxxx:xxx#api.xxx/_search" #omitted for privacy
a = POST(url,body = query,encode = "json")
The query is shown below in the appendix. a$content is giving me a whole bunch of a hexadecimal numbers on which I have to use another function before I can get some useful data.
Ultimately I wish to get a data frame by using b = fromJSON(a$content). So far in order to get any data I have to use:
chr<-function(n){rawToChar(as.raw(n))}
b = jsonlite::fromJSON(chr(a$content))
data = b$hits$hits$`_source`
This seems inefficient considering that I am parsing in the data through a local function to get the final data. So my questions are as follows:
Am I using the POST function correctly to get the query?
Is there a more efficient (faster) way of getting my data into a data frame?
Appendix:
query = '
{
"_source": [
"start","source.country_codes",
"dest.country_codes"
],
"size": 100,
"query": {
"bool": {
"must": [
{
"bool": {
"must_not": [
{
"range": {
"start": {
"lte": "2013-01-01T00:00:00"
}
}
},
{
"range": {
"start": {
"gt": "2016-05-19T00:00:00"
}
}
}
]
}
}
]
}
}
}'

POST function looks good.
js<-fromJSON(content(a,as="text"))

Related

Postman Schema Validation using TV4

I'm having trouble validating a schema in Postman using tv4 inside the tests tab - it is always returning a true test, no matter what I feed it. I am at a complete loss and could really use a hand - here is my example JSON Response, and my tests:
I've tried a ton of variations from every Stack Overflow/tutorial I could find and nothing will work - it always returns true.
//Test Example
var jsonData = JSON.parse(responseBody);
const schema = {
"required" : ["categories"],
"properties": {
"categories": {
"required" : ["aStringOne", "aStringTwo", "aStringThree" ],
"type": "array",
"properties" : {
"aStringOne": {"type": "string" },
"aStringTwo": {"type": "null" },
"aStringThree": {"type": "boolean" }
}
}
}
};
pm.test('Schema is present and accurate', () => {
var result=tv4.validateMultiple(jsonData, schema);
console.log(result);
pm.expect(result.valid).to.be.true;
});
//Response Example
{
"categories": [
{
"aStringOne": "31000",
"aStringTwo": "Yarp",
"aStringThree": "More Yarp Indeed"
}
]
}
This should return false, as all three properties are strings but its passing. I'm willing to use a different validator or another technique as long as I can export it as a postman collection to use with newman in my CI/CD process. I look forward to any help you can give.
I would suggest moving away from using tv4 in Postman, the project isn't actively supported and Postman now includes a better (in my opinion), more actively maintained option called Ajv.
The syntax is slightly different but hopefully, this gives you an idea of how it could work for you.
I've mocked out your data and just added everything into the Tests tab - If you change the jsonData variable to pm.response.json() it will run against the actual response body.
var jsonData = {
"categories": [
{
"aStringOne": "31000",
"aStringTwo": "Yarp",
"aStringThree": "More Yarp Indeed"
}
]
}
var Ajv = require('ajv'),
ajv = new Ajv({logger: console, allErrors: true}),
schema = {
"type": "object",
"required": [ "categories"],
"properties": {
"categories": {
"type": "array",
"items": {
"type": "object",
"required": [ "aStringOne", "aStringTwo", "aStringThree" ],
"properties": {
"aStringOne": { "type": "string" },
"aStringTwo": { "type": "integer"},
"aStringThree": { "type": "boolean"},
}
}
}
}
}
pm.test('Schema is valid', function() {
pm.expect(ajv.validate(schema, jsonData), JSON.stringify(ajv.errors)).to.be.true
});
This is an example of it failing, I've included the allErrors flag so that it will return all the errors rather than just the first one it sees. In the pm.expect() method, I've added JSON.stringify(ajv.errors) so you can see the error in the Test Result tab. It's a little bit messy and could be tidied up but all the error information is there.
Setting the properties to string show the validation passing:
If one of the required Keys is not there, it will also error for this too:
Working with schemas is quite difficult and it's not easy to both create them (nested arrays and objects are tricky) and ensure they are doing what you want to do.
There are occasions where I thought something should fail and it passed the validation test. It just takes a bit of learning/practising and once you understand the schema structures, they can become extremely useful.

ElasticSearch - difference between two date fields

I have an index in ElasticSearch with two fields of date type (metricsTime & arrivalTime). A sample document is quoted below. In Kibana, I created a scripted field delay for the difference between those two fields. My painless script is:
doc['arrivalTime'].value - doc['metricsTime'].value
However, I got the following error message when navigating to Kibana's Discover tab: class_cast_exception: Cannot apply [-] operation to types [org.joda.time.MutableDateTime] and [org.joda.time.MutableDateTime].
This looks same as the error mentioned in https://discuss.elastic.co/t/problem-in-difference-between-two-dates/121655. But the answer in that page suggests that my script is correct. Could you please help?
Thanks!
{
"_index": "events",
"_type": "_doc",
"_id": "HLV274_1537682400000",
"_version": 1,
"_score": null,
"_source": {
"metricsTime": 1537682400000,
"box": "HLV274",
"arrivalTime": 1539930920347
},
"fields": {
"metricsTime": [
"2018-09-23T06:00:00.000Z"
],
"arrivalTime": [
"2018-10-19T06:35:20.347Z"
]
},
"sort": [
1539930920347
]
}
Check the list of Lucene Expressions to check what expressions are available for date field and how you could use them
Just for sake of simplicity, check the below query. I have created two fields metricsTime and arrivalTime in a sample index I've created.
Sample Document
POST mydateindex/mydocs/1
{
"metricsTime": "2018-09-23T06:00:00.000Z",
"arrivalTime": "2018-10-19T06:35:20.347Z"
}
Query using painless script
POST mydateindex/_search
{ "query": {
"bool": {
"must": {
"match_all": {
}
},
"filter": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference",
"lang" : "painless",
"params": {
"difference": 2
}
}
}
}
}
}
}
}
}
Note the below line in the query
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference"
Now if you change the value of difference from 2 to 26 (which is one more than the difference in the dates) then you see that the above query would not return the document.
But nevertheless, I have mentioned the query as an example as how using scripting you can compare two different and please do refer to the link I've shared.

Querying Cosmos Nested JSON documents

I would like to turn this resultset
[
{
"Document": {
"JsonData": "{\"key\":\"value1\"}"
}
},
{
"Document": {
"JsonData": "{\"key\":\"value2\"}"
}
}
]
into this
[
{
"key": "value1"
},
{
"key": "value2"
}
]
I can get close by using a query like
select value c.Document.JsonData from c
however, I end up with
[
"{\"key\":\"value1\"}",
"{\"key\":\"value2\"}"
]
How can I cast each value to an individual JSON fragment using the SQL API?
As David Makogon said above, we need to transform such data within our app. We can do as below:
string data = "[{\"key\":\"value1\"},{\"key\":\"value2\"}]";
List<Object> t = JsonConvert.DeserializeObject<List<Object>>(data);
string jsonData = JsonConvert.SerializeObject(t);
Screenshot of result:

Insert date as epoch_seconds, output as formatted date

I have a set of timestamps formatted as seconds since the epoch. I'd like to insert to ElasticSearch as epoch_seconds but when querying would like to see the output as a pretty date, e.g. strict_date_optional_time.
My below mapping preserves the format that the input came in - is there any way to normalize the output to just one format via the mapping api?
Current Mapping:
PUT example
{
"mappings": {
"time": {
"properties": {
"time_stamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_second"
}
}
}
}
}
Example docs
POST example/time
{
"time_stamp": "2018-03-18T00:00:00.000Z"
}
POST example/time
{
"time_stamp": "1521389162" // Would like this to output as: 2018-03-18T16:05:50.000Z
}
GET example/_search output:
{
"total": 2,
"max_score": 1,
"hits": [
{
"_source": {
"time_stamp": "1521389162", // Stayed as epoch_second
}
},
{
"_source": {
"time_stamp": "2018-03-18T00:00:00.000Z"
}
}
]
}
Elasticsearch differentiates between the _source and the so called stored fields. The first one is supposed to represent your input.
If you actually use stored fields (by specifying store=true in your mapping) then specify multiple date formats this is easy: (emphasis mine)
Multiple formats can be specified by separating them with || as a separator. Each format will be tried in turn until a matching format is found. The first format will be used to convert the milliseconds-since-the-epoch value back into a string.
I have tested this with elasticsearch 5.6.4 and it works fine:
PUT /test -d '{ "mappings": {"doc": { "properties": {"post_date": {
"type":"date",
"format":"basic_date_time||epoch_millis",
"store":true
} } } } }'
PUT /test/doc/2 -d '{
"user" : "test1",
"post_date" : "20150101T121030.000+01:00"
}'
PUT /test/doc/1 -d '{
"user" : "test2",
"post_date" : 1525167490500
}'
Note how two different input-formats will result in the same format when using GET /test/_search?stored_fields=post_date&pretty=1
{
"hits" : [
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_score" : 1.0,
"fields" : {
"post_date" : [
"20150101T111030.000Z"
]
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"post_date" : [
"20180501T093810.500Z"
]
}
}
]
}
If you want to change the input (in _source) you're not so lucky, the mapping-transform feature has been removed:
This was deprecated in 2.0.0 because it made debugging very difficult. As of now there really isn’t a feature to use in its place other than transforming the document in the client application.
If, instead of changing the stored data you are interested in formatting the output, have a look at this answer to Format date in elasticsearch query (during retrieval)

elastic call and loops/apply in R

How do I run a loop/apply that will run an elastic search call with a different body each time?
I need to query (filtered query) an index for the word "local" for let's see, 2000 to 2010 and I want to store the results in R. Here I made an example with years from 2000 to 2001. I tried with paste but it wouldn't work because it messes up the JSON language in the body part. I can't share the es_base, unfortunately, but I would appreciate any help.
Thanks!
library(elastic)
connect(es_base = "http://...", es_port = "")
years <- list(
b <-'{
"query": {
"filtered": {
"query": {"term": { "signature_year": 2000}},
"filter": {
"term": { "text": "local"}
}
}
}
}',
c <- '{
"query": {
"filtered": {
"query": {"term": { "signature_year": 2001}},
"filter": {
"term": { "text": "local"}
}
}
}
}'
)
a<-for(i in years) {
Search( index = "index", b = i)$hits$total
}

Resources