ES Spring data add size into spring data format

ES Spring data add size into spring data format - spring-mvc

I am trying to convert my JSON ES query to Java I am using Spring data. I am almost there but the problem is i cannot get the "size": 0 into my query in Java.
GET someserver/_search
{
"query": { ...},
"size": 0,
"aggregations" : {
"parent_aggregation" : {
"terms" : {
"field": "fs.id"
},
"aggs": {
"sub_aggs" : {
"top_hits": {
"sort": [
{
"fs.smallVersion": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
In Java I am building an NativeSearchQuery object on which I think it should be possible to set the size?
NativeSearchQuery searchQuery = createNativeSearchQuery(data, validIndices, query, filter);
es.getElasticsearchTemplate().query(searchQuery, response -> extractResult(data, response));

If you are using elasticsearch before 2.0, you can use the search type feature to do what you want, which is not return docs. This can be accomplished using the NativeSearchQueryBuilder. If you set the SearchType to COUNT you will not get docs. Beware that in elasticsearch 2.x this is deprecated and you should use the size is 0. If the spring elasticsearch project will support elasticsearch 2.0 this will most likely change and the size attribute should be exposed in this builder as well.
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery())
.withSearchType(SearchType.COUNT)
.withIndices("yourindex")
.addAggregation(terms("nameofagg").field("thefield"))
.build();

Related

Postman Schema Validation using TV4

I'm having trouble validating a schema in Postman using tv4 inside the tests tab - it is always returning a true test, no matter what I feed it. I am at a complete loss and could really use a hand - here is my example JSON Response, and my tests:
I've tried a ton of variations from every Stack Overflow/tutorial I could find and nothing will work - it always returns true.
//Test Example
var jsonData = JSON.parse(responseBody);
const schema = {
"required" : ["categories"],
"properties": {
"categories": {
"required" : ["aStringOne", "aStringTwo", "aStringThree" ],
"type": "array",
"properties" : {
"aStringOne": {"type": "string" },
"aStringTwo": {"type": "null" },
"aStringThree": {"type": "boolean" }
}
}
}
};
pm.test('Schema is present and accurate', () => {
var result=tv4.validateMultiple(jsonData, schema);
console.log(result);
pm.expect(result.valid).to.be.true;
});
//Response Example
{
"categories": [
{
"aStringOne": "31000",
"aStringTwo": "Yarp",
"aStringThree": "More Yarp Indeed"
}
]
}
This should return false, as all three properties are strings but its passing. I'm willing to use a different validator or another technique as long as I can export it as a postman collection to use with newman in my CI/CD process. I look forward to any help you can give.

I would suggest moving away from using tv4 in Postman, the project isn't actively supported and Postman now includes a better (in my opinion), more actively maintained option called Ajv.
The syntax is slightly different but hopefully, this gives you an idea of how it could work for you.
I've mocked out your data and just added everything into the Tests tab - If you change the jsonData variable to pm.response.json() it will run against the actual response body.
var jsonData = {
"categories": [
{
"aStringOne": "31000",
"aStringTwo": "Yarp",
"aStringThree": "More Yarp Indeed"
}
]
}
var Ajv = require('ajv'),
ajv = new Ajv({logger: console, allErrors: true}),
schema = {
"type": "object",
"required": [ "categories"],
"properties": {
"categories": {
"type": "array",
"items": {
"type": "object",
"required": [ "aStringOne", "aStringTwo", "aStringThree" ],
"properties": {
"aStringOne": { "type": "string" },
"aStringTwo": { "type": "integer"},
"aStringThree": { "type": "boolean"},
}
}
}
}
}
pm.test('Schema is valid', function() {
pm.expect(ajv.validate(schema, jsonData), JSON.stringify(ajv.errors)).to.be.true
});
This is an example of it failing, I've included the allErrors flag so that it will return all the errors rather than just the first one it sees. In the pm.expect() method, I've added JSON.stringify(ajv.errors) so you can see the error in the Test Result tab. It's a little bit messy and could be tidied up but all the error information is there.
Setting the properties to string show the validation passing:
If one of the required Keys is not there, it will also error for this too:
Working with schemas is quite difficult and it's not easy to both create them (nested arrays and objects are tricky) and ensure they are doing what you want to do.
There are occasions where I thought something should fail and it passed the validation test. It just takes a bit of learning/practising and once you understand the schema structures, they can become extremely useful.

ElasticSearch - difference between two date fields

I have an index in ElasticSearch with two fields of date type (metricsTime & arrivalTime). A sample document is quoted below. In Kibana, I created a scripted field delay for the difference between those two fields. My painless script is:
doc['arrivalTime'].value - doc['metricsTime'].value
However, I got the following error message when navigating to Kibana's Discover tab: class_cast_exception: Cannot apply [-] operation to types [org.joda.time.MutableDateTime] and [org.joda.time.MutableDateTime].
This looks same as the error mentioned in https://discuss.elastic.co/t/problem-in-difference-between-two-dates/121655. But the answer in that page suggests that my script is correct. Could you please help?
Thanks!
{
"_index": "events",
"_type": "_doc",
"_id": "HLV274_1537682400000",
"_version": 1,
"_score": null,
"_source": {
"metricsTime": 1537682400000,
"box": "HLV274",
"arrivalTime": 1539930920347
},
"fields": {
"metricsTime": [
"2018-09-23T06:00:00.000Z"
],
"arrivalTime": [
"2018-10-19T06:35:20.347Z"
]
},
"sort": [
1539930920347
]
}

Check the list of Lucene Expressions to check what expressions are available for date field and how you could use them
Just for sake of simplicity, check the below query. I have created two fields metricsTime and arrivalTime in a sample index I've created.
Sample Document
POST mydateindex/mydocs/1
{
"metricsTime": "2018-09-23T06:00:00.000Z",
"arrivalTime": "2018-10-19T06:35:20.347Z"
}
Query using painless script
POST mydateindex/_search
{ "query": {
"bool": {
"must": {
"match_all": {
}
},
"filter": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference",
"lang" : "painless",
"params": {
"difference": 2
}
}
}
}
}
}
}
}
}
Note the below line in the query
"inline" : "doc['arrivalTime'].date.dayOfYear - doc['metricsTime'].date.dayOfYear > params.difference"
Now if you change the value of difference from 2 to 26 (which is one more than the difference in the dates) then you see that the above query would not return the document.
But nevertheless, I have mentioned the query as an example as how using scripting you can compare two different and please do refer to the link I've shared.

Querying Cosmos Nested JSON documents

I would like to turn this resultset
[
{
"Document": {
"JsonData": "{\"key\":\"value1\"}"
}
},
{
"Document": {
"JsonData": "{\"key\":\"value2\"}"
}
}
]
into this
[
{
"key": "value1"
},
{
"key": "value2"
}
]
I can get close by using a query like
select value c.Document.JsonData from c
however, I end up with
[
"{\"key\":\"value1\"}",
"{\"key\":\"value2\"}"
]
How can I cast each value to an individual JSON fragment using the SQL API?

As David Makogon said above, we need to transform such data within our app. We can do as below:
string data = "[{\"key\":\"value1\"},{\"key\":\"value2\"}]";
List<Object> t = JsonConvert.DeserializeObject<List<Object>>(data);
string jsonData = JsonConvert.SerializeObject(t);
Screenshot of result:

API Gateway and DynamoDB PutItem for String Set

I can't seem to find how to correctly call PutItem for a StringSet in DynamoDB through API Gateway. If I call it like I would for a List of Maps, then I get objects returned. Example data is below.
{
"eventId": "Lorem",
"eventName": "Lorem",
"companies": [
{
"companyId": "Lorem",
"companyName": "Lorem"
}
],
"eventTags": [
"Lorem",
"Lorem"
]
}
And my example template call for companies:
"companies" : {
"L": [
#foreach($elem in $inputRoot.companies) {
"M": {
"companyId": {
"S": "$elem.companyId"
},
"companyName": {
"S": "$elem.companyName"
}
}
} #if($foreach.hasNext),#end
#end
]
}
I've tried to call it with String Set listed, but it errors out still and tells me that "Start of structure or map found where not expected" or that serialization failed.
"eventTags" : {
"SS": [
#foreach($elem in $inputRoot.eventTags) {
"S":"$elem"
} #if($foreach.hasNext),#end
#end
]
}
What is the proper way to call PutItem for converting an array of strings to a String Set?

If you are using JavaScript AWS SDK, you can use document client API (docClient.createSet) to store the SET data type.
docClient.createSet - converts the array into SET data type
var docClient = new AWS.DynamoDB.DocumentClient();
var params = {
TableName:table,
Item:{
"yearkey": year,
"title": title
"product" : docClient.createSet(['milk','veg'])
}
};

Creating a custom tokenizer in ElasticSearch NEST

I have a custom class in ES 2.5 of the following:
Title
DataSources
Content
Running a search is fine, except with the middle field - it's built/indexed using a delimiter of '|'.
ex: "|4|7|8|9|10|12|14|19|20|21|22|23|29|30"
I need to build a query that matches some in all fields AND matches at least one number in the DataSource field.
So to summarize what I currently have:
QueryBase query = new SimpleQueryStringQuery
{
//DefaultOperator = !operatorOR ? Operator.And : Operator.Or,
Fields = LearnAboutFields.FULLTEXT,
Analyzer = "standard",
Query = searchWords.ToLower()
};
_boolQuery.Must = new QueryContainer[] {query};
That's the search words query.
foreach (var datasource in dataSources)
{
// Add DataSources with an OR
queryContainer |= new WildcardQuery { Field = LearnAboutFields.DATASOURCE, Value = string.Format("*{0}*", datasource) };
}
// Add this Boolean Clause to our outer clause with an AND
_boolQuery.Filter = new QueryContainer[] {queryContainer};
}
That's for the datasources query. There can be multiple datasources.
It doesn't work, and returns on results with the filter query added on. I think I need some work on the tokenizer/analyzer, but I don't know enough about ES to figure that out.
EDIT: Per Val's comments below I have attempted to recode the indexer like this:
_elasticClientWrapper.CreateIndex(_DataSource, i => i
.Mappings(ms => ms
.Map<LearnAboutContent>(m => m
.Properties(p => p
.String(s => s.Name(lac => lac.DataSources)
.Analyzer("classic_tokenizer")
.SearchAnalyzer("standard")))))
.Settings(s => s
.Analysis(an => an.Analyzers(a => a.Custom("classic_tokenizer", ca => ca.Tokenizer("classic"))))));
var indexResponse = _elasticClientWrapper.IndexMany(contentList);
It builds successfully, with data. However the query still isn't working right.
New query for DataSources:
foreach (var datasource in dataSources)
{
// Add DataSources with an OR
queryContainer |= new TermQuery {Field = LearnAboutFields.DATASOURCE, Value = datasource};
}
// Add this Boolean Clause to our outer clause with an AND
_boolQuery.Must = new QueryContainer[] {queryContainer};
And the JSON:
{"learnabout_index":{"aliases":{},"mappings":{"learnaboutcontent":{"properties":{"articleID":{"type":"string"},"content":{"type":"string"},"dataSources":{"type":"string","analyzer":"classic_tokenizer","search_analyzer":"standard"},"description":{"type":"string"},"fileName":{"type":"string"},"keywords":{"type":"string"},"linkURL":{"type":"string"},"title":{"type":"string"}}}},"settings":{"index":{"creation_date":"1483992041623","analysis":{"analyzer":{"classic_tokenizer":{"type":"custom","tokenizer":"classic"}}},"number_of_shards":"5","number_of_replicas":"1","uuid":"iZakEjBlRiGfNvaFn-yG-w","version":{"created":"2040099"}}},"warmers":{}}}
The Query JSON request:
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"simple_query_string": {
"fields": [
"_all"
],
"query": "\"housing\"",
"analyzer": "standard"
}
}
],
"filter": [
{
"terms": {
"DataSources": [
"1"
]
}
}
]
}
}
}

One way to achieve this is to create a custom analyzer with a classic tokenizer which will break your DataSources field into the numbers composing it, i.e. it will tokenize the field on each | character.
So when you create your index, you need to add this custom analyzer and then use it in your DataSources field:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"number_analyzer": {
"type": "custom",
"tokenizer": "number_tokenizer"
}
},
"tokenizer": {
"number_tokenizer": {
"type": "classic"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"DataSources": {
"type": "string",
"analyzer": "number_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
As a result, if you index the string "|4|7|8|9|10|12|14|19|20|21|22|23|29|30", you DataSources field will effectively contain the following array of token: [4, 7, 8, 9, 10, 12, 14, 191, 20, 21, 22, 23, 29, 30]
Then you can get rid of your WildcardQuery and simply use a TermsQuery instead:
terms = new TermsQuery {Field = LearnAboutFields.DATASOURCE, Terms = dataSources }
// Add this Boolean Clause to our outer clause with an AND
_boolQuery.Filter = new QueryContainer[] { terms };

At an initial glance at your code I think one problem you might have is that any queries placed within a filter clause will not be analysed. So basically the value will not be broken down into tokens and will be compared in its entirety.
It's easy to forget this so any values that require analysis need to be placed in the must or should clauses.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

ES Spring data add size into spring data format - spring-mvc

Related

Postman Schema Validation using TV4

ElasticSearch - difference between two date fields

Querying Cosmos Nested JSON documents

API Gateway and DynamoDB PutItem for String Set

Creating a custom tokenizer in ElasticSearch NEST

Categories

Resources