querying elasitcsearch parent child documents - parent-child

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents.
We define the index with the following command:
curl -XPOST 'localhost:9200/items' -d #itemsdef.json
where itemsdef.json has the following definition
{
"mappings" : {
"item" : {
"properties" : {
"id" : {"type" : "long" },
"name" : {
"type" : "string",
"_analyzer" : "textIndexAnalyzer"
},
"location" : {"type" : "geo_point" },
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"activityIndexAnalyzer" : {
"alias" : ["activityQueryAnalyzer"],
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textIndexAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textQueryAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
}
},
"filter" : {
"spanish_stop" : {
"type" : "stop",
"ignore_case" : true,
"enable_position_increments" : true,
"stopwords_path" : "analysis/spanish-stopwords.txt"
},
"spanish_synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/spanish-synonyms.txt"
},
"word_delimiter_impl" : {
"type" : "word_delimiter",
"generate_word_parts" : true,
"generate_number_parts" : true,
"catenate_words" : true,
"catenate_numbers" : true,
"split_on_case_change" : false
}
}
}
}
}
Then we add the child document definition using the following command:
curl -XPOST 'localhost:9200/items/slot/_mapping' -d #slotsdef.json
Where slotsdef.json has the following definition:
{
"slot" : {
"_parent" : {"type" : "item"},
"_routing" : {
"required" : true,
"path" : "parent_id"
},
"properties": {
"id" : { "type" : "long" },
"parent_id" : { "type" : "long" },
"activity" : {
"type" : "string",
"_analyzer" : "activityIndexAnalyzer"
},
"day" : { "type" : "integer" },
"start" : { "type" : "integer" },
"end" : { "type" : "integer" }
}
}
}
Finally we perform a bulk index with the following command:
curl -XPOST 'localhost:9200/items/_bulk' --data-binary #testbulk.json
Where testbulk.json holds the following data:
{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}
I'm trying to make the following query: search for all items within a certain distance to a location that have children (slots) in the specified days and within certain start and end ranges.
An item with more slots fulfilling the condition should score higher.
I tried starting with existing samples but the docs are really scarce and its hard to move forward.
Clues?

I don't think there is a way to write an efficient query that would do something like this without moving location to slots. You can do something like this, but it can quite inefficient for some data:
{
"query": {
"top_children" : {
"type": "blog_tag",
"query" : {
"constant_score" : {
"query" : {
... your query for children goes here ...
}
}
},
"score" : "sum",
"factor" : 5,
"incremental_factor" : 2
}
},
"filter": {
"geo_distance" : {
"distance" : "200km",
"location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
Basically, what this query is doing is this, it takes your range query or filter for children and whatever other conditions you need and wraps it into constant_score query to make sure that all children have score of 1.0. The top_children query collects all these children and accumulates their scores to the parents. And then filter filters out parents that are too far away.

Related

Converting a painless script into a visualisation on Kibana (Logs from AWS Connect)

I have logs being shipped from AWS Connect to Kibana through AWS OpenSearch. I have written the following script to return the latest status of an Agent like so:
GET agent-logs-*/_search
{
"script_fields": {
"data": {
"script": {
"lang": "painless",
"source": "params._source.CurrentAgentSnapshot.Configuration.Username + ', ' + params._source.CurrentAgentSnapshot.AgentStatus.Name + ', ' + params._source.EventTimestamp"
}
}
},
"collapse": {
"field": "CurrentAgentSnapshot.Configuration.Username.keyword"
},
"sort": [
{
"EventTimestamp": {
"order": "desc"
}
}
]
}
This returns a value of:
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 65,
"successful" : 65,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [
{
"_index" : "agent-logs-2022-06-28",
"_type" : "_doc",
"_id" : "",
"_score" : null,
"fields" : {
"data" : [
"al.pacino#email.com, Available, 2022-06-28T10:52:01.238Z"
],
"CurrentAgentSnapshot.Configuration.Username.keyword" : [
"al.pacino#email.com"
]
},
"sort" : [
1656413521238
]
},
{
"_index" : "agent-logs-2022-06-28",
"_type" : "_doc",
"_id" : "",
"_score" : null,
"fields" : {
"data" : [
"robert.deniro#email.com, Available, 2022-06-28T10:50:45.622Z"
],
"CurrentAgentSnapshot.Configuration.Username.keyword" : [
"robert.deniro#email.com"
]
},
"sort" : [
1656413445622
]
},
{
"_index" : "agent-logs-2022-06-26",
"_type" : "_doc",
"_id" : "",
"_score" : null,
"fields" : {
"data" : [
"marlon.brando#email.com, Offline, 2022-06-26T14:51:55.203Z"
],
"CurrentAgentSnapshot.Configuration.Username.keyword" : [
"marlon.brando#email.com"
]
},
"sort" : [
1656255115203
]
}
]
}
}
I wanted to take the data lines from the JSON - "al.pacino#email.com, Available, 2022-06-28T10:52:01.238Z" and represent this in a visualisation such as a Data Table to get a list of agents with their corresponding status.
By using the current agent-logs, there is a delay whereby the status change and heart beats overlap, causing an inaccurate count of the status, thus need to use this script.

Artifactory API - List all artifacts for a package version

What's the best way to list all assets for a given package via the Artifactory API?
I'm trying to write a script to get the assets for a package and I'd like for it to work with multiple repository types, like Maven and PyPI. I know I could use the Folder Info API to get what I need, but that relies on the repository layout, so it wouldn't work across repository types.
I'm currently using this AQL search:
curl -u user:password -X POST http://<artifactory_url>/artifactory/api/search/aql \
-H "Content-Type: text/plain" \
-d 'items.find({"repo": "libs-release-local"}, {"artifact.module.name": "com.foo.bar:fizz-buzz:1.2"})'
The response is almost what I want, but it seems to be including some assets from a different version of the package I'm searching for:
{
"results" : [ {
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.0",
"name" : "fizz-buzz-1.0.properties",
"type" : "file",
"size" : 790,
"created" : "2020-09-29T15:35:59.233Z",
"created_by" : "user",
"modified" : "2020-09-29T15:35:59.181Z",
"modified_by" : "user",
"updated" : "2020-09-29T15:35:59.233Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.1",
"name" : "fizz-buzz-1.1.properties",
"type" : "file",
"size" : 790,
"created" : "2020-09-29T15:42:34.982Z",
"created_by" : "user",
"modified" : "2020-09-29T15:42:34.931Z",
"modified_by" : "user",
"updated" : "2020-09-29T15:42:34.983Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.2",
"name" : "fizz-buzz-1.2-javadoc.jar",
"type" : "file",
"size" : 391843,
"created" : "2020-09-30T18:54:41.599Z",
"created_by" : "user",
"modified" : "2020-09-30T18:54:40.650Z",
"modified_by" : "user",
"updated" : "2020-09-30T18:54:41.600Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.2",
"name" : "fizz-buzz-1.2-sources.jar",
"type" : "file",
"size" : 1089,
"created" : "2020-09-30T18:54:41.764Z",
"created_by" : "user",
"modified" : "2020-09-30T18:54:41.710Z",
"modified_by" : "user",
"updated" : "2020-09-30T18:54:41.765Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.2",
"name" : "fizz-buzz-1.2.jar",
"type" : "file",
"size" : 1410,
"created" : "2020-09-30T18:54:41.902Z",
"created_by" : "user",
"modified" : "2020-09-30T18:54:41.844Z",
"modified_by" : "user",
"updated" : "2020-09-30T18:54:41.903Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.2",
"name" : "fizz-buzz-1.2.module",
"type" : "file",
"size" : 3481,
"created" : "2020-09-30T18:54:42.015Z",
"created_by" : "user",
"modified" : "2020-09-30T18:54:41.962Z",
"modified_by" : "user",
"updated" : "2020-09-30T18:54:42.015Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.2",
"name" : "fizz-buzz-1.2.pom",
"type" : "file",
"size" : 781,
"created" : "2020-09-30T18:54:42.238Z",
"created_by" : "user",
"modified" : "2020-09-30T18:54:42.190Z",
"modified_by" : "user",
"updated" : "2020-09-30T18:54:42.238Z"
},{
"repo" : "libs-release-local",
"path" : "com/foo/bar/fizz-buzz/1.2",
"name" : "fizz-buzz-1.2.properties",
"type" : "file",
"size" : 790,
"created" : "2020-09-30T18:54:42.124Z",
"created_by" : "user",
"modified" : "2020-09-30T18:54:42.078Z",
"modified_by" : "user",
"updated" : "2020-09-30T18:54:42.125Z"
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 8,
"total" : 8
}
}
Notice how it's including the properties file for fizz-buzz 1.0 and 1.1, even though I specified 1.2 in my search.
Is there a better way to get the information I'm looking for?
You can use the new GraphQL capability which was added in Artifactory 7.9.
This new capability allows you to query the rich metadata Artifactory holds about packages, version, artifacts and more using the GraphQL query language.
You can use the metadata REST API for queries. Please notice you need to use an admin access token for authentication. For example:
curl -H "Authorization: Bearer <Your Token>" -XPOST http://localhost:8082/metadata/api/v1/query -d '{"query":"..." }'
The following query, as an example, is fetching all the files which are part of versions 1.0* of a package named hello-world. This query will work for any type of package which can be managed in Artifactory.
query {
packages(
filter: {
name: "hello-world"
}
) {
edges {
node {
name
packageType
versions (filter: {name : "1.0*"}) {
name
repos {
name
leadFilePath
}
files {
name
}
}
}
}
}
}
The result would look something like
{
"data": {
"packages": {
"edges": [
{
"node": {
"name": "hello-world",
"packageType": "maven",
"versions": [
{
"name": "1.0-SNAPSHOT",
"repos": [
{
"name": "kotlin-local-snapshots",
"leadFilePath": "org/jetbrains/kotlin/hello-world/1.0-SNAPSHOT/hello-world-1.0-20171225.112927-1.pom"
}
],
"files": [
{
"name": "hello-world-1.0-20171225.112927-1.jar"
},
{
"name": "hello-world-1.0-20171225.112927-1.pom"
}
]
}
]
}
},
{
"node": {
"name": "hello-world",
"packageType": "maven",
"versions": [
{
"name": "1.0-SNAPSHOT",
"repos": [
{
"name": "kotlin-local-snapshots",
"leadFilePath": "org/jetbrains/kotlin/examples/hello-world/1.0-SNAPSHOT/hello-world-1.0-20171225.112138-1.pom"
}
],
"files": [
{
"name": "hello-world-1.0-20171225.112138-1.jar"
},
{
"name": "hello-world-1.0-20171225.112138-1.pom"
}
]
}
]
}
}
]
}
}
}
Try the below its using the path to only find artifacts that match com/foo/bar/fizz-buzz in the repo libs-release-local and then some jq at the end to make the output better. Noticed also the type : file which eliminates some noise in terms of metdata.
You will need to define or replace USER, API_KEY, and ARTIFACTORY_URL.
curl -su "${USER}:${API_KEY}" -X POST "${ARTIFACTORY_URL}/artifactory/api/search/aql" \
-H "content-type: text/plain" \
-d "items.find({\"type\" : \"file\",\"\$and\":[{\"path\" : {\"\$match\" : \"com/foo/bar/fizz-buzz*\"}, \"repo\" : {\"\$match\" : \"libs-release-local\"} }]}).include(\"name\",\"repo\",\"path\",\"size\").sort({\"\$desc\": [\"size\"]})" \
| jq -r "to_entries|map(\"\(.key)=\(.value|tostring)\")|.[]" | grep results | cut -f 2 -d = | jq .

Elasticsearch low indexing speed

I have a blog that contains 14k posts and tried to add these posts to the elastic search index.
I indexed some of the posts, but it's extremely slow, and it will take about 6 hours to estimate. All the performance optimization tips from the official site I made. In my opinion, I removed the redundant data such as post meta. Can I increase indexing speed? Add the index configuration below:
{
"test-post-1" : {
"aliases" : { },
"mappings" : {
"date_detection" : false,
"properties" : {
"ID" : {
"type" : "long"
},
"guid" : {
"type" : "keyword"
},
"menu_order" : {
"type" : "long"
},
"permalink" : {
"type" : "keyword"
},
"post_content" : {
"type" : "text"
},
"post_date" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"post_excerpt" : {
"type" : "text"
},
"post_id" : {
"type" : "long"
},
"post_mime_type" : {
"type" : "keyword"
},
"post_modified" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"post_name" : {
"type" : "text",
"fields" : {
"post_name" : {
"type" : "text"
},
"raw" : {
"type" : "keyword",
"ignore_above" : 10922
}
}
},
"post_parent" : {
"type" : "long"
},
"post_status" : {
"type" : "keyword"
},
"post_title" : {
"type" : "text",
"fields" : {
"post_title" : {
"type" : "text",
"analyzer" : "standard"
},
"raw" : {
"type" : "keyword",
"ignore_above" : 10922
},
"sortable" : {
"type" : "keyword",
"ignore_above" : 10922,
"normalizer" : "lowerasciinormalizer"
}
}
},
"post_type" : {
"type" : "text",
"fields" : {
"post_type" : {
"type" : "text"
},
"raw" : {
"type" : "keyword"
}
}
}
}
},
"settings" : {
"index" : {
"mapping" : {
"total_fields" : {
"limit" : "5000"
},
"ignore_malformed" : "true"
},
"number_of_shards" : "1",
"provided_name" : "test-post-1",
"max_shingle_diff" : "8",
"max_result_window" : "1000000",
"creation_date" : "1582745447768",
"analysis" : {
"filter" : {
"shingle_filter" : {
"max_shingle_size" : "5",
"min_shingle_size" : "2",
"type" : "shingle"
},
"edge_ngram" : {
"min_gram" : "3",
"side" : "front",
"type" : "edgeNGram",
"max_gram" : "10"
},
"ewp_word_delimiter" : {
"type" : "word_delimiter",
"preserve_original" : "true"
},
"ewp_snowball" : {
"type" : "snowball",
"language" : "russian"
}
},
"normalizer" : {
"lowerasciinormalizer" : {
"filter" : [
"lowercase",
"asciifolding"
],
"type" : "custom"
}
},
"analyzer" : {
"ewp_lowercase" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "keyword"
},
"shingle_analyzer" : {
"filter" : [
"lowercase",
"shingle_filter"
],
"type" : "custom",
"tokenizer" : "standard"
},
"default" : {
"filter" : [
"ewp_word_delimiter",
"lowercase",
"stop",
"ewp_snowball"
],
"char_filter" : [
"html_strip"
],
"language" : "russian",
"tokenizer" : "standard"
}
}
},
"number_of_replicas" : "1",
"uuid" : "cWGjSF4FQ1Or0A_0oSlA2g",
"version" : {
"created" : "7050299"
}
}
}
}
}
Wordpress version: 5.3.2
Elasticsearch version: 7.5.2
Enabled plugins: ElasticPress

How to set .indexOn in firebase?

This is my json structure
{
"books" : {
"sample" : {
"eight" : {
"author" : "eighta",
"name" : "eight",
"sub" : {
"subauthor" : "eightauthor",
"subname" : "general"
}
},
"eleven" : {
"author" : "twelvea",
"name" : "twelve",
"sub" : {
"subauthor" : "elevenauthor",
"subname" : "general"
}
},
"five" : {
"author" : "fivea",
"name" : "five",
"sub" : {
"subauthor" : "fiveauthor",
"subname" : "fivesub"
}
},
"four" : {
"author" : "foura",
"name" : "four",
"sub" : {
"subauthor" : "fourauthor",
"subname" : "general"
}
},
"nine" : {
"author" : "ninea",
"name" : "nine",
"sub" : {
"subauthor" : "nineauthor",
"subname" : "ninesub"
}
},
"one" : {
"author" : "onea",
"name" : "one",
"sub" : {
"subauthor" : "oneauthor",
"subname" : "onesub"
}
},
"seven" : {
"author" : "seven",
"name" : "seven"
},
"six" : {
"author" : "sixa",
"name" : "six"
},
"ten" : {
"author" : "tena",
"name" : "ten"
},
"three" : {
"author" : "threea",
"name" : "three"
},
"two" : {
"author" : "twoa",
"name" : "two"
}
}
}
}
I want to fetch data which are having subname equal to general
My index rules
{
/* Visit https://firebase.google.com/docs/database/security to learn more about security rules. */
"rules": {
".read": true,
".write": true,
"books": {
"$user_id": {
".indexOn": ["subname", "subauthor"]
}
}
}
}
}
https://myprojectpath/books/sample/eight.json?orderBy="subname"&equalTo="general"&print=pretty
Above rule is working fine. But I need to pass generic api to fetch the data where subname should be general. I cannot pass eight.json, nine.json, ten.json each time when I call. I should call only one api where it should provide the data where subname should be general.
If I understand you correctly you want to use a single query to search across all authors for their sub/subname property.
In that case you can define an index on books/sample for the sub/subname property of each child node:
"books": {
"sample": {
".indexOn": ["sub/subname", "sub/subauthor"]
}
}
The sample could be a $ variable here (such a $user_id), but the paths in .indexOn have to be known.
This creates an index under sample with the value of sub/subname for each child node, which you can then query with:
https://myprojectpath/books/sample.json?orderBy="sub/subname"&equalTo="general"&print=pretty

Can't write to Firebase database using rules

I'm trying to create a rule that allows some users to write but not all.
I need that all user can read 'menu' items but only users listed at store data can write.
My data structure:
{
"category" : [ null, "Burger", "Drinks" ],
"menu" : [ null, {
"available" : true,
"category" : "1",
"description" : "item1 description",
"image" : "chicken_maharaja",
"name" : "New Chicken Maharaja",
"price" : 1300,
"store" : 1
}, {
"available" : true,
"category" : "1",
"description" : "item2 description",
"image" : "big_spicy_chicken_wrap",
"name" : "Big Spicy Chicken Wrap",
"price" : 120,
"store" : 1
}, {
"available" : true,
"category" : "2",
"description" : "item3 description",
"image" : "thumsup",
"name" : "Thumsup 100ml",
"price" : 40,
"store" : 1
}, {
"available" : true,
"category" : "2",
"description" : "item4 description",
"image" : "mccafe_ice_coffee",
"name" : "Ice Coffee",
"price" : 140,
"store" : 1
}, {
"available" : true,
"category" : "1",
"description" : "item5 description",
"image" : "mc_chicken",
"name" : "MC Chicken",
"price" : 190,
"store" : 1
}, {
"available" : true,
"category" : "2",
"description" : "item6 description",
"image" : "Smoothie",
"name" : "Smoothie",
"price" : 70,
"store" : 2
}, {
"available" : true,
"category" : "1",
"description" : "item8 description",
"image" : "salad_wrap",
"name" : "Salad Wrap",
"price" : 150,
"store" : 2
} ],
"stores" : [ null, {
"location" : "Campinas - Taquaral",
"name" : "Store 1",
"user" : {
"pyixsRTw9qdiuESt62YnmEYXQt13" : true
}
}, {
"location" : "São Paulo - Perdises",
"name" : "Store 2",
"user" : {
"LBNZ8Dwp2rdJtlSh0NC1ApdtbAl2" : true,
"TLomOgrd3gbjDdpDAqGiwl0lBhn2" : true
}
} ],
"userProfile" : {
"LBNZ8Dwp2rdJtlSh0NC1ApdtbAl2" : {
"birthDate" : "1974-02-10",
"email" : "asd#asd.com",
"firstName" : "João",
"lastName" : "Silva"
},
"pyixsRTw9qdiuESt62YnmEYXQt13" : {
"birthDate" : "1974-02-10",
"email" : "leandro.garcias#gmail.com",
"firstName" : "Leandro",
"lastName" : "Garcia"
}
}
}
My rule:
{
"rules": {
"menu": {
"$items": {
".read": "true",
".write": "root.child('stores').child('1').child(data.child('user').val()).hasChild(auth.uid)"
}
},
"stores": {
"$store": {
".read": "true",
".write": "root.child('stores').child('$store').child(data.child('user').val()).hasChild(auth.uid)"
}
}
}
}
The read is ok. :-) But I can't write.
Your newData doesn't have a child user so that check always fails. You probably mean:
"43268522": {
"menu": {
"$items": {
".read": "true",
".write": "root.child('stores').child('1').child('user').hasChild(auth.uid)"
}
}
You're probably looking for this rule:
".write": "
root.child('stores')
.child(newData.child('store').val())
.child('user')
.hasChild(auth.uid)"
So this uses the store property from the new data to look up if the current user is in the store they're trying to modify.
Unfortunately this rule won't work with your current data structure, since the value of store is a number, while the key of a store is a string: "1" !== 1.
The simplest solution is to store the store as a string, e.g.:
"store": "1"
You might want to consider that anyway, since you're now getting Firebase's array coercion, which is not helpful. For more on this see our blog post on Best Practices: Arrays in Firebase. I'd recommend storing stores using either push IDs, or simply prefixing them, e.g.
"stores": {
"store1": {
...
}
}

Resources