RethinkDB filtering Object Array - r

I'm new to rethinkdb and I wanted to filter something like... get all with Kiwi or Strawberry as preferred fruit
{
"id": "65dbaa34-f7d5-4a25-b01f-682032fc6e05" ,
"fruits": {
"favorite": "Mango" ,
"preferred": [
"Kiwi" ,
"Watermelon"
]
}
}
I tried something like this after reading contains doc:
r.db('appname').table('food')
.filter(r.row('fruits').contains(function(doc) {
return doc('preferred').contains('Kiwi');
}))
And I'm getting a e: Cannot convert OBJECT to SEQUENCE in: error.

This is what you're looking for:
r.db('appname').table('food')
.filter((row) => {
r.or( // Returns true if any of the following are true
row('fruits')('preferred').contains('Kiwi'),
row('fruits')('preferred').contains('Strawberry')
)
});
You should know as well, that you can create your own index that calculates this for you, then you'd be able to do a .getAll query using your custom index and return all documents that fit this constraint very quickly.
Lastly, for something that would also work but is probably less efficient on large arrays:
r.db("appname").table('food')
.filter((row) => {
return row('fruits')('preferred').setIntersection(['Kiwi', 'Strawberry']).count().gt(0)
})

Related

Need to compare array in Marklogic with xquery

I need to compare array in MarkLogic with Xquery .
Query parameters:
{
"list": {
"bookNo": 13,
"BookArray":[20,21,22,23,24,25]
}
}
Sample Data:
{
"no":01'
"arrayList"[20,25]
}
{
"no":02'
"arrayList"[20,27]
}
{
"no":03'
"arrayList"[20,23,25]
}
Output:
"no":01
"no":03
I need to return "no" where all values from arraylist should be match with bookArray.
Ok. You do not explain if the actual data is in the system or not. So I did an example as if it is all in memory.
I chose to keep the sample in the MarkLogic JSON representation which has some oddities like number-nodes and array-nodes under the hood. To make it more readable if you dig into it, i used fn:data() to get less verbose. In all reality, if this was an in-memory operation and I could not use Javascript, then I would have converted the JSON structures to maps.
Here is a sample to help you explore. I cleaned up the JSON to be valid and for my sample wrapped the three samples in a single array.
xquery version "1.0-ml";
let $param-as-json := xdmp:unquote('{
"list": {
"bookNo": 13,
"BookArray":[20,21,22,23,24,25]
}
}')
let $list-as-json := xdmp:unquote('[
{
"no":"01",
"arrayList":[20,25]
},
{
"no":"02",
"arrayList":[20,27]
},
{
"no":"03",
"arrayList":[20,23,25]
}
]')
let $my-list := fn:data($param-as-json//BookArray)
return for $item in $list-as-json/*
let $local-list := fn:data($item//arrayList)
let $intersection := fn:data($item//arrayList)[.=$my-list]
where fn:deep-equal($intersection, $local-list)
return $item/no
Result:
01
03

Weaviate: using near_text with the exact property doesn't return a distance of 0

Here's a minimal example:
import weaviate
CLASS = "Superhero"
PROP = "superhero_name"
client = weaviate.Client("http://localhost:8080")
class_obj = {
"class": CLASS,
"properties": [
{
"name": PROP,
"dataType": ["string"],
"moduleConfig": {
"text2vec-transformers": {
"vectorizePropertyName": False,
}
},
}
],
"moduleConfig": {
"text2vec-transformers": {
"vectorizeClassName": False
}
}
}
client.schema.delete_all()
client.schema.create_class(class_obj)
batman_id = client.data_object.create({PROP: "Batman"}, CLASS)
by_text = (
client.query.get(CLASS, [PROP])
.with_additional(["distance", "id"])
.with_near_text({"concepts": ["Batman"]})
.do()
)
print(by_text)
batman_vector = client.data_object.get(
uuid=batman_id, with_vector=True, class_name=CLASS
)["vector"]
by_vector = (
client.query.get(CLASS, [PROP])
.with_additional(["distance", "id"])
.with_near_vector({"vector": batman_vector})
.do()
)
print(by_vector)
Please note that I specified both "vectorizePropertyName": False and "vectorizeClassName": False
The code above returns:
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 0.08034378, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 1.1920929e-07, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
If I look up the exact vector I get 'distance': 1.1920929e-07, which I guess is actually 0 (for some floating point evil magic), as expected.
But if I use near_text to search for the exact property, I get a distance > 0.
This is leading me to believe that, when using near_text, the embedding is somehow different.
My question is:
Why does this happen?
With two corollaries:
Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
Is there a way to check the embedding created during the near_text search?
here is some information that may help:
Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
Yes, this value 1.1920929e-07 should be interpreted as 0. I think there are some unfortunate float32/64 conversions going on that need to be rooted out.
Is there a way to check the embedding created during the near_text search?
The embeddings are either imported or generated during object creation, not at search-time. So performing multiple queries on an unchanged object will utilize the same search vector.
We are looking into both of these issues.

How to add a DynamoDB global secondary Index via Python/Boto3

Is it possible to add a Global Secondary Index to and existing DynamoDB table AFTER it has been created? I am using Python 3.x with Boto3 and have not been able to find any examples of them being added to the table after it was created.
In general, yes it is possible to add a Global Secondary Index (GSI) after the table is created.
However, it can take a long time for the change to come into effect, because building the GSI requires a table scan.
In the case of boto3, have a look at the documentation for update_table
For example, you try something like this:
response = client.update_table(
TableName = 'YourTableName',
# ...snip...
GlobalSecondaryIndexUpdates=[
{
'Create': {
'IndexName': 'YourGSIName',
'KeySchema': [
{
'AttributeName': 'YourGSIFieldName',
'KeyType': 'HASH'
}
],
'Projection': {
'ProjectionType': 'ALL'
},
'ProvisionedThroughput': {
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1
}
}
}
],
# ...snip...
)

Query to get exact matches of Elastic Field with multile values in Array

I want to write a query in Elastic that applies a filter based on values i have in an array (in my R program). Essentially the query:
Matches a time range (time field in Elastic)
Matches "trackId" field in Elastic to any value in array oth_usr
Return 2 fields - "trackId", "propertyId"
I have the following primitive version of the query but do not know how to use the oth_usr array in a query (part 2 above).
query <- sprintf('{"query":{"range":{"time":{"gte":"%s","lte":"%s"}}}}',start_date,end_date)
view_list <- elastic::Search(index = "organised_recent",type = "PROPERTY_VIEW",size = 10000000,
body=query, fields = c("trackId", "propertyId"))$hits$hits
You need to add a terms query and embed it as well as the range one into a bool/must query. Try updating your query like this:
terms <- paste(sprintf("\"%s\"", oth_usr), collapse=", ")
query <- sprintf('{"query":{"bool":{"must":[{"terms": {"trackId": [%s]}},{"range": {"time": {"gte": "%s","lte": "%s"}}}]}}}',terms,start_date,end_date)
I'm not fluent in R syntax, but this is raw JSON query that works.
It checks whether your time field matches given range (start_time and end_time) and whether one of your terms exact matches trackId.
It returns only trackId, propertyId fields, as per your request:
POST /indice/_search
{
"_source": {
"include": [
"trackId",
"propertyId"
]
},
"query": {
"bool": {
"must": [
{
"range": {
"time": {
"gte": "start_time",
"lte": "end_time"
}
}
},
{
"terms": {
"trackId": [
"terms"
]
}
}
]
}
}
}

Meteor.find multiple values in an array

I am using auto schema to define an array field. I need to find documents where multiple specific values are contained in that array. I know I can use the $in: operator while $in: can only match either one of the value in the first array against the second array while I would need to match any record that have all value in the first array. How I can achieve this?
Schema Definition
Demands = new Mongo.Collection("demands");
var demandschema = new SimpleSchema({
ability: {type:array},
language: {type: array}});
Demands.attachSchema(demandschema);
Contents Definition
DemandsSet=[
{ability: ["laser eye", "rocky skin", "fly"], language: ["english", "latin", "hindu"]},
{ability: ["sky-high jump", "rocky skin", "fly"], language: ["english", "latin", "japanese"]},
{ability: ["rocky skin", "sky-high jump"], language: ["english", "latin", "russian"]}
];
Target Set
var TargetAbility = ["rocky skin", "fly"];
var TargetLanguage = ["english", "hindu"];
When I do a $in operation
Demands.find({ $and: [
{ ability: { $in: TargetAbility }},
{ language: { $in: TargetLanguage }}
]}).fetch();
I will return me with all records, while it is not correct, how can I perform such a find operation?
$in: is not going to work for you because it looks for any match when comparing two arrays, not that all elements of one array must be present in the other.
You can write complete javascript functions to execute the required comparisons inside the mongodb query. See $where:
For example:
Demands.find({$where:
"this.ability.indexOf(TargetAbility[0]) > -1 &&
this.ability.indexOf(TargetAbility[1]) > -1 &&
this.language.indexOf(TargetLanguage[0]) > -1 &&
this.language.indexOf(TargetLanguage[1]) > -1" });
If your candidates have other than 2 entries each then you can write a more general form of this of course.
Note that Meteor apparently does not support the function() form of $where: but that restriction may be dated.
Also note that $where: cannot take advantage of indexes so performance may not be suitable for large collections.

Resources