Weaviate: using near_text with the exact property doesn't return a distance of 0 - weaviate

Here's a minimal example:
import weaviate
CLASS = "Superhero"
PROP = "superhero_name"
client = weaviate.Client("http://localhost:8080")
class_obj = {
"class": CLASS,
"properties": [
{
"name": PROP,
"dataType": ["string"],
"moduleConfig": {
"text2vec-transformers": {
"vectorizePropertyName": False,
}
},
}
],
"moduleConfig": {
"text2vec-transformers": {
"vectorizeClassName": False
}
}
}
client.schema.delete_all()
client.schema.create_class(class_obj)
batman_id = client.data_object.create({PROP: "Batman"}, CLASS)
by_text = (
client.query.get(CLASS, [PROP])
.with_additional(["distance", "id"])
.with_near_text({"concepts": ["Batman"]})
.do()
)
print(by_text)
batman_vector = client.data_object.get(
uuid=batman_id, with_vector=True, class_name=CLASS
)["vector"]
by_vector = (
client.query.get(CLASS, [PROP])
.with_additional(["distance", "id"])
.with_near_vector({"vector": batman_vector})
.do()
)
print(by_vector)
Please note that I specified both "vectorizePropertyName": False and "vectorizeClassName": False
The code above returns:
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 0.08034378, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
{'data': {'Get': {'Superhero': [{'_additional': {'distance': 1.1920929e-07, 'id': '05fbd0cb-e79c-4ff2-850d-80c861cd1509'}, 'superhero_name': 'Batman'}]}}}
If I look up the exact vector I get 'distance': 1.1920929e-07, which I guess is actually 0 (for some floating point evil magic), as expected.
But if I use near_text to search for the exact property, I get a distance > 0.
This is leading me to believe that, when using near_text, the embedding is somehow different.
My question is:
Why does this happen?
With two corollaries:
Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
Is there a way to check the embedding created during the near_text search?

here is some information that may help:
Is 1.1920929e-07 actually 0 or do I need to read something deeper into that?
Yes, this value 1.1920929e-07 should be interpreted as 0. I think there are some unfortunate float32/64 conversions going on that need to be rooted out.
Is there a way to check the embedding created during the near_text search?
The embeddings are either imported or generated during object creation, not at search-time. So performing multiple queries on an unchanged object will utilize the same search vector.
We are looking into both of these issues.

Related

Need to compare array in Marklogic with xquery

I need to compare array in MarkLogic with Xquery .
Query parameters:
{
"list": {
"bookNo": 13,
"BookArray":[20,21,22,23,24,25]
}
}
Sample Data:
{
"no":01'
"arrayList"[20,25]
}
{
"no":02'
"arrayList"[20,27]
}
{
"no":03'
"arrayList"[20,23,25]
}
Output:
"no":01
"no":03
I need to return "no" where all values from arraylist should be match with bookArray.
Ok. You do not explain if the actual data is in the system or not. So I did an example as if it is all in memory.
I chose to keep the sample in the MarkLogic JSON representation which has some oddities like number-nodes and array-nodes under the hood. To make it more readable if you dig into it, i used fn:data() to get less verbose. In all reality, if this was an in-memory operation and I could not use Javascript, then I would have converted the JSON structures to maps.
Here is a sample to help you explore. I cleaned up the JSON to be valid and for my sample wrapped the three samples in a single array.
xquery version "1.0-ml";
let $param-as-json := xdmp:unquote('{
"list": {
"bookNo": 13,
"BookArray":[20,21,22,23,24,25]
}
}')
let $list-as-json := xdmp:unquote('[
{
"no":"01",
"arrayList":[20,25]
},
{
"no":"02",
"arrayList":[20,27]
},
{
"no":"03",
"arrayList":[20,23,25]
}
]')
let $my-list := fn:data($param-as-json//BookArray)
return for $item in $list-as-json/*
let $local-list := fn:data($item//arrayList)
let $intersection := fn:data($item//arrayList)[.=$my-list]
where fn:deep-equal($intersection, $local-list)
return $item/no
Result:
01
03

firebase realtime database query to find data

I wish to store data for some children activities where each activity is good for certain age range. Let's say act A is good for 2 - 5 year old. act B is good for 0 -1 year old.
On the client side, there is a fixed set of choices like:
0 - 1 years,
1 - 3 years,
4 - 5 years,
6 - 13 years
Now the requirement is that the activity A should come up for selection 1 - 3 as well as 4 -5 years as 2 - 5 overlaps both the ranges.
What would be the good way to store activity data and then query it efficiently ?
Assuming the fixed set of choices is a permanent feature to your application, I'd have a boolean field for each match, for example, your activities would look like:
activities: {
activityA: {
range0to1: false,
range2to3: true,
range4to5: true,
range6to13: false
},
activityB: {
range0to1: true,
range2to3: false,
range4to5: false,
range6to13: false
}
}
And then when you want to query all activities which apply for eg. ages 2 to 3, then you already have the field to query with nothing too complicated.
But really for longevity, I wouldn't assume that the fixed set of choices is permanent for the lifetime of a an app, in which case I'd rather have something like:
activities: {
activityA: {
minAge: 2,
maxAge: 5,
},
activityB: {
minAge: 0,
maxAge: 1,
}
}
...and then if I want to query for the fixed choice of ages between x and y, my ideal query would be for all activities where either minAge or maxAge are between x and y (hence there's an overlap in the range)
eg (pseudocode) where ((minAge > x and minAge < y) or (maxAge > x or maxAge < y))
But unfortunately, in practice, firebase RTDB doesn't let you query by multiple fields, so if it's not too late, I'd recommend looking at Firestore which may be better suited for your needs (personally I think I'd typically recommend firestore over RTDB for most use-cases).
If you are stuck with RTDB, then another solution might be to create a lookup block at the root of your structure:
{
activities: {
activityA: {
// age range of 2-5 stored however you like
},
activityB: {
// age range of 0-1 stored however you like
},
activityC: {
// age range of 0-3 stored however you like
}
},
ageActivityLookup: {
age0: {
activityB: true,
activityC: true,
},
age1: {
activityB: true,
activityC: true,
},
age2: {
activityA: true,
activityC: true,
},
age3: {
activityA: true,
activityC: true,
},
age4: {
activityA: true,
},
age5: {
activityA: true,
}
}
}
So then you can simply query ageX and get your list of activities. This will mean multiple queries if you're looking for a range of ages, and does mean having to ensure your lookup block stays in sync. This should be OK if the rest of your application data structure isn't too complex.
#hussein as an inspiration from your idea i simplified it a bit to adjust to my usecase. And instead of a separate node i actually added each age group classification within the activity like:
baby:true
teen:true
and so on.
This saves from overhead of maintaining and updating an entire node with increasing complexity asactivities grow

Golang syntax in "if" statement with a map

I am reading a tutorial here: http://www.newthinktank.com/2015/02/go-programming-tutorial/
On the "Maps in Maps" section it has:
package main
import "fmt"
func main() {
// We can store multiple items in a map as well
superhero := map[string]map[string]string{
"Superman": map[string]string{
"realname":"Clark Kent",
"city":"Metropolis",
},
"Batman": map[string]string{
"realname":"Bruce Wayne",
"city":"Gotham City",
},
}
// We can output data where the key matches Superman
if temp, hero := superhero["Superman"]; hero {
fmt.Println(temp["realname"], temp["city"])
}
}
I don't understand the "if" statement. Can someone walk me through the syntax on this line:
if temp, hero := superhero["Superman"]; hero {
Like if temp seems nonsensical to an outsider as temp isn't even defined anywhere. What would that even accomplish? Then hero := superhero["Superman"] looks like an assignment. But what is the semicolon doing? why is the final hero there?
Can someone help a newbie out?
Many thanks.
A two-value assignment tests for the existence of a key:
i, ok := m["route"]
In this statement, the first value (i) is assigned the value stored
under the key "route". If that key doesn't exist, i is the value
type's zero value (0). The second value (ok) is a bool that is true if
the key exists in the map, and false if not.
This check is basically used when we are not confirmed about the data inside the map. So we just check for a particular key and if it exists we assign the value to variable. It is a O(1) check.
In your example try to search for a key inside the map which does not exists as:
package main
import "fmt"
func main() {
// We can store multiple items in a map as well
superhero := map[string]map[string]string{
"Superman": map[string]string{
"realname": "Clark Kent",
"city": "Metropolis",
},
"Batman": map[string]string{
"realname": "Bruce Wayne",
"city": "Gotham City",
},
}
// We can output data where the key matches Superman
if temp, hero := superhero["Superman"]; hero {
fmt.Println(temp["realname"], temp["city"])
}
// try to search for a key which doesnot exist
if value, ok := superhero["Hulk"]; ok {
fmt.Println(value)
} else {
fmt.Println("key not found")
}
}
Playground Example
if temp, hero := superhero["Superman"]; hero
in go is similar to writing:
temp, hero := superhero["Superman"]
if hero {
....
}
Here is "Superman" is mapped to a value, hero will be true
else false
In go every query to a map will return an optional second argument which will tell if a certain key is present or not
https://play.golang.org/p/Hl7MajLJV3T
It's more normal to use ok for the boolean variable name. This is equivalent to:
temp, ok := superhero["Superman"]
if ok {
fmt.Println(temp["realname"], temp["city"])
}
The ok is true if there was a key in the map. So there are two forms of map access built into the language, and two forms of this statement. Personally I think this slightly more verbose form with one more line of code is much clearer, but you can use either.So the other form would be:
if temp, ok := superhero["Superman"]; ok {
fmt.Println(temp["realname"], temp["city"])
}
As above. For more see effective go here:
For obvious reasons this is called the “comma ok” idiom. In this
example, if the key is present, the value will be set appropriately and ok
will be true; if not, the value will be set to zero and ok will be
false.
The two forms for accessing maps are:
// value and ok set if key is present, else ok is false
value, ok := map[key]
// value set if key is present
value := map[key]

Can I just style decimals of a number in Angular?

I am trying to style just the decimals to look just like this:
Didn't had success, I guess that I need to make my own filter, tried but didn't had success either, I guess it is because I am using it inside a state.
Here the code I am using for the number:
<h2><sup>$</sup>{{salary | number:0}}<sub>.00</sub></h2>
Inside the .app iam using this scope:
$scope.salary = 9000;
Thing is, number can be whatever the user salary is, it get the number from an input, in other places I have more numbers with decimals too.
Possible solutions:
Extract only the decimals from value and print them inside de
tag.
Use a filter to do this?
Use a directive that will split the amount and generate the proper HTML. For example:
app.directive('salary', function(){
return {
restrict: 'E'
, scope: {
salary: '#'
}
, controller: controller
, controllerAs: 'dvm'
, bindToController: true
, template: '<h2><sup>$</sup>{{ dvm.dollar }}<sub>.{{ dvm.cents }}</sub></h2>'
};
function controller(){
var parts = parseFloat(this.salary).toFixed(2).split(/\./);
this.dollar = parts[0];
this.cents = parts[1];
}
});
The easiest solution would be to split out the number into it's decimal portion and the whole number portion:
var number = 90000.99111;
console.log(number % 1);
Use this in your controller, and split your scope variable into an object:
$scope.salary = {
whole: salary,
decimal: salary % 1
}
Protip: Using an object like this is better than using two scope variables for performance

RethinkDB filtering Object Array

I'm new to rethinkdb and I wanted to filter something like... get all with Kiwi or Strawberry as preferred fruit
{
"id": "65dbaa34-f7d5-4a25-b01f-682032fc6e05" ,
"fruits": {
"favorite": "Mango" ,
"preferred": [
"Kiwi" ,
"Watermelon"
]
}
}
I tried something like this after reading contains doc:
r.db('appname').table('food')
.filter(r.row('fruits').contains(function(doc) {
return doc('preferred').contains('Kiwi');
}))
And I'm getting a e: Cannot convert OBJECT to SEQUENCE in: error.
This is what you're looking for:
r.db('appname').table('food')
.filter((row) => {
r.or( // Returns true if any of the following are true
row('fruits')('preferred').contains('Kiwi'),
row('fruits')('preferred').contains('Strawberry')
)
});
You should know as well, that you can create your own index that calculates this for you, then you'd be able to do a .getAll query using your custom index and return all documents that fit this constraint very quickly.
Lastly, for something that would also work but is probably less efficient on large arrays:
r.db("appname").table('food')
.filter((row) => {
return row('fruits')('preferred').setIntersection(['Kiwi', 'Strawberry']).count().gt(0)
})

Resources