weaviate aggregate by reference property - weaviate

I want to build an aggregate query on my data.
I have Patents class that have references of Paragraphs classes (paragraphs that have vectorized text),
I want to count patents for each catagory (property of patent) that are near vector.
in psuedo SQL:
select (count distinct Patent)
from myweaviate
where Paragraph.nearVector(vector, certainty=0.9)
group by catagory
I tried using something like (which is also bad even if it worked because it counts paragraphs):
result = (client.query.aggregate("Paragraph") \
.with_group_by_filter(["inPatent{... on Patent{publicationID}"]) \
.with_fields('meta { count }') \
.with_fields('groupedBy {value}') \
.with_near_vector({'vector': vector, 'certainty': 0.8}) \
.do())
and getting:
{'data': {'Aggregate': {'Paragraph': None}}, 'errors': [{'locations': [{'column': 12, 'line': 1}], 'message': "could not extract groupBy path: Expected a valid property name in 'path' field for the filter, but got 'inPatent{... on Patent{publicationID}'", 'path': ['Aggregate', 'Paragraph']}]}
I couldn't find any source in the docs or in the internet to do something like that, (aka use aggregate on reference property),
additionally, doing a count distinct (but in this case the Patent class is distinct of course)
can anyone help?

unfortunately it is not possible to do grouping by cross-references. The error in your case means that you did not construct a valid path, that is because the path needs to be a list where each item is a valid configuration, i.e. the path should be like this: path: ["inPatent", "Patent", "publicationID"]. It goes property -> class name -> property -> class name -> ... til your desired field. Currently Weaviate does not support Aggregate.groupBy with cross references, if you run your query again with the correct path you should get something like this:
"message": "shard 9wKKa18SJOiM: identify groups: grouping by cross-refs not supported"
Note that it is possible to use the cross reference property as your groupBy path (since you want to Aggregate on the Patent ID, it means that the UUID (and beacon) of the Patent object are unique has a one-to-one mapping to the publicationID ), and it should look like this:
result = (client.query.aggregate("Paragraph") \
.with_group_by_filter(["inPatent"]) \
.with_fields('meta { count }') \
.with_fields('groupedBy {value}') \
.with_near_vector({'vector': vector, 'certainty': 0.8}) \
.do())

Related

How do I access a calculated field in Rails?

In my first attempt to develop something in Ruby on Rails :) ... I have a list of names stored in fields "first_name" and "last_name". In my Person model, I have defined something like this:
def sort_name
sort_name = last_name + ',' + first_name
end
Now I want to show all persons shown in a list, sorted by sort_name, but (in my controller) something like
#persons = Person.order(:sort_name)
doesn't work (Unknown column 'sort_name' in 'order clause'). How do reference to the calculated field sort_name in my controller?
I am sure this is a "oh my god I am so stupid moment" but happy for any advise!
If the model Person has the fields name, first_lastname and second_lastname, you can do the next:
Person.order(:name, :first_lastname, :second_lastname)
By default is ordering in ascending way. Also you can put if you want ascending or descending way for each field:
Person.order(name: :asc, first_lastname: :desc, second_lastname: :asc)
Additional if you want add a column with the complete name, you can use select, using postgresql the code would be:
people = Person.order(
name: :asc, first_lastname: :desc, second_lastname: :asc
).select(
"*, concat(name,' ', first_lastname, ' ',second_lastname) as sort_name"
)
people[0].sort_name
# the sort_name can be for example "Adán Saucedo Salas"

Use dictionary keys and associated values as wildcards in snakemake

I have a great number of analyses that need to be done in one go, and thus I thought that I can make a dictionary and parse the keys and values as wildcards (every snakemake run needs two wildcards to be used).
My dict will look like this:
myDict= {
"Apple": ["fruity","red","green"]
"Banana": ["fruity,"yellow"]
}
Here the first key in the dictionary will be wildcard1, here {Apple}, with the first value as wildcard2, here {fruity}, and run snakemake with these two until the final rule is has been run.
Then the same key will again be used ({Apple} as wildcard1) with the second associated value, here {red}, as wildcard2, and run snakemake until the last rule has been run.
Then after the final value belonging to {Apple} has been used as wildcard2, switch over to {Banana} as wildcard1 with its first value, {fruity} as wildcard2.
This will go on until all keys and their associated values have been used as wildcards and snakemake will stop. (That is keys as wildcard1, and their values as wildcard2).
My question is if this is possible, and if so, how can I achieve that?
I bet there is a way to do it with a single expand, but you can use a more verbose list comprehension. I'll take the files to be {wc1}_{wc2}.out for wildcards 1 and 2. Then you have
myDict= {
"Apple": ["fruity","red","green"],
"Banana": ["fruity","yellow"]
}
inputs = [expand('{wc1}_{wc2}.out',
wc1=key, wc2=value)
for key, value in myDict.items()]
# inputs = [['Apple_fruity.out', 'Apple_red.out', 'Apple_green.out'], ['Banana_fruity.out', 'Banana_yellow.out']]
rule all:
input: inputs
Edited to address comment:
To make two lists, keys and values, you can use
keys = []
values = []
for key, value in myDict.items():
for v in value:
keys.append(key)
values.append(v)
print(keys) # ['Apple', 'Apple', 'Apple', 'Banana', 'Banana']
print(values) # ['fruity', 'red', 'green', 'fruity', 'yellow']

Invalid type for parameter error when using put_item dynamodb

I want to write data in dataframe to dynamodb table
item = {}
for row in datasource_archived_df_join_repartition.rdd.collect():
item['x'] = row.x
item['y'] = row.y
client.put_item( TableName='tryfail',
Item=item)
but im gettin this error
Invalid type for parameter Item.x, value: 478.2, type: '<'type 'float''>', valid types: '<'type 'dict''>'
Invalid type for parameter Item.y, value: 696- 18C 12, type: '<'type 'unicode''>', valid types: '<'type 'dict''>'
Old question, but it still comes up high in a search and hasn't been answered properly, so here we go.
When putting an item in a DynamoDB table it must be a dictionary in a particular nested form that indicates to the database engine the data type of the value for each attribute. The form looks like below. The way to think of this is that an AttributeValue is not a bare variable value but a combination of that value and its type. For example, an AttributeValue for the AlbumTitle attribute below is the dict {'S': 'Somewhat Famous'} where the 'S' indicates a string type.
response = client.put_item(
TableName='Music',
Item={
'AlbumTitle': { # <-------------- Attribute
'S': 'Somewhat Famous', # <-- Attribute Value with type string ('S')
},
'Artist': {
'S': 'No One You Know',
},
'SongTitle': {
'S': 'Call Me Today',
},
'Year': {
'N': '2021' # <----------- Note that numeric values are supplied as strings
}
}
)
In your case (assuming x and y are numbers) you might want something like this:
for row in datasource_archived_df_join_repartition.rdd.collect():
item = {
'x': {'N': str(row.x)},
'y': {'N': str(row.y)}
}
client.put_item( TableName='tryfail', Item=item)
Two things to note here: first, each item corresponds to a row, so if you are putting items in a loop you must instantiate a new one with each iteration. Second, regarding the conversion of the numeric x and y into strings, the DynamoDB docs explain that the reason the AttributeValue dict requires this is "to maximize compatibility across languages and libraries. However, DynamoDB treats them as number type attributes for mathematical operations." For fuller documentation on the type system for DynamoDB take a look at this or read the Boto3 doc here since you are using Python.
The error message is indicating you are using the wrong type, it looks like you need to be using a dictionary when assigning values to item['x'] and item[y]. e.g.
item['x'] = {'value': row.x}
item['y'] = {'value': row.y}

Update dictionary key inside list using map function -Python

I have a dictionary of phone numbers where number is Key and country is value. I want to update the key and add country code based on value country. I tried to use the map function for this:
print('**Exmaple: Update phone book to add Country code using map function** ')
user=[{'952-201-3787':'US'},{'952-201-5984':'US'},{'9871299':'BD'},{'01632 960513':'UK'}]
#A function that takes a dictionary as arg, not list. List is the outer part
def add_Country_Code(aDict):
for k,v in aDict.items():
if(v == 'US'):
aDict[( '1+'+k)]=aDict.pop(k)
if(v == 'UK'):
aDict[( '044+'+k)]=aDict.pop(k)
if (v == 'BD'):
aDict[('001+'+k)] =aDict.pop(k)
return aDict
new_user=list(map(add_Country_Code,user))
print(new_user)
This works partially when I run, output below :
[{'1+952-201-3787': 'US'}, {'1+1+1+952-201-5984': 'US'}, {'001+9871299': 'BD'}, {'044+01632 960513': 'UK'}]
Notice the 2nd US number has 2 additional 1s'. What is causing that?How to fix? Thanks a lot.
Issue
You are mutating a dict while iterating it. Don't do this. The Pythonic convention would be:
Make a new_dict = {}
While iterating the input a_dict, assign new items to new_dict.
Return the new_dict
IOW, create new things, rather than change old things - likely the source of your woes.
Some notes
Use lowercase with underscores when defining variable names (see PEP 8).
Lookup values rather than change the input dict, e.g. a_dict[k] vs. a_dict.pop(k)
Indent the correct number of spaces (see PEP 8)

How to access and mutate node property value by the property name string in Cypher?

My goal is to access and mutate a property of a node in a cypher query where the name of the property to be accessed and mutated is an unknown string value.
For example, consider a command:
Find all nodes containing a two properties such that the name of the first property is lower-case and the name of the latter is the upper-case representation of the former. Then, propagate the value of the property with the lower-case string name to the value of the property with the upper-case name.
The particular case is easy:
MATCH ( node )
WHERE has(node.age) AND has(node.AGE) AND node.age <> node.AGE
SET node.AGE = node.age
RETURN node;
But I can't seem to find a way to implement the general case in a single request.
Specifically, I am unable to:
Access the property of the node with a string and a value
Mutate the property of the node with a string and a value
For the sake of clarity, I'll include my attempt to handle the general case. Where I failed to modify the property of the node I was able to generate the cypher for a command that would accomplish my end goal if it were executed in a subsequent transaction.
MERGE ( justToMakeSureOneExists { age: 14, AGE : 140 } ) WITH justToMakeSureOneExists
MATCH (node)
WHERE ANY ( kx IN keys(node) WHERE kx = LOWER(kx) AND ANY ( ky in keys(node) WHERE ky = UPPER(kx) ) )
REMOVE node.name_conflicts // make sure results are current
FOREACH(kx in keys(node) |
SET node.name_conflicts
= COALESCE(node.name_conflicts,[])
+ CASE kx
WHEN lower(kx)
THEN []
+ CASE WHEN any ( ky in keys(node) WHERE ky = upper(kx) )
THEN ['match (node) where id(node) = ' + id(node)+ ' and node.' + upper(kx) + ' <> node.' + kx + ' set node.' + upper(kx) + ' = node.' + kx + ' return node;']
ELSE [] END
ELSE []
END )
RETURN node,keys(node)
Afterthought: It seems like the ability to mutate a node property by property name would be a pretty common requirement, but the lack of obvious support for the feature leads me to believe that the feature was omitted deliberately? If this feature is indeed unsupported is there any documentation to explain why and if there is some conflict between the approach and the recommended way of doing things in Neo/Cypher?
There is some discussion going on regarding improved support for dynamic property access in Cypher. I'm pretty confident that we will see support for this in the future, but I cannot comment on a target release nor on a date.
As a workaround I'd recommend implementing that into a unmanaged extension.
It appears that the desired language feature was added to Cypher in Neo4j 2.3.0 under the name "dynamic property". The Cypher docs from version 2.3.0-up declare the following syntax group as a valid cypher expression:
A dynamic property: n["prop"], rel[n.city + n.zip], map[coll[0]].
This feature is documented for 2.3.0 but is absent from the previous version (2.2.9).
Thank you Neo4j Team!

Resources