Computing ARI by finding patterns in a reduced graph - graph

I have a graph in which items can be partitioned differently, i.e. clusters of items vary between two partitions. I'd like to calculate the Adjusted Rand Index between two different partitions to evaluate how much they differ.
The graph
This is the general structure of the graph:
(:Store)-[:SELLS]->(:Product)-[:SIMILAR]-(:Product)<-[:SELLS]-(:Store)
--- red_beans ------ orange_beans ---
/ / \
supermart ------ yellow_beans --- --- ecomart
\ \ /
--- blue_beans ------ green_beans ---
--- blue_jello ---
/ \
supermart --- purple_jello --- --- ecomart
\ /
--- red_jello ---
Each store sells different colored products.
Each product from a store can be similar to products in another store.
Additionally, each store can bundle different products together:
(:Product)-[:BUNDLED]-(:Product)
purple_jello --- red_beans --- blue_beans (supermart bundle)
red_jello --- orange_beans (ecomart bundle)
The question
Regardless of product color (different colors of a product are considered to be the same product), for products available in both stores (similar products are actually the same product available in both stores), how much do bundles differ between stores?
The process
The first step would be to reduce the graph to correspond to the question. It should (virtually) look like this:
(:Store)-[:SELLS]->(:Product)-[:SIMILAR]-(:Product)<-[:SELLS]-(:Store)
supermart --- super_beans --- eco_beans --- ecomart
supermart --- super_jello --- eco_jello --- ecomart
I would like to reduce different colors of a product into one by looking at their indirect similarities. Products from a same store that are indirectly linked by similarity will be considered the same:
(a:Store)-[:SELLS]->(:Product)-[:SIMILAR*]-(:Product)<-[:SELLS]-(a:Store)
The bundles will also be simplified like this:
(:Product)-[:BUNDLED]-(:Product)
super_jello --- super_beans (supermart bundle)
eco_jello --- eco_beans (ecomart bundle)
This should be a natural extension of the products reduction. Multiple edges will also be reduced into one.
The second step would be to extract the information necessary to calculate the ARI between the different partitions of products. The ARI is:
Three measures are needed to compute it:
The number of product pairs that are found in both stores' bundles. Basically, the number of times this pattern is present in the graph:
(a:Store)-[:SELLS]->(:Product)-[:SIMILAR]-(:Product)<-[:SELLS]-(b:Store)
| |
[:BUNDLED] [:BUNDLED]
| |
(a:Store)-[:SELLS]->(:Product)-[:SIMILAR]-(:Product)<-[:SELLS]-(b:Store)
The number of product pairs that are found in one store's bundles. Basically, the number of times this pattern is present in the graph, for each store:
(a:Store)-[:SELLS]->(:Product)
|
[:BUNDLED]
|
(a:Store)-[:SELLS]->(:Product)
The number of possible product pairs in one store. Since we consider only products that are available in both stores, each store has the same number of products, and this would be:
MATCH (a:Store)-[:SELLS]->(p:Product)
RETURN a, count(p)*(count(p)-1)/2
The implementation
Implementing this into cypher queries isn't as natural as I expected. I am having trouble translating my process, especially the first step, and would appreciate any help.
Edit: This is the graph I am working on:
with [
{
store: 'supermart',
products: ['red_beans','yellow_beans','blue_beans','purple_jello',
'orange_icecream','pink_icecream','blue_candy','brown_cake']
},
{
store: 'ecomart',
products: ['orange_beans','green_beans','blue_jello','red_jello',
'red_icecream','white_icecream','purple_candy','white_sugar']
}
] as sells
unwind sells as sell
merge (s:Store {name: sell.store})
with s, sell
unwind sell.products as product
merge (p:Product {name: product})
merge (s)-[:SELLS]->(p);
with [
['red_beans', 'purple_jello'],
['red_beans', 'blue_beans'],
['blue_beans', 'purple_jello'],
['yellow_beans', 'brown_cake'],
['orange_icecream', 'pink_icecream'],
['pink_icecream', 'blue_candy'],
['orange_beans', 'red_jello']
] as bundles
unwind bundles as bundle
match (p1:Product {name: bundle[0]})
match (p2:Product {name: bundle[1]})
merge (p1)-[:BUNDLED]-(p2);
with [
['red_beans', 'orange_beans'],
['yellow_beans', 'orange_beans'],
['yellow_beans', 'green_beans'],
['blue_beans', 'green_beans'],
['purple_jello', 'blue_jello'],
['purple_jello', 'red_jello'],
['orange_icecream', 'red_icecream'],
['pink_icecream', 'red_icecream'],
['pink_icecream', 'white_icecream'],
['blue_candy', 'purple_candy']
] as similarities
unwind similarities as similar
match (p1:Product {name: similar[0]})
match (p2:Product {name: similar[1]})
merge (p1)-[:SIMILAR]-(p2);
The ARI for this graph is 0.571428

Related

How to get recommendations based on previous orders in neo4j?

I am trying to work on neo4j for the first time. I have written the following:
LOAD CSV WITH HEADERS FROM "file:///restaurant_data.csv" AS data
MERGE(n1:Customer{Name:data.Name, Latitude:toFloat(data.Latitude),Longitude:toFloat(data.Longitude)})
MERGE(n2:Orders{OrderId:data.Order_ID,OrderTimestamp:data.Order_ts,FoodName:data.Food_Item})
MERGE(n3:Restaurant{RestaurantName:data.Restaurant, RestLat:toFloat(data.Rest_lat), RestLong:toFloat(data.Rest_long)})
MERGE (n1)-[r1:PLACES_ORDER]->(n2)
MERGE (n2)-[r2:BELONGS_TO]->(n3)
MERGE (n3)-[r3:SERVES]->(n2)
RETURN *;
i can share the csv if needed.
I want to find recommendation of restaurant for customer based on his top 5 previous orders by order_timestamp and distance between restaurant and customer should be less than 5 kms.
MATCH(n1:Customer{Name:"Angy"})
MATCH(n1)-[:PLACES_ORDER]->(n2:Orders)<-[:SERVES]-(rec:Restaurant)
RETURN n2.FoodName, n2.OrderTimestamp, n3.RestaurantName
ORDER BY n2.OrderTimestamp
LIMIT 5
This gives me only top 5 orders, how do I find restaurants serving those orders?
My file link: https://docs.google.com/spreadsheets/d/e/2PACX-1vTc35TBanV3Uk5gbCeEFeJkm2YAhbJPLnpS0KzmYErVRulvbXCWdSZ7xiKUfnCZQQUt-1ArabgmAGmL/pubhtml
Your query did not work. This worked:
MATCH(:Customer{Name:"Angy"})-[:PLACES_ORDER]->(o:Orders)
WITH o ORDER BY o.OrderTimestamp DESC LIMIT 5
WITH [o.FoodName] as foods
MATCH (r:Restaurant)-[:SERVES]->(n:Orders)
WHERE (n.FoodName in foods)
RETURN distinct r.RestaurantName,foods

Write to Firebase with serial number indexing

The data in LiveAutos is set using:
ref.child("hhh#hgh").child("latitude").setValue(location.latitude)
the data in Livelyautos is set using a python script by uploading a JSON file.
How can I write the data in LiveAutos similar to the LivelyAutos with serial numbers 0,1,2,3.
the database will be updated by multiple devices locations.
or how can I read the data from LiveAutos?
As #FrankvanPuffelen mentioned in his comment, storing sequential numeric elements is not a recommended way of adding data to Firebase Realtime Database, is rather an anti-pattern, since such a schema doesn't scale. What you can do is to use the push() method:
ref.child("hhh#hgh").push().child("latitude").setValue(location.latitude)
Which will produce a schema that looks like this:
Firebase-root
|
--- hhh#hgh
|
--- $pushedId
|
--- latitude: 0.00
|
--- longitude: 0.00
In this way you can add as many locations as you want.

How to remove duplicate lines in YAML format configuration files?

I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:
...
app: activity-worker
app: activity-worker
...
I need to search through each of those files and find those duplicates so that I can remove one of them.
Note: I know that to replace a certain string (say, switch service: to app:) in all files of a directory (say, dev) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/\service:/app:/g'. I'm looking for a relation between lines.
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}

Can I use this CSV to load a neo4j graph with cypher?

I am a medical doctor trying to model a drugs to enzymes database and am starting with a CSV file I use to load my data into the Gephi graph layouting program. I understand the power of a graph db but am illiterate with cypher:
The current CSV has the following format:
source;target;arc_type; <- this is an header needed for Gephi import
artemisinin;2B6;induces;
...
amiodarone;1A2;represses;
...
3A457;carbamazepine;metabolizes;
These sample records show the three types of relationships. Drugs can repress or augment a cytochrome, and cytochromes metabolize drugs.
Is there a way to use this CSV as is to load into neo4j and create the graph?
Thank you very much.
In neo4j terminology, a relationship must have "type", and a node can have any number of labels. It looks like your use case could benefit from labelling your nodes with either Drug or Cytochrome.
Here is a possible neo4j data model for your use case:
(:Drug)-[:MODULATES {induces: false}]->(:Cytochrome)
(:Cytochrome)-[:METABOLIZES]->(:Drug)
The induces property has a boolean value indicating whether a drug induces (true) or represses (false) the related cythochrome.
The following is a (somewhat complex) query that generates the above data model from your CSV file:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'file:///Drugs.csv' AS line FIELDTERMINATOR ';'
WITH line,
CASE line.arc_type
WHEN 'metabolizes' THEN {a: [1]}
WHEN 'induces' THEN {b: [true]}
ELSE {b: [false]}
END AS todo
FOREACH (ignored IN todo.a |
MERGE (c:Cytochrome {id: line.source})
MERGE (d:Drug {id: line.target})
MERGE (c)-[:METABOLIZES]->(d)
)
FOREACH (induces IN todo.b |
MERGE (d:Drug {id: line.source})
MERGE (c:Cytochrome {id: line.target})
MERGE (d)-[:MODULATES {induces: induces}]->(c)
)
The FOREACH clause does nothing if the value after the IN is null.
Yes it's possible, but you will need to install APOC : a list of usefull stored procedures for Neo4j. You can find it here : https://neo4j-contrib.github.io/neo4j-apoc-procedures/
Then you should put your CSV file into the import folder of Neo4j, and run those queries :
The first one to create a unique constraint on :Node(name) :
CREATE CONSTRAINT ON (n:Node) ASSERT n.name IS UNIQUE;
And then this query to import your data :
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM 'file:///my-csv-file.csv' AS line
MERGE (n:Node {name:line.source})
MERGE (m:Node {name:line.target})
CALL apoc.create.relationship(n, line.arc_type,{​}, m)

How to design three dimension nosql database using Firebase

I am using Firebase as database for an e-commerce app.
I got a problem in the product catalog design.
My idea is:
1. One product can have different sizes. (e.g. S, M, L, XL)
2. One product can also have different colors. (e.g. black, blue, bronze, red)
3. The price can vary depends on different size OR different color, e.g. a medium size T-shirt in black is $100, while the same size T-shirt but in blue is $150.
In other words, there can be up to 9 different prices for one T-shirt which has 3 sizes and 3 colors.
Below is the design I can come up with.
I stored the sizes, colors, and the prices in the child 'sku'.
Under this child, I put the price for different color in the child 'price'
But I think it is not the best design so hope anyone could advise a better solution.
For security reason, I have hidden part of the unique key.
You should remodel your database according to your needs.
Products
|
-Kj53453453453453 //ProductId
| |
--- Small_Size: true
| |
--- Black_Color: true
| |
--- Quantity: 7
| |
--- Price: 100
| |
--- ProductId: T_SHORT_ID // which must be the same for all t-shirts of same type
|
-Kj53453794677886 //ProductId
|
--- XL_Size: true
|
--- Red_Color: true
|
--- Quantity: 9
|
--- Price: 65
|
--- ProductId: T_SHORT_ID // which must be the same for all t-shirts of same type
Sizes
|
--- Small_Size: "S"
|
--- Medium_Size: "M"
|
--- Large_Size: "L"
|
--- XL_Size: "XL"
|
--- XXL_Size: "XXL"
Colors
|
--- Black_Color: "Black"
|
--- Blue_Color: "Blue"
|
--- Bronze_Color: "Bronze"
|
--- Red_Color: "Red"
Using this model you'll be able to have a node for each product separately. This means, let say for the first product which has the -Kj53453453453453 as an id, you know that is back and the size si S. For this type of product you know also that you have 7 pieces. When someone is buying a piece, the only thing you need to do, is to decrease the quantity by one, that's it!
Creating the correct queries, you'll be able to display everything from your database, all products, all sizes, all colors, all products that are black, all products that have the size of XL and so on.
Hope iti helps.

Resources