Cypher Query - Excluding certain relationships - graph

I am querying my graph where it has the following nodes:
Customer
Account
Fund
Stock
With the following relationships:
HAS (a customer HAS an account)
PURCHASED (an account PURCHASES a fund or stock)
HOLDS (a fund HOLDS a stock)
The query I am trying to achieve is returning all Customers that have accounts that hold Microsoft through a fund. The following is my query:
MATCH (c:Customer)-[h:HAS]->(a:Account)-[p:PURCHASED]-(f:Fund)-[holds:HOLDS]->(s:Stock {ticker: 'MSFT'})
WHERE exists((f)-[:HOLDS]->(s:Stock))
AND exists ((f:Fund)-[holds]->(s:Stock))
AND NOT exists((a:Account {account_type: 'Individual'})-[p:PURCHASED]->(s:Stock))
RETURN *
This almost gets me the desired results but I keep getting 2 relationships out of the Microsoft stock that is tied to an Individual account where I do not want those included.
Any help would be greatly appreciated!
Result:
Desired Result:

There is duplications in your query. Lines 2 and 3 are the same. Line 2 is a subgraph of Line 1. Then you are using the variables a, p and s more than once in line 1 and line 4. Below query is not tested but give it a try. Please tell me if it works for you or not.
MATCH (c:Customer)-[h:HAS]->(a:Account)-[p:PURCHASED]-(f:Fund)-[holds:HOLDS]->(s:Stock {ticker: 'MSFT'})
WHERE NOT exists((:Account{account_type: 'Individual'})-[:PURCHASED]->(:Stock))
RETURN *

It seems to me that you should just uncheck the "Connect result nodes" option in the Neo4j Browser:

Related

How to query Gremlin when multiple connections between nodes are present

I'm trying to build a suggestion engine using Gremlin but I'm having a hard time trying to understand how to create a query when multiple nodes are connected by different intermediate nodes.
Playground:
https://gremlify.com/alxrvpfnlo9/2
Graph:
In this simple example I have two users, both like cheese and bread. But User2 also likes sandwiches, which seems a good suggestion for User1 as he shares some common interests with User2
The question I'm trying to answer is: "What can I suggest to User1 based on what other users like?"
The answer should be: Everything that other users that like the same things as User1 likes, but excluding what User1 already like. In this case it should return a sandwich
So far I have this query:
g.V(2448600).as('user1')
.out().as('user1Likes')
.in().where(neq('user1')) // to get to User2
.out().where(neq('user1Likes')) // to get to what User2 likes but excluding items that User1 likes
Which returns:
Sandwich, bread, Sandwich (again), cheese
I think that it returns that data because it walks through the graph by the Cheese node first, so Bread is not included in the 'user1Likes' list, thus not excluded in the final result. Then it walks through the Bread node, so cheese in this case is a good suggestion.
Any ideas/suggestions on how to write that query? Take into consideration that it should escalate to multiple users-ingredients
I suggest that you model your problem differently. Normally the vertex label is used to determine the type of the entity. Not to identify the entity. In your case, I think you need two vertex labels: "user" and "product".
Here is the code that creates the graph.
g.addV('user').property('name', 'User1').as('user1').
addV('user').property('name', 'User2').as('user2').
addV('product').property('name', 'Cheese').as('cheese').
addV('product').property('name', 'Bread').as('bread').
addV('product').property('name', 'Sandwiches').as('sandwiches').
addE('likes').from('user1').to('cheese').
addE('likes').from('user1').to('bread').
addE('likes').from('user2').to('cheese').
addE('likes').from('user2').to('bread').
addE('likes').from('user2').to('sandwiches')
And here is the traversal that gets the recommended products for "User1".
g.V().has('user', 'name', 'User1').as('user1').
out('likes').aggregate('user1Likes').
in('likes').
where(neq('user1')).
dedup().
out('likes').
where(without('user1Likes')).
dedup()
The aggregate step aggregates all the products liked by "User1" into a collection named "user1Likes".
The without predicate passes only the vertices that are not within the collection "user1Likes".

Query number of edges to each connected vertex

I have a dataset with several orders and their respective purchases (called products). The orders are linked to the products they purchased. For a given product P1, I would like to get a list of all "linked" products (i.e. products that were purchased in the same orders as P1) with the number of occurrences (i.e. the number of orders that purchased those "linked" products)
In gremlin, I have already constructed a graph connecting all orders to their respective purchases. I am using Cosmos DB in case that is important. I am able to query all orders that purchased P1:
g.V().hasLabel('PRODUCT').has('id', '2').in('purchased')
I am also able to query all linked products:
g.V().hasLabel('PRODUCT').has('id','2').in('purchased').out('purchased')
However, the first query only returns the orders that purchased P1, and the second only returns the "linked" products. I am unable to get the number of occurrences of each "linked" product. Does anyone have any advice? Thanks in advance...
That's a classic recommendation query. Basically, you're only missing the final step:
g.V().has('PRODUCT','id','2').
in('purchased').
out('purchased').
groupCount()
However, you probably don't want to include the initial product in your result:
g.V().has('PRODUCT','id','2').as('a').
in('purchased').
out('purchased').
where(neq('a')).
groupCount()

Get the counts of all first generation nodes neo4j

My data structure is very simple. One label called customers with a one-to-one, one directional relationship to another customer of being referred. What is the correct query to retrieve the counts for each node of all the degrees of referred nodes that resulted from it.
In other words, if the database consisted
CustomerA referred CustomerB,
CustomerB referred CustomerC
the resulting table should be:
Customer 1st gen referrals 2nd gen referrals
A 1 1
B 1 0
C 0 0
You could match on nodes and find the sizes of the desired patterns:
MATCH (c:Customer)
RETURN c as Customer,
size((c)-[:REFERRED]->()) as firstGenRef,
size((c)-[:REFERRED*2]->()) as secondGenRef
EDIT
As far as returning the counts of all levels of referrals, that's likely going to be an expensive query, depending on how interconnected your data is.
You can give this a try, and if it takes too long or hangs, you may want to switch to APOC Procedures, specifically apoc.path.spanningTree(), which uses NODE_GLOBAL uniqueness to only retain a single path to each node encountered, that usually performs better.
MATCH (c:Customer)-[r:REFERRED*]->(ref)
WITH DISTINCT c, size(r) as gen, ref
WITH c, gen, count(gen) as referrals
ORDER BY gen ASC
RETURN c as Customer, collect({gen:gen, referrals:referrals}) as referrals
This will get you each customer on a row, along with collected maps of the generation and number of referrals at each generation, down to the maximum generation depth per customer.

How do I create a running count of outcomes sequentially by date and unique to a specific person/ID?

I have a list of unique customers who have made transactions over a year (Jan – Dec). They have bought products using 3 different methods (card, cash, check). My goal is to build a multi-classification model to predict the method pf payment.
To do this I am engineering some Recency and Frequency features into my training data, but am having trouble with the following frequency count because the only way I know how to do it is in Excel using the Countifs and SUMIFs functions, which are inhibitingly slow. If someone can help and/or suggest another solution, it would be very much appreciated:
So I have a data set with 3 columns (Customer ID, Purchase Date, and Payment Type) that is sorted by Purchase Date then Customer ID. How do I then get a prior frequency count of payment type by date that does not include the count of the current row transaction or any future transactions that are > the Purchase Date. So basically I want to do a running count of each payment option, based on a unique Customer ID, and a date range that is < purchase date of that training row. In my head I see it as “crawling” backwards through the transactions and counting. Simplified screenshot of data frame is below with the 3 prior count columns I am looking to generate programmatically.
Screenshot
This gives you the answer as a list of CustomerID, PurchaseDate, PaymentMethod and prior counts
SELECT CustomerID, PurchaseDate, PaymentMethod,
(
select count(CustomerID) from History T
where
T.CustomerID=History.CustomerID
and T.PaymentMethod=History.PaymentMethod
and T.PurchaseDate<History.PurchaseDate
)
AS PriorCount
FROM History;
You can save this query and use it as the source for a crosstab query to get the columnar format you want
Some notes:
I assumed "History" as the source table name - you can change the query above to use the correct source
To use this as a query, open a new query in design view. Close the window that asks what tables the query is to be built on. Open the SQL view of the query design - like design view, but it shows the SQL instead of the normal design interface. Copy the above into the SQL view.
You should now be able to switch to datasheet view and see the results
When the query is working to your satisfaction, save it with any appropriate name
Open a new query in design view
When you get the list of tables to include, switch to the list of queries and include the query you just saved
Change the query type to crosstab and update the query as needed to select rows, columns and values - look up "access crosstab queries" if you need more help.
Another tip to see what is happening here:
You can take the subquery - the parts inside the () above - and make
just that statement into it's own query, excluding the opening and closing (). Then you can look at it's design view to see what it does
Save it with an appropriate name and put it into the query above in place of the statement in () - then you can look at the design view.
Sometimes it's easier to visualize and learn from 2 queries strung together this way than to work with sub queries.

Graph DB get the next best recommended node in Neo4j cypher

I have a graph using NEO4j and currently trying to build a simple recommendation system that is better than text based search.
Nodes are created such as: Album, People, Type, Chart
Relationship are created such as:
People - [:role] -> Album
where roles are: Artist, Producer, Songwriter
Album-[:is_a_type_of]->Type (type is basically Pop, Rock, Disco...)
People -[:POPULAR_ON]->Chart (Chart is which Billboard they might have been)
People -[:SIMILAR_TO]->People (Predetermined similarity connection)
I have written the following cypher:
MATCH (a:Album { id: { id } })-[:is_a_type_of]->(t)<-[:is_a_type_of]-(recommend)
WITH recommend, t, a
MATCH (recommend)<-[:ARTIST_OF]-(p)
OPTIONAL MATCH (p)-[:POPULAR_ON]->()
RETURN recommend, count(DISTINCT t) AS type
ORDER BY type DESC
LIMIT 25;
It works however, it easily repeats itself if it has only one type of music connected to it, therefore has the same neighbors.
Is there a suggested way to say:
Find me the next best album that has the most similar connected relationships to the starting Album from.
Any Recommendation for a tie breaker scenario? Right now it is order by type (so if an album has more than one type of music it is valued more but if everyone has the same number, there is no more
significant)
-I made the [:SIMILAR_TO] link to enforce a priority to consider that relationship as important, but I haven't had a working cypher with it
-Same goes for [:Popular_On] (Maybe Drop this relationship?)
You can use 4 configurations and order albums according to higher value in this order. Keep configuration between 0 to 1 (ex. 0.6)
a. People Popular on Chart and People are similar
b. People Popular on Chart and People are Not similar
c. People Not Popular on Chart and People are similar
d. People Not Popular on Chart and People are Not similar
Calculate and sum these 4 values with each album. Higher the value, higher recommended Album.
I have temporarily made config as a = 1, b =0.8, c=0.6, d = 0.4. And assumed some relationship present which suggests some People Likes Album. If you are making logic based on Chart only then use a & b only.
MATCH (me:People)
where id(me) = 123
MATCH (a:Album { id: 456 })-[:is_a_type_of]->(t:Type)<-[:is_a_type_of]-(recommend)
OPTIONAL MATCH (recommend)<-[:ARTIST_OF]-(a:People)-[:POPULAR_ON]->(:Chart)
WHERE exists((me)-[:SIMILAR_TO]->(a))
OPTIONAL MATCH (recommend)<-[:ARTIST_OF]-(b:People)-[:POPULAR_ON]->(:Chart)
WHERE NOT exists((me)-[:SIMILAR_TO]->(b))
OPTIONAL MATCH (recommend)<-[:LIKES]-(c:People)
WHERE exists((me)-[:SIMILAR_TO]->(a))
OPTIONAL MATCH (recommend)<-[:LIKES]-(d:People)
WHERE NOT exists((me)-[:SIMILAR_TO]->(a))
RETURN recommend, (count(a)*1 + count(b)*0.8 + count(c)* 0.6+count(d)*0.4) as rec_order
ORDER BY rec_order DESC
LIMIT 10;

Resources