Neo4j expand path and return distinct nodes and relationships - graph

I have given one or more start nodes (by ID) and I need to expand over one or more hops and return one result with an array of distinct nodes and and array of distinct relationships.
I can do this either via apoc.path.expand() or apoc.path.subgraphAll(), but either way it produces multiple rows for each expanded path, and therefore may contain duplicate nodes. To reduce the multiple rows into one row, I have used collect() with apoc.coll.toSet() and apoc.coll.flatten() to remove duplicates from the nodes and relationships array:
apoc.path.subgraphAll:
MATCH (n) WHERE id(n) IN $ids
CALL apoc.path.subgraphAll(n, { minLevel: 1, maxLevel: 2 }) YIELD nodes, relationships
WITH collect(nodes) as nodes, collect(relationships) as relationships
RETURN apoc.coll.toSet(apoc.coll.flatten(nodes)) as nodes, apoc.coll.toSet(apoc.coll.flatten(relationships)) as relationships
apoc.path.expand:
MATCH (n) WHERE id(n) IN $ids
CALL apoc.path.expand(n, null, null, 1, 2) YIELD path
WITH collect(nodes(path)) as nodes, collect(relationships(path)) as relationships
RETURN apoc.coll.toSet(apoc.coll.flatten(nodes)) as nodes, apoc.coll.toSet(apoc.coll.flatten(relationships)) as relationships
Is there another way to remove the duplicates from the two arrays or to query the nodes and relationships?

Related

Gremlin query to select common attributes in entities

Wanted to get the selected columns of entities using gremlin
Entity1 - Employee has id, name, out_address, out_department attributes
Entity2 - Department has id, name, in_Employee attributes
Entity3 - Address has id, street, city. state, in_address attributes
In plain SQL, its very quite simple using alias
Select emp.id, emp.name, dept.id, dept.name, address.id address.street, address.city, address.state
From Employee emp
INNER JOIN Department ON dept JOIN dept.id = emp.id
INNER JOIN Address ON address JOIN address.id = emp.id
where emp.id = "<some condition here>"
Trying the same thing in gremlin
g.V().has('Employee', 'id','<some condition here>').out('department').values('id', 'name', 'street', 'city')
But the value we are getting is id of the department.
I am new to Gremlin. Could you please help.
Thanks,
You have to think of Gremlin as a pipeline of filters and transformations, so when you do:
g.V().has('Employee', 'id','<some condition here>').
out('department').
values('id', 'name', 'street', 'city')
You first get your "employee" vertex, but then by traversing out() you've transformed that "employee" vertex into its related "department" vertices. At that point, values(...) then transforms the "department" vertices into property values for each "department" vertex".
So, thinking in terms of transformations, one way to get what you want is to use project():
g.V().has('Employee', 'id','<some condition here>').
project('employeeName', 'employeeId', 'dept', 'addresses').
by('name').
by('id').
by(out('department').elementMap()).
by(out('address').elementMap().fold())
You start with the same "employee" vertex as before and you transform it with project() to a Map with the specified keys. Each key's value is defined by the following by() modulators. Therefore, for the by('name'), we're saying transform the "employee" vertex to grab the "name" property value and assign it to the "employeeName" key. We do the same for by('id') and the "employeeId" key. Then for the by() associated with "dept" we start with the "employee" vertex and traverse out on the "department" edge and transform that vertex to a Map using the elementMap() step (just now available on 3.4.4 - but you could use valueMap() or whatever transform you wanted to include the data in the "dept" key). Finally, for the last by() we do the same as "department" but note that I assumed you have multiple addresses for an employee and I added a fold() to the end to reduce the stream of addresses to a List for the "addresses" key.

Use linq in the where clause

I have 2 tables _customerRepository.GetAllQueryable() and _customerSettlementRepository.GetAllQueryable().
In table _customerSettlementRepository.GetAllQueryable(), I have column ApplyD (date), after joining these two together, I want to find out max ApplyD in the where clause. This is my code:
var settlements = from c in _customerRepository.GetAllQueryable()
join cs in _customerSettlementRepository.GetAllQueryable() on new {c.CustomerMainC, c.CustomerSubC}
equals new {cs.CustomerMainC, cs.CustomerSubC} into c1
where cs.ApplyD == (c1.Select(b=>b.ApplyD).Max())
select new CustomerSettlementViewModel()
{
TaxRate = cs.TaxRate
};
It's remarkable that quite often in these questions people come up with an SQL(-like) statement without specification of the goal they want to reach. Hence it is impossible to see whether the provided statement fulfills the requirements.
Anyway, it seems you have something like Customers (in CustomerRepository) and CustomerSettlements in CustomerSettlementRepository.
both Customers and CustomerSettlements have a CustomerMainC and a CustomerSubC. You want to join Customers and CustomerSettlements on these two properties.
A CustomerSettlement also has an ApplyD and a TaxRate.
You only want to keep the join results where ApplyD has the maximum value of ApplyD
Finally, from every remaining join result you want to create one CustomerSettlementViewModel object with the value of the TaxRate in the join result that was taken from the CustomerSettlement.
Now that I wrote this, it baffles me why you need to join in the first place, because you only use values from the CustomerSettlements, not from the Customer.
Besides, if two Customers are joined with the same CustomerSettlements. this will result in two equal CustomerSettlementViewModel objects.
But let's assume this is really what you want.
In baby steps:
IQueryable<Customer> customers = ...
IQueryable<CustomerSettlement> customerSettlements = ...
var joinResults = customers.Join(customerSettlements
customer => new {customer.CustomerMainC, customer.CustomerSubC},
settlement => new {settlement.CustomerMainC, settlement.CustomerSubC}
(customer, settlement) => new
{
settlement.ApplyD,
settlement.TaxRate,
// add other properties from customers and settlements you want in the end result
});
In words: take all Customers and all CustomerSettlements. From every Customer create an object having the values of the customer's CustomerMainC and CustomerSubC. Do the same from every CustomerSettlement. When these two objects are equal, create a new object, having the values of the CustomerSettlement's ApplyD and TaxRate (and other properties you need in the end result)
Note that this is still an IQueryable. No query is performed yet.
From this joinResult you only want to keep those objects that have the value of ApplyD that equals the maximum value of ApplyD.
This question on StackOverflow is about selecting the records with the max value. The idea is to group the records into groups with the same value for ApplyD. Then order the groups in descending Key order and take the first group.
var groupsWithSameApplyD = joinResults.GroupBy(
joinedItem => joinedItem.ApplyD,
joinedItem => new CustomerSettlementViewModel()
{
TaxRate = orderedItem.TaxRate,
// add other values from joinedItems as needed
});
Every group in groupsWithSameApplyD has a key equal to ApplyD. The group consists of CustomerSettlementViewModel objects created frome the joinedItems that all have the same ApplyD that is in the Key of the group.
Now order by descending:
var orderedGroups = groupsWithSameApplyD.OrderByDescending(group => group.Key);
The first group contains all elements that had the largest ApplyD. Your desired result is the sequence of elements in the group.
If there is no group at all, return an empty sequence. Note if a sequence is requested as result, it is always better to return an empty sequence instead of null, so callers can use the returned value in a foreach without having to check for null return
var result = orderedGroups.FirstOrDefault() ??
// if no groups at all, return empty sequence:
Enumerable.Empty<CustomerSettlementViewModel>();
Note: the FirstOrDefault is the first step where the query is actually performed. If desired you could put everything in one big query. Not sure if this would improve readability and maintainability.
This is my syntet error, I need to write this since the first
var settlements = from c in _customerRepository.GetAllQueryable()
join cs in _customerSettlementRepository.GetAllQueryable() on new {c.CustomerMainC, c.CustomerSubC}
equals new {cs.CustomerMainC, cs.CustomerSubC}
select new CustomerSettlementViewModel()
{
TaxRate = cs.TaxRate
};
settlements = settlements.Where(p => p.ApplyD == settlements.Max(b => b.ApplyD));

Neo4j - querying N items per group

The following is my query:
MATCH (u:User{id:1})-[r:FOLLOWS]->(p:Publisher)<-[:PUBLISHED]-(i:Item)-[:TAGGED]->(t:Tag)<-[f:FOLLOWS]-u
RETURN a, count(t) ORDER BY count(k) DESC LIMIT 100
So User can follow Publisher and a Tag. The query find the items, that user may like by counting matching tags.
Suppose there two properties, MIN and MAX, on relationship u-r->p. These properties specify, how many items user wants to see from each publisher. How can I rewrite the query to allow this?
Here is one thought. Say for instance that the FOLLOWS relationship has a min value and a max value set. You could use the following query to limit the data that is returned by the query based on those values. I have not rewritten the entire query to include the tags and a limit there either.
// find the user and the publisher and the relationship
// which has the min/max parameters
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
// macth the items that the publisher published
match p-[:PUBLISHED]-(i:Item)
// order them just because we can
with u, p, r, i
order by i.name
// collect the ordered items as the total list of items
with u, p, r, collect(i.name) as items
// make sure the collection is >= the minimum size of the list
// if so then return the items in the collection up to the max length
// otherwise return and empty collection
// you might want to do something else
with u, p, r, case
when length(items) >= r.min then items[..r.max]
else []
end as items
return u.name, p.name, r.min, r.max, items
The unfortunate thing about this is that you have already performed the query to get the items and are just filtering them out for display purposes. It would be nice to know the person's preference before hand so you could apply the max limit in the query for the items using limit and a parameter. This would eliminate unnecessary database hits. Depending on the publisher there could be many, many items and limiting them up front might be advantageous.
Here are a couple of variations to experiment with too. You could also do something like this...
// slight variation where the minimum is enforced with where instead of case
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
match p-[:PUBLISHED]-(i:Item)
with u, p, r, i
order by i.name
with u, p, r, collect(i.name) as items
where length(items) >= r.min
return u.name, p.name, items[..r.max]
or even this...
// only results actually between the min and max are returned
match (u:User {id: 1})-[r:FOLLOWS]->(p:Publisher)
with u, p, r
match p-[:PUBLISHED]-(i:Item)
with u, p, r, i
order by i.name
with u, p, r, collect(i.name) as items
where length(items) >= r.min
and length(items) <= r.max
return u.name, p.name, items[..r.max]

cypher: how to return distinct relationship types?

How to return the distinct relationship types from all paths in cypher?
Example query:
MATCH p=(a:Philosopher)-[*]->(b:SchoolType)
RETURN DISTINCT EXTRACT( r in RELATIONSHIPS(p)| type(r) ) as RelationshipTypes
This returns a collection for each path p.
I would like to return a single collection contain the distinct relationship types across all collections.
Here is a link to a graph gist to run the query-
http://gist.neo4j.org/?7851642
You might first collect all relationships on the matched path to a collection "allr", and then get the collection of distinct type(r) from the collection of all relationships,
MATCH p=(a:Philosopher)-[rel*]->(b:SchoolType)
WITH collect(rel) AS allr
RETURN Reduce(allDistR =[], rcol IN allr |
reduce(distR = allDistR, r IN rcol |
distR + CASE WHEN type(r) IN distR THEN [] ELSE type(r) END
)
)
Note, each element 'rcol' in the collection "allr" is in turn a collection of relationships on each matched path.

Cypher Order By Number of Paths

Let's say I have a graph of movies and directors, where movies are connected to each other by co-viewership. I want to find similar directors, i.e. directors whose films tend to be watched together.
START n=node:index(Name="Steven Spielberg") MATCH n-->m--l<--o RETURN o;
This gets me all of the related directors, but how do I order them by the number of paths that connect them? Bonus points if I can also take weight of the tie between films into consideration.
count(*) is the number of paths that start with n and end with o
START n=node:index(Name="Steven Spielberg")
MATCH n-->m--l<--o
RETURN o,count(*)
order by count(*) desc;
with weights on the relationships
START n=node:index(Name="Steven Spielberg")
MATCH path=n-->m--l<--o
RETURN o,sum(reduce(sum=0,r in rels(path) : sum+r.weight)) as weight
ORDER BY weight desc;
START n=node:index(Name="Steven Spielberg")
MATCH path=n-->m--l<--o
RETURN o
ORDER BY length(path);

Resources