Adding a new user to neo4j - graph

A totally neo4j noob is talking here,
I like to create a graph to store a set of users, a typical user is as follows:
CREATE
(node_1 {FullName:"Peter Parker",FirstName:"peter",FamilyName:"parker"}),
(node_2 {Address:"Newyork",CountryCode:"US"}),
(node_3 {Location:"Hidden"}),
(node_4 {phoneNumber:11111}),
(node_5 {InternetEmailAddress:"peter#peterland.com")
now the problem is,
Every time I execute this I add 5 more nodes.
I know I need to use a unique key, but all example I saw can use a unique key for a specific node. So how can I make sure a user doesn't get added if it already exists(I can use email address as unique key).
how do I update the nodes if some changes occur. for example, after a week I want to update the graph to contain the following instead of the previous one.(no duplicates)
CREATE(node_1 {FullName:"Peter Parker",FirstName:"peter",FamilyName:"parker"}),(node_2 {Address:"Newyork",CountryCode:"US"}),(node_3 {Location:"public"}),(node_4 {phoneNumber:11111}),(node_5 {InternetEmailAddress:"peter#peterland.com"),(node_6 {status:"Jailed"})
(NOTE the new update changed location to "public" and added a new node for peter

Seeing as you had a load of nodes anyway.
Some of the data you have modelled as Nodes are probably properties as the other answer suggests, some are possibly correctly modelled as Nodes and one could probably form the or a part of the relationship.
Location public/hidden can be modelled in one of three ways, as a property on the Person, as a property between the Person and the Location or as the relationship type. To understand that first you need to have a relationship.
Your address at the moment is another Node, I think this is correct, but possibly you would want two nodes, related something like this:
(s:State)-[:IN_COUNTRY]-(c:Country)
YMMV and clearly that a US centric model, but you can extend it easilly enough.
Now you could create Peter with a LIVES_IN relationship:
CREATE (p:Person{fullName:"Peter Parker"}), (s:State{name:"New York"}), (c:Country{code:"US"}),
(p)-[:LIVES_IN]->(s), (s)-[:IN_COUNTRY]->(c)
For speed you are better off modelling two relationships which could be LIVES_IN_PUBLIC and LIVES_IN_HIDDEN which means to perform that update that you want above then you have to delete the one and create the other. However, if speed is not of the essence, it is common also to use properties on the relationship.
CREATE (p:Person{fullName:"Peter Parker"}), (s:State{name:"New York"}), (c:Country{code:"US"}),
(p)-[:LIVES_IN{public:false}]->(s), (s)-[:IN_COUNTRY]->(c)
So your complete Q&A:
CREATE (p:Person {fullName:"Peter Parker",firstName:"peter",familyName:"parker", phoneNumber:1111, internetEmailAddress:"peter#peterland.com"}),
(s:State {name:"New York"}), (c:Country {code:"US"}),
(p)-[:LIVES_IN{public:false}]->(s), (s)-[:IN_COUNTRY]-(c)
MATCH (p:Person {internetEmailAddress:"peter#peterland.com"})-[li:LIVES_IN]->()
SET li.public = true, p.status = "jailed"
When adding other People you probably do not want to recreate States and Countries, rather you want to match them, and possibly Merge them, but we'll stick to Create.
MATCH (s:State{name:"New York"})
CREATE (p:Person{name:"John Smith", internetEmailAddress:"john#google.com"})-[:LIVES_IN{public:false}]->(s)
John Smith now implicitly lives in the US too as you can follow the relationship through the State Node.
Treatise complete.

I think you're modeling your data incorrectly here - you're setting up each property of the person as a separate node, which is not a good idea. You don't have any linkages between those nodes, so with this data pattern, later on you won't be able to tell what Peter Parker's address is. You're also not using node labels, which I think could really help here.
The quick question to your answer about updating nodes is that you have to MATCH them, then use SET to modify a property. So if you had a person, you might do this:
MATCH (p:Person { FullName: "Peter Parker" })
SET p.Address = "123 Fake Street"
RETURN p;
But notice I'm making assumptions about the way your data is structured. I'll take that same data you provided, this might be a better way of creating it:
CREATE (node_1:Person {FullName:"Peter Parker",
FirstName:"peter",
FamilyName:"parker",
Address:"Newyork",CountryCode:"US",
Location:"Hidden",
phoneNumber:11111,
InternetEmailAddress:"peter#peterland.com"});
The difference with this suggestion is that I'm putting all the properties into a single node (instead of one property per node) and I'm applying the Person label to the node.
If you structured the data like this, then the update query I provided would work. Structuring the data like you have it, it's not possible to update Peter Parker's address, because there's no relationship between your node_1 and node_2

Related

Gremlin query - how to eliminate nested coalesce

I have person vertex, has_vehicle edge and vehicle vertex which models vehicle ownership use case. The graph path is person -> has_vehicle -> vehicle.
I want to implement a Gremlin query which associates a vehicle to a person only if
The person does not have a vehicle
AND
The input vehicle is not associated with a person yet.
I followed the fold-coalesce-unfold pattern and came out with following Gremlin query with nested coalesce
g.V().hasLabel('person').has('name', 'Tom').as('Tom').outE('has_vehicle').fold().coalesce(
__.unfold(), // check if Tom already have a vehicle
g.V().has('vehicle', 123).as('Vehicle').inE('has_vehicle').fold().coalesce(
__.unfold(), // check if vehicle 123 is already associated with a person
__.addE('has_vehicle').from('Tom').to('Vehicle') // associate the vehicle to Tom
)
)
Is there a way to eliminate the nested coalesce? If I have multiple criteria, it would be too complex to write the query.
This might be a case where a couple of where(not(...)) patterns, rather than nesting coalesce steps works well. For example, we might change the query as shown below.
g.V().hasLabel('person').has('name', 'Tom').as('Tom').
where(not(outE('has_vehicle'))).
V().has('vehicle', 123).as('Vehicle').
where(not(inE('has_vehicle'))).
addE('has_vehicle').from('Tom').to('Vehicle')
So long as the V steps do not fan out and yield multiple Tom or Vehicle nodes that should work and is easy to extend by adding more to the where filters as needed.
As as a side note, the not steps used above should work even if not wrapped by where steps, but I tend to find it just reads better as written.
This rewrite does make an assumption that you are able to tolerate the case where Tom already has a car and the query just ends there. In that case no vertex or edge will be returned. If you did a toList to run the query you would get an empty list back in that case however to indicate nothing was done.

How to get a path from one node to another including all other nodes and relationships involved in between

I have designed a model in Neo4j in order to get paths from one station to another including platforms/legs involved. The model is depicted down here. Basically, I need a query to take me from NBW to RD. also shows the platforms and legs involved. I am struggling with the query. I get no result. Appreciate if someone helps.
Here is my cypher statement:
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]->(c:Station)
WHERE (a.name='NBW')
AND c.name='RD'
RETURN p
Model:
As mentioned in the comments, in Cypher you can't use a directed variable-length relationship that uses differing directions for some of the relationships.
However, APOC Procedures just added the ability to expand based on sequences of relationships. You can give this a try:
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='THT'
CALL apoc.path.expandConfig(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>, can_board>, goto>, can_alight>, <has_platform'}) YIELD path
RETURN path
I added a limit so that only the first (and shortest) path to your end station will be returned. Removing the limit isn't advisable, since this will continue to repeat the relationships in the expansion, going from station to station, until it finds all possible ways to get to your end station, which could hang your query.
EDIT
Regarding the new model changes, the reason the above will not work is because relationship sequences can't contain a variable-length sequence within them. You have 2 goto> relationships to traverse, but only one is specified in the sequence.
Here's an alternative that doesn't use sequences, just a whitelisting of allowed relationships. The spanningTree() procedure uses NODE_GLOBAL uniqueness so there will only be a single unique path to each node found (paths will not backtrack or revisit previously-visited nodes).
MATCH (start:station), (end:station)
WHERE start.name='NBW' AND end.name='RD'
CALL apoc.path.spanningTree(start, {terminatorNodes:[end], limit:1,
relationshipFilter:'has_platform>|can_board>|goto>|can_alight>|<has_platform'}) YIELD path
RETURN path
Your query is directed --> and not all of the relationships between your two stations run in the same direction. If you remove the relationship direction you will get a result.
Then once you have a result I think something like this could get you pointed in the right direction on extracting the particular details from the resulting path once you get that working.
Essentially I am assuming that everything you are interested in is in your path that is returned you just need to filter out the different pieces that are returned.
As #InverseFalcon points out this query should be limited in a larger graph or it could easily run away.
MATCH p = (a:Station)-[r:Goto|can_board|can_alight|has_platfrom*0..]-(c:Station)
WHERE (a.name='NBW')
AND c.name='THT'
RETURN filter( n in nodes(p) WHERE 'Platform' in labels(n)) AS Platforms

How do i get all nodes in the graph on a certain relation ship type

I have build a small graph where all the screens are connected and the flow of the screen varies based on the system/user. So the system/user is the relationship type.
I am looking to fetch all nodes that are linked with a certain relation ship from a starting screen. I don't care about the depth since i don't know the depth of the graph.
Something like this, but the below query takes ever to get the result and its returning incorrect connections not matching the attribute {path:'CC'}
match (n:screen {isStart:true})-[r:NEXT*0..{path:'CC'}]-()
return r,n
A few suggestions:
Make sure you have created an index for :screen(isStart):
CREATE INDEX ON :screen(isStart);
Are you sure you want to include 0-length paths? If not, take out 0.. from your query.
You did not specify the directionality of the :NEXT relationships, so the DB has to look at both incoming and outgoing :NEXT relationships. If appropriate, specify the directionality.
To minimize the number of result rows, add a WHERE clause that ensures that the current path cannot be extended further.
Here is a proposed query that combines the last 3 suggestions (fix it up to suit your needs):
MATCH (n:screen {isStart:true})-[r:NEXT* {path:'CC'}]->(x)
WHERE NOT (x)-[:NEXT {path:'CC'}]->()
return r,n;

How to store and retrieve different types of Vertices with the Tinkerpop/Blueprints graph API?

When looking at the Tinkerpop-Blueprints API it is quite straight forward to use one type of vertices but how can I store two? E.g. Users and their interests?
And how can I get a Vertex by id? I mean, there could be a user named 'timetabling' as well as the interests 'timetabling' - how to handle that id conflict?
-
I know that the first problem could be solved via introducing an index for a type-property and for the second problem I could auto generate the id and create another index for the name-property. BUT why would I then need the vertex id at all? E.g. for the in-memory there is a HashMap for all vertices which would be of no use and wasting memory! (I could solve the problem differently via combining type and name as the id but then it would inefficient if I e.g. list all users.)
Hmmh, ok. I'm just using the vertices for the combined id (name+type) and a separate index for type. Better solutions?
In general it is best to rely on the automatic ID system of the underlying graph database (e.g. Neo4j, InfiniteGraph, OrientDB, etc.). The way in which you would add the information you want is as follows:
Vertex v = graph.addVertex(null)
v.setProperty("name","timetabling")
Vertex marko = graph.addVertex(null)
graph.addEdge(null, marko, v, "hasInterest")
Verte aType = graph.addVertex(null)
graph.addEdge(null, aType, v, "hasType")
In short, the ID of a vertex/edge is a non-domain-specific way of retrieving vertices/edges. Generally, it is best to use properties in your domain model for indexing.
Hope that speaks to your question,
Marko.
http://markorodriguez.com

Drupal create views involving LEFT JOIN Sub-Select with non-existent node

i'm using Drupal 6
I have this table relation and I've translated into CCK complete with it's relation.
Basically when I view a Period node, I have tabs to display ALL Faculty nodes combined with Presence Number.
here's the table diagram: http://i.stack.imgur.com/7Y5cU.png
Translated into CCK like these:
CCK Faculty (name),
CCK Period (desc,from,to) and
CCK Presence(node-reference-faculty, node-reference-period, presence_number)
Here's my simple manual SQL query that achieve this result: http://i.stack.imgur.com/oysd3.png
SELECT faculty.name, presence.presence_number FROM Faculty AS faculty
LEFT JOIN (SELECT * FROM Presence WHERE Period_id=1) AS presence ON faculty.id=presence.Faculty_id
The value of 1 for Period_id will be given by the Period Node ID from the url argument.
Now the hardest part, is simulating simple SQL query above into Views. How can I make such query into Views in Drupal-6 or Drupal-7 ?
thanks for any helps.
The main issue, which I think you've noticed, is that if you treat Faculty as the base for your join, then there is no way to join on the Presence nodes. Oppositely, if you treat Presence as the base, then you will not see faculties that have no presence number.
There is no easy way, using your currently defined structure, to do these joins in views.
I would say your easiest option is to remove the 'node-reference-faculty' field from the presence node and add a node-reference-presence field to the faculty. Since CCK fields can have multiple values, you can still have your one-to-many relationship properly.
The one downside of this is that then you need to manage the presence-faculty relationship from the faculty nodes instead of the presence nodes. If that's a show stopper, which it could be depending on your workflow, you could have BOTH node-reference fields, and use a module like http://drupal.org/project/backreference to keep them in sync.
Once you have your reference from faculty -> presence, you will need to add a relationship in Views. Just like adding a field or a filter, open the list of relationships and find the one for your node-reference field.
Next, you will need to add an argument for period id and set it up to use a node id from the url. The key thing is that when you add the argument, it will ask which relationship to use in its options. You will want to tell it to use your newly added presence relationship.
You don't really need to do a subquery in your SQL like that. This should be the same thing and won't make mysql try to create a temporary table. I mention it because you can't really do subqueries in Views unless you are writing a very custom Views handler, but in this case you don't really need the subquery anyway.
Ex.
SELECT f.name, p.presence_number
FROM Faculty AS f
LEFT JOIN Presence AS p ON f.id=p.Faculty_id
WHERE p.Period_id=1;
I wrote an article about how to achieve a similar outcome here. http://scottanderson.com.au/#joining-a-views-query-to-a-derived-table-or-subquery
Basically how to alter a Views query to left join on a sub-query.

Resources