OrientDB - Get network at level 2 - graph

I'm working for a big professional social network and we are starting to check if OrientDB meets our requirements in terms of Social Graph. For now on, we have managed to deploy a cluster of 10 nodes, setup backups and restore and populate all of our data from MongoDB to OrientDB with no major issues.
Our data model is :
vertices :
Profile
Company
Job
Publication
...
edges :
Followed : one profile can follow another profile or a company
Applied : one profile can apply to a job
Posted : one profile can post some publication on the network
...
What I want to know is :
How to get all people connected to one given profile at depth 1 or 2. I've tried something like SELECT out('Followed').out('Followed') as friend FROM 14:4 where 14:4 is a Profile object. Unfortunately, it gives me Profiles as well as Companies since a Profile can "Followed" a Company. How can I filter to get Profiles only ? I've tried SELECT out('Followed').out('Followed') as friend FROM 14:4 where #class = 'Profile' but it does not work :( Should I have multiple edge classes (FollowedProfile and FollowedCompany) to ease queries ?
When a Profile creates an account using another Social Network (Facebook, Google, ...) we are storing his existing contacts and match them with our database so we can say "Profile A is connected to Profile B thanks to Facebook". How should I represent that in OrientDB ? An attribute on the edge or a dedicated edge class ?
Last one is : how can I get the shortest path between two Profiles ?
Thanks a lot.

1) You can try with
SELECT FROM (SELECT expand(out('Followed').out('Followed')) as friend FROM 14:4) Where #class='Profile'
2) I think An attribute (thanks) on the edge is better
3) You can use shortestPath function (http://www.orientechnologies.com/docs/last/orientdb.wiki/SQL-Functions.html) :
Example:
select shortestPath(#8:32, #8:10, 'BOTH')

Related

Querying details from GraphDB

We are trying to implement Customer oriented details in Graphdb, were with a single query we can fetch the details of a customer such as his address,phone,email etc. We have build it using had address, has email edges..
g.addV('member').property('id','CU10611972').property('CustomerId', 'CU10611972').property('TIN', 'xxxx').property('EntityType', 'Person').property('pk', 'pk')
g.addV('email').property('id','CU10611972E').property('pk', 'pk')
g.addV('primary').property('id','CU10611972EP').property('EmailPreference','Primary').property('EmailType', 'Home').property('EmailAddress', 'SNEHA#GMAIL.COM').property('pk', 'pk')
g.V('CU10611972').addE('has Email').to(g.V('CU10611972E'))
g.V('CU10611972E').addE('has Primary Email').to(g.V('CU10611972EP')
This is how we have build email relation to the customer.. Similarly we have relations with Address and Phone. So right now we are using this command to fetch the json related to this customer for email,
g.V('CU10611972').out('has Email').out('has Primary Email')
And for complete Customer details we are using union for each Vertex, Phone,Emaiul and address..
Could you please suggest if there is an efficient way to query this detail?
This comes down really to two things.
General graph data modelling
Things the graph DB you are using does and does not support.
With Gremlin there are a few ways to model this data for a single vertex.
If the database supports it, have a list of names like ['home','mobile'] and use metaproperties to attach a phone number to each.
A lot of the Gremlin implementations I am aware of have chosen not to support meta properties. In these cases you have a couple of options.
(a) Have a property for 'Home' and another for 'Mobile'. If either is not known you could either not create that property or give it a value such as "unknown"
(b) Use prefixed strings such as ["Home:123456789","Mobile:123456789] and store them in a set or list (multi properties) and access them in Gremlin using the startingWith predicate. Such as g.V(id).properties('phone').hasValue(startingWith('Mobile')).value()

Neo4j Match and Create takes too long in a 10000 node graph

I have a data model like this:
Person node
Email node
OWNS relationship
LISTS relationship
KNOWS relationship
each Person can OWN one Email and LISTS multiple Emails (like a contact list, 200 contacts is assumed per Person).
The query I am trying to perform is finding all the Persons that OWN an Email that a Contact LISTS and create a KNOWS relationship between them.
MATCH (n:Person {uid:'123'}) -[r1:LISTS]-> (m:Email) <-[r2:OWNS]- (l:Person)
CREATE UNIQUE (n)-[:KNOWS]->[l]
The counts of my current database is as follows:
Number of Person nodes: 10948
Number of Email nodes: 1951481
Number of OWNS rels: 21882
Number of LISTS rels: 4376340 (Each Person has 200 unique LISTS rels)
Now my problem is that running the said query on this current database takes something between 4.3 to 4.8 seconds which is unacceptable for my need. I wanted to know if this is normal timing considering my data model or am I doing something wrong with the query (or even model).
Any help would be much appreciated. Also if this is normal for Neo4j please feel free to suggest other graph databases that can handle this kind of model better.
Thank you very much in advance
UPDATE:
My query is: profile match (n: {uid: '4692'}) -[:LISTS]-> (:Email) <-[:OWNS]- (l) create unique (n)-[r:KNOWS]->(l)
The PROFILE command on my query returns this:
Cypher version: CYPHER 2.2, planner: RULE. 3919222 total db hits in 2713 ms.
Yes, 4.5 seconds to match one person from index along with its <=100 listed email addresses and merging a relationship from user to the single owner of each email, is slow.
The first thing is to make sure you have an index for uid property on nodes with :Person label. Check your indices with SCHEMA command and if missing create such an index with CREATE INDEX ON :Person(uid).
Secondly, CREATE UNIQUE may or may not do the work fine, but you will want to use MERGE instead. CREATE UNIQUE is deprecated and though they are sometimes equivalent, the operation you want performed should be expressed with MERGE.
Thirdly, to find out why the query is slow you can profile it:
PROFILE
MATCH (n:Person {uid:'123'})-[:LISTS]->(m:Email)<-[:OWNS]-(l:Person)
MERGE (n)-[:KNOWS]->[l]
See 1, 2 for details. You may also want to profile your query while forcing the use of one or other of the cost and rule based query planners to compare their plans.
CYPHER planner=cost
PROFILE
MATCH (n:Person {uid:'123'})-[:LISTS]->(m:Email)<-[:OWNS]-(l:Person)
MERGE (n)-[:KNOWS]->[l]
With these you can hopefully find and correct the problem, or update your question with the information to help others help you find it.

Good schema for neo4j DB

What is the best design for social sharing network based on phonebook?
Background:
App sync the phonebook of a contact and social network of a user and build a graph and give recommendations.
Use case:
1. Recommendation of friends, mutual friends to people in your phonebook.
2. Initially not many nodes user connects to has social linkage therefore it may be the
case you have lot of friends but very few of them are social friends. Does it make sense
to have new relationship for every social ID (be bbm, FB, linkedin)
3. We will build the network through which old user gets notification when new user joins
any of social network and register to our app.
4. Pick of the day - based on mutual friends (degree and social interest and country), user
can search country wise social IDs.
5. User always search people from 1 country or its connected node (upto 4 degree) based on
male/female and age filter.
6. Status update would be notified to other connected nodes with social IDs.
Schema:
Country (1 relation or bucket per country).
Friend (all people who are in phonebook) - phonenumber as a key.
Social friend (1 relation or bucket per social networking company - will update the relation as soon as someone from your phonebook updates the social linkage on our site).
1 relation for male - help for filtering and pick suggestions.
1 relation for female help for filtering and pick suggestions.
Does it make sense to also add friends and social friends as a relation?
Usually you would do something like this:
(u:User { phone : "phone#" })-[:ON_PHONE]->(u2:User { phone : "phone#" })
(u:User { phone : "phone#" })-[:KNOWS]->(u2:User:Facebook { phone : "phone#", fbid: "fbid" })
If you want to keep the record information where the social network connections came from you can add the additional links to your graph like this:
(u:User { phone : "phone#" })-[:KNOWS_ON_FB]->(:FacebookUser {fbid:"fbid"})<-[:ON_FB]-(u2:User { phone : "phone#", fbid: "fbid" })
I think in most cases you don't need keep the records and should just focus on the "realtime/current" structure of your graph that you can update.
You can add additional labels to your users if you have facebook, linkedin etc. information for them and then add additional indexes:
create index on :User(phone)
create index on :Facebook(fbid)

Symfony2 dynamic relationship with a field

I am building a social website and I am laying out how the feed will work. I want to use the answer here: How to implement the activity stream in a social network and implement the database design mentioned:
id
user_id (int)
activity_type (tinyint)
source_id (int)
parent_id (int)
parent_type (tinyint)
time (datetime but a smaller type like int would be better)
The problem is I don't know how I would map the source_id based off activity_type. If a user registers, I want the source_id to be the user that registered. If someone creates a group the source_id will be the group. I know I can just use simple IDs without keys I just wanted to know if Symfony had some sort of way to do this built in.
If I fetch the feed and the activity_type is user_register I would like to be able to do this to get the source (user) without running an additional query:
$feedEntity->getSource()->getUsername(); //getSource() being the User entity
And if the source_typeis "user_post":
$feedEntity->getSource()->getMessage(); //getSource() being the UserPost entity
I basically just want to find the best way to store this data and make it the fastest.
Not easy to deal with doctrine and i think it cannot achieved 100% automatically
However, the keyword is table inheritance
http://docs.doctrine-project.org/en/2.0.x/reference/inheritance-mapping.html#single-table-inheritance
I think you could achieve your goal by doing something like this :
You create a discriminator map by the type column of the table which tells doctrine to load this entity a UserSource (for example)
This UserSource can be an own entity (can be inherited from a base class if you want) where you can decide to map the source_id column to the real User Entity
You can use instanceof matching against the namespace of the different entities mapped inside your discriminator map to define different behaviours for the different sources

Cypher Query to get connected nodes from two relations

I'm newbie to Neo4j/GraphDB and have created following simple graph
node[1]user1 which is 'friend' with node[2]user2 and node[3]user3
and all 3 above user have 'post' nodes connected to them as well..
question is how to get user1's connected friend and their post as well?
following query returns friends of user1 and his post only...
START user1=node(2) MATCH user1-->all_node RETURN all_node
Depending on the relationship types you have chosen, something like this should work:
START user1=node(2)
MATCH user1-[:FRIEND]->friend-[:POST]->post
RETURN friend,post

Resources