Get the counts of all first generation nodes neo4j - graph

My data structure is very simple. One label called customers with a one-to-one, one directional relationship to another customer of being referred. What is the correct query to retrieve the counts for each node of all the degrees of referred nodes that resulted from it.
In other words, if the database consisted
CustomerA referred CustomerB,
CustomerB referred CustomerC
the resulting table should be:
Customer 1st gen referrals 2nd gen referrals
A 1 1
B 1 0
C 0 0

You could match on nodes and find the sizes of the desired patterns:
MATCH (c:Customer)
RETURN c as Customer,
size((c)-[:REFERRED]->()) as firstGenRef,
size((c)-[:REFERRED*2]->()) as secondGenRef
EDIT
As far as returning the counts of all levels of referrals, that's likely going to be an expensive query, depending on how interconnected your data is.
You can give this a try, and if it takes too long or hangs, you may want to switch to APOC Procedures, specifically apoc.path.spanningTree(), which uses NODE_GLOBAL uniqueness to only retain a single path to each node encountered, that usually performs better.
MATCH (c:Customer)-[r:REFERRED*]->(ref)
WITH DISTINCT c, size(r) as gen, ref
WITH c, gen, count(gen) as referrals
ORDER BY gen ASC
RETURN c as Customer, collect({gen:gen, referrals:referrals}) as referrals
This will get you each customer on a row, along with collected maps of the generation and number of referrals at each generation, down to the maximum generation depth per customer.

Related

HERE Navstreets LINK_ID Range

My organization has acquired the HERE Navstreets data set. It wishes to update the content while still adhering to the HERE Navstreets data model and relationships.
In this context, it is deemed of value to:
Retain the LINK_ID column as the unique identifier for each street segment.
Make a distinction between the original HERE LINK_ID values and the one added by my organization.
Retain the ability to ingest streets updates from HERE should my organization decide to do so.
In this context, we would like to use a different range of LINK_ID values from the one used by HERE. As an example, if HERE uses values between 10,000,000 and 100,000,000, we would assign only LINK_ID values that are within the range 1,000-9,999,999 (this is only for illustration purposes).
Is this approach already accounted for by HERE for the street data they may get from Map Creator? Is there a specific HERE (for Review or Work in Progress) range of LINK_ID values we should consider?
Based on the HERE KB0015682: Permanent ID concept in HERE Data
Entities with Permanent IDs
Generally, the following feature do have permanent IDs in the HERE Map products:
Lane
Face
Point Features
Administrative Areas (for , Built-up Areas, Districts, and Administrative Areas)
Complex Features (this includes Complex Administrative Area Features as well as Complex Intersections and Complex Roads)
Permanent IDs are globally unique within a specific Object, e.g., a Link ID occurs once globally. However the same Permanent ID can be used among different Object types (e.g., Node, Link, condition, etc.). Note: When a map is upgraded to Intermediate or HERE map, or when a country undergoes administrative restructuring, there may be a change in Permanent IDs.
The following are examples of permanent IDs in the RDF:
Address Point ID
Admin Place ID
Association ID
Building ID
Carto ID
Complex Feature ID
Condition ID
Country ID (which is one of the Admin Place IDs)
Face ID
Feature Point ID
Lane ID
Lane Nav Strand ID
Link ID
Name ID (with some exceptions)
Nav Strand ID
Node ID
POI ID
Road Link ID
Sign ID
Zone ID
Numeric Range of Permanent IDs
Map object IDs (PVIDs) in the extracts use 32-bit integer values to fit in a N(10) scheme. Note: Exception to N(10) scheme can exist. For example, Lane ID is N(12) in length.
The entire range is divided as follows:
Range----------------------------Designation
0000000001 to 0016777215 -> Non-permanent IDs
0016777215 to 2147483647 -> Permanent IDs
The range dedicated to permanent IDs are used for any entity.
The range dedicated to non permanent IDs are used in rare situations where an update is made in a copy of the database instead of in the live database itself and this update results in a new ID. This new ID in the database copy would be in the non-permanent range. The update would also be applied into the live database and this update would receive a permanent ID available in the next scheduled release. A cross-reference is not provided between non-permanent IDs and the eventual permanent ID from the live database.

Invisible graphs cause report to slow

I have a report with a parameter where the end user chooses a practice name that corresponds to a group of people. Most of these groups have fewer than 10 people, but a small number of them have as many as 150. When there are more than 15 people in a given group, they want separate graphs, each with no more than 15 people. So for most of the groups, we only need one graph. For a few, we need a lot of graphs.
Behind the scenes, I created a graph for each multiple of 15 people, and set them to only be visible if there are actually that many people in the group. This does what I need it to, but it makes the report super slow. As close as I can tell, behind the scenes when an end user runs the report it's still somehow rendering the hidden graphs and slowing it all to heck. (I did find this link which I think suggests this is a known bug.
I need to have one report where the end user selects the practice name, so I can't make two reports, "My practice is normal" and "My practice is ginormous". I thought maybe I could make a conditional sub-report split into those two reports based on the practice name parameter, but that doesn't appear to be possible; you can play around with visibility but I'm guessing that will still cause the invisible graph rendering problem and not help my speed.
Are there any other cool tips I can try to speed up my report, or is this just a case of too many graphs spoiling the broth?
The easiest way would be to generate a group number for every 15 people and then use a list control to repeat the chart for each group.
Here's a very quick example of this in action. I just used some sample data from one of the Adventure Works sample database.
Here's my query that returns every person in each selected department. Note that I have commented out the DELCAREs as these were just in there for testing.
--DECLARE #Department varchar(50) = ''
--DECLARE #chartMax int = 5
SELECT
GroupName, v.Department, v.FirstName, v.LastName
, ChartGroup = (ROW_NUMBER() OVER(PARTITION BY Department ORDER BY LastName, FirstName)-1) / #chartMax -- calc which chart number the person belongs to
, Salary = ((ABS(CHECKSUM(NewId())) % 100) * 500) + (ABS(CHECKSUM(NewId())) % 1000) + 10000 -- Just some random number to plot
FROM [HumanResources].[vEmployeeDepartment] v
WHERE Department IN (#Department)
ORDER BY Department
The key bit is the ChartGroup column
ChartGroup = (ROW_NUMBER() OVER(PARTITION BY Department ORDER BY LastName, FirstName)-1) / #chartMax
This will give the first 5 rows in each department a ChartGroup of 0 the next 15 1 and so on. I used 5 rather than 15 just so it's easier to demo.
Here's the dataset results
Now, in your report, add a List, set it's dataset property to your dataset containing your main data (the query above in my case).
Now edit the 'details' rowgroup properties and add a grouping by Practice and ChartGroup (Department and ChartGroup in this example)
In the list box's textbox, right-click then insert a chart.
Set the chart up as required, in my example, I used salary as the values on a pie chart and the employee names as the labels.
Here's the final design ..
Note that I set the department as a multi-value parameter and also set the number of persons per chart (chartMax) as a report parameter.
When I preview the report I get this for 'Engineering' which has 6 employees
Sales has 18 employees so we get this
.... and so on, it will generate a new chart for every 15 people or part thereof.

Cypher Query - Excluding certain relationships

I am querying my graph where it has the following nodes:
Customer
Account
Fund
Stock
With the following relationships:
HAS (a customer HAS an account)
PURCHASED (an account PURCHASES a fund or stock)
HOLDS (a fund HOLDS a stock)
The query I am trying to achieve is returning all Customers that have accounts that hold Microsoft through a fund. The following is my query:
MATCH (c:Customer)-[h:HAS]->(a:Account)-[p:PURCHASED]-(f:Fund)-[holds:HOLDS]->(s:Stock {ticker: 'MSFT'})
WHERE exists((f)-[:HOLDS]->(s:Stock))
AND exists ((f:Fund)-[holds]->(s:Stock))
AND NOT exists((a:Account {account_type: 'Individual'})-[p:PURCHASED]->(s:Stock))
RETURN *
This almost gets me the desired results but I keep getting 2 relationships out of the Microsoft stock that is tied to an Individual account where I do not want those included.
Any help would be greatly appreciated!
Result:
Desired Result:
There is duplications in your query. Lines 2 and 3 are the same. Line 2 is a subgraph of Line 1. Then you are using the variables a, p and s more than once in line 1 and line 4. Below query is not tested but give it a try. Please tell me if it works for you or not.
MATCH (c:Customer)-[h:HAS]->(a:Account)-[p:PURCHASED]-(f:Fund)-[holds:HOLDS]->(s:Stock {ticker: 'MSFT'})
WHERE NOT exists((:Account{account_type: 'Individual'})-[:PURCHASED]->(:Stock))
RETURN *
It seems to me that you should just uncheck the "Connect result nodes" option in the Neo4j Browser:

How to determine trial booking conversion - google sheets

Introduction
Many sites use WooCommerce as a plugin for their WordPress site and so do we :). We've linked all purchases to google sheets, so I can do some analyses.
Our goal is get a many children physically active as we can and we have gym classes for very young children with they parents. To teach them the basics of the fun of physical activity
What I would like to do
I would like to know, how many free trial classes actually convert to paying customers and what the average timespan is between booking a trial class and becoming a member
The data that I have
I have the following columns which are necessary for this, I believe:
Datestamp
paymentID (is empty when booking a free trial class)
Price (is 0,00 whem booking a free trial class)
childsName (Unique in combination with parentsEmailadress, but recurs every month in the list once a membership is active)
parentsEmailadress (may have several children)
OrderName (has the string "trial" or "Membership")
I've made some dummy data in the following sheet:
https://docs.google.com/spreadsheets/d/1lWzQbXMU4qRLp_2qiQ_qsq57nPMy2RG8AHDMGKW626E/edit?usp=sharing
Possible solution
My guess is that I should:
make a column in which I combine the childs name and the emailadress
Make a TRUE of FALSE column to check if order is trial class or not
Make a column to find the first Unique child-emailadres combination in previous orders (How do I do that?! - Vlookup?)
and than
if this is found than check again if this is a trial class order.
If it is a trial class order than it should determine the amount of days between the trial class order and the non-trial class order and display the amount of days
if this is another normal order than leave empty(it's just a membership order)
if the emailadres is not found than display "direct" (it's a directly bought membership)
I did 1 and 2 and tried 3 with:
=ALS(H2=0;VERT.ZOEKEN(G2;A:G;1;ONWAAR);) (in Dutch)
=IF(H2=0;V.LOOKUP(G2;A:G;1;FALSE);) (possible translation)
But this doesn't work.
Really hope some can point me in the right direction!
Thank you very much in advance!
Given that trial classes have a price of 0, there's no need to create another column to identify those cases–just check the price. To the source data we'll add the "Client ID" column that you created in Column G of your sample sheet. (Ideally, you'll come up with an client ID system, but this works.) Now, create a new sheet that will be your dashboard and let's add a few columns:
Client ID This grabs only the unique values from the Client IDs in Sheet2 as we don't want users to be repeated in our dashboard. (Column A, row 1... for all others place the formulas in row 2).
=UNIQUE(Sheet2!G:G)
Did they trial? This will tell you TRUE/FALSE if the client did a trial. (Column B).
=ISNUMBER(MATCH(Dashboard!A2, FILTER(Sheet2!G:G, Sheet2!C:C=0), 0))
Did they convert? This will tell you TRUE/FALSE if the client converted from trial to paid. (In cases where they did not do a trial, it will be blank.) (Column C).
=IF(Dashboard!B2, ISNUMBER(MATCH(Dashboard!A2, FILTER(Sheet2!$G:$G, Sheet2!$C:$C>0), 0)), "")
Date of First Trial The date of the first trial. If none exists, will be blank. (Column D).
=IFERROR(MIN(FILTER(Sheet2!$A:$A, Sheet2!$G:$G=Dashboard!A2, Sheet2!$C:$C=0)))
Date of First Paid Course The date of the first paid course. If none exists, will be blank. (Column E).
=IFERROR(MIN(FILTER(Sheet2!$A:$A, Sheet2!$G:$G=Dashboard!A2, Sheet2!$C:$C>0)))
Days from First Trial to First Paid The number of days between the first paid course and the first trial course. If one of those values doesn't exist, then will be blank. (Column F).
=IF(ISDATE(Dashboard!E2), Dashboard!E2-Dashboard!D2, "")
Now you can answer several questions:
How many clients used a free trial? =COUNTIF(Dashboard!B:B, TRUE)
How many free trials converted? =COUNTIF(Dashboard!C:C, TRUE)
Average number of days from first trial to first paid? =AVERAGE(Dashboard!F:F)

Graph DB get the next best recommended node in Neo4j cypher

I have a graph using NEO4j and currently trying to build a simple recommendation system that is better than text based search.
Nodes are created such as: Album, People, Type, Chart
Relationship are created such as:
People - [:role] -> Album
where roles are: Artist, Producer, Songwriter
Album-[:is_a_type_of]->Type (type is basically Pop, Rock, Disco...)
People -[:POPULAR_ON]->Chart (Chart is which Billboard they might have been)
People -[:SIMILAR_TO]->People (Predetermined similarity connection)
I have written the following cypher:
MATCH (a:Album { id: { id } })-[:is_a_type_of]->(t)<-[:is_a_type_of]-(recommend)
WITH recommend, t, a
MATCH (recommend)<-[:ARTIST_OF]-(p)
OPTIONAL MATCH (p)-[:POPULAR_ON]->()
RETURN recommend, count(DISTINCT t) AS type
ORDER BY type DESC
LIMIT 25;
It works however, it easily repeats itself if it has only one type of music connected to it, therefore has the same neighbors.
Is there a suggested way to say:
Find me the next best album that has the most similar connected relationships to the starting Album from.
Any Recommendation for a tie breaker scenario? Right now it is order by type (so if an album has more than one type of music it is valued more but if everyone has the same number, there is no more
significant)
-I made the [:SIMILAR_TO] link to enforce a priority to consider that relationship as important, but I haven't had a working cypher with it
-Same goes for [:Popular_On] (Maybe Drop this relationship?)
You can use 4 configurations and order albums according to higher value in this order. Keep configuration between 0 to 1 (ex. 0.6)
a. People Popular on Chart and People are similar
b. People Popular on Chart and People are Not similar
c. People Not Popular on Chart and People are similar
d. People Not Popular on Chart and People are Not similar
Calculate and sum these 4 values with each album. Higher the value, higher recommended Album.
I have temporarily made config as a = 1, b =0.8, c=0.6, d = 0.4. And assumed some relationship present which suggests some People Likes Album. If you are making logic based on Chart only then use a & b only.
MATCH (me:People)
where id(me) = 123
MATCH (a:Album { id: 456 })-[:is_a_type_of]->(t:Type)<-[:is_a_type_of]-(recommend)
OPTIONAL MATCH (recommend)<-[:ARTIST_OF]-(a:People)-[:POPULAR_ON]->(:Chart)
WHERE exists((me)-[:SIMILAR_TO]->(a))
OPTIONAL MATCH (recommend)<-[:ARTIST_OF]-(b:People)-[:POPULAR_ON]->(:Chart)
WHERE NOT exists((me)-[:SIMILAR_TO]->(b))
OPTIONAL MATCH (recommend)<-[:LIKES]-(c:People)
WHERE exists((me)-[:SIMILAR_TO]->(a))
OPTIONAL MATCH (recommend)<-[:LIKES]-(d:People)
WHERE NOT exists((me)-[:SIMILAR_TO]->(a))
RETURN recommend, (count(a)*1 + count(b)*0.8 + count(c)* 0.6+count(d)*0.4) as rec_order
ORDER BY rec_order DESC
LIMIT 10;

Resources