How to source statments with neoj4 graph database? - graph

I'm looking forward to use NeoJ4 for some researchs. However I have to check if it can do what I want first.
I would like to build a graph that says :
StatementID1 = Cannabidiol hasPositiveEffectOn ChronicPain
StatementID1 isSupportedBy Study1
StatementID1 isSupportedBy Study2
StatementID1 isNotSupportedBy Study3
This is easy to add key:Value properties to a relationship in NeoJ4.
The difficulty is that I want Study1,2,3 to be nodes. So that these can have them own set of properties.
This can be done in a triplestore where each triple has an ID like "Statment1" here. This is a matter of adding triples where the object is another triple ID.
url:TripleID1 = url:Cannabidiol url:hasPositiveEffectOn url:ChronicPain
url:TripleID2 = url:TripleID1 url:isSupportedBy url:Study1
url:TripleID3 = url:TripleID1 url:isSupportedBy url:Study2
url:TripleID4 = url:TripleID1 url:isNotSupportedBy url:Study3
For the moment I can't find how to do it simplely in NeoJ4.
I could add the DOI of the study as a property :
Study 1 :
DOI:123/123
Then add the same DOI in the link :
isSupportedBy:
DOI:123/123
Since the DOI is unique it could be possible to make some searchs. However this will make things much more complex.
Is there a simpler method to achieve that?

I suppose this is a database design issue.
Would a node/relationship model something like the following fit your data and make your queries easy?

Neo4j doesn’t support edges going from an edge to a node. Edges are always between nodes. So you’ll have to convert your positiveEffect edge to a node (as proposed in rickhg12hs’s answer) or model the positiveEffect as a non-edge (as you yourself proposed).

Related

Do I need to create all nodes by hand in Neo4j?

I am probably missing something because I am very new to Neo4j, but looking at their Movie graph - probably the very first graph to play with when you are learning the platform - they give us a really big piece of code where every node and labels and properties are imputed by hand, one after the other. Ok, it seems fair to a small graph for learning purpose. But, how should I proceed when I want to import a CSV and create a graph from this data? I believe a hand-imput is not expected at all.
My data look something like this:
date
origin
destiny
value
type
balance
01-05-2021
A
B
500
transf
2500
It has more than 10 thousand rows like this.
I loaded it as:
LOAD CSV FROM "file:///MyData.csv" AS data
RETURN data;
and it worked. The data was loaded etc. But now I have some questions:
1- How do I proceeed if I want origin to be a node and destiny to be another node with type to be edges with value as property? I mean, I know how to create it like (a)->[]->(b) but how to create the entire graph without creating edge by edge, node by node, property by property etc...?
2- Am I able to select the date and see something like a time evolution for this graph? I want to see all transactions in 20-05-2021, 01-05-2021 etc and see how it evolves. Is it possible?
As example in the official docs says here: https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/#tutorial-neo4j-admin-import
You may want to create 3 separate files for the import:
First: you need the movies.csv to import nodes with label :Movie
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Second: you need actors.csv to import nodes with label :Actor
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally, you can import relationships
As you see, actors and movies are already imported. So now you just need to specify the relationships. In the example, you're importing ROLE relationship in the given format:
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
So as you see in the header, you've got values:
START_ID - where the relationship starts, from which node
role - property name (you can specify multiple properties here, just make sure the csv format contains data for it)
:END_IN - where the relationship ends, to which node
:TYPE - type of the relationship
That's all :)

Is it possible to aggregate data with varying nesting depth in Grafana?

I have data in Grafana with different nesting depths. It looks like this (the nesting depth differs depending on the message type):
foo.<host>.type.<type-id>
foo.<host>.type.<type-id>.<subtype-id>
foo.<host>.type.<type-id>.<subtype-id>.<more-nesting>
...
The <host> field can be the IP of the server sending the data and <type-id> is the type of message that it handled. There are quite a lot of message types but for the visualization I am only interested in the first level of <type-id> aggregated over all hosts.
For example, if I have this data:
foo.ip1.type.type1 = 3
foo.ip1.type.type2.subtype1 = 5
foo.ip2.type.type1 = 4
foo.ip2.type.type2.subtype1 = 9
foo.ip2.type.type2.subtype2 = 13
I would rather see it like this:
foo.*.type.type1 = 7 (3+4)
foo.*.type.type2 = 27 (5+9+13)
Then it would be easier to produce a graph where you can see which types of messages are most frequent.
I have not found a way to express that in Grafana. The only option that I see is to create a graph by manually creating queries for each message type. If there were only a handful of types that would be OK, but in my example, the number of types is quite high and even worse, they can change over time. When new message types are added, I would like to see them without having to change the graph.
Does Grafana support to aggregate the data in such a way? Can it visualize the data aggregated by one node and while summing up everything that comes after the node (like the --max-depth option in the Unix du command)?
I am not very experienced with Grafana, but I am starting to believe this functionality is not supported. Not sure whether Grafana allows to preprocess the data, but if the data could be transformed to
foo.ip1.type.type1 = 3
foo.ip1.type.type2_subtype1 = 5
foo.ip2.type.type1 = 4
foo.ip2.type.type2_subtype1 = 9
foo.ip2.type.type2_subtype2 = 13
it would also be valid workaround as the number of subtypes in very low in my data (often there is even only one subtype).
I think the groupByNode function might be useful to you. By doing something like:
groupByNode(foo.ip1.type.*.*,3,"sumSeries")
You'll need to repeat this for each level of nesting. Hope that helps.
More information is available here:
http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.groupByNode
If you want to do it the way you alluded to in your example you could use aliasSub

In R, looking for a more detailed str() showing full names or a tree

I want to change parts of a ggplot2 object made by a function and returned as a result, to remove the Y-axis label. No, the function does not allow that to be specified in the first place so I want to change it after the fact.
str(theObject) ## shows the nested structure with parts shortened to ".." and I want to be able to type something like:
theObject$A$B$C$myLabel <- ""
So how can I either make an str -like listing with full paths like that or perhaps draw a tree structure showing the inner working of the object?
Yes, I can figure things out using names(theObject) and finding which branch leads to what I am looking for, then switching to that branch and repeating but it looks like there could be a better automated way to find a leaf node such as:
leaf_str(obj=theObject, leaf="myLabel")
might return zero or more lines like:
theObject$A$B$C$myLabel
theObject$A$X$Y$Z$myLabel
Or, the entire structure could be put out as a series of such lines.
I have searched and found nothing quite like this. I can see lots of uses especially in teaching what an object is. Yes, S4 objects might also use # as well as $.
The
tree
function in the xfun package may be useful.
See here for more details
https://yihui.org/xfun/

Nodes Size Relative to Associated Value

I'm fairly new to R and haven't been able to find an answer for this. Someone else asked a similar question, but no solution was ever reported. If I should have posted this Q on a different stackexchange, I apologize and will delete if it can't be migrated.
Using data I pulled from the FDIC on US based financial institutions and their total asset holdings, I would like to create a basic network graph where each node is proportionally sized to each other node in the graph. Each node would also be labeled with the name of the financial institution.
The edges of the graph actually don't matter for now, but I want each node connected to the network by at least one edge.
As of now, I've already successfully created a very basic network with 8 banks, connected by edges I randomly assigned, as shown here (I apparently can't embed pictures yet, sorry about that):
My .csv file will be formatted as:
id, bank, assets
1, JP Morgan Chase, 16928000
2, Bank of America, 19075000
... ... ...
For the graph I already created, it is the same as above except without the asset column. It was also only 8 banks, where the file I hope to use will have 25.
Like I already said, as for edges, I just randomly assigned some. If someone knows an easier way of creating random edges that connect the nodes I create, please let me know. Otherwise, this is how my file is formatted as of now:
to, from
1, 2
1, 3
...
And I created the graph I linked with the following commands:
> nodes <- read.csv("~/foo/foo/foo.csv")
> links <- read.csv("~/blah/taco/burrito/blah.csv")
> net <- graph_from_data_frame(d=links, vertices = nodes, directed = F)
> class(net)
> net
IGRAPH UN-- 8 10 --
+ attr: name (v/c), bank (v/c)
+ edges (vertex names):
[1] 1--2 1--3 1--4 1--5 2--3 2--4 2--7 4--5 5--8 7--8
> plot(net, main = "Financial Intermediaries", edge.arrow.size=.4, vertex.size=25, vertex.label.cex=1.5, vertex.label.color="black", vertex.label=V(net)$bank)
I hope I was clear with my problem and gave the necessary details/code. If not, please just let me know and I'll post it up here. Like I said, I'm really new to R (I literally picked it up today, lol), and much of the code I've used so far was less or more taken from Katya Ognyanova's examples/presentations on her blog.
For the sake of clarity, I'm currently using RStudio (most recent stable) and R v3.2.5.
I have been only using the igraph package, but if what I want can't be done with that, I am more than willing to switch over to a different package. That said, I would like to stay with R (unless there really is something so much easier for this it can't be ignored. I would like to stick with and learn R).
Thank you for any and all help, I really appreciate it.
as #Osssan linked to in the comments, there was a partial solution floating around.
That said, I think I created more of a 'hack' solution than a proper one with what I gleaned from the previous question. Here is what I did.
In my csv file, I had four columns. In the third column, I had the asset's for a given bank. NOTE Since I don't know how to do data manipulation inside of R, I had to do some work to adjust the size of the asset value so that it did not result in nodes that covered the entirety of the graph. With my solution, you will NOT get nodes that are relative in size automatically. You must do that first.
Since I wanted to create a network with nodes(banks) that were variable in size according to their respective asset holdings, what I did was create a separate vector like so
> df <- read.csv("~/blah/blah/blah.csv", colClasses = c("NULL","NULL", NA, "NULL"))
What this command does is read in the csv file, looks at the headings with colClasses and tell the interpreter to vacuum up all columns specified (non-NULL). With this vector, I then plugged it into my the plot function as such:
> plot(net, main = "Financial Intermediaries", edge.arrow.size=.4, vertex.size=as.matrix(df), vertex.label.color="black")
where I make a matrix using the as.matrix(df) and set it to vertex.size=. Given a vector of only one dimension, R is able to quickly make the appropriate matrix (I guess).
I still have to do some relabeling and connecting with edges, but it worked in graphing. I graphed the largest 26 commercial banks by total asset holdings (and adjusted them to % of total commercial bank assets in the US), so you will see that the size of nodes increase from 26-1. Here's the output.
Like I said, this solution works, but I am far from sure whether it would be considered proper or kosher. I welcome anyone to edit this solution so that it clarifies what is actually happening with my code and or post a proper/optimized solution if it exists. I'm going to give this post a solid few days before marking it solved, as I would like to still get a solid answer on this confusing problem.
P.S. If anyone knows of a way to force nodes not to overlap, I would appreciate a comment explaining how to do that. If you look at my picture, you'll see that the effect of dwarfing the other nodes is diminished when the largest node is covered by it's closely sized peers.

Graph Database - How to deal with multilingual data

I'm trying to approach a multilingual graph database but I'm struggling on how to achieve an optimal model.
My current proposal is to make two node types: Movie and MovieTranslation.
Movie holds all relationships as likes, related, ratings and comments. MovieTranslation contains all translatable data (title, plot, genres). A Movie node does not contain these kind of properties, only the original_title.
Movie and MovieTranslation are tied together by a translation relationship.
When I query nodes, I would check if they have a translation relationship with the queried locale (en_US for example). If true, merge the translation with the main node as the result.
I think this way might not be the best, but I can't think on a better one.
Do you guys have a better suggestion for the database model? It would be very appreciated.
I'm using neo4j, if you need this information.
Thanks,
Vinicius.
I suggest moving the original title to its own node also, call it MovieTitle. "Complicating" your model in this way should actually "simplify" (or at least standardise) your queries because you're always looking in one place for film titles (also for indexing and searching).
You're assuming that films only have one original title which isn't the case. A Korea-Japan co-production will have at least two original titles. Whole genres of Japanese cinema were released with different original Japanese titles in cinemas and on VHS.
Distinct from the idea of an original title is that of specific language titles. The same film released in different Chinese-speaking countries will have different Chinese-language titles that are deemed more marketable to the specific local audiences.
To get the original title:
MATCH (c:Country)<-[HAS_NATIONALITY]-(m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)
WHERE m.id = 1
RETURN COLLECT(t.title, c.country_code)
To get the original title in China:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)
WHERE c.country_code == "CN"
RETURN m, COLLECT(t.title, c.country_code)
To get all language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language)
RETURN m, COLLECT(t.title, l.language_code)
To get all Chinese-language titles:
MATCH (m:Movie)-[HAS_TITLE]->(t:MovieTitle)-[HAS_NATIONALITY]->(c:Country)-[HAS_LANGUAGE]->(l:Language)
WHERE l.language_code == "zh"
RETURN m, COLLECT(t.title, c.name)
I would separate plot and genre into their own nodes. There is an argument that different national cinemas have unique genres, but if westerns and samurai dramas are both sub-genres of period dramas then you want to find them both on a period drama search.
I would still have the idea of Translation nodes but don't confuse with them the domain you're modelling. It should be domain-ignorant and - for simple words/phrases like "romantic comedy" - should almost be a third-party graph plug-in released by GraphAware in 2025.
Get the French-language genre titles of a specific film:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation)-[HAS_LANGUAGE]->(l:Language)
WHERE m.id = 100 AND l.language_code = "fr"
RETURN COLLECT(t.translation)
Get all romanic comedies:
MATCH (m:Movie)-[HAS_GENRE*]->(g:Genre)-[HAS_TRANSLATION]->(t:Translation)
WHERE t.translation = "comédie romantique"
RETURN m
Unlike movie titles and genres, plots are altogether more simple because you're modelling the film's story as a blob of text and not as domain objects in itself. Perhaps later you may do textual analysis on the plot texts to find themes, gender bias, etc, and model this in the graph as well.
Get the French language plot for a specific movie:
MATCH (m:Movie)-[HAS_PLOT]->(p:Plot)-[HAS_LANGUAGE]->(l:Language)-[HAS_TRANSLATION]->(t:Translation)
WHERE m.id = 100 AND t.translation = "French"
RETURN p.plot
(Please treat the Cypher queries as pseudo-code. I didn't make a graph and test them.)
I think the model is ok.
You can RETURN movie, translation or RETURN {movie:movie, translation:translation}
Currently converting nodes to maps and combining these maps is not yet supported, that's something on the roadmap.
How and where would you want to use the nodes? If for rendering, you can just access the two columns or entries. If for graph visualization you can also combine them into a node in the json source for the viz.

Resources