gremlin-python-gets nodes with greater than two edges - jupyter-notebook

I am currently using gremlin-python to study a graph. I want to get all vertices having more than two out edges. I am using anonymous traversal to filter out users based on the edge count but below is the error I am getting.
AttributeError: 'list' object has no attribute 'out'
I am new to this, not sure what I am doing wrong here. This is the way described in the limited gremlin-python tutorials/docs available.

It would be helpful if you could include a code snippet showing the imports you used as well as the query. In your Python code did you remember to import this class?
from gremlin_python.process.graph_traversal import __
I am able to run your query without any issues using one of my graphs
g.V().hasLabel('airport').where(__.out().count().is_(P.gte(2))).count().next()
If you do not have that import you will see an error like the one you are seeing.
There is a list of the most commonly needed imports when using gremlin-python at this location
EDITED to add:
As Stephen points out in the comment below given you only ever need to know if there are at least two outgoing edges you can reduce the work the query engine has to do (some optimizers may not need this) by adding a limit step.
g.V().hasLabel('airport').where(__.out().limit(2).count().is_(P.gt(1))).count().next()

from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
graph = Graph()
graph_db_uri = 'ws://localhost:8182/gremlin'
g = graph.traversal().withRemote(DriverRemoteConnection(graph_db_uri,'g'))
c=g.V().hasLabel('node_name').count().next()
print(c)

Related

Do I need to create all nodes by hand in Neo4j?

I am probably missing something because I am very new to Neo4j, but looking at their Movie graph - probably the very first graph to play with when you are learning the platform - they give us a really big piece of code where every node and labels and properties are imputed by hand, one after the other. Ok, it seems fair to a small graph for learning purpose. But, how should I proceed when I want to import a CSV and create a graph from this data? I believe a hand-imput is not expected at all.
My data look something like this:
date
origin
destiny
value
type
balance
01-05-2021
A
B
500
transf
2500
It has more than 10 thousand rows like this.
I loaded it as:
LOAD CSV FROM "file:///MyData.csv" AS data
RETURN data;
and it worked. The data was loaded etc. But now I have some questions:
1- How do I proceeed if I want origin to be a node and destiny to be another node with type to be edges with value as property? I mean, I know how to create it like (a)->[]->(b) but how to create the entire graph without creating edge by edge, node by node, property by property etc...?
2- Am I able to select the date and see something like a time evolution for this graph? I want to see all transactions in 20-05-2021, 01-05-2021 etc and see how it evolves. Is it possible?
As example in the official docs says here: https://neo4j.com/docs/operations-manual/current/tutorial/neo4j-admin-import/#tutorial-neo4j-admin-import
You may want to create 3 separate files for the import:
First: you need the movies.csv to import nodes with label :Movie
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Second: you need actors.csv to import nodes with label :Actor
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally, you can import relationships
As you see, actors and movies are already imported. So now you just need to specify the relationships. In the example, you're importing ROLE relationship in the given format:
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
So as you see in the header, you've got values:
START_ID - where the relationship starts, from which node
role - property name (you can specify multiple properties here, just make sure the csv format contains data for it)
:END_IN - where the relationship ends, to which node
:TYPE - type of the relationship
That's all :)

How do you change the turbulence intensity within FLORIS

I'd like to a run a case where I calculate the power of a small farm for a range of TI values, do I need to edit the JSON input file to do this?
import matplotlib.pyplot as plt
import floris.tools as wfct
# Initialize the FLORIS interface fi (using the example input)
fi = wfct.floris_utilities.FlorisInterface("example_input.json")
# Would now like to loop over TI from 6-15% here and collect powers...
You can access any of the parameters in the input file programmatically after the input file is loaded. There isn't a one-stop object in the FLORIS framework where all of the parameters live, but the floris.simulation API docs should give you some guidance.
With that in mind, the best method for iterating over a parameter within FLORIS is the FlorisInterface object. Once you know where you want to change a parameter in the simulation modules, you can see if there's a helper function in the FlorisInterface. Typically, you'll end up modifying one of the inputs to FlorisInterface.reinitialize_flow_field and using that function. This script provides a simple example. I think this is the solution in your case since you can change turbulence intensity through that function call.
When you want to change parameters in other objects like Wake, you can extract it from the FLORIS object, make changes, and then supply it back to reinitialize_flow_field.
I hope this answers you question!

How, with Gremlin, to return properties from in-vertices the same as I do from out-vertices? (Not as arrays)

I'm trying to start traversing from one set of labelled vertices, then get all their in-vertices connected by a particular kind of edge, then from there, return a property of those in-vertices as objects. I can do this same thing with some out-vertices starting from the same set of labelled vertices with no problem, but get a "The provided traverser does not map to a value:" error when I attempt it with some in-vertices.
I have found a workaround, but it is not ideal, as it returns the desired property values as arrays of length one.
Here is how I do the very similar task successfully with out-vertices:
g.V().hasLabel('TestCenter').project('address').by(out('physical').project('street').by(values('street1')))
This returns things like
==>{address={street=561 PLACE DE CEDARE}}
==>{address={street=370 N BLACK STATION AVE}}
This is great!
Then I try the same sort of query with some in-vertices, like this:
g.V().hasLabel('TestCenter').project('host').by(__.in('hosts').project('aCode').by(values('code')))
and get the above mentioned error.
The workaround I've been able to find is to add a .fold() to the final "by" like this:
g.V().hasLabel('TestCenter').project('host').by(__.in('hosts').project('aCode').by(values('code')).fold())
but then my responses are like this
==>{host=[{aCode=7387}]}
==>{host=[{aCode=9160}]}
What I would like is a response looking like this:
==>{host={aCode=4325}}
==>{host={aCode=1234}}
(Note: I am not sure if this is relevant, but I am connecting Gremlin to a Neptune DB Instance)
It seems to me from the error above and your workaround that not all of your 'TestCenter' have an in edge from type 'hosts'. When using project the by have to map for a valid value.
you can do two things:
1) make sure a value will be returned in the project:
g.V().hasLabel('TestCenter').project('host')
.by(coalesce(__.in('hosts').project('aCode').by(values('code')), constant('empty')))
2) filter does values:
g.V().hasLabel('TestCenter').where(__.in('hosts'))
.project('host').by(__.in('hosts').project('aCode').by(values('code')))

bokeh update high level chart with javascript callback

I am using bokeh to plot high level chart (Line) with a dataframe .
I have also widget to filter data showed by the graph.
I know how to do it with basic glyphs but not with high level charts.
Indeed basic plots are using ColumnDataSource as input and a javascript callback can take it as argument and trigger it. But for a dataframe as input, it does not seem possible.
I have the following error if I want to pass in args the dataframe df:
ValueError: expected an element of Dict(String, Instance(Model)), got {'df': ......
Any idea?
Thanks.
David
I think the only way to do that is by re-creating the entire plot each time and replacing it. See https://groups.google.com/a/continuum.io/forum/#!topic/bokeh/Hn14aDN_5lk for example. I would stick with using a ColumnDataSource.
I have also to precise that I don't want to use a bokeh server but a standalone solution.
As a workaround I tried to replace the dataframe by a dataframe generated by a columndatasource like this :
plot=Line(source.data,x='x',y='y')
or
plot=Line(source.to_df(),x='x',y='y')
with source=ColumnDataSource(df)
No more errors but nothing happens when triggering source in javascript callback.
Is it normal?
Thanks.
David
If you use
plot=Line(source.data,x='x',y='y') or plot=Line(source.to_df(),x='x',y='y')
then it is normal that your javascript callback doesn't trigger anything. Because you didn't pass any source to the line, you gave it the dictionary "data" from the source as defined in your python code at that time, and it will never change.
As said in the link of Okonomiyaki's answer, you should use bokeh.plotting if you want more interactions with js callbacks.

Python - Equivalent to scipy griddata because of an error

I would like to use an equivalent to scipy griddata because of an error appears :
QH7074 qhull warning: more than 16777215 ridges. ID field overflows and two ridges
may have the same identifier. Otherwise output ok.
which sometimes kill my calculation or just very slow...
Currently i have :
test_interp= griddata((xx_points.flatten(),yy_points.flatten()),values.flatten(), (xi+X,yi+V), method='linear',fill_value=nan)
I had exactly the same error message and it looks like a limitation of griddata for big data, according to this thread. The limit is 2^24=16777216 'ridges' (I don't know exactly what it refers to). I did exactly the same interpolation with less data points and this problem didn't occur. My guess is that you should find a way to split your dataset to be below the 2^24 threshold.

Resources