I'm trying to create reusable components for my traversals using gremlin python by putting traversal components into functions and I'm running into a problem where some of the traversal components aren't working correctly.
As setup I'm running gremlin server using the docker container with the configuration file loading in the modern graph from the github repo
docker run -p 8182:8182 tinkerpop/gremlin-server:3.4.6 conf/gremlin-server-modern.yaml
My test python code looks like the following:
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.process.graph_traversal import __
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
def connect_gremlin(endpoint='ws://localhost:8182/gremlin'):
return traversal().withRemote(DriverRemoteConnection(endpoint,'g'))
def n():
return __.values('name')
def r():
return __.range(2,4)
g = connect_gremlin()
# works as expected
g.V().map(n()).toList()
# returns an empty list
g.V().map(n()).filter(r()).toList()
# but using range step directly works as expected
g.V().map(n()).range(2,4).toList()
I can successfully move the values step into a function but when I try to do the same thing with the range step it returns an empty list rather than the 2nd through 4th items. Anyone know what I'm doing wrong?
The map step is intended to map the state of each traverser to a new state. In the context of a single traverser a range starting anywhere but zero is not going to do what you expect.
Here are some examples using Python:
>>> g.V().map(__.range(0,1)).limit(5).toList()
[v[1400], v[1401], v[1402], v[1403], v[1404]]
>>> g.V().map(__.range(0,2)).limit(5).toList()
[v[1400], v[1401], v[1402], v[1403], v[1404]]
>>> g.V().map(__.range(1,2)).limit(5).toList()
[]
This is why the values step works inside a map step and range does not.
Rather than inject code using a map step why not just incrementally add to the traversal and then iterate it when complete?
Related
This below piece of gremlin code runs perfectly well in the gremlin console (it finds the unique start and end points of an k-step ego network, along with the minimum distance to that endpoint):
g.V(42062000).as("from")
.repeat(both().as("to")).emit().times(3).path()
.count(local).as("pathlen")
.select("from", "to", "pathlen")
.dedup("from", "to").toList()
And gives an output similar to the following, which is as expected:
==>{from=v[42062000], to=v[83607800], plen=2}
==>{from=v[42062000], to=v[23683248], plen=3}
==>{from=v[42062000], to=v[41762840], plen=3}
==>{from=v[42062000], to=v[42062000], plen=3}
==>{from=v[42062000], to=v[83599456], plen=3}
However, when converting the code to conform to the gremlinpython wrapper
(i.e. after substituting as for as_), I'm given the error TypeError: Object of type GraphTraversal is not JSON serializable, even though it's the same query.
Has anyone faced similar issues?
I am using gremlinpython 3.4.2, but was originally using 3.3.3. My version of Python is 3.7.3.
import static classes using
from gremlin_python import statics
statics.load_statics(globals())
or
from gremlin_python.process.graph_traversal import elementMap, range_,
local, count
and common reserved words must end with _
Example:
as_, range_
I am trying to execute a gremlin script over https against a remote JanusGraph instance. I have filtered my problem to the part where I am trying to add an edge using vertex variables. I am trying to add two vertices, assign the results to a variable and use them to add an edge. Also I am also tying to avoid a single line script like g.V().addV(..).aaddV(..).addE(..), because of the program logic that is behind the script
The following gremlin works in the gremlin console (remote session)
def graph=ConfiguredGraphFactory.open("ga");
def g = graph.traversal();
v1=g.addV('node1');
v2=g.addV('node2');
v1.addE('test').to(v2);
But when I try to do the same over https (issued against a compose-janusgraph server), I get an error. I did add .iterate() to the addV() and the vertexes are getting added if I remove the addE(..) line. But when I try
{"gremlin":"def graph=ConfiguredGraphFactory.open('ga');
def g = graph.traversal();
v1=g.addV('node16').property('name','testn16').iterate();
v2=g.addV('node17').property('name','testn2').iterate();
v1.addE('test18').to(v2);
g.tx().commit()"}
I get the exception
The traversal strategies are complete and the traversal can no longer
be modulated","Exception-Class":"java.lang.IllegalStateException"
Also note that I am joining the whole gremlin into one single line before sending it over curl. I have split them to newlines here for readability. Any help would be great. -- Thank you
iterate() doesn't return a Vertex...it just iterates the traversal to generate side-effects (i.e. the graph gets a vertex added but no result is returned). You probably just need to do:
{"gremlin":"graph=ConfiguredGraphFactory.open('ga');
g = graph.traversal();
g.addV('node16').property('name','testn16').as('v1').
addV('node17').property('name','testn2').as('v2').
addE('test18').from('v1').to('v2').iterate();
g.tx().commit()"}
How can I define a simple cell magic that just executes the cell as if the %%mymagic wasn't there?
The context is that we are using the wonderful IPython parallel framework. In some places, we also use its defined %%px magic. But sometimes we'd like to run the same notebook without a cluster (local only). In that case, %%px isn't defined and I would have to comment it out. Instead, in that case I'd like to redefine %%px so that:
%%px: would be a no-op.
%%px --local: just runs the cell, no other side-effect.
Alternatively, all %%px (with --local or not) could just run the cell, if that's simpler.
Another approach would be to create an ipyparallel Client that is a fake one, i.e. with 0 nodes (but would still operate correctly, e.g. with regard to %%px --local). But that would be for another question.
Things I've tried:
%alias_magic px time (after all, I don't care if the cell is timed). Unfortunately, %%time doesn't take arguments and chokes on --local.
Define my own "no-op" magic:
if USE_CLIENT:
pass
else:
# temporarily define %%px cell magic
from IPython import get_ipython
def px(line, cell):
"""Do nothing"""
pass
get_ipython().register_magic_function(px, 'cell')
but that succeeds a little too well at doing really nothing (i.e. the cells are not executed).
Look into IPython/core/magics/execution.py to see if there is any hook I could reuse (something that would just execute the cell). I haven't found, but perhaps I haven't looked hard enough.
Any other idea?
I think the relevant command is
self.shell.run_cell(cell)
We can define a magic command that just does not have effect as:
def f(line, cell):
print('==> line:[{}]'.format(line))
print('==> cell:\n # {}'.format('\n # '.join(cell.split())))
print('==================================================================')
res = get_ipython().run_cell(cell)
get_ipython().register_magic_function(f, 'cell', 'cache')
Here is an example run:
In your case, try:
if USE_CLIENT:
pass
else:
# temporarily define %%px cell magic
from IPython import get_ipython
def px(line, cell):
res = get_ipython().run_cell(cell)
get_ipython().register_magic_function(px,'cell','px')
I have installed both Titan and Faunus and each seems to be working properly (titan-0.4.4 & faunus-0.4.4)
However, after ingesting a sizable graph in Titan and trying to import it in Faunus via
FaunusFactory.open( )
I am experiencing issues. To be more precise, I do seem to get a faunus graph from the call FaunusFactory.open( ),
faunusgraph[titanhbaseinputformat->titanhbaseoutputformat]
but then, even asking a simple
g.v(10)
I do get this error:
Task Id : attempt_201407181049_0009_m_000000_0, Status : FAILED
com.thinkaurelius.titan.core.TitanException: Exception in Titan
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.getAdminInterface(HBaseStoreManager.java:380)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.ensureColumnFamilyExists(HBaseStoreManager.java:275)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.openDatabase(HBaseStoreManager.java:228)
My property file is taken straight out of the Faunus page with Titan-HBase input, except of course changing the url of the hadoop cluster:
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname= my IP
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
faunus.graph.output.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseOutputFormat
faunus.graph.output.titan.storage.backend=hbase
faunus.graph.output.titan.storage.hostname= IP of my host
faunus.graph.output.titan.storage.port=2181
faunus.graph.output.titan.storage.tablename=titan
faunus.graph.output.titan.storage.batch-loading=true
faunus.output.location=output1
zookeeper.znode.parent=/hbase-unsecure
titan.graph.output.ids.block-size=100000
Anyone can help?
ADDENDUM:
To address the comment below, here is some context: as I have mentioned, I have a graph in Titan and can perform basic gremlin queries on it.
However, I do need to run a gremlin global query which, due to the size of the graph, needs Faunus and its underlying MR capabilities. Hence the need to import it. The error I get doesn't look to me as if it points to some inconsistency in the graph itself.
I'm not sure that you have your "flow" of Faunus right. If your end result is to do a global query of the graph, then consider this approach:
pull your graph to sequence file
issue your global query over the sequence file
More specifically create hbase-seq.properties:
# input graph parameters
faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
faunus.graph.input.titan.storage.backend=hbase
faunus.graph.input.titan.storage.hostname=localhost
faunus.graph.input.titan.storage.port=2181
faunus.graph.input.titan.storage.tablename=titan
# hbase.mapreduce.scan.cachedrows=1000
# output data (graph or statistic) parameters
faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=snapshot
faunus.output.location.overwrite=true
In Faunus, copy do:
g = FaunusFactory.open('hbase-seq.properties')
g._()
That will read the graph from hbase and write it to sequence file in HDFS. Next, create: seq-noop.properties with these contents:
# input graph parameters
faunus.graph.input.format=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
faunus.input.location=snapshot/job-0
# output data parameters
faunus.graph.output.format=com.thinkaurelius.faunus.formats.noop.NoOpOutputFormat
faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
faunus.output.location=analysis
faunus.output.location.overwrite=true
The above configuration will read your sequence file from the previous step and without re-writing the graph (that's what NoOpOutputFormat is for). Now in Faunus do:
g = FaunusFactory.open('seq-noop.properties')
g.V.sideEffect('{it.degree=it.bothE.count()}').degree.groupCount()
This will execute a degree distribution, writing the results in HDFS to the 'analysis' directory. Obviously you can do whatever Faunus-flavored Gremlin you want here - I just wanted to provide an example. I think this is a pretty standard "flow" or pattern for using Faunus from a graph analysis perspective.
Is it possible to write Robot Framework tests in Python instead of the .txt format?
Behind the scenes it looks like the .txt test get converted into Python by pybot so I'm hoping that this is simply a matter of importing the right library and inheriting from the right class but I haven't been able to figure out how to do that.
(We already have a bunch of suites and have keywords written in both formats but sometimes the RF syntax makes it very difficult to do things that are simple in Python. I understand it would be possible to just write a Python keyword for each test plus 'wrap' setup and teardown functions the same way, but that seems cumbersome.)
Robot does not convert your test cases to python behind the scenes before running them. Instead, it parses the test cases, then iterates over each keyword, calling the code that implements the keyword. There isn't ever a stage where there's a completely pure python representation of a test case.
It is not possible to write tests in python, and have those tests run alongside traditional robot tests by the provided test runner. Like you said in your question, your only option is to put all of your logic for a single test case in a single keyword, and call that keyword from a test case.
It is possible to create and execute tests in python solely via the published API. This might not be what you're really asking for, because ultimately you're still creating keywords, you're just creating them via python.
from robot.api import TestSuite
suite = TestSuite('Activate Skynet')
suite.imports.library('OperatingSystem')
test = suite.tests.create('Should Activate Skynet', tags=['smoke'])
test.keywords.create('Set Environment Variable', args=['SKYNET', 'activated'], type='setup')
test.keywords.create('Environment Variable Should Be Set', args=['SKYNET'])
The above example was taken from here:
http://robot-framework.readthedocs.org/en/2.8.1/autodoc/robot.running.html
Well, you should not care if your python code represents tests or keywords as long as you code the logic of the tests in python.
The best you can do is to keep some html tables in robot format. Each line would be a call for a keyword. The keyword could be implemented in python, and, logically, represents a whole test (although in robot terminology it is still a "keyword").
This post shows how you can have access to the robot context from your python code.
robot variables
BuiltIn().get_variable_value("${USERNAME}")
java keywords
from com.mycompany.myproject.testtools import LoginRobotKeyword
LoginRobotKeywords().login(user, pwd)
robot keywords
BuiltIn().run_keyword("check user connected", user)
Robotframework does not support writting test cases in python directly. I've submitted an enhancement PR, check it here
https://github.com/robotframework/robotframework/issues/3128
But I've tried to do that by moving all the test cases logic to python code, and make RF test cases just a entry point to them.
Here is an example.
We could create a python file to include all testing logic and setup/teardown logic, like this
# *** case0001.py *****
from SchoolClass import SchoolClass
schCla = SchoolClass()
class case0001:
def steps(self):
print('''\n\n***** step 1 **** add school class \n''')
self.ret1 = schCla.add_school_class('grade#1', 'class#1', 60)
assert self.ret1['retcode'] == 0
print('''\n\n***** step 2 **** list school class to check\n''')
ret = schCla.list_school_class(1)
schCla.classlist_should_contain(ret['retlist'],
'grade#1',
'class#1',
60,
self.ret1['id'])
def setup(self):
pass
def teardown(self):
schCla.delete_school_class(self.ret1['id'])
And then we creat a Robot file. In which all RF test cases are in the same form and just work as entry points to python test cases above.
like this
*** Settings ***
Library cases/case0001.py WITH NAME C000001
Library cases/case0002.py WITH NAME C000002
*** Test Cases ***
add class - tc000001
[Setup] C000001.setup
C000001.steps
[Teardown] C000001.teardown
add class - tc000002
[Setup] C000002.setup
C000002.steps
[Teardown] C000002.teardown
You could see, in this way, the RF testcases are similar. We could even create a tool to auto generate them by scanning Python testcases.