How to add an edge between two existing vertices in gremlin python? - gremlin

Gremlin - working as expected
gremlin> vMarko = g.V().addV("person").property("name", "Marko").next()
==>v[1]
gremlin> vPeter = g.V().addV("person").property("name", "Peter").next()
==>v[6]
gremlin> g.V(vMarko).addE('knows').to(vPeter) //// (6)
==>e[22][1-knows->6]
Gremlin Python - working as expected
edge = g.add_v('person').property('name', 'Peter').as_('p2').add_v('person').property('name', 'Marko').addE('knows').to('p2').toList()
print(edge)
==> [e[74926][74924-knows->74922]]
Gremlin Python - Not working
v_marko = g.add_v('person').property('name', 'Marko').next()
v_peter = g.add_v('person').property('name', 'Peter').next()
print(type(v_marko))
edge = g.V(v_marko).addE('knows').to(v_peter).toList() # doesn't work
print(edge)
==> []

I was not able to reproduce this issue. Using a graph I have loaded and Gremlin Python running in the Python console:
>>> a=g.V('3').next()
>>> a
v[3]
>>> type(a)
<class 'gremlin_python.structure.graph.Vertex'>
>>> g.V(a).next()
v[3]
>>> b=g.V('4').next()
>>> g.addE('temp').from_(a).to(b).next()
e[62c2124a-f105-6558-343f-acd56ccfac66][3-temp->4]
>>> c=g.addV('temp').next()
>>> g.V(c).addE('temp').to(b).next()
e[0ac2124c-7230-dba0-9088-e66824d74b31][c0c2124b-f30c-bdea-7572-5127103f32c7-temp->4]
>>> g.E().hasLabel('temp').toList()
[e[62c2124a-f105-6558-343f-acd56ccfac66][3-temp->4], e[0ac2124c-7230-dba0-9088-e66824d74b31][c0c2124b-f30c-bdea-7572-5127103f32c7-temp->4]
]
I also tried using your queries (I changed add_v to addV as my client is slightly older I believe than yours and does not yet have that extra Python idiom support added).
>>> v_marko = g.addV('person').property('name', 'Marko').next()
>>> v_peter = g.addV('person').property('name', 'Peter').next()
>>> v_marko
v[0cc2124f-ec77-8d58-eaf2-7075c166fd5d]
>>> v_peter
v[a0c21250-0f11-b7d9-b2cb-2ddefc5e8749]
>>> edge = g.V(v_marko).addE('knows').to(v_peter).toList()
>>> edge
[e[3ac21250-3de9-9e13-24f5-b61854c4c46d][0cc2124f-ec77-8d58-eaf2-7075c166fd5d-knows->a0c21250-0f11-b7d9-b2cb-2ddefc5e8749]]
>>>
I ran my tests using Amazon Neptune as the database and the following client version.
>>> from gremlin_python import __version__
>>> __version__.version
'3.5.2'

Related

How to access Xcom value in a non airflow operator python function

I have a stored XCom value that I wanted to pass to another python function which is not called using PythonOperator.
def sql_file_template():
<some code which uses xcom variable>
def call_stored_proc(**kwargs):
#project = kwargs['row_id']
print("INSIDE CALL STORE PROC ------------")
query = """CALL `{0}.dataset_name.store_proc`(
'{1}' # source table
, ['{2}'] # row_ids
, '{3}' # pivot_col_name
, '{4}' # pivot_col_value
, 100 # max_columns
, 'MAX' # aggregation
);"""
query = query.format(kwargs['project'],kwargs['source_tbl'] ,kwargs['row_id'],kwargs['pivot_col'],kwargs['pivot_val'])
job = client.query(query, location="US")
for result in job.result():
task_instance = kwargs['task_instance']
task_instance.xcom_push(key='query_string', value=result)
print result
return result
bq_cmd = PythonOperator (
task_id= 'task1'
provide_context= True,
python_callable= call_stored_proc,
op_kwargs= {'project' : project,
'source_tbl' : source_tbl,
'row_id' : row_id,
'pivot_col' : pivot_col,
'pivot_val' : pivot_val
},
dag= dag
)
dummy_operator >> bq_cmd
sql_file_template()
The output of stored proc is a string which is captured using xcom.
Now I would like to pass this value to some python function sql_file_template without using PythonOperator.
As per Airflow documentation xcom can be accessed only between tasks.
Can anyone help on this?
If you have access to the Airflow installation you'd like to query (configuration, database access, and code) you can use Airflow's airflow.models.XCom:get_one class method:
from datetime import datetime
from airflow.models import XCom
execution_date = datetime(2020, 8, 28)
xcom_value = XCom.get_one(execution_date=execution_date,
task_id="the_task_id",
dag_id="the_dag_id")
So you want to access XCOM outside Airflow (probably a different project / module, without creating any Airflow DAGs / tasks)?
Airflow uses SQLAlchemy for mapping all it's models (including XCOM) to corresponding SQLAlchemy backend (meta-db) tables
Therefore this can be done in two ways
Leverage Airflow's SQLAlchemy model
(without having to create a task or DAG). Here's an untested code snippet for reference
from typing import List
from airflow.models import XCom
from airflow.settings import Session
from airflow.utils.db import provide_session
from pendulum import Pendulum
#provide_session
def read_xcom_values(dag_id: str,
task_id: str,
execution_date: Pendulum,
session: Optional[Session]) -> List[str]:
"""
Function that reads and returns 'values' of XCOMs with given filters
:param dag_id:
:param task_id:
:param execution_date: datetime object
:param session: Airflow's SQLAlchemy Session (this param must not be passed, it will be automatically supplied by
'#provide_session' decorator)
:return:
"""
# read XCOMs
xcoms: List[XCom] = session.query(XCom).filter(
XCom.dag_id == dag_id, XCom.task_id == task_id,
XCom.execution_date == execution_date).all()
# retrive 'value' fields from XCOMs
xcom_values: List[str] = list(map(lambda xcom: xcom.value, xcoms))
return xcom_values
Do note that since it is importing airflow packages, it still requires working airflow installation on python classpath (as well as connection to backend-db), but here we are not creating any tasks or dags (this snippet can be run in a standalone python file)
For this snippet, I have referred to views.py which is my favorite place to peek into Airflow's SQLAlchemy magic
Directly query Airflow's SQLAlchemy backend meta-db
Connect to meta db and run this query
SELECT value FROM xcom WHERE dag_id='' AND task_id='' AND ..

run the same method of a list of instances in pathos.multiprocessing

I am working on a traveling salesman problem. Given that all agents traverse the same graph to find their own path separately, i am trying to parallelize the path-finding action of agents. the task is for each iteration, all agents start from a start node to find their paths and collect all the paths to find the best path in the current iteration.
I am using pathos.multiprocessing.
the agent class has a traverse method as,
class Agent:
def find_a_path(self, graph):
# here is the logic to find a path by traversing the graph
return found_path
I create a helper function to wrap up the method
def do_agent_find_a_path(agent, graph):
return agent.find_a_path(graph)
then create a pool and employ amap by passing the helper function, a list of agent instance and the same graph,
pool = ProcessPool(nodes = 10)
res = pool.amap(do_agent_find_a_path, agents, [graph] * len(agents))
but, the processes are created in sequence and it runs very slow. I'd like to have some instructions on a correct/decent way to leverage pathos in this situation.
thank you!
UPDATE:
I am using pathos 0.2.3 on ubuntu,
Name: pathos
Version: 0.2.3
Summary: parallel graph management and execution in heterogeneous computing
Home-page: https://pypi.org/project/pathos
Author: Mike McKerns
i get the following error with the TreadPool sample code:
>import pathos
>pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
Traceback (most recent call last):
File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-f8f5e7774646>", line 1, in <module>
pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
AttributeError: 'ThreadPool' object has no attribute 'iumap'```
I'm the pathos author. I'm not sure how long your method takes to run, but from your comments, I'm going to assume not very long. I'd suggest that, if the method is "fast", that you use a ThreadPool instead. Also, if you don't need to preserve the order of the results, the fastest map is typically uimap (unordered, iterative map).
>>> class Agent:
... def basepath(self, dirname):
... import os
... return os.path.basename(dirname)
... def slowpath(self, dirname):
... import time
... time.sleep(.2)
... return self.basepath(dirname)
...
>>> a = Agent()
>>> import pathos.pools as pp
>>> dirs = ['/tmp/foo', '/var/path/bar', '/root/bin/bash', '/tmp/foo/bar']
>>> import time
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.006751060485839844
>>> p.close(); p.join(); p.clear()
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.0005156993865966797
>>> t.close(); t.join(); t.clear()
and, just to compare against something that takes a bit longer...
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.slowpath, dirs)); print(time.time()-go)
('bar', 'bash', 'bar', 'foo')
0.2055649757385254
>>> t.close(); t.join(); t.clear()
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.slowpath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.2084510326385498
>>> p.close(); p.join(); p.clear()
>>>

Emitting dronekit.io vehicle's attribute changes using flask-socket.io

I'm trying to send data from my dronekit.io vehicle using flask-socket.io. Unfortunately, I got this log:
Starting copter simulator (SITL)
SITL already Downloaded and Extracted.
Ready to boot.
Connecting to vehicle on: tcp:127.0.0.1:5760
>>> APM:Copter V3.3 (d6053245)
>>> Frame: QUAD
>>> Calibrating barometer
>>> Initialising APM...
>>> barometer calibration complete
>>> GROUND START
* Restarting with stat
latitude -35.363261
>>> Exception in attribute handler for location.global_relative_frame
>>> Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
longitude 149.1652299
>>> Exception in attribute handler for location.global_relative_frame
>>> Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
Here is my code:
sample.py
from dronekit import connect, VehicleMode
from flask import Flask
from flask_socketio import SocketIO, emit
import dronekit_sitl
import time
sitl = dronekit_sitl.start_default()
connection_string = sitl.connection_string()
print("Connecting to vehicle on: %s" % (connection_string,))
vehicle = connect(connection_string, wait_ready=True)
def arm_and_takeoff(aTargetAltitude):
print "Basic pre-arm checks"
while not vehicle.is_armable:
print " Waiting for vehicle to initialise..."
time.sleep(1)
print "Arming motors"
vehicle.mode = VehicleMode("GUIDED")
vehicle.armed = True
while not vehicle.armed:
print " Waiting for arming..."
time.sleep(1)
print "Taking off!"
vehicle.simple_takeoff(aTargetAltitude)
while True:
if vehicle.location.global_relative_frame.alt>=aTargetAltitude*0.95:
print "Reached target altitude"
break
time.sleep(1)
last_latitude = 0.0
last_longitude = 0.0
last_altitude = 0.0
#vehicle.on_attribute('location.global_relative_frame')
def location_callback(self, attr_name, value):
global last_latitude
global last_longitude
global last_altitude
if round(value.lat, 6) != round(last_latitude, 6):
last_latitude = value.lat
print "latitude ", value.lat, "\n"
emit("latitude", value.lat)
if round(value.lon, 6) != round(last_longitude, 6):
last_longitude = value.lon
print "longitude ", value.lon, "\n"
emit("longitude", value.lon)
if round(value.alt) != round(last_altitude):
last_altitude = value.alt
print "altitude ", value.alt, "\n"
emit("altitude", value.alt)
app = Flask(__name__)
socketio = SocketIO(app)
if __name__ == '__main__':
socketio.run(app, host='0.0.0.0', port=5000, debug=True)
arm_and_takeoff(20)
I know because of the logs that I should not do any HTTP request inside "vehicle.on_attribute" decorator method and I should search for information on how to solve this problem but I didn't found any info about the error.
Hope you could help me.
Thank you very much,
Raniel
The emit() function by default returns an event back to the active client. If you call this function outside of a request context there is no concept of active client, so you get this error.
You have a couple of options:
indicate the recipient of the event and the namespace that you are using, so that there is no need to look them up in the context. You can do this by adding room and namespace arguments. Use '/' for the namespace if you are using the default namespace.
emit to all clients by adding broadcast=True as an argument, plus the namespace as indicated in #1.

Gremlin Python project By clause

Have a graph running on Datastax Enterprise Graph (5.1 Version), running on Cassandra storage.
Trying to run a query to get both the ID and the property.
In Gremlin Console I can do this:
gremlin> g.V(1).project("v", "properties").by().by(valueMap())
==>[v:v[1],properties:[name:[marko],age:[29]]]
How can I translate the valueMap call still using Python GraphTraversal API. I know I can run a direct query via Session Execution, like this.
session.execute_graph("g.V().has(\"Node_Name\",\"A\").project(\"v\", \"properties\").by().by(valueMap())",{"name":graph_name})
Below is my setup code.
from dse.cluster import Cluster, EXEC_PROFILE_GRAPH_DEFAULT
from dse_graph import DseGraph
from dse.cluster import GraphExecutionProfile, EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT
from dse.graph import GraphOptions
from gremlin_python.process.traversal import T
from gremlin_python.process.traversal import Order
from gremlin_python.process.traversal import Cardinality
from gremlin_python.process.traversal import Column
from gremlin_python.process.traversal import Direction
from gremlin_python.process.traversal import Operator
from gremlin_python.process.traversal import P
from gremlin_python.process.traversal import Pop
from gremlin_python.process.traversal import Scope
from gremlin_python.process.traversal import Barrier
graph_name = "TEST"
graph_ip = ["127.0.0.1"]
graph_port = 9042
schema = """
schema.edgeLabel("Group").create();
schema.propertyKey("Version").Text().create();
schema.edgeLabel("Group").properties("Version").add()
schema.vertexLabel("Example").create();
schema.edgeLabel("Group").connection("Example", "Example").add()
schema.propertyKey("Node_Name").Text().create();
schema.vertexLabel("Example").properties("Node_Name").add()
schema.vertexLabel("Example").index("exampleByName").secondary().by("Node_Name").add();
"""
profile = GraphExecutionProfile(
graph_options=GraphOptions(graph_name=graph_name))
client = Cluster(
contact_points=graph_ip, port=graph_port,
execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: profile}
)
graph_name = graph_name
session = client.connect()
graph = DseGraph.traversal_source(session)
# force the schema to be clean
session.execute_graph(
"system.graph(name).ifExists().drop();",
{'name': graph_name},
execution_profile=EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT
)
session.execute_graph(
"system.graph(name).ifNotExists().create();",
{'name': graph_name},
execution_profile=EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT
)
session.execute_graph(schema)
session.shutdown()
session = client.connect()
graph = DseGraph.traversal_source(session)
Update:
I guess i have not made the problem clear. It is in python and not in gremlin console. So running code like graph.V().has("Node_Name","A").project("v","properties").by().by(valueMap()).toList()
will give following result. How to execute the gremlin query while still remain in the GLV level, not drop down to text serialized query to Gremlin-Server?
Traceback (most recent call last):
File "graph_test.py", line 79, in <module>
graph.V().has("Node_Name","A").project("v", "properties").by().by(valueMap()).toList()
NameError: name 'valueMap' is not defined
I may not fully understand your question but it seems like you largely have the answer most of the way there. This last line of code:
graph = DseGraph.traversal_source(session)
should probably be written as:
g = DseGraph.traversal_source(session)
The return value of traversal_source(session) is a TraversalSource and not a Graph instance and by convention TinkerPop tends to refer to such a variable as g. Once you have a TraversalSource, then you can just write your Gremlin.
g = DseGraph.traversal_source(session)
g.V().has("Node_Name","A").project("v", "properties").by().by(valueMap()).toList()

How to output json using HTSQL's python library

From the example:
>>> from htsql import HTSQL
>>> htsql = HTSQL("pgsql:///htsql_demo")
>>> rows = htsql.produce("/school{name, count(department)}")
How do I convert rows into JSON? Using the JSON formatter blows up:
>>> rows = htsql.produce("/school{name, count(department)}/:json")
UnsupportedActionError: unsupported action
While processing:
/school{name, count(department)}/:json
^^^^
I'm using HTSQL 2.3.3
It has to be done via internal API:
from htsql import HTSQL
demo = HTSQL('pgsql:///htsql_demo')
rows = demo.produce('/school{name, count(department)}')
from htsql.core.fmt.emit import emit
with demo:
text = ''.join(emit('x-htsql/json', rows))
print text
The credit goes to Kirill Simonov, from the HTSQL user's group.

Resources