Gremlin Python project By clause - graph

Have a graph running on Datastax Enterprise Graph (5.1 Version), running on Cassandra storage.
Trying to run a query to get both the ID and the property.
In Gremlin Console I can do this:
gremlin> g.V(1).project("v", "properties").by().by(valueMap())
==>[v:v[1],properties:[name:[marko],age:[29]]]
How can I translate the valueMap call still using Python GraphTraversal API. I know I can run a direct query via Session Execution, like this.
session.execute_graph("g.V().has(\"Node_Name\",\"A\").project(\"v\", \"properties\").by().by(valueMap())",{"name":graph_name})
Below is my setup code.
from dse.cluster import Cluster, EXEC_PROFILE_GRAPH_DEFAULT
from dse_graph import DseGraph
from dse.cluster import GraphExecutionProfile, EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT
from dse.graph import GraphOptions
from gremlin_python.process.traversal import T
from gremlin_python.process.traversal import Order
from gremlin_python.process.traversal import Cardinality
from gremlin_python.process.traversal import Column
from gremlin_python.process.traversal import Direction
from gremlin_python.process.traversal import Operator
from gremlin_python.process.traversal import P
from gremlin_python.process.traversal import Pop
from gremlin_python.process.traversal import Scope
from gremlin_python.process.traversal import Barrier
graph_name = "TEST"
graph_ip = ["127.0.0.1"]
graph_port = 9042
schema = """
schema.edgeLabel("Group").create();
schema.propertyKey("Version").Text().create();
schema.edgeLabel("Group").properties("Version").add()
schema.vertexLabel("Example").create();
schema.edgeLabel("Group").connection("Example", "Example").add()
schema.propertyKey("Node_Name").Text().create();
schema.vertexLabel("Example").properties("Node_Name").add()
schema.vertexLabel("Example").index("exampleByName").secondary().by("Node_Name").add();
"""
profile = GraphExecutionProfile(
graph_options=GraphOptions(graph_name=graph_name))
client = Cluster(
contact_points=graph_ip, port=graph_port,
execution_profiles={EXEC_PROFILE_GRAPH_DEFAULT: profile}
)
graph_name = graph_name
session = client.connect()
graph = DseGraph.traversal_source(session)
# force the schema to be clean
session.execute_graph(
"system.graph(name).ifExists().drop();",
{'name': graph_name},
execution_profile=EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT
)
session.execute_graph(
"system.graph(name).ifNotExists().create();",
{'name': graph_name},
execution_profile=EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT
)
session.execute_graph(schema)
session.shutdown()
session = client.connect()
graph = DseGraph.traversal_source(session)
Update:
I guess i have not made the problem clear. It is in python and not in gremlin console. So running code like graph.V().has("Node_Name","A").project("v","properties").by().by(valueMap()).toList()
will give following result. How to execute the gremlin query while still remain in the GLV level, not drop down to text serialized query to Gremlin-Server?
Traceback (most recent call last):
File "graph_test.py", line 79, in <module>
graph.V().has("Node_Name","A").project("v", "properties").by().by(valueMap()).toList()
NameError: name 'valueMap' is not defined

I may not fully understand your question but it seems like you largely have the answer most of the way there. This last line of code:
graph = DseGraph.traversal_source(session)
should probably be written as:
g = DseGraph.traversal_source(session)
The return value of traversal_source(session) is a TraversalSource and not a Graph instance and by convention TinkerPop tends to refer to such a variable as g. Once you have a TraversalSource, then you can just write your Gremlin.
g = DseGraph.traversal_source(session)
g.V().has("Node_Name","A").project("v", "properties").by().by(valueMap()).toList()

Related

Mocking SQLAlchemy test within Strawberry/FastAPI

I'm working on creating unit tests for a FastAPI, Strawberry, and SQLAlchemy setup. The current API is working and returning data correctly, but I cannot figure out how to mock the underlying database for unit tests. Would love any help/guidance to figure out this issue.
Below is the test code I"m currently working with, which I'm hoping will be enough to solve this issues but happy to post more if it helps. Running this currently will produce and output of ExecutionResult(data=None, errors=[GraphQLError("'NoneType' object is not subscriptable", locations=[SourceLocation(line=3, column=13)], path=['biomarkers'])], extensions={}), which seems to indicate that it is almost working but not quite reaching the mocked data within UnifiedAlchemyMagicMock.
import uuid
import unittest
from unittest import mock
import strawberry
from strawberry.extensions import Extension
from alchemy_mock.mocking import UnifiedAlchemyMagicMock
from app.api.api_v1 import api
from app.models import biomarker as biomarker_models
class MockSession:
'''Create Mock Session for Db'''
session = UnifiedAlchemyMagicMock(data=[
(
[mock.call.query(biomarker_models.Biomarker)],
[biomarker_models.Biomarker(
name="hello",
id=uuid.UUID('1a8d8791-946c-4fc4-8f5d-1b0c4f5ee2f5'),
quest_biomarker_code="quest"),
biomarker_models.Biomarker(
name="test",
id=uuid.uuid4(),
quest_biomarker_code="palazo")]
)
])
class MockRequest(Extension):
'''Mock Request state for context'''
def on_request_start(self):
self.execution_context.context["db"] = MockSession()
def on_request_end(self):
self.execution_context.context["db"].close()
class BioMarkerTestCase(unittest.TestCase):
'''Test Biomarker'''
def setUp(self) -> None:
self.strawberry_schema = strawberry.Schema(
query=api.Query,
mutation=api.Mutation,
extensions=[MockRequest],
types=api.QUERY_TYPE_LIST)
def test_query_get_all(self) -> None:
'''test biomarker query'''
query = """
query {
biomarkers {
id
name
whyItMatters
questBiomarkerCode
modeOfAcquisition
questRefRangeLow
questRefRangeHigh
optimalRangeLow
optimalRangeHigh
withinRangeRecommendations
belowRangeRecommendations
aboveRangeRecommendations
crossReferenceBiomarkers
notes
resourcesCited
measurementUnits
isCritical
resultDataType
critical{
id
biomarkerId
isPriority1
priority1Range
isPriority2
priority2Range
}
}
}
"""
query_result = query_result = self.strawberry_schema.execute_sync(query)
self.assertIsNotNone(query_result.data)
When using
query_result = query_result = self.strawberry_schema.execute_sync(query)
the context_value is defaulted to None, which I think is the cause of your errors.
try with:
query_result = query_result = self.strawberry_schema.execute_sync(query, context_value={})

How to change xcom in Airflow to accomodate large data?

I am using the following code in my Airflow operator:
import json
import pandas as pd
from airflow.exceptions import AirflowException
from airflow.hooks.http_hook import HttpHook
from airflow.models import BaseOperator
from airflow.utils.decorators import apply_defaults
from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook
class HttpToGoogleCloudStorageOperator(BaseOperator):
template_fields = ['endpoint', 'data', 'headers', ]
template_ext = ()
ui_color = '#f4a460'
#apply_defaults
def __init__(self,
endpoint,
project_id,
table_id,
data=None,
headers=None,
auth=None,
http_conn_id='http_default',
*args, **kwargs):
super(HttpToGoogleCloudStorageOperator, self).__init__(*args, **kwargs)
self.table_id = table_id
self.http_conn_id = http_conn_id
self.method = "GET"
self.endpoint = endpoint
self.headers = headers or {}
self.auth = auth
self.data = data or {}
def execute(self, context):
http = HttpHook(self.method, http_conn_id=self.http_conn_id)
self.log.info("Calling HTTP method " + self.endpoint)
response = http.run(self.endpoint, self.data, self.headers,auth=self.auth)
self.log.info("Got response")
Unfortunately the data returned is too large (about 5k) to fit in the standard xcom and I get this error:
{taskinstance.py:1059} ERROR - (_mysql_exceptions.DataError) (1406, "Data too long for column 'value' at row 1")
Is there a way I can tell http_hook to use a different xcom, or (even better) not use xcom at all? I have looked around and I do not see a solution.
Thanks for any tips or pointers.
Edit: Here is how I call the operator. Note that nowhere do I specify xcom.
query_load_task = HttpToGoogleCloudStorageOperator(
task_id="query_load_task",
endpoint=endpoint,
project_id="my_gcp_poroject_id",
table_id="dataset.table",
data=None,
auth=(username, password))
It's preferable to store data to a system designed for such (e.g.: the file system, AWS S3, Azure, etc.) and instead return a unique identifier to reference the location of the data, for the file system this would likely be the full path (e.g.: /tmp/acme_response_20200709.csv) that way you leverage the best of both the storage system and your database.
If you add your code I'd be happy to take a crack at writing up some psuedo-code as an example.

How can I get a one line per test result with Robot Framework?

I want to take test case results from Robot Framework runs and import those results into other tools (ElasticSearch, ALM tools, etc).
Towards that end I would like to be able to generate a text file with one line per test. Here is an example line pipe delimited:
testcase name | time run | duration | status
There are other fields I would add but those are the basic ones. Any help appreciated. I have been looking at robot.result http://robot-framework.readthedocs.io/en/3.0.2/autodoc/robot.result.html but haven't figured it out yet. If/when I do I will post answer here.
Thanks,
The output.xml file is very easy to parse with normal XML parsing libraries.
Here's a quick example:
from __future__ import print_function
import xml.etree.ElementTree as ET
from datetime import datetime
def get_robot_results(filepath):
results = []
with open(filepath, "r") as f:
xml = ET.parse(f)
root = xml.getroot()
if root.tag != "robot":
raise Exception("expect root tag 'robot', got '%s'" % root.tag)
for suite_node in root.findall("suite"):
for test_node in suite_node.findall("test"):
status_node = test_node.find("status")
name = test_node.attrib["name"]
status = status_node.attrib["status"]
start = status_node.attrib["starttime"]
end = status_node.attrib["endtime"]
start_time = datetime.strptime(start, '%Y%m%d %H:%M:%S.%f')
end_time = datetime.strptime(end, '%Y%m%d %H:%M:%S.%f')
elapsed = str(end_time-start_time)
results.append([name, start, elapsed, status])
return results
if __name__ == "__main__":
results = get_robot_results("output.xml")
for row in results:
print(" | ".join(row))
Bryan is right that it's easy to parse Robot's output.xml using standard XML parsing modules. Alternatively you can use Robot's own result parsing modules and the model you get from it:
from robot.api import ExecutionResult, SuiteVisitor
class PrintTestInfo(SuiteVisitor):
def visit_test(self, test):
print('{} | {} | {} | {}'.format(test.name, test.starttime,
test.elapsedtime, test.status))
result = ExecutionResult('output.xml')
result.suite.visit(PrintTestInfo())
For more details about the APIs used above see http://robot-framework.readthedocs.io/.

How to get sys.exc_traceback form IPython shell.run_code?

My app interfaces with the IPython Qt shell with code something like this:
from IPython.core.interactiveshell import ExecutionResult
shell = self.kernelApp.shell # ZMQInteractiveShell
code = compile(script, file_name, 'exec')
result = ExecutionResult()
shell.run_code(code, result=result)
if result:
self.show_result(result)
The problem is: how can show_result show the traceback resulting from exceptions in code?
Neither the error_before_exec nor the error_in_exec ivars of ExecutionResult seem to give references to the traceback. Similarly, neither sys nor shell.user_ns.namespace.get('sys') have sys.exc_traceback attributes.
Any ideas? Thanks!
Edward
IPython/core/interactiveshell.py contains InteractiveShell._showtraceback:
def _showtraceback(self, etype, evalue, stb):
"""Actually show a traceback. Subclasses may override..."""
print(self.InteractiveTB.stb2text(stb), file=io.stdout)
The solution is to monkey-patch IS._showtraceback so that it writes to sys.stdout (the Qt console):
from __future__ import print_function
...
shell = self.kernelApp.shell # ZMQInteractiveShell
code = compile(script, file_name, 'exec')
def show_traceback(etype, evalue, stb, shell=shell):
print(shell.InteractiveTB.stb2text(stb), file=sys.stderr)
sys.stderr.flush() # <==== Oh, so important
old_show = getattr(shell, '_showtraceback', None)
shell._showtraceback = show_traceback
shell.run_code(code)
if old_show: shell._showtraceback = old_show
Note: there is no need to pass an ExecutionResult object to shell.run_code().
EKR

Create a portal_user_catalog and have it used (Plone)

I'm creating a fork of my Plone site (which has not been forked for a long time). This site has a special catalog object for user profiles (a special Archetypes-based object type) which is called portal_user_catalog:
$ bin/instance debug
>>> portal = app.Plone
>>> print [d for d in portal.objectMap() if d['meta_type'] == 'Plone Catalog Tool']
[{'meta_type': 'Plone Catalog Tool', 'id': 'portal_catalog'},
{'meta_type': 'Plone Catalog Tool', 'id': 'portal_user_catalog'}]
This looks reasonable because the user profiles don't have most of the indexes of the "normal" objects, but have a small set of own indexes.
Since I found no way how to create this object from scratch, I exported it from the old site (as portal_user_catalog.zexp) and imported it in the new site. This seemed to work, but I can't add objects to the imported catalog, not even by explicitly calling the catalog_object method. Instead, the user profiles are added to the standard portal_catalog.
Now I found a module in my product which seems to serve the purpose (Products/myproduct/exportimport/catalog.py):
"""Catalog tool setup handlers.
$Id: catalog.py 77004 2007-06-24 08:57:54Z yuppie $
"""
from Products.GenericSetup.utils import exportObjects
from Products.GenericSetup.utils import importObjects
from Products.CMFCore.utils import getToolByName
from zope.component import queryMultiAdapter
from Products.GenericSetup.interfaces import IBody
def importCatalogTool(context):
"""Import catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog')
parent_path=''
if obj and not obj():
importer = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
__traceback_info__ = path
print [importer]
if importer:
print importer.name
if importer.name:
path = '%s%s' % (parent_path, 'usercatalog')
print path
filename = '%s%s' % (path, importer.suffix)
print filename
body = context.readDataFile(filename)
if body is not None:
importer.filename = filename # for error reporting
importer.body = body
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
importObjects(sub, path+'/', context)
def exportCatalogTool(context):
"""Export catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog', None)
if tool is None:
logger = context.getLogger('catalog')
logger.info('Nothing to export.')
return
parent_path=''
exporter = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
if exporter:
if exporter.name:
path = '%s%s' % (parent_path, 'usercatalog')
filename = '%s%s' % (path, exporter.suffix)
body = exporter.body
if body is not None:
context.writeDataFile(filename, body, exporter.mime_type)
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
exportObjects(sub, path+'/', context)
I tried to use it, but I have no idea how it is supposed to be done;
I can't call it TTW (should I try to publish the methods?!).
I tried it in a debug session:
$ bin/instance debug
>>> portal = app.Plone
>>> from Products.myproduct.exportimport.catalog import exportCatalogTool
>>> exportCatalogTool(portal)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../Products/myproduct/exportimport/catalog.py", line 58, in exportCatalogTool
site = context.getSite()
AttributeError: getSite
So, if this is the way to go, it looks like I need a "real" context.
Update: To get this context, I tried an External Method:
# -*- coding: utf-8 -*-
from Products.myproduct.exportimport.catalog import exportCatalogTool
from pdb import set_trace
def p(dt, dd):
print '%-16s%s' % (dt+':', dd)
def main(self):
"""
Export the portal_user_catalog
"""
g = globals()
print '#' * 79
for a in ('__package__', '__module__'):
if a in g:
p(a, g[a])
p('self', self)
set_trace()
exportCatalogTool(self)
However, wenn I called it, I got the same <PloneSite at /Plone> object as the argument to the main function, which didn't have the getSite attribute. Perhaps my site doesn't call such External Methods correctly?
Or would I need to mention this module somehow in my configure.zcml, but how? I searched my directory tree (especially below Products/myproduct/profiles) for exportimport, the module name, and several other strings, but I couldn't find anything; perhaps there has been an integration once but was broken ...
So how do I make this portal_user_catalog work?
Thank you!
Update: Another debug session suggests the source of the problem to be some transaction matter:
>>> portal = app.Plone
>>> puc = portal.portal_user_catalog
>>> puc._catalog()
[]
>>> profiles_folder = portal.some_folder_with_profiles
>>> for o in profiles_folder.objectValues():
... puc.catalog_object(o)
...
>>> puc._catalog()
[<Products.ZCatalog.Catalog.mybrains object at 0x69ff8d8>, ...]
This population of the portal_user_catalog doesn't persist; after termination of the debug session and starting fg, the brains are gone.
It looks like the problem was indeed related with transactions.
I had
import transaction
...
class Browser(BrowserView):
...
def processNewUser(self):
....
transaction.commit()
before, but apparently this was not good enough (and/or perhaps not done correctly).
Now I start the transaction explicitly with transaction.begin(), save intermediate results with transaction.savepoint(), abort the transaction explicitly with transaction.abort() in case of errors (try / except), and have exactly one transaction.commit() at the end, in the case of success. Everything seems to work.
Of course, Plone still doesn't take this non-standard catalog into account; when I "clear and rebuild" it, it is empty afterwards. But for my application it works well enough.

Resources