Preserve detailed Gremlin error message when running Gremlin query with eval() - gremlin

in my script I do the following:
eval("query")
and get:
unexpected EOF while parsing (<string>, line 1)
in Jupyter i do:
query
and get:
GremlinServerError: 499: {"requestId":"2602387d-f9a1-4478-a90d-3612d1943b71","code":"ConstraintViolationException","detailedMessage":"Vertex with id already exists: ba48297665fc3da684627c0fcb3bb1fd6738e7ad8eb8768528123904b240aaa7b21f66624de1fea84c87e5e2707995fe52435f1fb5fc4c2f9eaf85a605c6877a"}
Is there a way to preserve the detailed error message whilst doing Gremlin queries with the eval("querystring") approach?
I need to concatenate many strings into one query, that is why.
Also, the detailed error message allows me to catch the errors like this ConstraintViolationException
Details:
I am interacting with Neptune with Python.
I have this at the beginning of my script:
from gremlin_python import statics
statics.load_statics(globals())
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
which is from the official documentation on how to connect with Python.

There is insufficient info in the question to provide a good answer for this one. There should be no difference in the error message you see between a client program and Jupyter notebook, as long as you're using the exact same code. From your messages, I suspect that there is a difference in either the serializer or the protocol (websocket vs HTTP) between your experiments. The response formats (and possibly the error formats too) different between serializers and protocol, so thats probably where you should start looking.

Related

Multithreaded Flask application causes stack error in rpy2 R process

Essentially the same error as here, but those solutions do not provide enough information to replicate a working example: Rpy2 in a Flask App: Fatal error: unable to initialize the JIT
Within my Flask app, using the rpy2.rinterface module, whenever I intialize R I receive the same stack usage error:
import rpy2.rinterface as rinterface
from rpy2.rinterface_lib import openrlib
with openrlib.rlock:
rinterface.initr()
Error: C stack usage 664510795892 is too close to the limit Fatal error: unable to initialize the JIT
rinterface is the low-level R hook in rpy2, but the higher-level robjects module gives the same error. I've tried wrapping the context lock and R initialization in a Process from the multiprocessing module, but have the same issue. Docs say that a multithreaded environment will cause problems for R: https://rpy2.github.io/doc/v3.3.x/html/rinterface.html#multithreading
But the context manager doesn't seem to be preventing the issue with interfacing with R
rlock is an instance of a Python's threading.Rlock. It should take care of multithreading issues.
However, multitprocessing can cause a similar issue if the embedded R is shared across child processes. The code for this demo script showing parallel processing with R and Python processes illustrate this: https://github.com/rpy2/rpy2/blob/master/doc/_static/demos/multiproc_lab.py
I think that the way around this is to configure Flask, or most likely your wsgi layer, to create isolated child processes, or have all of your Flask processes delegate R calculations to a secondary process (created on the fly, or in a pool of processes waiting for tasks to perform).
As other answers for similar questions have implied, Flask users will need to initialize and run rpy2 outside of the WSGI context to prevent the embedded R process from crashing. I accomplished this with Celery, where workers provide an environment separate from Flask to handle requests made in R.
I used the low-level rinterface library as mentioned in the question, and wrote Celery tasks using classes
import rpy2.rinterface as rinterface
from celery import Celery
celery = Celery('tasks', backend='redis://', broker='redis://')
class Rpy2Task(Task):
def __init__(self):
self.name = "rpy2"
def run(self, args):
rinterface.initr()
r_func = rinterface.baseenv['source']('your_R_script.R')
r_func[0](args)
pass
Rpy2Task = celery.register_task(Rpy2Task())
async_result = Rpy2Task.delay(args)
Calling rinterface.initr() anywhere but in the body of the task run by the worker results in the aforementioned crash. Celery is usually packaged with redis, and I found this a useful way to support exchanging information between R and Python, but of course Rpy2 also provides flexible ways of doing this.

Redirect robot.api.logger calls to file as messages are invisible due to XML-RPC

In one of my projects we are using Robot Framework with custom keyword libraries in a complex test environment using assorted ECUs and PCs. One keyword library must be controlled by a python remote server via XML-RPC, as it has to be on a different PC.
Now, all important messages from robot.api.logger calls, like logger.debug() oder logger.console() are swallowed due to the XML-RPC. This is a known issue, which also clearly stated in the docs.
For most parts these APIs work exactly like when using with Robot
Framework normally. There main limitation is that logging using
robot.api.logger or Python's logging module is currently not
supported.
It is possible to write a thin wrapper or decorator for robot.api.logger, so that all debug messages are redirected to a simple txt file, like:
DEBUG HH:MM:SS > Message
WARN HH:MM:SS > Message
This would be really helpful in case of problems.
Of course, it would be easy to use the built python logging module, but
I'm looking to find a solution, that changes the least amout of already existing code and I also want that the results are written to the text file additional to normal robot.api.logging to the Robot reports, as we are using the same library in a local and a remote way.
So basically I need to find a way to extend/redirected robot.api.logger calls, by first using the normal python logging module and then using the normal robot.api.logger.
You can patch the write function of the robot.api.logger so it will write to a log file as well. This patching could be triggered by a library argument.
This would require you to only modify the constructor of your library.
RemoteLib.py
import sys
from robot.api import logger
from robot.output import librarylogger
from robotremoteserver import RobotRemoteServer
def write(msg, level='INFO', html=False):
librarylogger.write(msg, level, html)
with open('log.txt', 'a') as f:
print(f'{level}\tHH:MM:SS > {msg}', file=f)
class RemoteLib():
ROBOT_LIBRARY_SCOPE = 'GLOBAL'
ROBOT_LIBRARY_VERSION = 0.1
def __init__(self, to_file=True):
if to_file:
logger.write = write
def log_something(self, msg):
logger.info(msg)
logger.debug(msg)
logger.warn(msg)
logger.trace(msg)
logger.error(msg)
logger.console(msg)
if __name__ == '__main__':
RobotRemoteServer(RemoteLib(), *sys.argv[1:])
local_run.robot
*** Settings ***
Library RemoteLib to_file=False
*** Test Cases ***
Test
Log Something something
remote_run.robot
*** Settings ***
Library Remote http://127.0.0.1:8270
*** Test Cases ***
Test
Log Something something
You could use the Python logging module as well in the write patch just as it is used in Robot Framework itself, Redirects the robot.api.logger to python logging if robot is not running.

How come Stackdriver messes up my error grouping

In my experience the Stackdriver Error Reporting service groups unrelated errors together. This is a big problem for me on several levels:
The titles often do not correlate to the reported errors in "recent samples". So I have to look at the samples for each error to see what errors really happend because the title really can't be trusted.
I might set an error to "muted" and as a result other errors that are grouped under the same title don't get reported anymore. It might take me months to discover that certain errors have been happening that I wasn't aware of.
In general I have no overview about what errors are happening in what rate.
This all seems to violate basic functionality for an error reporting system, so I think I must be missing something.
The code is running on Firebase Functions, so the Firebase flavour of Google Cloud Functions and is written in Typescript (compiled to Javascript with Firebase predeploy script).
I log errors using console.error with arguments formatted as Error instances like console.error(new Error('some error message')). AFAIK that is the correct way for code running on Node.js.
Is there anything special I can do to make Stackdriver understand my code better?
I have this in a root of my functions deployment:
import * as sourceMaps from "source-map-support";
sourceMaps.install();
Below is a screenshot of one error category. You see that the error title is "The service is currently unavailable", yet the samples contain errors for "Request contains an invalid argument" and "This request was already locked..."
The error about service and invalid argument could be related to the FCM service, so there is some correlation although I think these are very different errors.
The error about request lock is really something completely unrelated. The word "request" in this context means something really different but the word is the only relationship I can see.
The error reporting supports Javascript, but not Typescript as mentioned in the documentation for the product, nevertheless, you should take a look at your logs and see if they are properly formatted for them to be ingested in the error reporting.
Also, keep in mind that the errors are grouped based on the guidelines over at this document, so maybe you won't get the grouping you get due to them.
Hope you find this useful.

How to prevent "Execution failed:[Errno 32] Broken pipe" in Airflow

I just started using Airflow to coordinate our ETL pipeline.
I encountered the pipe error when I run a dag.
I've seen a general stackoverflow discussion here.
My case is more on the Airflow side. According to the discussion in that post, the possible root cause is:
The broken pipe error usually occurs if your request is blocked or
takes too long and after request-side timeout, it'll close the
connection and then, when the respond-side (server) tries to write to
the socket, it will throw a pipe broken error.
This might be the real cause in my case, I have a pythonoperator that will start another job outside of Airflow, and that job could be very lengthy (i.e. 10+ hours), I wonder if what is the mechanism in place in Airflow that I can leverage to prevent this error.
Can anyone help?
UPDATE1 20190303-1:
Thanks to #y2k-shubham for the SSHOperator, I am able to use it to set up a SSH connection successfully and am able to run some simple commands on the remote site (indeed the default ssh connection has to be set to localhost because the job is on the localhost) and am able to see the correct result of hostname, pwd.
However, when I attempted to run the actual job, I received same error, again, the error is from the jpipeline ob instead of the Airflow dag/task.
UPDATE2: 20190303-2
I had a successful run (airflow test) with no error, and then followed another failed run (scheduler) with same error from pipeline.
While I'd suggest you keep looking for a more graceful way of trying to achieve what you want, I'm putting up example usage as requested
First you've got to create an SSHHook. This can be done in two ways
The conventional way where you supply all requisite settings like host, user, password (if needed) etc from the client code where you are instantiating the hook. Im hereby citing an example from test_ssh_hook.py, but you must thoroughly go through SSHHook as well as its tests to understand all possible usages
ssh_hook = SSHHook(remote_host="remote_host",
port="port",
username="username",
timeout=10,
key_file="fake.file")
The Airflow way where you put all connection details inside a Connection object that can be managed from UI and only pass it's conn_id to instantiate your hook
ssh_hook = SSHHook(ssh_conn_id="my_ssh_conn_id")
Of course, if your'e relying on SSHOperator, then you can directly pass the ssh_conn_id to operator.
ssh_operator = SSHOperator(ssh_conn_id="my_ssh_conn_id")
Now if your'e planning to have a dedicated task for running a command over SSH, you can use SSHOperator. Again I'm citing an example from test_ssh_operator.py, but go through the sources for a better picture.
task = SSHOperator(task_id="test",
command="echo -n airflow",
dag=self.dag,
timeout=10,
ssh_conn_id="ssh_default")
But then you might want to run a command over SSH as a part of your bigger task. In that case, you don't want an SSHOperator, you can still use just the SSHHook. The get_conn() method of SSHHook provides you an instance of paramiko SSHClient. With this you can run a command using exec_command() call
my_command = "echo airflow"
stdin, stdout, stderr = ssh_client.exec_command(
command=my_command,
get_pty=my_command.startswith("sudo"),
timeout=10)
If you look at SSHOperator's execute() method, it is a rather complicated (but robust) piece of code trying to achieve a very simple thing. For my own usage, I had created some snippets that you might want to look at
For using SSHHook independently of SSHOperator, have a look at ssh_utils.py
For an operator that runs multiple commands over SSH (you can achieve the same thing by using bash's && operator), see MultiCmdSSHOperator

NiFi: GetHTTP Processor Regular Expression Invalid Error

I have a simple NiFi flow, with GetHTTP and PutFile processors. I am trying to connect the GetHTTP processor to the DC Metro data API, with this link:
https://api.wmata.com/TrainPositions/TrainPositions?contentType={contentType}
(The website can be found here)
I can get this error:
I can't debug this error in the log, since it has not run yet. I also cannot find any other examples of this error. I put the link above in the URL part of the configuration, and gave it a sample Filename of wmata_data.json. Thanks.
I think you are having a Newline in the URL property value as shown below
To resolve the issue Remove the newline in URL property and try again.

Resources