Is there a way of using Gremlin within an asyncio Python application? - gremlin

The TinkerPop documentation describes GLV for Python. However, the examples presented in there are built around synchronous code. There is the aiogremlin library that was desingned to enable use of Gremlin in Python's asyncio code. Unfortunately, the project seem to be discontinued.
Does the official GLV support the asyncio or is there a way to use Gremlin in asynchronous Python applications?

I noticed that this question has sat unanswered so here goes...
The Gremlin Python client today uses Tornado. That may change in the future to just use aiohttp. Getting the event loops to play nicely together can be tricky. The easiest way I have found is to use the nest-asyncio library. With that installed you can write something like this. I don't show the g being created but this code assumes the connection to the server has been made and that g is the Graph Traversal Source.
import nest_asyncio
nest_asyncio.apply()
async def count_airports():
c = g.V().hasLabel('airport').count().next()
print(c)
async def run_tests(g):
await count_airports()
return
asyncio.run(run_tests(g))
As you mentioned the other option is to use something like aiogremlin.

Any latest version of the gremlin library supports async code but implementation doesn't seem to be straight, as for making the queries gremlinAsync uses future (Future Documentation)
To Make it easy to read and implement you can convert the Future object return by gremlin API to coroutine which support's await syntax
async def get_result(query, client):
result = await asyncio.wrap_future(client.submitAsync(query))
return result
client = gremlin_connection(environ.get("url"),environ.get("username"),environ.get("password"))
data = await get_result(query1, client)

Related

Background task in fastapi making blocking requests

I have a resource intensive async method that i want to run as a background task. Example code for it looks like this:
#staticmethod
async def trigger_task(id: str, run_e2e: bool = False):
try:
add_status_for_task(id)
result1, result2 = await task(id)
update_status_for_task(id, result1, result2)
except Exception:
update_status_for_task(id, 'FAIL')
#router.post("/task")
async def trigger_task(background_tasks: BackgroundTasks):
background_tasks.add_task(EventsTrigger.trigger_task)
return {'msg': 'Task submitted!'}
When i trigger this endpoint, I expect an instant output: {'msg': 'Task submitted!'}. But instead the api output is awaited till the task completes. I am following this documentation from fastapi.
fastapi: v0.70.0
python: v3.8.10
I believe the issue is similar to what is described here.
Request help in making this a non-blocking call.
What I have learned from the github issues,
You can't use async def for task functions (Which will run in background.)
As in background process you can't access the coroutine, So, your async/await will not work.
You can still try without async/await. If that also doesn't work then you should go for alternative.
Alternative Background Solution
Celery is production ready task scheduler. So, you can easily configure and run the background task using your_task_function.delay(*args, **kwargs)
Note that, Celery also doesn't support async in background task. So, whatever you need to write is sync code to run in background.
Good Luck :)
Unfortunately you seem to have oversimplified your example so it is a little hard to tell what is going wrong.
But the important question is: are add_status_for_task() or update_status_for_task() blocking? Because if they are (and it seems like that is the case), then obviously you're going to have issues. When you run code with async/await all the code inside of it needs to be async as well.
This would make your code look more like:
async def trigger_task(id: str, run_e2e: bool = False):
try:
await add_status_for_task(id)
result1, result2 = await task(id)
await update_status_for_task(id, result1, result2)
except Exception:
await update_status_for_task(id, 'FAIL')
#router.post("/task/{task_id}")
async def trigger_task(task_id: str, background_tasks: BackgroundTasks):
background_tasks.add_task(EventsTrigger.trigger_task, task_id)
return {'msg': 'Task submitted!'}
How are you running your app?
According to the uvicorn docs its running with 1 worker by default, which means only one process will be issued simultaneously.
Try configuring your uvicorn to run with more workers.
https://www.uvicorn.org/deployment/
$ uvicorn example:app --port 5000 --workers THE_AMOUNT_OF_WORKERS
or
uvicorn.run("example:app", host="127.0.0.1", port=5000, workers=THE_AMOUNT_OF_WORKERS)

How to unit test a gRPC server's asynchronous methods?

Following the suggestions in this question I was able to unit test the synchronous methods of my gRPC service (which is built with the grpc.aio API) using the grpc_testing library. However, when I follow this example on an asynchronous method of my gRPC service I get:
ERROR grpc_testing._server._rpc:_rpc.py:92 Exception calling application!
Traceback (most recent call last):
File "/home/jp/venvs/grpc/lib/python3.8/site-packages/grpc_testing/_server/_service.py", line 63, in _stream_response
response = copy.deepcopy(next(response_iterator))
TypeError: 'async_generator' object is not an iterator
Looking through the grpc_testing codebase and searching more broadly, I cannot find examples of unit testing async gRPC methods. The closest thing I could find is an unmerged branch of pytest-grpc, but the example service does not have any async methods.
Can anyone share an example of unit testing an asynchronous gRPC method in python?
I followed #Lidi's recommendations (thank you) and implemented the tests using pytest-asyncio. For what it's worth, here is a basic example testing an async stream stream method:
import mock
import grpc
import pytest
from tokenize_pb2 import MyMessage
from my_implementation import MyService
async def my_message_generator(messages):
for message in messages:
yield message
#pytest.mark.asyncio
async def test_my_async_stream_stream_method():
service = MyService()
my_messages = my_message_generator([MyMessage(), MyMessage()])
mock_context = mock.create_autospec(spec=grpc.aio.ServicerContext)
response = service.MyStreamStreamMethod(my_messages, mock_context)
results = [x async for x in response]
assert results == expected_results
gRPC Testing is a nice project. But we need engineering resources to make it support asyncio, and mostly importantly, adopt the existing APIs to asyncio's philosophy.
For testing gRPC asyncio, I would recommend just use pytest which has pytest-asyncio to smoothly test out asyncio features. Here is an example: code.
The solution given in Joshua's answer also works with python unittest framework utilizing the unittest.IsolatedAsyncioTestCase class. For example:
import mock
import grpc
import unittest
from example_pb2 import MyRequestMessage
from my_implementation import MyService
class Tests(unittest.IsolatedAsyncioTestCase):
def setUp(self):
self.service = MyService()
self.mock_context = mock.create_autospec(spec=grpc.aio.ServicerContext)
async def test_subscribe_unary_stream(self):
response = self.service.MyUnaryStreamMethod(MyRequestMessage(), self.mock_context)
async for result in response:
self.assertIsNotNone(result)
While this allows testing of the actual business logic in the RPC functions, it falls short of grpcio-testing in terms of service features like response codes, timeouts etc.

Gremlin Javascript Traversal Never Resolves

I'm trying to use the gremlin npm module and connect to a Neptune database. During testing, I tried having gremlin connect to an inactive endpoint and invalid url to make the system more resilient. I expected some sort of error to be thrown. However, with invalid/inactive urls, graph traversals just don't resolve with no messaging.
const traversal = gremlin.process.AnonymousTraversalSource.traversal;
const DriverRemoteConnection = gremlin.driver.DriverRemoteConnection;
const dc = new DriverRemoteConnection('wss://localhost:80');
const g = traversal().withRemote(dc);
const data = await g.V().limit(1).toList();
console.log(data);
I'd expect g.V().limit(1).toList() to throw an error when using an invalid remote connection. Again, the promise never resolves and the console.log(data) on the next line is never run.
Any help with this would be much appreciated! I need some sort of system to detect whether the database connection is valid and if not logs errors.
There is an issue in the current JavaScript GLV, we've filed TINKERPOP-2381 that summarizes the problems.
This should not affect the GLV when the address points to a valid server.
Thanks for providing so much detail in the question.
I'm adding an answer so that this thread is preserved.
After investigation if the host does not exist the 3.4.1 JavaScript Gremlin client throws an exception and exits the process. With the 3.4.4 client (or later) the error seems to get silently swallowed.

Invoke R script on AWS Lambda from NodeJS

As a result of several hours of unfruitful searches, I am posting this question.
I suppose it is a duplicate of this one:
How do you run RServe on AWS Lambda with NodeJS?
But since it seems that the author of that question did not accomplish his/her goal successfully, I am going to try again.
What I currently have:
A NodeJS server, that invokes an R script through Rserve and passes data to evaluate through node-rio.
Function responsible for that looks like this:
const R = (arg1, arg2) => {
return new Promise((resolve, reject)=>{
const args = {
arg1, arg2
};
//send data to Rserve to evaluate
rio.$e({
filename: path.resolve('./r-scripts/street.R'),
entrypoint: 'run',
data: args,
})
.then((data)=>{
resolve(JSON.parse(data));
})
.catch((err)=>{
reject(`err: ${err}`);
});
});
};
And this works just fine. I am sending data over to my R instance and getting results back into my server.
What I am ultimately trying to achieve:
Every request seems to spawn its own R workspace, which has a considerable memory overhead. Thus, serving even hundreds of concurrent requests using this approach is impossible, as my AWS EC2 runs out of memory pretty quickly.
So, I am looking for a way to deploy all the memory intensive parts to AWS Lambda and thus get rid of the memory overhead.
I guess, the specific question in my case is if there is a way to package R and Rserve together with NodeJS lambda function. Or if there is a way for me to get convinced that this approach won't work using lambda and I should try to look for an alternative.
Note: I cannot use anything other than R, since these are external R scripts, that I have to invoke from my server.
Thanks in advance!

Telegram, tracking message edit/delete and editing my own messages (Client, not Bot API)

So I'm trying to implement the logging of telegram chats into my ELK storage in a proper way, and the existing solution with tgcli is too old (I also have a PoC which logs message edits from Android client via Xposed, but its implemented on top of UI level and is ineffective)
I need to receive edits/deletion of messages, and do it with client Telegram API.
Spent a day on researching it:
support for editing messages appeared in May 15, 2016 (telegram blog)
telegram-cli's tgl library is 2 years old and most likely has no support for that layer
I looked into telegramdesktop source as it was very promising, unfortunately their git history has no scheme changes poiting to edit support.
And the official layer version list is truncated. Security via obscurity eh.
from some tests done with golang library used in shelomentsevd/telegramgo, edits in supergroup are handled by TL_updateChannelTooLong message
Now I don't want to lose more time picking the libraries/sources. So, I'm asking about the experience with either of the following libraries, I'm looking for exactly one library which will allow to implement the required features fast - for someone who doesn't want to dive deep into MTProto's specifics.
sochix/TLSharp is missing explicit examples about getting edits. Probably would be hard
danog/MadelineProto seems like a good place to start
there are also tdlib, libqtelegram, TelegramAPI
It's much easier to do it in telethon.
Here is a sample code I've put together gathering snippets directly from the docs.
from telethon import TelegramClient, events
API_ID = ...
API_HASH = " ... "
client = TelegramClient('session', api_id=API_ID, api_hash=API_HASH)
#client.on(events.MessageDeleted)
async def handler(event):
# Log all deleted message IDs
for msg_id in event.deleted_ids:
print('Message', msg_id, 'was deleted in', event.chat_id)
#client.on(events.MessageEdited)
async def handler(event):
# Log the date of new edits
print('Message', event.id, 'changed at', event.date)
with client:
client.run_until_disconnected()
Docs for: MessageEdited, MessageDeleted)

Resources