In Gremlin, how can I use the SubgraphStrategy when submitting a script - gremlin

I am issuing Gremlin queries to a AWS Neptune database as follows:
client = Client(f"wss://{self.host}:{self.port}/gremlin", "g")
client.submit("g.V()...")
This works fine and I get the expected results.
I would like to include a SubgraphStrategy when issuing these queries. (I'm using a SubgraphStrategy to ignore nodes marked as deleted.) I can do this when I build the query dynamically, like this:
g = traversal().withRemote(remoteConn).withStrategies(
SubgraphStrategy(
vertices=__.hasNot("is_deleted"), edges=__.hasNot("is_deleted")
)
)
g.V()...
I cant figure out how to specify the subgraph strategy when issuing the query as a string. For example, I've tried this:
client = Client(f"wss://{self.host}:{self.port}/gremlin", "g")
client.submit('g.withStrategies(SubgraphStrategy.build().vertexProperties(hasNot("is_deleted")).V()...')
Does anybody know how to do this?

Neptune doesn't allow for that Java syntax that uses .build() when creating strategies, but I think it will support the Groovy syntax that was introduced at TinkerPop 3.4.9:
g.withStrategies(ReadOnlyStrategy,
new SubgraphStrategy(vertexProperties: __.hasNot('endTime')))

Related

How can I log sql execution results in airflow?

I use airflow python operators to execute sql queries against a redshift/postgres database. In order to debug, I'd like the DAG to return the results of the sql execution, similar to what you would see if executing locally in a console:
I'm using psycop2 to create a connection/cursor and execute the sql. Having this logged would be extremely helpful to confirm the parsed parameterized sql, and confirm that data was actually inserted (I have painfully experiences issues where differences in environments caused unexpected behavior)
I do not have deep knowledge of airflow or the low level workings of the python DBAPI, but the pscyopg2 documentation does seem to refer to some methods and connection configurations that may allow this.
I find it very perplexing that this is difficult to do, as I'd imagine it would be a primary use case of running ETLs on this platform. I've heard suggestions to simply create additional tasks that query the table before and after, but this seems clunky and ineffective.
Could anyone please explain how this may be possible, and if not, explain why? Alternate methods of achieving similar results welcome. Thanks!
So far I have tried the connection.status_message() method, but it only seems to return the first line of the sql and not the results. I have also attempted to create a logging cursor, which produces the sql, but not the console results
import logging
import psycopg2 as pg
from psycopg2.extras import LoggingConnection
conn = pg.connect(
connection_factory=LoggingConnection,
...
)
conn.autocommit = True
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.addHandler(logging.StreamHandler(sys.stdout))
conn.initialize(logger)
cur = conn.cursor()
sql = """
INSERT INTO mytable (
SELECT *
FROM other_table
);
"""
cur.execute(sql)
I'd like the logger to return something like:
sql> INSERT INTO mytable (
SELECT ...
[2019-07-25 23:00:54] 912 rows affected in 4 s 442 ms
Let's assume you are writing an operator that uses postgres hook to do something in sql.
Anything printed inside an operator is logged.
So, if you want to log the statement, just print the statement in your operator.
print(sql)
If you want to log the result, fetch the result and print the result.
E.g.
result = cur.fetchall()
for row in result:
print(row)
Alternatively you can use self.log.info in place of print, where self refers to the operator instance.
Ok, so after some trial and error I've found a method that works for my setup and objective. To recap, my goal is to run ETL's via python scripts, orchestrated in Airflow. Referring to the documentation for statusmessage:
Read-only attribute containing the message returned by the last command:
The key is to manage logging in context with transactions executed on the server. In order for me to do this, I had to specifically set con.autocommit = False, and wrap SQL blocks with BEGIN TRANSACTION; and END TRANSACTION;. If you insert cur.statusmessage directly following a statement that deletes or inserts, you will get a response such as 'INSERT 0 92380'.
This still isn't as verbose as I would prefer, but it is a much better than nothing, and is very useful for troubleshooting ETL issues within Airflow logs.
Side notes:
- When autocommit is set to False, you must explicitly commit transactions.
- It may not be necessary to state transaction begin/end in your SQL. It may depend on your DB version.
con = psy.connect(...)
con.autocommit = False
cur = con.cursor()
try:
cur.execute([some_sql])
logging.info(f"Cursor statusmessage: {cur.statusmessage})
except:
con.rollback()
finally:
con.close()
There is some buried functionality within psycopg2 that I'm sure can be utilized, but the documentation is pretty thin and there are no clear examples. If anyone has suggestions on how to utilize things such as logobjects, or returning join PID to somehow retrieve additional information.

Wanna get label name but return id,if need change gremlin sentence?

When using gremlin console to connect gremlin server
run gremlin> graph=ConfiguredGraphFactory.open('test');
mgmt=graph.openManagement();mgmt.getVertexLabels()
will return:
==>person
==>animal
but when using the same gremlin sentence in java language to query vertex label
it return:
{result{object=v[525] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex},
result{object=v[2061] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}}
i want to get the label name in java,how to do
The getVertexLabels() method returns a VertexLabelobject. That object implements TinkerPop's Vertex interface. When you execute that code in Java (presumably via a remote script in JanusGraph Server - i.e. Gremlin Server) the VertexLabel is coerced to a DetachedVertex - that's just how Gremlin Server treats all Vertex instances. I would guess that if you wanted the actual "label" you would simply issue your script to get the label itself:
mgmt.getVertexLabels().collect{it.name()}
That will coerce the vertex labels to strings and then you'll get what you want.

Have Gremlin-console show all the methods available for a specific object?

In gremlin-console, is there a way to show all the methods available for a specific object?
For example, In gremlin-console if I type g.V().hasLabel("person") and I want to see what methods I can chain/call for the object returned by g.V().hasLabel("person"). How do you do that?
The answer is to use the <Tab> key.
gremlin> "test".c
capitalize() center( charAt( chars() codePointAt( codePointBefore( codePointCount( codePoints() collectReplacements( compareTo(
compareToIgnoreCase( concat( contains( contentEquals( count(
However, I'm finding that it is not working for something like g.V().o which I'd hoped would have shown out(). Apparently, the groovy shell (which is what the Gremlin Console is based on) doesn't seem to want to do the auto-complete on a fluent API. It seems to only work on the first object on which you are calling the method:
gremlin> g.
E( V( addV( addV() close() inject( tx() withBindings( withBulk(
withComputer( withComputer() withPath() withRemote( withSack( withSideEffect( withStrategies( withoutStrategies( anonymousTraversalClass
bytecode graph strategies
gremlin> x = g.V();[]
gremlin> x.o
option( optional( or( order( order() otherV() out( outE( outV()
gremlin> x.o
That stinks...that's not really a TinkerPop issue - we rely on the groovysh for that functionality. Not much we can do there I don't think....
Of course, you are using DSE Graph which means you have access to DataStax Studio which not only has the auto-complete that you're looking for but also schema support (and more!). I'd suggest that you switch to that.

Paging or Using Skip in Azure Cosmos

Has anyone been able to determine the equivalent of Gremlin Skip in Azure Cosmos? It's not listed on Microsoft's documentation and I was thinking it's just outdated. I did try doing a query such as g.V().hasLabel('the_label').has('the_property', eq('the_value')).skip(some_number) and it errors out as such Unable to find any method 'skip'.
From your link in the Apache TinkerPop documentation:
The skip()-step is analogous to range()-step save that the higher end range is set to -1.
with these examples:
gremlin> g.V().values('age').order().skip(2)
==>32
==>35
gremlin> g.V().values('age').order().range(2, -1)
==>32
==>35

Linq2Dynamodb run expression string on Table

I have an ASP.NET Web API site which records all objects in AWS DynamoDb. I took a quick look at linq2Dynamodb. It seems that the common way to use it is like:
var moviesTable = ctx.GetTable<Movie>();
var inceptionMovie = moviesTable
.Where(m => m.Genre == "Thriller" && m.Title == "Inception")
.Single();
But I want some API like:
moviesTable.Execute(string querystring);
The reason is that from the Web API, I usually get some query like:
http://host/service.svc/Orders?$filter=ShipCountry eq 'France'
I'd like to pass the filter string "ShipCountry eq 'France'" here. Do anyone know if there is a way for me to do this? Thanks.
With Linq2DynamoDb you can expose your DynamoDb tables as OData resources. Please, see the doc here.
As soon as you have an OData resource, you can query it with OData queries of any valid kind.
And you can use any OData client library for that. E.g. with System.Data.Services.Client you would say something like:
var entities = myContext.Execute<MyEntityType>(new Uri("<here goes URI with valid OData query string>"), "GET", true);
or just construct a LINQ query on the client side.
The drawback is that you will need to host your OData service somewhere (DynamoDb itself doesn't support OData, so that's the reason).
The advantages are:
you will be able to cache your data in ElastiCache.
you will be able to implement custom authentication inside your service.
Sorry for a bit late answer.

Resources