Encounter warning when add edge in OrientDB - orientdb-2.1

Here is my Code:
OrientVertex luca = graph.addVertex(null);
luca.setProperty("name", "John" + Integer.toString(i));
OrientVertex marko = graph.addVertex(null);
marko.setProperty("name", "Van Ness Ave." + Integer.toString(i + 1));
OrientEdge lucaKnowsMarko = graph.addEdge(null, luca, marko, "knows");
graph.commit();
Here is a snapshot of the same.
And then, I encountered this warning:
WARNING: The command 'create edge type 'knows' as subclass of 'E''
must be executed outside an active transaction: the transaction will
be committed and reopen right after it. To avoid this behavior execute
it outside a transaction (db=test)
Googling this problem, it seems that this question is relevant to the not-transactional and transactional databases operations.

You are working schema less, so OrientDB creates classes for you the first time you create vertexes/edges. In this case it was for the Edge class 'knows'. You can avoid this by creating classes at the beginning, or however, outside the scope of transaction. Try to execute this before your code, only once:
OrientGraphNoTx graph = new OrientGraphNoTx(url);
graph.createEdgeType("knows");
graph.shutdown();

Related

Getting Error "not spawned anonymously - use the __ class rather than a TraversalSource " while adding or updating the edge(in Neptune using gremlin)

I am using Amazon Neptune engine version 1.1.1.0 and I am trying to execute below query
g.E().has(id, '6529056485837422516').fold().coalesce(unfold().property('id','6529056485837422516').property('createdat', 1553676114).property('status', 0).property('edgeproperty', 'connects').property('source', '63').property('destination', '54'),g.V('54').addE('connects').to(__.V('63')).property(id,'6529056485837422516').property('id','6529056485837422516').property('createdat', 1553676114).property('status', 0).property('edgeproperty', 'connects').property('source', '63').property('destination', '54'))
I am executing this query from my gremlin console and after executing this I am getting error like below
{"detailedMessage":"The child traversal of [GraphStep(vertex,[54]), AddEdgeStep({label=[connects], createdat=[1553676114], edgeproperty=[connects], ~to=[[GraphStep(vertex,[63])]], destination=[54], id=[6529056485837422516], id=[6529056485837422516], source=[63], status=[0]})] was not spawned anonymously - use the __ class rather than a TraversalSource to construct the child traversal","code":"InternalFailureException","requestId":"ceee889e-c382-4bde-ad91-86ea1cb010c1"}
But If I am adding edge separately with below query then I am able to add edge
g.V('54').addE('connects').to(__.V('63')).property(id,'6529056485837422516').property('id','6529056485837422516').property('createdat', 1553676114).property('status', 0).property('edgeproperty', 'connects').property('source', '63').property('destination', '54')
Even if I am able to update edge with below query then also I am able to update
g.E().has(id, '6529056485837422516').fold().coalesce(unfold().property('id','6529056485837422516').property('createdat', 1553676114).property('status', 0).property('edgeproperty', 'connects').property('source', '63').property('destination', '54'))
but if I am trying to update or add same time by using fold(). coalesce() then I am getting error as not spawned anonymously - use the __ class rather than a TraversalSource.
Please help me how to solve this error
The resolution is in the error message itself and is described in further detail here, but basically as of TinkerPop 3.5.0 you can no longer use a child traversal spawned from g (i.e. a GraphTraversalSource). It must be spawned anonymously from __ (i.e. double underscore class).
In short, you need to spawn the g.V('54').addE(... traversal as __V('54').addE(....

How can I log sql execution results in airflow?

I use airflow python operators to execute sql queries against a redshift/postgres database. In order to debug, I'd like the DAG to return the results of the sql execution, similar to what you would see if executing locally in a console:
I'm using psycop2 to create a connection/cursor and execute the sql. Having this logged would be extremely helpful to confirm the parsed parameterized sql, and confirm that data was actually inserted (I have painfully experiences issues where differences in environments caused unexpected behavior)
I do not have deep knowledge of airflow or the low level workings of the python DBAPI, but the pscyopg2 documentation does seem to refer to some methods and connection configurations that may allow this.
I find it very perplexing that this is difficult to do, as I'd imagine it would be a primary use case of running ETLs on this platform. I've heard suggestions to simply create additional tasks that query the table before and after, but this seems clunky and ineffective.
Could anyone please explain how this may be possible, and if not, explain why? Alternate methods of achieving similar results welcome. Thanks!
So far I have tried the connection.status_message() method, but it only seems to return the first line of the sql and not the results. I have also attempted to create a logging cursor, which produces the sql, but not the console results
import logging
import psycopg2 as pg
from psycopg2.extras import LoggingConnection
conn = pg.connect(
connection_factory=LoggingConnection,
...
)
conn.autocommit = True
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.addHandler(logging.StreamHandler(sys.stdout))
conn.initialize(logger)
cur = conn.cursor()
sql = """
INSERT INTO mytable (
SELECT *
FROM other_table
);
"""
cur.execute(sql)
I'd like the logger to return something like:
sql> INSERT INTO mytable (
SELECT ...
[2019-07-25 23:00:54] 912 rows affected in 4 s 442 ms
Let's assume you are writing an operator that uses postgres hook to do something in sql.
Anything printed inside an operator is logged.
So, if you want to log the statement, just print the statement in your operator.
print(sql)
If you want to log the result, fetch the result and print the result.
E.g.
result = cur.fetchall()
for row in result:
print(row)
Alternatively you can use self.log.info in place of print, where self refers to the operator instance.
Ok, so after some trial and error I've found a method that works for my setup and objective. To recap, my goal is to run ETL's via python scripts, orchestrated in Airflow. Referring to the documentation for statusmessage:
Read-only attribute containing the message returned by the last command:
The key is to manage logging in context with transactions executed on the server. In order for me to do this, I had to specifically set con.autocommit = False, and wrap SQL blocks with BEGIN TRANSACTION; and END TRANSACTION;. If you insert cur.statusmessage directly following a statement that deletes or inserts, you will get a response such as 'INSERT 0 92380'.
This still isn't as verbose as I would prefer, but it is a much better than nothing, and is very useful for troubleshooting ETL issues within Airflow logs.
Side notes:
- When autocommit is set to False, you must explicitly commit transactions.
- It may not be necessary to state transaction begin/end in your SQL. It may depend on your DB version.
con = psy.connect(...)
con.autocommit = False
cur = con.cursor()
try:
cur.execute([some_sql])
logging.info(f"Cursor statusmessage: {cur.statusmessage})
except:
con.rollback()
finally:
con.close()
There is some buried functionality within psycopg2 that I'm sure can be utilized, but the documentation is pretty thin and there are no clear examples. If anyone has suggestions on how to utilize things such as logobjects, or returning join PID to somehow retrieve additional information.

Why AddEdgeStep doesn't work after DropStep of edges in the same traversal using Gremlin?

I have this code which essentially updates properties, removes all old IsOfType edges and adds new IsOfType edges (if I remove all the method/class abstraction and make it inline):
traversal = g.V("Entity:633471488:519").as("entity");
//update properties
traversal.property("text", "new text");
traversal.property("description", "new description");
//drop typeEdges
traversal.select("entity").outE("IsOfType").drop();
//even that causes the same issue(!): traversal.select("entity").outE("HasInner").drop();
System.out.println("traversal after type edges deletion: " +traversal);
//make new typeEdges
traversal.V("Entity:996942848:518").as("type-0").addE("IsOfType").from("entity").to("type-0");
System.out.println("traversal after type edges addition: " +traversal);
//storage
traversal.select("entity").forEachRemaining({})
Everything works (even the drop of existing IsOfType edges). But the creation of the new IsOfType edges doesn't seem to be resulting in new edges on the graph. If I comment out the drop, then creation works fine (!) It is as if the DropStep which is before the addEdgeStep is happening at the end. I even tried to drop other type of edge and it is causing the same issue (!). It might be that implicit transaction handling is deciding to commit when a drop() happens as it is with next(), iterate() and forEachRemaining() ?? If that is the case then drops and creations can't happen within the same transaction using Fluent API which renders it not very useful for real applications :(
Here is the state of the traversals after the deletion and after the additon of two IsOfType edges in my run (I tried both Java and Datastax Studio Console):
traversal after type edges deletion:
[
GraphStep(vertex,[Entity:633471488:519])#[entity],
AddPropertyStep({value=[Entity], key=[atClass]}),
AddPropertyStep({value=[FilmWithSuperCategories aaa], key=[text]}),
AddPropertyStep({value=[dffsdfsd f2313], key=[description]}),
SelectOneStep(entity)#[entity],
VertexStep(OUT,[IsOfType],edge),
DropStep
]
traversal after type edges addition:
[
GraphStep(vertex,[Entity:633471488:519])#[entity],
AddPropertyStep({value=[Entity], key=[atClass]}),
AddPropertyStep({value=[FilmWithSuperCategories aaa], key=[text]}),
AddPropertyStep({value=[dffsdfsd f2313], key=[description]}),
SelectOneStep(entity)#[entity],
VertexStep(OUT,[IsOfType],edge),
DropStep,
GraphStep(vertex,[Entity:996942848:518])#[type-0],
AddEdgeStep({~from=[[SelectOneStep(entity)]], ~to=[[SelectOneStep(type-0)]], label=[IsOfType]}),
GraphStep(vertex,[Entity:1489781376:516])#[type-1],
AddEdgeStep({~from=[[SelectOneStep(entity)]], ~to=[[SelectOneStep(type-1)]], label=[IsOfType]})
]
Edit
From what I read here (http://tinkerpop.apache.org/docs/current/reference/#drop-step)
The drop()-step (filter/sideEffect) is used to remove element and properties from the graph (i.e. remove). It is a filter step because the traversal yields no outgoing objects.
There are no objects being returned so it is not possible to do anything after a drop happens! so I am curious how I can do multiple drops/additions in a single transaction using DSE Graph Fluent API
Thanks!
You can wrap your drop in a sideEffect step, e.g.:
g.V(entity1).as("a").sideEffect(outE().filter(inV().is(entity2)).drop()).
V(entity2).addE("link").from("a")

getting 409 error when trying to call table.CreateIfNotExists() for the first time

When starting my program for the first time since the associated table has been deleted I get this error:
An exception of type 'Microsoft.WindowsAzure.Storage.StorageException' occurred in Microsoft.WindowsAzure.Storage.dll but was not handled in user code
Additional information: The remote server returned an error: (409) Conflict.
However if I refresh the crashed page the table will successfully create.
Here is the code just in case:
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
Microsoft.WindowsAzure.CloudConfigurationManager.
GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("tableTesting");
table.CreateIfNotExists();
I don't really understand how or why I'd be getting a conflict error if there's nothing there.
These errors appear elsewhere in my code as well when I'm working with blob containers, but I can't reproduce them as easily.
If you look at the status codes here: http://msdn.microsoft.com/en-us/library/azure/dd179438.aspx, you will notice that you get 409 error code in two scenarios:
Table already exists
Table is being deleted
If I understand correctly, table.CreateIfNotExists() only handles the 1st situation but not the 2nd one. Please check if that is not the case in your situation. One way to check this would be to see details of Storage Exception. Somewhere you should get the code which would match with the link I mentioned above.
Also one important thing to understand is that when you delete the table, it is actually marked for deletion and is actually deleted through a background process (much like garbage collection). If you try to create a table between these two steps, you will get the 2nd error.

R and RJDBC: Using dbSendUpdate results in ORA-01000: maximum open cursors exceeded

I have encountered an error when trying to insert thousands of rows with R/RJDBC and the dbSendUpdate command on an Oracle database.
Problem can be reproduced by creating a test table with
CREATE TABLE mytest (ID NUMBER(10) not null);
and then executing the following R script
library(RJDBC)
drv<-JDBC("oracle.jdbc.OracleDriver","ojdbc-11.1.0.7.0.jar") # place your JDBC driver here
conn <- dbConnect(drv, "jdbc:oracle:thin:#MYSERVICE", "myuser", "mypasswd") # place your connection details here
for (i in 1:10000) {
dbSendUpdate(conn,"INSERT INTO mytest VALUES (?)",i))
}
Searching the Internet provided the information, that one should close result cursors, which is obvious (e.g. see java.sql.SQLException: - ORA-01000: maximum open cursors exceeded or Unable to resolve error - java.sql.SQLException: ORA-01000: maximum open cursors exceeded).
But the help file for ??dbSendUpdate claims for not using result cursors at all:
.. that dbSendUpdate is used with DBML queries and thus doesn't return any result set.
Therefore this behavior doesn't make much sense to me :-(
Can anybody help?
Thanks, a lot!
PS: Found something interessting in the RJDBC documentation http://www.rforge.net/RJDBC/
Note that the life time of a connection, result set, driver etc. is determined by the lifetime of the corresponding R object. Once the R handle goes out of scope (or if removed explicitly by rm) and is garbage-collected in R, the corresponding connection or result set is closed and released. This is important for databases that have limited resources (like Oracle) - you may need to add gc() by hand to force garbage collection if there could be many open objects. The only exception are drivers which stay registered in the JDBC even after the corresponding R object is released as there is currently no way to unload a JDBC driver (in RJDBC).
But again, even inserting gc() within the loop will produce the same beahvoir.
Finally we realized this being a bug in package RJDBC. If you are willing to patch RJDBC you can use the patched sources available at https://github.com/Starfox899/RJDBC (as long as they are not imported into the package).

Resources