I have NebulaGraph database version 3.1.2 running in my AWS environment and I am testing basic openCypher.
If I run MATCH (n:Person{lastName:"Brown"})-[e:LIKES_COMMENT]-(m) RETURN m.locationIP, it fails to retrieve the user IP. Not sure where it went wrong. It should be a valid openCypher statement and Nebula Graph supports openCypher.
Simply returning m works. Screenshot is as follows:
As it propmpted, we should use var.TagName.PropName like
m.Comment.locationIP
This is one of where it differentiated from the vanilla OpenCypher in NebulaGraph.
Related
I've sent a gremlin statement which fails with the following error:
{"code":"InternalFailureException","requestId":"d924c667-2c48-4a7e-a288-6f23bac6283a","detailedMessage":"null: .property(Cardinality, , String)"} (599)
I've enabled audit logs on the cluster side but there aren't any indications for any errors there although I see the request.
Are there any technics to debug such problems with AWS Neptune?
Since the gremlin is built dynamically and builds complicated graphs with thousands of operations, I'm looking for some way to understand better where the error is.
In one case it turned out the payload was too big and in another, the gremlin bytecode failed although it worked great on the local Tinkerpop Server.
But with the generic internal failure exception it is very hard to pinpoint the issues.
I'm using the Node.js gremlin package (3.5.0).
Thanks for the clarifications. As a side note, Neptune is not yet at the 3.5.0 TinkerPop level so there is always the possibility of mismatches between client and server. The audit log only shows the query received. It does not reflect the results of running the query. When I am debugging issues like this one, I often create a text version of the query (the node client has a translator that can do that for you) and use the Neptune REST API to first of all check the /explain plan - you can also generate that from the Neptune notebooks using %%gremlin explain. If nothing obvious shows up in the explain plan I usually try to run the text version of the query just to see if anything shows up - to rule out the client driver. I also sometimes run my query against a TinkerGraph just to see if the issue is as simple as Neptune not returning the best error message it could. If the text query should work, changing the Node.js Gremlin client driver to 3.4.11 is worth trying.
UPDATED to add a link to the documentation on Gremlin translators.
https://tinkerpop.apache.org/docs/current/reference/#translators
I'm trying to export an entire remote graph into json. When I use the following code, it results in an empty file. I am using Gremlin-driver 3.3.2 as this is the same version in the underling graph database, AWS Neptune.
var traversal = EmptyGraph.instance().traversal().withRemote(DriverRemoteConnection.using(getCluster()))
traversal.getGraph().io(graphson()).writeGraph("my-graph.json");
How is one supposed to populate the graph with data such that it can be exported?
I also posted this to the Gremlin Users list.
Here's some code that will do it for you with Neptune and should work with most Gremlin Server implementations I would think.
https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-export
The results of the export can be used to load via Neptune's bulk loader if you choose to export as CSV.
Hope that is useful
If that is more than you needed hopefully it will at least give you some pointers that help.
With hosted graphs, including Neptune, it is not uncommon to find that they do not expose the Graph object or give access to the io() classes.
The Gremlin io( ) Step is not supported in Neptune. Here is the Neptune's documentation which talks about the other difference between the Amazon Neptune implementation of Gremlin and the TinkerPop implementation.
Taking on-board the valuable feed-back from Ankit and Kelvin, I concentrated on using a local gremlin server to handle the data wrangling.
Once I had the data in a locally running server, by generating gremlin script from an in-memory entity model,I accessed it via a Gremlin console and ran the following:
~/apache-tinkerpop-gremlin-console-3.3.7/bin/gremlin.sh
gremlin> :remote connect tinkerpop.server conf/remote.yaml
gremlin> :> graph.io(graphson()).writeGraph("my-graph.json")
==>null
This put the my-graph.json file in /opt/gremlin-server/ on the docker container.
I extracted it using docker cp $(docker container ls -q):/opt/gremlin-server/my-graph.json .
I can then use this data to populate a gremlin-server testcontainer for running integration tests against a graph database.
neptune-export doesn't support direct export to S3. You'll have to export to the local file system, and then separately copy the files to S3.
I am experimenting with using flyway repair, but I'm curious as to exactly what commands flyway is sending to the target database. Is there a way to configure it to echo the commands?
The -X command line option gives a lot more verbose information but doesn't echo out the exact SQL used to repair the history table. All the information is there, though as the log contains entries like
Repairing Schema History table for version 1.0.1 (Description: create table foo, Type: SQL, Checksum: 1456329846) ...
which translates into UPDATE flyway_schema_history SET description=#p0, type=#p1, checksum=#p2 where version=#p3 in the obvious way
(at least with SQL Server on which I've just run the profiler).
You could submit it as a feature request? https://github.com/flyway/flyway/issues
We need to connect to on premise Teradata from Azure Databricks .
Is that possible at all ?
If yes please let me know how .
I was looking for this information as well and I recently was able to access our Teradata instance from Databricks. Here is how I was able to do it.
Step 1. Check your cloud connectivity.
%sh nc -vz 'jdbcHostname' 'jdbcPort'
- 'jdbcHostName' is your Teradata server.
- 'jdbcPort' is your Teradata server listening port. By default, Teradata listens to the TCP port 1025
Also check out Databrick’s best practice on connecting to another infrastructure.
Step 2. Install Teradata JDBC driver.
Teradata Downloads page provides JDBC drivers by version and archive type. You can also check the Teradata JDBC Driver Supported Platforms page to make sure you pick the right version of the driver.
Databricks offers multiple ways to install a JDBC library JAR for databases whose drivers are not available in Databricks. Please refer to the Databricks Libraries to learn more and pick the one that is right for you.
Once installed, you should see it listed in the Cluster details page under the Libraries tab.
Terajdbc4.jar dbfs:/workspace/libs/terajdbc4.jar
Step 3. Connect to Teradata from Databricks.
You can define some variables to let us programmatically create these connections. Since my instance required LDAP, I added LOGMECH=LDAP in the URL. Without LOGMECH=LDAP it returns “username or password invalid” error message.
(Replace the text in italic to the values in your environment)
driver = “com.teradata.jdbc.TeraDriver”
url = “jdbc:teradata://Teradata_database_server/Database=Teradata_database_name,LOGMECH=LDAP”
table = “Teradata_schema.Teradata_tablename_or_viewname”
user = “your_username”
password = “your_password”
Now that the connection variables are specified, you can create a DataFrame. You can also explicitly set this to a particular schema if you have one already. Please refer to Spark SQL Guide for more information.
Now, let’s create a DataFrame in Python.
My_remote_table = spark.read.format(“jdbc”)\
.option(“driver”, driver)\
.option(“url”, url)\
.option(“dbtable”, table)\
.option(“user”, user)\
.option(“password”, password)\
.load()
Now that the DataFrame is created, it can be queried. For instance, you can select some particular columns to select and display within Databricks.
display(My_remote_table.select(“EXAMPLE_COLUMN”))
Step 4. Create a temporary view or a permanent table.
My_remote_table.createOrReplaceTempView(“YOUR_TEMP_VIEW_NAME”)
or
My_remote_table.write.format(“parquet”).saveAsTable(“MY_PERMANENT_TABLE_NAME”)
Step 3 and 4 can also be combined if the intention is to simply create a table in Databricks from Teradata. Check out the Databricks documentation SQL Databases Using JDBC for other options.
Here is a link to the write-up I published on this topic.
Accessing Teradata from Databricks for Rapid Experimentation in Data Science and Analytics Projects
If you create a virtual network that can connect to on prem then you can deploy your databricks instance into that vnet. See https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html.
I assume that there is a spark connector for terradata. I haven't used it myself but I'm sure one exists.
You can't. If you run Azure Databricks, all the data needs to be stored in Azure. But you can call the data using REST API from Teradata and then save data in Azure.
I am very new to AS400, and I am stuck. I have read documenation but cannot find what I need.
I have an odbc connection to an AS400 server. When I run this command I get an Outfile with everything I need:
CALL QSYS.QCMDEXC('DSPUSRPRF USRPRF(*ALL) OUTPUT(*OUTFILE) OUTFILE(CHHFLE/TEST3)', 0000000061.00000)
Instead of the results going to an outfile I need to receive the results of this command to my script that is connecting through odbc. If I change 'OUTPUT(*OUTFILE)' to 'OUTPUT(*)' I get no results when I try to 'fetchall()'.
Is there any way to get this information through the odbc connection to my script?
EDIT: I am on a linux server, in a python script using pyodbc to connect. I can run sql queries successfully using this connection, but I can't figure out how to get the results of a command to come through as some sort of record set.
I hope I'm interpreting what you're asking correctly. it looks like you're accessing user profile data and dumping it to a file. It looks like you then want to use the contents of that file in a script or something that's running on Windows. If that's the case:
In general, when accessing data in a file from the Windows world, whether through ODBC and VBScript, or .NET, the AS/400 is treated like a database. All files in libraries are exposed via the built-in DB2 database. It's all automatic, and part of the Universal DB2 database.
So, after creating this file, you should have a file named TEST3 in library CHHFLE
You'd create a connection and execute the following SQL statement to read the contents:
Select * From CHHFLE.TEST3
This, of course, assumes that you have proper permissions to access this. You should be able to test this using the iSeries Navigator tool, which includes the ability to run SQL Scripts against the database before doing it in your script.
Added after reading comments above
There's info at this question on connecting to the DB2 from Python. I hope it's helpful.
OUTPUT(*) is not stdout, unfortunately. That means you won't be able to redirect OUTPUT(*) to an ODBC connection. Dumping to a DB2 table via OUTPUT(*OUTFILE) is a good plan. Once that's done, use a standard cursor / fetch loop as though you were working with any other DB2 table.