Lineage feature in Cloudera Navigator - cloudera

Does Lineage work in the Enterprise trial version of Cloudera?
I see the lineage tab but i dont get to see the lineage of the hive table which i derived from another hive table.
Unfortunately, this information is also not very clear from the Cloudera documentation as well.

Lineage does work in the Enterprise trial version, but the lineage has to be clear to the navigator.
Which means: if a table is derived from another table, and there are no "drop table" at the end of the command. If you have to have a "drop table", put it at the start of the process.

Related

Create backup of bigquery cluster table

I've a clustered partitioned table exported from GA 360. Attached is the image. I would like to create exact replica of the same. Using Web UI it's not possible. I created backup table using bq command line tool, still no luck.
Also, whenever we check preview it has a day filter. It looks like this:
Whenever data is appended to the backup table, I don't find this filter there even though this option is set to true while creating a table.
If you can give more context about handling this kind of table it would be beneficial.
Those are indeed sharded tables. As explained by #N. L they follow a time-based naming approach: [PREFIX]_YYYYMMDD. They then get grouped together. The explained procedure to backup them seems correct. Anyhow, I would recommend to use partitioned tables as it will be easier to backup them and they perform better in general.
This is not a cluster / partitioned table. This one is a sharded non-partitioned table having one common prefix. Once you start creating multiple tables with same prefix we can see them under the same prefix.
Ex:
ga_session_20190101
ga_session_20190102
both these tables will be grouped together.
To take backup of these tables you need to create a script to copy source to destination table with same name and execute that script using bq command line tool under the same project.

Bulk insert in datastage using teradata connector

I am new to datastage, I created a simple job to get data from .ds file and load it in teradata using teradata connector, in the properties of teradata connector I set the
access_method=Bulk, max_session=2, min_session=1,load_type=load,
max_buffer_size=10000 and max_partition_sessions=1
but the job is continously in running state without displaying amount of rows transfered. Whereas when I choose the access_method=immediate then it starts to proceed, can any one suggest me the right way to do load in parallel.
Are you aware of the nodes you are using? Add APT_CONFIG_FLE environment variable and try to see from director how many nodes are used. N number of nodes means 2n number of processes. This might help you.
If this doesn't help try looking into the source database connector stage to see if increasing values for any options helps.

How to ignore data loss warning while schema comparison?

While trying to update the database in SQL Schema Comparison in Visual Studio, I am getting the below error.
(48,1): SQL72014: .Net SqlClient Data Provider: Msg 50000, Level 16, State 127, Line 6 Rows were detected. The schema update is terminating because data loss might occur.
An error occurred while the batch was being executed.
I understand that tool has inspected data loss if it updates.
I was thinking that there would be some option where I ignore this.
After googling I got the below link, but in Visual Studio 2012,
https://social.msdn.microsoft.com/Forums/en-US/ce95ac1d-a31c-4e83-904e-78a8491d0761/shema-compare-force-update-with-data-loss?forum=vstsdb
But I don't find any such option in my Schema options
In 2015 the sequence is: Create Compare, click on gear icon, general tab, unclick "Block on data loss". I have to set this each time I create a new comparisons, I have been unable to find a way to set a default that sticks other than saving the comparison.
I had this same problem, and unchecking the "Block Incremental Deployment if data loss might occur" didn't fix the issue. I still got lost of errors regarding column size changes that I couldn't work around. I also had to uncheck the "Verify deployment" checkbox, the last item in the lower section, as well.
If deploying the dacpac using sqlpackage.exe command-line utility(used for automating build/deployments like in DevOps), then we need to pass the argument: /p:BlockOnPossibleDataLoss=False
More info here -> https://learn.microsoft.com/en-us/sql/tools/sqlpackage/sqlpackage?view=sql-server-ver15

how to generate sqlite database from CDM Model in PowerDesigner?

I have a Conceptual Data Model in powerdesigner and I want generate Sqlite DB from it, how to generate sqlite database from CDM Model in PowerDesigner?
I have started a DBMS for SQLite 3
When you generate Physical Data Model, you can select, in the dropbox button for database selection, ANSI Level 2. It works flawlessly that way. Confirm your choice when you generate the script.
Just make sure to remove or comment the drop statements at the beginning of the resulting script and you should not have any error when running the script in a database client.
Use the Tools->Generate Physical Data Model... command and select an appropriate database from the list (probably the ODBC or ANSI options since SQLLite isn't an out-of-the-box option.
OR, you could first create a database XEM for SQLLite, but that's a pretty advanced task. I'd stick with the generic if possible.

how to present graph with query results in orientdb

I am newbie to orientdb.
I want to know how to show the node/edge graph in the figure with query results.
I search online for it, however, I still can not find clue.
Use the OrientDB studio:
start OrientDB as server
Connect to http://localhost:2480
Execute your query in the "query" tab
Get an item and click on "Graph" tab

Resources