How create a kudu table in cloudera quickstart VM - cloudera

I have been trying to create a kudu table in impala using the cloudera quickstart VM following this example
https://kudu.apache.org/docs/quickstart.html
CREATE TABLE sfmta
PRIMARY KEY (report_time, vehicle_tag)
PARTITION BY HASH(report_time) PARTITIONS 8
STORED AS KUDU
AS SELECT
UNIX_TIMESTAMP(report_time, 'MM/dd/yyyy HH:mm:ss') AS report_time,
vehicle_tag,
longitude,
latitude,
speed,
heading
FROM sfmta_raw;
getting the following error:
ERROR: AnalysisException: Table property 'kudu.master_addresses' is required when the impalad startup flag -kudu_master_hosts is not used. The VM used is cloudera-quickstart-vm-5.13.0-0-virtualbox. Thanks in advance for your help

From the documentation
If the -kudu_master_hosts configuration property is not set, you can
still associate the appropriate value for each table by specifying a
TBLPROPERTIES('kudu.master_addresses') clause in the CREATE TABLE
statement or changing the TBLPROPERTIES('kudu.master_addresses') value
with an ALTER TABLE statement.
So your table creation should looks like
CREATE TABLE sfmta
PRIMARY KEY (report_time, vehicle_tag)
PARTITION BY HASH(report_time) PARTITIONS 8
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='localhost:7051')
AS SELECT
UNIX_TIMESTAMP(report_time, 'MM/dd/yyyy HH:mm:ss') AS report_time,
vehicle_tag,
longitude,
latitude,
speed,
heading
FROM sfmta_raw;
7051 is the default port for kudu master.

Related

Informatica creates duplicate records on a Teradata set table with a primary index ( direct SQL can not )

Very simple source->target insert, source records are treated as "Insert", target transformation uses the "Insert" function only. Target Teradata table is a set with a primary index defined. Informatica target trans also has a primary key defined. Teradata informatica relational connection does not have a "bulk" option.
So how come Informatica does create a duplicate record, while even a direct insert into Teradata can not.
Any ideas?

MariaDB: SELECT INSERT from ODBC CONNECT engine from SQL Server keeps causing "error code 1406 data too long"

Objective: Using MariaDB I want to read some data from MS SQL Server (via ODBC Connect engine) and SELECT INSERT it into a local table.
Issue: I keep geting "error code 1406 data too long" even if source and destination varchar fields have the very same size (see further details)
Details:
The query which I'm trying to execute is in the form:
INSERT INTO DEST_TABLE(NUMERO_DOCUMENTO)
SELECT SUBSTR(TRIM(NUMERO_DOCUMENTO),0,5)
FROM CONNECT_SRC_TABLE
The above is the very minimal subset of fields which causes the problem.
The source CONNECT Table is actually a view inside SQL Server. The destination table has been defined so to be identical to the the ODBC CONNECT Table (same field names, same NULL constranints, same filed types ans sizes)
There's no issue on a couple of other VARCHAR fields
The issue is happening with a filed NUMERO_DOCUMENTO VARCHAR(14) DEFAULT NULL where the max length from the input table is 14
The same issue is also happening with 2 other fields ont the same table
All in all it seems to be an issue with the source data rather then the destination table.
Attemped workarounds:
I tried to force silent truncation but, reasonably, this does not make any difference: Error Code: 1406. Data too long for column - MySQL
I tried enlarging the destination field with no appreciable effect NUMERO_DOCUMENTO VARCHAR(100) DEFAULT NULL
I tried to TRIM the source field (hidden spaces?) and to limit its size at source to no avail: INSERT INTO DEST_TABLE(NUMERO_DOCUMENTO) SELECT SUBSTR(TRIM(NUMERO_DOCUMENTO),0,5) FROM CONNECT_SRC_TABLE but the very same error is always returned
Workaround:
I tried performing the same thing using a FOR x IN (src_query) DO INSERT .... END FOR and this solution seems to work: this means that the problem is not into the data itself but in how the engine performs the INSERT SELECT query

Using Teradata Volatile Table in SSIS ADO NET Source

Put simply, can I use an ADO NET Source task to query a Teradata VOLATILE TABLE? For context, using Teradata SQL Assistant, I can easily create a Teradata VOLATILE TABLE, insert data into it and select data from it. In Visual Studio, using SSIS SQL Tasks, I am also able to create and insert date into a Teradata VOLATILE TABLE. However, because the table does not actually exist yet, it appears we cannot use a separate ADO NET Source task to select data from it, meaning we also cannot map the columns. We get the error "[Teradata Database][3807] Object 'TABLE_NAME' does not exist." If the data in a VOLATILE TABLE, and more accurately the VOLATILE TABLE column definitions, are only available at run time can an ADO NET Source task be used to query a Teradata VOLATILE TABLE? If so, how?
Really old, and not sure if it will work. But You can set validation to false, that might do what you are wanting.

Copying/Appending non matching data between same table on different database servers

Following up on my earlier query , is there a way to specify the condition for the records that would be copied over using the column names on the same table? i.e. I want to copy all data from sandbox server to production server for all rows where COL_A in sandbox that do not exist in the production server. So the intended select query should be:
SELECT * FROM <Sandbox><TABLE_C> WHERE <Sandbox><TABLE_C>COL_A NOT EXISTS (SELECT <production>COL_A FROM <production>TABLE_C)
i.e. all the records from sandbox to production where a matching COL_A could not be found
I am not sure about Oracle specific syntax but something along these lines assuming that you are able to access them as linked servers -
INSERT INTO TABLE_C#prod_link
SELECT source.*
FROM TABLE_C source
LEFT JOIN TABLE_C#prod_link target
ON source.COL_A = target.COL_A
WHERE target.COL_A IS NULL
where prod_link is a database link
CREATE PUBLIC DATABASE LINK
prod_link
CONNECT TO
remote_username
IDENTIFIED BY
mypassword
USING 'tns_service_name';
I do not have an Oracle instance running that I can try this on but it should work

SQL scripts act on master database instead of practice database

I wrote some sql scripts to create a database and store data. I just noticed that the new tables and data are going to the master database.
I found that I can address the correct database if I scope out the database as so:
CREATE TABLE Practice1.dbo.Experiments
(
ID int IDENTITY (100,1) PRIMARY KEY,
CompanyName nvarchar (50)
)
but I'd rather not have to scope out each command. Is there a way to set the database in the script so I don't have to scope everything out?
INSERT INTO Practice1.dbo.EXPERIMENTS
VALUES
(
'hello world'
)
SELECT * FROM Practice1.dbo.EXPERIMENTS
You have a drop down list on your toolbar that allows you to select what database you want the script to execute on. Also, you can state the database to use at the top of your script.
Example:
Syntax
USE {database}
http://doc.ddart.net/mssql/sql70/ua-uz_7.htm
On SQL Server 2005, to switch the database context, use the command:
USE DatabaseName
in the samples above, the database name is Practice1, hence:
USE Practice1

Resources