Case sensitive in Cloudera Impala table column name - cloudera

I have installed Coludera VM. Tried to fetch data from Impala database using query editor. If I give upper case column name in query, always getting column name in lower case. Is there any limitations for column name as like we should use column name in lower case?
Sample Query:
select orderid as COLUMN1 from default.orders
Result:
column1
10248
10249
10278

From the Impala documentation:
Impala identifiers are always case-insensitive. That is, tables named
t1 and T1 always refer to the same table, regardless of quote
characters. Internally, Impala always folds all specified table and
column names to lowercase. This is why the column headers in query
output are always displayed in lowercase.

Try these table properties when creating the table. Make sure to put in your column names and types.
tblproperties (
'avro.schema.literal'='
{
"type":"record",
"name":"SchemaName",
"fields":[
{"name":"COLUMN1","type":["null","long"]},
{"name":"COLUMN2","type":["null","string"]}
]
}'
)
Inspired by https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Useschema.literalandembedtheschemainthecreatestatement

Related

Compare xml of two tables [PENTAHO]

Do you have any idea how to do something like this in pentaho?
I have two tables. First table it is source table in mssql and second it is target table in db2.
In first table I have column with type xml. We supply second table this data second table. In second also I have column XML. I would like to compare in pentaho whether the xml value in the second table corresponds to what is in the first table.
You can use combination of "Multiway merge join" & "Switch/Case" step to get the non-matching id from source data. I have prepare a SOLUTION for you. You can get help from here.
Here
Get both table input from MSSQL & DB2
Merge both the table data with condition source.XML=destination.xml with FULL JOIN
Get the source IDs only when IDs are not available in destination table using SWITCH/CASE
Select the IDs and write into the text file

Impala add column with default value

I want to add a column to an existing impala table(and view) with a default value (so that the existing rows also have a value). The column should not allow null values.
ALTER TABLE dbName.tblName ADD COLUMNS (id STRING NOT NULL '-1')
I went through the docs but could not find an example that specifically does this. How do I do this in Impala? Hue underlines/does not recognize the NOT NULL command
Are you using Kudu as a storage layer for your table? Because if not, then according to Impala docs,
Note: Impala only allows PRIMARY KEY clauses and NOT NULL constraints on
columns for Kudu tables. These constraints are enforced on the Kudu
side.
...
For non-Kudu tables, Impala allows any column to contain NULL values,
because it is not practical to enforce a "not null" constraint on HDFS
data files that could be prepared using external tools and ETL
processes.
Impala's ALTER TABLE syntax also does not support specifying default column values (in general, non-Kudu).
With Impala you could try as follow
add the column
ALTER TABLE dbName.tblName ADD COLUMNS(id STRING);
once you've added the column you can fill that column as below using the same table
INSERT OVERWRITE dbName.tblName SELECT col1,...,coln, '-1' FROM dbName.tblName;
where col1,...,coln are the previous columns before the add columns command and '-1' is to fill the new column.

In Teradata there get columns/fields used by join and where condition and respective table without parsing query

I am trying to automate some performance check on query in Teradata.
So as part of that I want to check if columns used in joining condition are primary index of respective table or not and similarly for columns used in where condition are partition column in respective table or not. Is there any direct Teradata query which can directly give this without parsing whole query.
Yes there are two dbc objects where you can query :
dbc.columnsv
dbc.indicesv.
Primary index information will be stored in the 2nd view just search with your tablename and database name.
Partitioned information is stored in columnsv , there is a column with a flag value 'Y' for partitioned columns.
Example :
SELECT DATABASENAME,TABLENAME,COLUMNNAME FROM DBC.COLUMNSV WHERE PARTITIONINGCOLUMN='Y' where tablename=<> and databasename=<>;
Select * from dbc.indicesv where tablename=<> and databasename=<>;

sqlite3 - the philosophy behind sqlite design for this scenario

suppose we have a file with just one table named TableA and this table has just one column named Text;
let say we populate our TableA with 3,000,000 of strings like these(each line a record):
Many of our patients are incontinent.
Many of our patients are severely disturbed.
Many of our patients need help with dressing.
if I save the file at this level it'll be: ~326 MB
now let say we want to increase the speed of our queries and therefore we set our Text column as the PrimaryKey(or create index on it);
if I save the file at this level it'll be: ~700 MB
our query:
SELECT Text FROM "TableA" where Text like '% home %'
for the table without index: ~5.545s
for the indexed table: ~2.231s
As far as I know when we create index on a column or set a column to be our PrimaryKey then sqlite engine doesn't need to refer to table itself(if no other column was requested in query) and it uses the index for query and hence the speed of query execution increases;
My question is in the scenario above which we have just one column and set that column to be the PrimaryKey too, then why sqlite holds some kind of unnecessary data?(at least it seems unnecessary!)(in this case ~326 MB) why not just keeping the index\PrimaryKey data?
In SQLite, table rows are stored in the order of the internal rowid column.
Therefore, indexes must be stored separately.
In SQLite 3.8.2 or later, you can create a WITHOUT ROWID table which is stored in order of its primary key values.

Strange sqlite3 behavior

I'm working on a small SQLite database using the Unix command line sqlite3 command tool. My schema is:
sqlite> .schema
CREATE TABLE status (id text, date integer, status text, mode text);
Now I want to set the column 'mode' to the string "Status" for all entries. However, if I type this:
sqlite> UPDATE status SET mode="Status";
Instead of setting column 'mode' to the string "Status", it sets every entry to the value that is currently in the column 'status'. Instead, if I type the following it does the expected behavior:
sqlite> UPDATE status SET mode='Status';
Is this normal behavior?
This is also a FAQ :-
My WHERE clause expression column1="column1" does not work. It causes every row of the table to be returned, not just the rows where column1 has the value "column1".
Use single-quotes, not double-quotes, around string literals in SQL. This is what the SQL standard requires. Your WHERE clause expression should read: column1='column2'
SQL uses double-quotes around identifiers (column or table names) that contains special characters or which are keywords. So double-quotes are a way of escaping identifier names. Hence, when you say column1="column1" that is equivalent to column1=column1 which is obviously always true.
http://www.sqlite.org/faq.html#q24
Yes, that's normal in SQL.
Single quotes are used for string values; double quotes are used for identifiers (like table or column names).
(See the documentation.)

Resources