Impact of add column on superprojection in vertica DB - projection

I have a conceptual question in vertica DB. If I create a table 'abc' in vertica with columns a,b,c order by a,b, it will automatically create a superprojection for it.
Now, If I alter table 'abc' add column 'd' to it, it will create a new superprojection.
The question is, will the 'order by a,b' be impacted in this new superprojection? Will vertica retain this order by in the new superprojection? Also, will it also include the column 'd' to this order by? What is the default behaviour?

Will vertica retain this order by in the new superprojection?
It will retain the order by specified in the initial CREATE TABLE statement.
Also, will it also include the column 'd' to this order by?
Vertica will only add new columns to the super projection (this is the default behavior).
Walk through
Let's create the table & add data:
CREATE TABLE public.abc (
a int,
b int,
c int
) ORDER BY a, b;
INSERT INTO public.abc (a, b, c) VALUES (1, 2, 3);
A super-projection is automatically added when data is added to the table:
CREATE PROJECTION public.abc /*+createtype(P)*/
(
a,
b,
c
)
AS
SELECT abc.a,
abc.b,
abc.c
FROM public.abc
ORDER BY abc.a,
abc.b
SEGMENTED BY hash(abc.a, abc.b, abc.c) ALL NODES KSAFE 1;
Let's add a new column to the table:
ALTER TABLE public.abc ADD COLUMN d int;
The new column gets added only to the projection columns and table columns in any super-projections (not in ORDER BY):
CREATE PROJECTION public.abc /*+createtype(P)*/
(
a,
b,
c,
d -- Added here
)
AS
SELECT abc.a,
abc.b,
abc.c,
abc.d -- Added here
FROM public.abc
ORDER BY abc.a,
abc.b
SEGMENTED BY hash(abc.a, abc.b, abc.c) ALL NODES KSAFE 1;

Related

How to find cardinality of all columns in kusto?

I'm trying to find number of distinct values in all columns for some query. I found that dcount works well but you have to supply the specific column. I want to do this on all columns, where the column names and the number of columns are dynamic
you'll have to explicitly include all columns of interest.
note that any additional column you add to the query will increase the resources utilization of the query, so if you have any knowledge about columns that are likely to be of high cardinality, consider including only those.
FWIW: you can generate the query (for all columns, with the caveat above) dynamically, then invoke the result of this:
let tableName = "my_table";
let datetime_column_name = "my_datetime_column";
let lookback_period = 1h;
let column_names = toscalar(
table(tableName)
| getschema
| summarize make_set(ColumnName)
);
print query = strcat(
tableName,
"\n| where ",
datetime_column_name,
" > ago(timespan(",
lookback_period,
"))\n| summarize dcount(",
strcat_array(column_names, "),\ndcount("),
")")

How to add new keys and values to existing hash table in R?

Using hash package in R I created a hast table with keys and values. I want to add new keys and values to the existing hashtable. Is there any way?
Suppose
ht <- hash(keys = letters, values = 1:26)
And I need to add new keys and values to ht.
Is there any way other than
for eg :
ht$zzz <- 45
The documentation for the hash package provides a number of syntax varieties for adding new elements to a hash:
h <- hash()
.set( h, keys=letters, values=1:26 )
.set( h, a="foo", b="bar", c="baz" )
.set( h, c( aa="foo", ab="bar", ac="baz" ) )
The first .set option would seem to be the best for bulk inserts of key value pairs. You would only need a pair of vectors, ordered in such a way that the key value representation is setup the way you want.

Converting date to a varchar using "like" in pl/sql

I need to go through few millions of data searching for a year sent as a parameter to a method. The year comes as a varchar.
This is the query I'm working with
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND to_char(cre_date, 'YYYY') = year_;
cre_ date is of type date and year_ is from type carchar.
when performing this query it take around 25 minutes to process it completely.
Is anyone knows about a different approach to find out the quick execution.
Please help.
This didn't work out.
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND cre_date LIKE '%2013';
The reason might be 'cre_date' and '%2013' are of different types
If you have an index on (mch_code, contract, cre_date) columns, you can improve performance by doing something like:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= to_date('01/01/'||year_, 'dd/mm/yyyy')
and cre_date < add_months(to_date('01/01/'||year_, 'dd/mm/yyyy'), 12);
Even better would be to declare the start of the year as a DATE variable prior to running the sql, eg:
v_year_dt := to_date('01/01/'||year_, 'dd/mm/yyyy');
which would make the query:
select x, y
from a
where mch_code = 'KN'
and contract = '15KTN'
and cre_date >= v_year_dt
and cre_date < add_months(v_year_dt, 12);
If you don't have an index on those three columns, you could create a function based index on (mch_code, contract, to_char(cre_date, 'yyyy')) that should help speed up your query, depending on the percentage of rows you're expecting to select. It may help even more if you added the x and y columns into the index, so that no table access was required at all.
Alternatively, you could think about partitioning the table on cre_date, monthly or yearly.
The reason your query is slow is that you're applying a function to a column on every row in your table. Let's try it another way:
SELECT X,Y
FROM A
WHERE mch_code = 'KN' AND
contract = '15KTN' AND
CRE_DATE BETWEEN TO_DATE('01/01/' || year_, 'DD/MM/YYYY')
AND TO_DATE('01/01/' || year_, 'DD/MM/YYYY') + INTERVAL '1' YEAR;
This eliminates the need to apply a function against every row in the table, and should allow any indexes on CRE_DATE to be used.
Best of luck.
You can try with EXTRACT function:
SELECT X,Y
FROM A
WHERE mch_code = 'KN'
AND contract = '15KTN'
AND EXTRACT(YEAR FROM cre_date) = year_;

How to get count of all columns of a table, which are not null using PL/SQL?

Is there any PL/SQL function, which allows to pass a table name and returns the count of all columns, which don't include null values?
I have a huge number of columns and don't want to query each and every column. I'm new to PL/SQL and highly appreciate your help.
As of a comment to the question one approach to solve this is the following query:
SELECT t.table_name,
t.num_rows,
c.column_name,
c.num_nulls,
t.num_rows - c.num_nulls num_not_nulls,
c.data_type,
c.last_analyzed
FROM all_tab_cols c
JOIN sys.all_all_tables t ON c.table_name = t.table_name
WHERE c.table_name LIKE 'EXT%'
AND c.nullable = 'Y'
GROUP BY t.table_name,
t.num_rows,
c.column_name,
c.num_nulls,
c.data_type,
c.last_analyzed
ORDER BY t.table_name,
c.column_name

how to find the total sum of particular column using C#?

Suppose In database their is column called INCOME_PER_DAY. I bring data of this column in the gridview .
Now My question is that I want to find the total sum of the column INCOME_PER_DAY using C# .how to do this?
Please tell me.
Do this on server-side (database).
Return 2 recordsets: one with details and the second one (one row) with SUM(INCOME_PER_DAY).
or use this query:
SELECT ROW_TYPE = 1, FIELD1, FIELD2, FIELD3, INCOME_PER_DAY FROM MYSALES
UNION ALL
SELECT ROW_TYPE = 2, NULL, NULL, NULL, INCOME_PER_DAY = SUM(INCOME_PER_DAY) FROM MYSALES
ROW_TYPE = 1 - detail row
ROW_TYPE = 2 - summary row
On a page, use, for example, datagrid in the ItemDataBound event handler: check ROW_TYPE to apply valid CSS style (detail and summary)
Unfortunately, you have to loop through the column and add up rows line-by-line.

Resources