Forcing a PI scan for teradata table - teradata

I have a table (say T1) which has a PI defined on column c1, c2 and c3.
In one query, the need is to have filter only on c1, and for a given unknown reason, I cant create a secondary index on the column c1.
I tried writing a query like this -
select *
from T1
where C1 = <some Value>
and c2 = c2
and c3 = c3
given that conditions for c2 and c3 are tautologies, the result set will not be impacted. However, I am "expecting" to fool Teradata into invoking the PI for this query, which does not happen.
any explanation ?

Teradata's PI is hash-based, so all three columns must be referenced with ANDed conditions based on equality.
There's no way to get PI-access when only a single column is known.
But of course you might create a Secondary Index (why did it fail?). If this is a recurring requirement you might better think about changing the PI and/or adding partitioning.

You can create a hash index or a single table join index with the primary index of column 'C1'. This will allow you to have single or group-AMP access of the index and then a row id access to the base table. The other option, if 'C1' is unique is to create a USI on that column. This will provide you at best a two-AMP operation with a single value or a group-AMP option if you provide multiple values when qualifying.
I am not aware of a method to query the base table as you have it defined and access the primary index without fully qualifying it.

Related

SQLite: Add up integer tables based on shared indicators

I am aggregating numbers from different sqlite databases into a single output database table.
I need to add up integer columns i1,i2,i3 in the output table based on three indicating columns a,b,c that tell me which rows to update:
ATTACH DATABASE "out.db" AS output;
INSERT INTO output.rows(a,b,c,i1,i2,i3)
SELECT DISTINCT "some_value", b, c, 0, 0, 0 FROM main.rows
ON CONFLICT IGNORE;
#THE FOLLOWING LINES MIGHT SHOW WHAT I MEAN...
UPDATE output.rows SET i1=i1+i1_,i2=i2+i2_, i3=i3+i3_
WHERE a="some_value" AND b=b_ and c=c_
SELECT i1_, i2_, i3_, b_, c_ FROM main.rows;
I do not want to type in all the combinations of a,b,c. As you can see, a does not come from main but from external information (the filename).
In newer versions of SQLite that support UPSERT, the following seems to work:
ATTACH DATABASE "$out.db" AS output;
INSERT INTO output.rows(a,b,c,i1,i2,i3)
SELECT "some_value", b, c, i1, i2, i3 FROM main.rows WHERE true
ON CONFLICT (a,b,c) DO UPDATE SET i1=i1+i1, i2=i2+i2, i3=i3+i3;
In my case, the columns i1,i2,i3 coming from main actually had a different name (say I1,I2,I3) than their counterpart in output, therefore the UPDATE was clearer (i1=i1+I1). I failed to reference as main.rows.i1 inside the UPDATE statement. If you know how to solve that ambiguity, please comment.

Express conditions on two consecutive variable length relationships?

How to express a conditions for two consecutive variable length relationships?
Consider this partial query
MATCH(t1:Type{myID: 1})-[r:relType]->(:Type)-[rels:relType*0..]-(t2:Type{myID:100})
WHERE r.attr1>10
Basically I am trying to saying that there could be one or more relations from t1 to t2. The first relation r should satisfy a given condition on its attribute.
If this is the only relation between the two nodes then it's ok.
It at least another relation exist I want to add another condition such as:
WHERE r.attr1>10 AND r_next.attr2> r_prev.attr2+r_prev.attr1
where r_next and r_prev are consecutive relations: ()-[r_prev]->()-[r_next]-(). Note that at the first step r_prev is the first relation r.
I know rels is a collection but I do not know how to express such a condition.
Consecutive comparison like this isn't easy at this time, and it can't currently be evaluated during expansion.
You can do some filtering on this after, but it will be ugly.
We'll make use of the APOC Procedures for apoc.coll.pairsMin(), which takes a collection and returns a list of adjacent pairs.
MATCH (t1:Type{myID: 1}), (t2:Type{myID:100})
MATCH (t1)-[r:relType]->(:Type)-[rels:relType*0..]-(t2)
WHERE r.attr1>10
WITH t1, t2, apoc.coll.pairsMin(rels) as pairs
WHERE all(pair in pairs WHERE pair[0].attr1 + pair[0].attr2 < pair[1].attr2)
RETURN t1, t2 //or whatever you want to return from this

Autofill based on list and value of a cell

I'm making a spreadsheet to help me with my personal accounting. I'm trying to create a formula in LibreOffice Calc that will search in a given cell for a number of different text strings and if found return a text string.
For example, the formula should search for "burger" or "McDonalds" in $C6 and likewise then return "Food" to $E6. It should not be case sensitive. And needs partially to match strings as well as in the case of Burger King. I need it to be able to search for other keywords and return those values as well, like "AutoZone" and return "Auto" and NewEgg and return "Electronics".
I've had a tough time finding any kind of solution to this and the closet I could get was with a MATCH formula but once I nested it in an IF it would not work. I've also tried nested IF with OR; not joy on either.
Examples:
=IF(OR(D10="*hulu*",D10="*netflix*",D10="*movie*",D10="*theature*",D10="*stadium*",D10="*google*music*")=1,"Entertainment",IF(OR(D10="*taco*",D10="*burger*",D10="*mcdonald*",D10="*dq*",D10="*tokyo*",D10="*wendy*",D10="*cafe*",D10="*wing*",D10="*tropical*",D10="*kfc*",D10="*olive*",D10="*caesar*",D10="*costa*vida*",D10="*Carl*",D10="*in*n*out*",D10="*golden*corral*",D10="*nija*",D10="*arby*",D10="*Domino*",D10="*Subway*",D10="*Iggy*",D10="*Pizza*Hut*",D10="*Rumbi*",D10="*Custard*",D10="*Jimmy*")=1,"Food",IF(OR(D10="*autozone*",D10="*Napa*",D10="*OREILLY*")=1,"AUTO","-")))
I can create a different table and make a lookup reference so another way to put this is I need something that does the opposite of what VLOOKUP and HLOOKUP do and return the header value for any data matching in given columns.
Something like:
=IF(NOT(ISNA(MATCH(A1,B3:B99))),B2,IF(NOT(ISNA(MATCH(A1,C3:C99))),c2,0))
If A1 was the test and B2 and C2 were the headers and it was searching below those.
As per my comments, try this:
=IF(SUM(LEN(G150)-LEN(SUBSTITUTE(LOWER(G150),{"hulu","netflix","movie","theater"," stadium"},"")))>0,"Entertainment",IF(SUM(LEN(G150)-LEN(SUBSTITUTE(LOWER(G150),{"burger","taco","vida","caf‌​e","wing","dairy","mcdonald","wendy","kfc","pizza","carl","domino","ceaser","oliv‌​e","jimmy","custard","subway","arby"},"")))>0,"Food",IF(SUM(LEN(G150)-LEN(SUBSTITUTE(LOWER(G150),{"autozone","Napa","oreilly"},"")))>0,"AUTO","-")))
It is an Array formula and must be confirmed with Ctrl-Shift-Enter.
You can do this various ways using INDEX/MATCH/VLOOKUP formulae. Just a couple of caveats: I am using Excel, and never used Libre so hope this works; and, you will need a mapping table that maps MacDonalds to Food, Google Music to Entertainment and so on (for all the cases possible).
Let's assume your mapping table in your screenshot is A6 to E9.
The formula in E10 =vlookup(C10,$C$6:$E$9,3,0)
Explanation: it looks up C10 (Burger King) in the table $C$6:$E$9 and result is the 3rd column (E is 3rd column from C, where C10 was looked up) in that table. The 0 will give you an exact match, if you want a partial match then enter 1 there.
Note: if your mapping table is in say columns G and H (Service name in G and Type of Service in H), AND you are unsure how many entries it will have, a mod to the formula is =vlookup(C10,$G:$H,2,0) OR =vlookup(C10,$G:$H,2,1) for a partial match. Here, 3 is replaced by 2 because H is the 2nd column from G where C10 will be looked up.
EDIT: Doing VLOOKUP with INDEX and MATCH functions for an approximate match of text - this could be the solution you are looking at in your last comment(?)
Two things needed to be done. a.Reference table entries, b.applying the INDEX/MATCH function.
Part a - in your reference table, you will have to make entries between 2*s for the value to be looked up. The way you mention in your example in the Qn *movie*,*wendy*,etc. That's really the trick that enables us to lookup by cell reference. Corresponding return values like Entertainment/Food/etc need to be their own full words. Let's assume you have this table prepared in columns G6:H26 (G-lookup value, H-return value)
Part b - In you cell F6 (as per your screenshot), you can try this formula =INDEX($H$6:$H$26,MATCH(C6,$G$6:$G$26,0))
That really just is the replacement formula for VLOOKUP using INDEX/MATCH.
As your values stored in column G are in *s, the cell C6 in the MATCH formula will do a partial read.

SQLite - Selecting rows from table using a comparer function

I have a table in my SQLite database where one of the columns is just a free text.
Also, I have a custom function defined that calculates the Levenshtein distance of two given strings. Basically is just a comparer function that returns an integer value (distance between both strings).
My goal is to retrieve ALL the rows from that table that share a distance lower than a given value D between them.
Is this possible using queries? I thought GROUP BY would be the answer but I haven't gotten any semi decent results I can share.
Thanks in advance for any help provided.
You have to join the table with itself:
SELECT *
FROM MyTable T1 JOIN
MyTable T2 ON T1.ID < T2.ID AND
LDist(T1.TextColumn, T2.TextColumn) < 42
(The ID comparison prevents returning two result records for the same pair.)

How to create a view that returns a 2x2 (or NxN) matrix of results

So I know enough SQL just to be really dangerous (I don't normally work the back-end) but cannot get the following view to be created successfully ;) The result set I'm after is a data set that has rows assigned as a column alias from multiple tables (instead of a 1xN flat of all columns). There is a many-to-one relationship when looking at the main table, based on foreign keys associated to the row id of the appropriate related table.
Ideally I'd like a data set that looks like this in the return:
dataset.transaction_row[n]: col1, col2, col3, coln... (columns from the transaction table)
dataset.category_row[n]: col1, co2, col3, coln... (columns from the category table)
and so on...
I get the following error:
Query Error: near "AS": syntax error Unable to execute statement
From:
CREATE VIEW view_unreconciled_transactions
AS SELECT account_transaction.* AS transaction_row,
category.* AS category_row,
memorized.name_rule_replace OR account_transaction.name AS payee
FROM account_transaction
LEFT JOIN memorized ON account_transaction.memorized_key = memorized.id
LEFT JOIN category ON account_transaction.category_key = category.id
WHERE status != 2
ORDER BY account_transaction.dt_posted DESC
It seems easy enough since the result-column selector is repeatable which includes expressions (referencing sqlite's syntax diagrams). In reference to the error, I'm assuming it's complaining about the 2nd 'AS' where I'm trying to get table.* assigned as an alias. Any help in the right direction is appreciated. If I had to, I suppose I could explicitly state all columns but that feels like a kludge.
The AS modifier can only be applied to a single column, not to a collection such as the * you used. You will have to break them out into specific names, (which is best practice IMHO anyway)
It looks like you want to make a "pivot table". They can be tricky to make in a database. I can say that if you a data result, where each row comes from a different table source, and the columns form each table are IDENTICAL, then you could try using a UNION statement to join the different results together like they are just one dataset.
NOTE that the columns all take their naming cue from the first dataset in a UNION and the datatype all need to be the same.

Resources