Inserting Duplicate Records into a Temporary Table - teradata

I have a table ABC with duplicate records. I want to Insert only the duplicate records into another table ABC_DUPE in same schema using Bteq.
Any suggestions ?
Thanks,
Mukesh

You can use QUALIFY statement to identify and output duplicates:
Since you didn't share your table... Then consider the following ABC table:
+----+----+----+
| f1 | f2 | f3 |
+----+----+----+
| 1 | a | x |
| 1 | b | y |
| 2 | a | z |
| 2 | b | w |
| 2 | a | n |
+----+----+----+
Where a unique record is determined by using fields f1 and f2. In this example the record where f1=2 and f2='a' is a duplicate with f3 values z and n. To output these we use qualify:
SELECT *
FROM ABC
QUALIFY COUNT(*) OVER (PARTITION BY f1, f2) > 1;
QUALIFY uses Window functions to determine which records to include in the outputted record set. Here we use window function COUNT(*) partitioning by our unique composite key f1, f2. We keep only records where the Count(*) over that partition is greater than 1.
This will output:
+----+----+----+
| f1 | f2 | f3 |
+----+----+----+
| 2 | a | z |
| 2 | a | n |
+----+----+----+
You can use this in a CREATE TABLE statement like:
CREATE TABLE ABC_DUPE AS
(
SELECT *
FROM ABC
QUALIFY COUNT(*) OVER (PARTITION BY f1, f2) > 1
) PRIMARY INDEX (f1, f2);

Related

Storing attributes with multiple values in relational database

I am storing product attributes in a relational table in a MariaDB database the following way:
I have a main table, called Products which provide the name, description, and other simple information about a product, and another table, ProductAttributes, with the following structure: Id|ProductId|Attribute|Value where Id is an autoincremented primary key, and ProductId is a reference to a row in the Products table.
I can store simple attribute value relations to a product in this way, say ie, height, weight, length of a product. My problems start, when a product's attribute, ie color can have multiple possible values.
I could add multiple lines to the ProductAttributes table when storing multi-valued attributes, ie:
1|yy|color|red
2|yy|color|blue
and from this schema, I could easily retrieve a single product's attributes, but I am having trouble on how to go forward when trying to compare two products based on their attributes.
Is there any other way to store multiple values for a single attribute in a relational database to maintain their searchability?
As of now, to find similar attributed products I am doing a similar query:
SELECT * FROM ProductAttributes base
INNER JOIN ProductAttributes compare ON compare.ProductId != base.ProductId
WHERE base.Attribute = compare.Attribute
AND base.Value = compare.Value
AND base.ProductId = 'x'
GROUP BY compare.ProductId
My problem is, that this query will return the products with a red and blue color, as similar to products with a blue color.
Btw, I can not change my attributes tables to a one attribute per column representation, because I do not know from the get-go how many attributes will I have, and even if I knew, I have way too many possible attributes and differences on each product category, to represent this in a traditional table.
A possible pitfall is, that I also want to compare products to one another with missing attributes. Ie, if a product has a length attribute specified, but another one has no length attribute, they could still be similar. Right now, to make this kind of comparison, in the background, I am transposing my attributes table, to a simple table, and on that table, perform this query:
SELECT b.ProductId as BaseProduct, s.ProductId as SimProduct
FROM tmp_transposed_product_attributes b
CROSS JOIN tmp_transposed_product_attributes s ON b.ProductId != s.ProductId
WHERE (b.attribute1 = s.attribute1 OR b.attribute1 IS NULL OR s.attribute1 IS NULL)
AND (b.attribute2 = s.attribute2 OR b.attribute2 IS NULL OR s.attribute2 IS NULL) ...
If I'm following correctly for the product comparison, I like to use EXISTS or NOT EXISTS to help find things like that, which may also help avoid having to transpose the data.
For example, given this sample table data:
MariaDB [test]> select * from productattributes;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 1 | yy | height | 5 |
| 2 | yy | color | red |
| 3 | yy | weight | 10 |
| 4 | yy | length | 6 |
| 5 | yy | color | blue |
| 6 | zz | color | white |
| 7 | zz | height | 5 |
| 8 | zz | length | 8 |
+----+-----------+-----------+-------+
8 rows in set (0.00 sec)
To find all similar attributes between the two, but has different values (removes attribute/values pairs that are the same) use a NOT EXISTS query to same table like so:
MariaDB [test]> SELECT * FROM `productattributes` pA
-> WHERE productID IN ('yy', 'zz')
-> AND NOT EXISTS (SELECT * FROM productattributes pB
-> WHERE pA.attribute = pB.attribute
-> AND pA.value = pB.value
-> AND pA.productID != pB.productID)
-> ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 2 | yy | color | red |
| 5 | yy | color | blue |
| 4 | yy | length | 6 |
| 3 | yy | weight | 10 |
| 6 | zz | color | white |
| 8 | zz | length | 8 |
+----+-----------+-----------+-------+
6 rows in set (0.00 sec)
Then to find attribute/value pairs that ARE the same between the two, simply remove the NOT portion of the query:
MariaDB [test]> SELECT * FROM `productattributes` pA
-> WHERE productID IN ('yy', 'zz')
-> AND EXISTS (SELECT * FROM productattributes pB
-> WHERE pA.attribute = pB.attribute
-> AND pA.value = pB.value
-> AND pA.productID != pB.productID)
-> ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 1 | yy | height | 5 |
| 7 | zz | height | 5 |
+----+-----------+-----------+-------+
2 rows in set (0.00 sec)
Here's the query without the command line junk:
SELECT * FROM `productattributes` pA
WHERE productID IN ('yy', 'zz')
AND NOT EXISTS (SELECT * FROM productattributes pB
WHERE pA.attribute = pB.attribute
AND pA.value = pB.value
AND pA.productID != pB.productID)
ORDER BY productID, attribute;
EDIT:
To cover the case where there is an attribute that is in one but not the other, then the value check of the query can be removed:
MariaDB [test]> SELECT * FROM `productattributes` pA
-> WHERE productID IN ('yy', 'zz')
-> AND NOT EXISTS (SELECT * FROM productattributes pB
-> WHERE pA.attribute = pB.attribute
-> AND pA.productID != pB.productID)
-> ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 3 | yy | weight | 10 |
+----+-----------+-----------+-------+
1 row in set (0.00 sec)

query to transpose rows to columns SQLite

Good afternoon,
I would like to know if it is possible to make a query to generate columns according to the number of rows that I have in my table
example:
ID COD DIAG
111111111 | Z359 | D
111111112 | Z359 | D
111111112 | Z359 | D
111111113 | Z359 | R
111111113 | Z359 | P
111111113 | Z359 | R
111111114 | Z359 | D
111111114 | Z359 | D
111111114 | Z359 | D
111111115 | Z359 | D
it would be ideal that columns be created according to the number of rows for each id, if not possible it would put a fixed number of columns.
result query
ID | COD1 | DIAG1 | COD2 | DIAG2 | COD3 | DIAG3
111111111 | Z359 | D | | | |
111111112 | Z359 | D | Z359 | D | |
111111113 | Z359 | R | Z359 | P | Z359 | R
111111114 | Z359 | D | Z359 | D | Z359 | D
111111115 | Z359 | D | | | |
sorry my english
Thanks a Lot !!
This first query follows the pattern of the answer to the duplicate question, included here for comparison.
WITH numbered AS (
SELECT row_number() OVER
(PARTITION BY ID ORDER BY COD, DIAG)
AS seq,
t.*
FROM SO58566470 t)
SELECT ID,
max(CASE WHEN seq = 1 THEN COD END) AS COD1,
max(CASE WHEN seq = 1 THEN DIAG END) AS DIAG1,
max(CASE WHEN seq = 2 THEN COD END) AS COD1,
max(CASE WHEN seq = 2 THEN DIAG END) AS DIAG1,
max(CASE WHEN seq = 3 THEN COD END) AS COD3,
max(CASE WHEN seq = 3 THEN DIAG END) AS DIAG3
FROM numbered n
GROUP BY ID;
But that really is a naive use of window functions, since it could have maximized the window by calculating other values at the same time. The first query is already collecting and traversing partitioned rows to get the row number, yet it essentially repeats that process twice by collecting values in the next query using the aggregate max() functions.
The following query looks longer and perhaps more complicated, but it takes advantage of the partitioned data (i.e. window data) by collecting the transformed values in the same process. But because window functions necessarily operate on each row, it becomes necessary to filter out "incomplete" rows. I did not do any kind of profiling on the queries, but I suspect this second query is much more efficient overall.
WITH transform AS (
SELECT id,
lag(COD, 0) OVER IDWin AS COD1,
lag(DIAG, 0) OVER IDWin AS DIAG1,
lag(COD, 1) OVER IDWin AS COD2,
lag(DIAG, 1) OVER IDWin AS DIAG2,
lag(COD, 2) OVER IDWin AS COD3,
lag(DIAG, 2) OVER IDWin AS DIAG3,
row_number() OVER IDWin AS seq
FROM SO58566470 t
WINDOW IDWin AS (PARTITION BY ID ORDER BY COD, DIAG)
ORDER BY ID, SEQ
),
last AS (
SELECT id, max(seq) as maxseq
FROM transform
GROUP BY id
)
SELECT transform.*
FROM transform
JOIN last
ON transform.id = last.id AND transform.seq = last.maxseq
ORDER BY id;

DynamoDB How to Setup "Reverse lookup GS"

I'm trying to figure out how to implement a reverse lookup GSI in DyamoDB. I attended an amazing talk about DynamoDB at reInvent this year (https://youtu.be/HaEPXoXVf2k?t=2674). Around 44 minutes into the talk the idea of a Reverse Lookup GSI is presented. I can't figure out how to implement this in Dynamo.
I want to add a single GSI to do a reverse lookup.
My current Scheme looks like:
I would like to be able to query on just the CXSK. I'm planning on overloading the CXSK and would love to be able to do a query with a begins with for that key.
I'm not sure what I'm missing when I go to create the GSI. I'm not sure what should go in the following fields. I'm also curious if it makes sense to have an overloaded Sort Key.
Let's say this is your original table
| pk | sk | prop1 | prop2 | ...
| a | b | xyz | abc
| a | c | lmn | opq
| b | x | rst | lme
| b | b | tuv | opq
in the above table you can do queries like
select * where pk = a It will return row 1 and 2
select * where pk = a and sk = b it will return row 1
Now to do reverse lookup mean you want to aggregate data by some other field name.
Let's say we want to do it by sk. To do this we will create a GSI with sk as partitionKey and pk as SortKey. And this view of table will look like
This will be your GSI1 table
| pk | sk | prop1 | prop2 | ...
| b | a | xyz | abc
| c | a | lmn | opq
| x | b | rst | lme
| b | b | tuv | opq
in the above table you can do queries like
select * where pk = b It will return row 1 and 4
select * where pk = b and sk = a it will return row 1
Considering the above description, in your case you should create GSI with pk as CXSK and sk as USERId

SQLite UPDATE Value Using Foreign Key Reference

Given three tables with one table serving as a junction table which contains two foreign key columns, I'm trying to make an insert so that, given a TableA.prefix, TableA.number, TableB.prefix, and TableB.number, I can update the JunctionTable.is_archived column for the matching row in JunctionTable:
So while the matching row in JunctionTable currently looks like:
+----------------------------------------------------------------------+
| id | tblA_id | tblB_id | is_archived |
| 3 | 7 | 98 | 0 |
+----------------------------------------------------------------------+
And matching rows in TableA and TableB look like:
TableA
+----------------------------------------------+
| id | prefix | number |
| 7 | CLA | 754 |
+----------------------------------------------+
TableB
+----------------------------------------------+
| id | prefix | number |
| 98 | RED | 221 |
+----------------------------------------------+
I'd like to UPDATE the is_archived value like so:
+----------------------------------------------------------------------+
| id | tblA_id | tblB_id | is_archived |
| 3 | 7 | 98 | 1 |
+----------------------------------------------------------------------+
I've tried a few different statements based on information found here but they aren't valid:
UPDATE JunctionTable
SET is_archived = "1"
WHERE tblAid =
(SELECT id FROM TableA WHERE prefix = "CLA" AND number = 754)
AND tblB.id =
(SELECT id FROM TableB WHERE prefix = "RED" AND number = 221)
UPDATE JunctionTable
SET is_archived = "1"
WHERE (
LEFT JOIN TableA ON JunctionTable.tblA_id=TableA.id
WHERE TableA.course_prefix = "CLA" AND TableA.course_number = 754
LEFT JOIN TableB ON JunctionTable.tblB_id=TableB.id
WHERE TableB.course_prefix = "RED" AND TableB.course_number = 221)
In the first query, it looks like the problems are the names of the ID columns in your Junction table ("tblAid" and "tblB.id"), and you're using double quotes instead of single quotes. This should work:
UPDATE JunctionTable
SET is_archived = 1
WHERE tblA_id =
(SELECT id FROM TableA WHERE prefix = 'CLA' AND number = 754)
AND tblB_id =
(SELECT id FROM TableB WHERE prefix = 'RED' AND number = 221)

sqlite extract two row from two tables to create a new table

I have an sqlite db with two table
table1
------------------------------
TIME | ElevationA| ElevationB|
-----|-----------|-----------|
T1 | eA1 | eB1  |
T2 | eA2 | eB2 |
table2
------------------------------
TIME | Temperat A| Temperat B|
-----|-----------|-----------|
T1 | tA1 | tB1  |
T2 | tA2 | tB2 |
I am searching for a "magic" command that make a table of all parameter at a given time, e.g something that would be like:
SELECT WHERE TIME=T1 table1 AS ELEV ,table2 AS TEMP
and that would result in
table3
------------
ELEV | TEMP |
-----|----- |
eA1 | tA1 |
eB1 | tB1 |
Of course I could bash script it but I would prefer a to create a view in SQLite as it is more straightforwards and avoid to duplicate the data.
Any idea welcome
You can use:
CREATE TABLE TABLE3(ELEV,TEMP);
INSERT INTO TABLE3(ELEV,TEMP) VALUES((SELECT TIME FROM TABLE1 WHERE TIME = T1),SELECT TIME FROM TABLE2 WHERE TIME =T2));
These 2 select clauses must return the same number of records.

Resources