query to transpose rows to columns SQLite - sqlite

Good afternoon,
I would like to know if it is possible to make a query to generate columns according to the number of rows that I have in my table
example:
ID COD DIAG
111111111 | Z359 | D
111111112 | Z359 | D
111111112 | Z359 | D
111111113 | Z359 | R
111111113 | Z359 | P
111111113 | Z359 | R
111111114 | Z359 | D
111111114 | Z359 | D
111111114 | Z359 | D
111111115 | Z359 | D
it would be ideal that columns be created according to the number of rows for each id, if not possible it would put a fixed number of columns.
result query
ID | COD1 | DIAG1 | COD2 | DIAG2 | COD3 | DIAG3
111111111 | Z359 | D | | | |
111111112 | Z359 | D | Z359 | D | |
111111113 | Z359 | R | Z359 | P | Z359 | R
111111114 | Z359 | D | Z359 | D | Z359 | D
111111115 | Z359 | D | | | |
sorry my english
Thanks a Lot !!

This first query follows the pattern of the answer to the duplicate question, included here for comparison.
WITH numbered AS (
SELECT row_number() OVER
(PARTITION BY ID ORDER BY COD, DIAG)
AS seq,
t.*
FROM SO58566470 t)
SELECT ID,
max(CASE WHEN seq = 1 THEN COD END) AS COD1,
max(CASE WHEN seq = 1 THEN DIAG END) AS DIAG1,
max(CASE WHEN seq = 2 THEN COD END) AS COD1,
max(CASE WHEN seq = 2 THEN DIAG END) AS DIAG1,
max(CASE WHEN seq = 3 THEN COD END) AS COD3,
max(CASE WHEN seq = 3 THEN DIAG END) AS DIAG3
FROM numbered n
GROUP BY ID;
But that really is a naive use of window functions, since it could have maximized the window by calculating other values at the same time. The first query is already collecting and traversing partitioned rows to get the row number, yet it essentially repeats that process twice by collecting values in the next query using the aggregate max() functions.
The following query looks longer and perhaps more complicated, but it takes advantage of the partitioned data (i.e. window data) by collecting the transformed values in the same process. But because window functions necessarily operate on each row, it becomes necessary to filter out "incomplete" rows. I did not do any kind of profiling on the queries, but I suspect this second query is much more efficient overall.
WITH transform AS (
SELECT id,
lag(COD, 0) OVER IDWin AS COD1,
lag(DIAG, 0) OVER IDWin AS DIAG1,
lag(COD, 1) OVER IDWin AS COD2,
lag(DIAG, 1) OVER IDWin AS DIAG2,
lag(COD, 2) OVER IDWin AS COD3,
lag(DIAG, 2) OVER IDWin AS DIAG3,
row_number() OVER IDWin AS seq
FROM SO58566470 t
WINDOW IDWin AS (PARTITION BY ID ORDER BY COD, DIAG)
ORDER BY ID, SEQ
),
last AS (
SELECT id, max(seq) as maxseq
FROM transform
GROUP BY id
)
SELECT transform.*
FROM transform
JOIN last
ON transform.id = last.id AND transform.seq = last.maxseq
ORDER BY id;

Related

Storing attributes with multiple values in relational database

I am storing product attributes in a relational table in a MariaDB database the following way:
I have a main table, called Products which provide the name, description, and other simple information about a product, and another table, ProductAttributes, with the following structure: Id|ProductId|Attribute|Value where Id is an autoincremented primary key, and ProductId is a reference to a row in the Products table.
I can store simple attribute value relations to a product in this way, say ie, height, weight, length of a product. My problems start, when a product's attribute, ie color can have multiple possible values.
I could add multiple lines to the ProductAttributes table when storing multi-valued attributes, ie:
1|yy|color|red
2|yy|color|blue
and from this schema, I could easily retrieve a single product's attributes, but I am having trouble on how to go forward when trying to compare two products based on their attributes.
Is there any other way to store multiple values for a single attribute in a relational database to maintain their searchability?
As of now, to find similar attributed products I am doing a similar query:
SELECT * FROM ProductAttributes base
INNER JOIN ProductAttributes compare ON compare.ProductId != base.ProductId
WHERE base.Attribute = compare.Attribute
AND base.Value = compare.Value
AND base.ProductId = 'x'
GROUP BY compare.ProductId
My problem is, that this query will return the products with a red and blue color, as similar to products with a blue color.
Btw, I can not change my attributes tables to a one attribute per column representation, because I do not know from the get-go how many attributes will I have, and even if I knew, I have way too many possible attributes and differences on each product category, to represent this in a traditional table.
A possible pitfall is, that I also want to compare products to one another with missing attributes. Ie, if a product has a length attribute specified, but another one has no length attribute, they could still be similar. Right now, to make this kind of comparison, in the background, I am transposing my attributes table, to a simple table, and on that table, perform this query:
SELECT b.ProductId as BaseProduct, s.ProductId as SimProduct
FROM tmp_transposed_product_attributes b
CROSS JOIN tmp_transposed_product_attributes s ON b.ProductId != s.ProductId
WHERE (b.attribute1 = s.attribute1 OR b.attribute1 IS NULL OR s.attribute1 IS NULL)
AND (b.attribute2 = s.attribute2 OR b.attribute2 IS NULL OR s.attribute2 IS NULL) ...
If I'm following correctly for the product comparison, I like to use EXISTS or NOT EXISTS to help find things like that, which may also help avoid having to transpose the data.
For example, given this sample table data:
MariaDB [test]> select * from productattributes;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 1 | yy | height | 5 |
| 2 | yy | color | red |
| 3 | yy | weight | 10 |
| 4 | yy | length | 6 |
| 5 | yy | color | blue |
| 6 | zz | color | white |
| 7 | zz | height | 5 |
| 8 | zz | length | 8 |
+----+-----------+-----------+-------+
8 rows in set (0.00 sec)
To find all similar attributes between the two, but has different values (removes attribute/values pairs that are the same) use a NOT EXISTS query to same table like so:
MariaDB [test]> SELECT * FROM `productattributes` pA
-> WHERE productID IN ('yy', 'zz')
-> AND NOT EXISTS (SELECT * FROM productattributes pB
-> WHERE pA.attribute = pB.attribute
-> AND pA.value = pB.value
-> AND pA.productID != pB.productID)
-> ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 2 | yy | color | red |
| 5 | yy | color | blue |
| 4 | yy | length | 6 |
| 3 | yy | weight | 10 |
| 6 | zz | color | white |
| 8 | zz | length | 8 |
+----+-----------+-----------+-------+
6 rows in set (0.00 sec)
Then to find attribute/value pairs that ARE the same between the two, simply remove the NOT portion of the query:
MariaDB [test]> SELECT * FROM `productattributes` pA
-> WHERE productID IN ('yy', 'zz')
-> AND EXISTS (SELECT * FROM productattributes pB
-> WHERE pA.attribute = pB.attribute
-> AND pA.value = pB.value
-> AND pA.productID != pB.productID)
-> ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 1 | yy | height | 5 |
| 7 | zz | height | 5 |
+----+-----------+-----------+-------+
2 rows in set (0.00 sec)
Here's the query without the command line junk:
SELECT * FROM `productattributes` pA
WHERE productID IN ('yy', 'zz')
AND NOT EXISTS (SELECT * FROM productattributes pB
WHERE pA.attribute = pB.attribute
AND pA.value = pB.value
AND pA.productID != pB.productID)
ORDER BY productID, attribute;
EDIT:
To cover the case where there is an attribute that is in one but not the other, then the value check of the query can be removed:
MariaDB [test]> SELECT * FROM `productattributes` pA
-> WHERE productID IN ('yy', 'zz')
-> AND NOT EXISTS (SELECT * FROM productattributes pB
-> WHERE pA.attribute = pB.attribute
-> AND pA.productID != pB.productID)
-> ORDER BY productID, attribute;
+----+-----------+-----------+-------+
| id | productID | attribute | value |
+----+-----------+-----------+-------+
| 3 | yy | weight | 10 |
+----+-----------+-----------+-------+
1 row in set (0.00 sec)

Looking for a way to calculate aggregates without collapsing rows

As the title says, I'd like to find an efficient way to calculate aggregates over groups of rows without collapsing those rows together. For an example I want to create the mean column in the table below.
|------------|---------|-------------|
| category | value | mean(value) |
|------------|---------|-------------|
| A | 1 | 3 |
|------------|---------|-------------|
| A | 3 | 3 |
|------------|---------|-------------|
| A | 5 | 3 |
|------------|---------|-------------|
| B | 1 | 1.5 |
|------------|---------|-------------|
| B | 2 | 1.5 |
|------------|---------|-------------|
So far, the best way I've found to do this is:
T
| join kind=leftouter (T | summarize avg() by category) on category
This seems to be causing performance problems. I'm also aware of a way of doing it using partition by, but need to support having more than 64 categories.
Am I missing a good way of doing this task?
Here you go:
let MyTable = datatable(Category:string, value:long) [
"A", 1,
"A", 3,
"A", 5,
"B", 1,
"B", 2
];
let Avgs = MyTable | summarize avg(value) by Category;
MyTable | lookup (Avgs) on Category
This will output exactly what you want.
Explanation:
First you create a temporary table (using a let statement) named Avgs, where you'll have the average per Category.
Your main statement is to output MyTable, but for every category you want to also display the relevant value from Avgs, which you achieve by using the lookup operator.

SQLite Sum of Sums

Let's say I have two tables which look like this:
Games:
| AwayTeam | HomeTeam | AwayPoints | HomePoints |
------------------------------------------------------
| Aardvarks | Bobcats | 2 | 1 |
| Bobcats | Caterpillars | 20 | 10 |
| Aardvarks | Caterpillars | 200 | 100 |
Teams:
| Name |
----------------
| Aardvarks |
| Bobcats |
| Caterpillars |
How can I make a result which looks like this?
| Name | TotalPoints |
------------------------------
| Aardvarks | 202 |
| Bobcats | 21 |
| Caterpillars | 110 |
I think my real problem is how to splice statements together in SQL. These two statements work well individually:
SELECT SUM ( AwayPoints )
FROM Games
WHERE AwayTeam='Bobcats';
SELECT SUM ( HomePoints )
FROM Games
WHERE HomeTeam='Bobcats';
I suspect that I need a compound operator if I want to splice two SELECT statements togeather. Then pass that statement into the aggregate expression below:
SELECT Name, SUM( aggregate_expression )
AS 'TotalPoints'
FROM Teams
GROUP BY Name;
If I had to just throw it all together, I think I'd end up with something like this:
SELECT Name, SUM (
SELECT SUM ( AwayPoints )
FROM Games
WHERE AwayTeam=Name
UNION
SELECT SUM ( HomePoints )
FROM Games
WHERE HomeTeam=Name
)
AS 'TotalPoints'
FROM Teams
GROUP BY Name;
However that doesn't work because SELECT SUM ( SELECT ... is completely invalid
Use a UNION ALL
SELECT team, SUM(points)
FROM (
SELECT HomeTeam AS team, SUM(HomePoints) AS points
FROM Games
GROUP BY HomeTeam
UNION ALL
SELECT AwayTeam AS team, SUM(AwayPoints) AS points
FROM Games
GROUP BY AwayTeam
)
GROUP BY team
See SQLite documentation for
SELECT expr AS alias
SELECT ... FROM ( select-stmt ) AS table-alias

SQLite: Selecting numeric columns with values >= 2^31 fails

According to the SQLite documentation, numeric columns can store integers up to 8 bytes in size. However, I am having trouble actually selecting the values once they are stored:
create table x (a integer, b integer, c integer);
insert into x values ('2147483647', '2147483647', '2147483647');
select * from x;
+------------+------------+------------+
| a | b | c |
+------------+------------+------------+
| 2147483647 | 2147483647 | 2147483647 |
+------------+------------+------------+
1 row in set
insert into x values ('2147483648', '2147483649', '2147483648');
select * from x;
Nothing happens - no rows are returned
select count(1) from x;
+----------+
| count(1) |
+----------+
| 2 |
+----------+
1 row in set
select substr(a, 1, 5) from x;
+-----------------+
| substr(a, 1, 5) |
+-----------------+
| 21474 |
| 21474 |
+-----------------
How can I retrieve the actual values properly, while retaining the integer data type? (Changing it to REAL or TEXT works as expected, but is there no other choice?)
Edit: I used Navicat for the demonstration above, but also ran into the issue with my actual application (using node-sqlite3).

Inserting Duplicate Records into a Temporary Table

I have a table ABC with duplicate records. I want to Insert only the duplicate records into another table ABC_DUPE in same schema using Bteq.
Any suggestions ?
Thanks,
Mukesh
You can use QUALIFY statement to identify and output duplicates:
Since you didn't share your table... Then consider the following ABC table:
+----+----+----+
| f1 | f2 | f3 |
+----+----+----+
| 1 | a | x |
| 1 | b | y |
| 2 | a | z |
| 2 | b | w |
| 2 | a | n |
+----+----+----+
Where a unique record is determined by using fields f1 and f2. In this example the record where f1=2 and f2='a' is a duplicate with f3 values z and n. To output these we use qualify:
SELECT *
FROM ABC
QUALIFY COUNT(*) OVER (PARTITION BY f1, f2) > 1;
QUALIFY uses Window functions to determine which records to include in the outputted record set. Here we use window function COUNT(*) partitioning by our unique composite key f1, f2. We keep only records where the Count(*) over that partition is greater than 1.
This will output:
+----+----+----+
| f1 | f2 | f3 |
+----+----+----+
| 2 | a | z |
| 2 | a | n |
+----+----+----+
You can use this in a CREATE TABLE statement like:
CREATE TABLE ABC_DUPE AS
(
SELECT *
FROM ABC
QUALIFY COUNT(*) OVER (PARTITION BY f1, f2) > 1
) PRIMARY INDEX (f1, f2);

Resources