KQL window functions - how to partition by multiple columns? - azure-data-explorer

Input table dimVehicleV1:
SaleStart
Product
Model
1/1/2020
Car
1
1/2/2020
Bike
1
2/1/2020
Car
2
3/1/2020
Bike
2
Desired output dimVehicleV2:
SaleStart
Product
Model
SaleEnd
1/1/2020
Car
1
2/1/2020
1/2/2020
Bike
1
3/1/2020
2/1/2020
Car
2
null
3/1/2020
Bike
2
null
I see serialization via order by, and then the next() function. I don't see how to make it respect the Product column groupings though.
Fail query:
let dimVehicleV2 =
dimVehicleV1
| order by Product asc, SaleStart asc
| extend SaleEnd = next(SaleStart, 1);
dimVehicleV2
How does one use the next() function so that it respects column groups?

If I understand your question correctly, this should work:
datatable(SaleStart:datetime, Product:string, Model:int)
[
datetime(1/1/2020), 'Car', 1,
datetime(1/2/2020), 'Bike', 1,
datetime(2/1/2020), 'Car', 2,
datetime(3/1/2020), 'Bike', 2,
]
| order by Product asc, SaleStart asc
| extend SaleEnd = iff(next(Product) == Product and next(Model) != Model, next(SaleStart), datetime(null))
SaleStart
Product
Model
SaleEnd
2020-01-01 00:00:00.0000000
Car
1
2020-02-01 00:00:00.0000000
2020-01-02 00:00:00.0000000
Bike
1
2020-03-01 00:00:00.0000000
2020-02-01 00:00:00.0000000
Car
2
2020-03-01 00:00:00.0000000
Bike
2

I came to this post searching for an answer to the question actually in the title of this post: "How to partition by multiple columns?"
In case someone else needs, here is what I ended up doing: extend the domain by creating a new column that combines the values of the multiple columns you want, and use that new column as the partition key.
You can combine the columns by using concatenation, or a hash, or something else.
dimVehicleV1
| extend PartitionKey = strcat(Product, ":", Model)
| partition hint.strategy=native by PartitionKey (top 1 by SaleStart) // or wharever partition transformation

In case useful to anyone, I found a solution I prefer over Yoni's perfectly adequate one.
let MyTable = datatable(SaleStart:datetime, Product:string, Model:int)
[
datetime(1/1/2020), 'Car', 1,
datetime(1/2/2020), 'Bike', 1,
datetime(2/1/2020), 'Car', 2,
datetime(3/1/2020), 'Bike', 2,
];
MyTable
| partition by Product
(
order by Model asc
| extend SaleEnd = next(SaleStart)
)
This seems to me to abstract away the details of the logic required, expressing just the thought.

Related

Enforce uniqueness within a date range or based on the value of another column

I have a table with a large amount of data; moving forward, I would like to enforce uniqueness for a given column in this table. However, the table contains a large amount of rows where that column is non-unique. I am not able to delete or alter these rows.
Is it possible to enforce uniqueness over a given date range, or since a specific date, or based on the value of another column (or something else like that) in MariaDB?
You can create a UNIQUE index on multiple columns, where one column is nullable. MariaDB will see each column with NULL values as a different value regarding the UNIQUE index, even if the other column values of the UNIQUE index are the same. Check the MariaDB documentation Getting Started with Indexes - Unique Index:
The fact that a UNIQUE constraint can be NULL is often overlooked. In SQL any NULL is never equal to anything, not even to another NULL. Consequently, a UNIQUE constraint will not prevent one from storing duplicate rows if they contain null values:
CREATE TABLE t1 (a INT NOT NULL, b INT, UNIQUE (a,b));
INSERT INTO t1 values (3,NULL), (3, NULL);
SELECT * FROM t1;
+---+------+
| a | b |
+---+------+
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | NULL |
| 3 | NULL |
+---+------+
You can create such a UNIQUE index on the date column you already have and a new column which indicates if the date value should be unique or not:
CREATE TABLE Foobar(
id INT AUTO_INCREMENT PRIMARY KEY NOT NULL,
createdAt DATE NOT NULL,
dateUniqueMarker BIT NULL DEFAULT 0,
UNIQUE KEY uq_createdAt(createdAt, dateUniqueMarker)
);
INSERT INTO Foobar(createdAt) VALUES ('2021-11-04'),('2021-11-05'),('2021-11-06');
SELECT * FROM Foobar;
+----+------------+------------------------------------+
| id | createdAt | dateUniqueMarker |
+----+------------+------------------------------------+
| 1 | 2021-11-04 | 0x00 |
| 2 | 2021-11-05 | 0x00 |
| 3 | 2021-11-06 | 0x00 |
+----+------------+------------------------------------+
INSERT INTO Foobar(createdAt) VALUES ('2021-11-05');
ERROR 1062 (23000): Duplicate entry '2021-11-05-\x00' for key 'Foobar.uq_createdAt'
UPDATE Foobar SET dateUniqueMarker = NULL WHERE createdAt = '2021-11-05';
INSERT INTO Foobar(createdAt, dateUniqueMarker) VALUES ('2021-11-05', NULL);
SELECT * FROM Foobar;
+----+------------+------------------------------------+
| id | createdAt | dateUniqueMarker |
+----+------------+------------------------------------+
| 1 | 2021-11-04 | 0x00 |
| 2 | 2021-11-05 | NULL |
| 5 | 2021-11-05 | NULL |
| 3 | 2021-11-06 | 0x00 |
+----+------------+------------------------------------+
Without any data example and scenario illustration, it's hard to know. If you can update your question with those information, please do.
"Is it possible to enforce uniqueness over a given date range, or since a specific date, or based on the value of another column (or something else like that) in MariaDB?"
If by "enforce" you mean to create a new column then populate it with unique identifier, then yes it is possible. If what you really mean is to generate a unique value based on other column, that's also possible. Question is, how unique do you want it to be?
Is it like this unique?
column1
column2
column3
unique_val
2021-02-02
ABC
DEF
1
2021-02-02
CBD
FEA
1
2021-02-03
BED
GER
2
2021-02-04
ART
TOY
3
2021-02-04
ZSE
KSL
3
Whereby if it's the same date (on column1), it should have the same unique value regardless of column2 & column3 data.
Or like this?
column1
column2
column3
unique_val
2021-02-02
ABC
DEF
1
2021-02-02
CBD
FEA
2
2021-02-03
BED
GER
3
2021-02-04
ART
TOY
4
2021-02-04
ZSE
KSL
5
Taking all (or certain) columns to consider the unique value.
Both of the scenario above can be achieved in query without the need to alter the table, adding and populate a new column but of course, the latter is also possible.

How to select a limited amount of values in a complex column in Hive?

I have a table with an id, name and proficiency. The proficiency column is of a complex column with map data type. How do I limit the amount of data to 2 shown in the complex map data type?
Example table
ID | name | Proficiency
003 | John | {"Cooking":3, "Talking":6 , "Chopping":8, "Teaching":5}
005 | Lennon | {"Cooking":3, "Programming":6 }
007 | King | {"Chopping":8, "Boxing":5 ,"shooting": 4}
What i want to show after the select statement
ID | name | Proficiency
003 | John | {"Cooking":3, "Talking":6 }
005 | Lennon | {"Cooking":3, "Programming":6 }
007 | King | {"Chopping":8, "Boxing":5 }
For fixed number of map elements required this can be done easily using map_keys() and map_values() functions which return arrays of keys and values, you can access key and value using array index, then assemble map again using map() function:
with MyTable as -------use your table instead of this subquery
(select stack(3,
'003', 'John' , map("Cooking",3, "Talking",6 , "Chopping",8, "Teaching",5),
'005', 'Lennon', map("Cooking",3, "Programming",6 ),
'007', 'King' , map("Chopping",8, "Boxing",5 ,"shooting", 4)
) as (ID, name, Proficiency)
) -------use your table instead of this
select t.ID, t.name,
map(map_keys(t.Proficiency)[0], map_values(t.Proficiency)[0],
map_keys(t.Proficiency)[1], map_values(t.Proficiency)[1]
) as Proficiency
from MyTable t
Result:
t.id t.name proficiency
003 John {"Cooking":3,"Talking":6}
005 Lennon {"Cooking":3,"Programming":6}
007 King {"Boxing":5,"shooting":4}
Map does not guarantee the order by definition, and map_keys, map_values return unordered arrays by definition, but they are in the same order when used in the same subquery, so keys are matching to their corresponding values.

Interactive Grid:Process with PL/SQL only that isn't based off a table

Env: Oracle APEX v5.1 with Oracle 12c Release 2
Firstly, I have created an Interactive Grid that isn't based off an underlying table as I will process this manually using PL/SQL.
I have been using the following as a guide:
https://apex.oracle.com/pls/apex/germancommunities/apexcommunity/tipp/6361/index-en.html
I basically have the following query:
select
level as id,
level as grid_row,
null as product,
null as product_item
from dual connect by level <= 1
Concentrating on just the product and product_item columns where the product_item column will be a readonly column and only the product number can be entered, I would like to achieve the following:
Product Product Item
---------- -------------
123456 123456-1
123456 123456-2
556677 556677-1
654321 654321-1
654321 654321-2
654321 654321-3
123456 123456-3
From the above, as the user types in the Product and then tabs out of the field, I would like a DA to fire that will add the sequence of "-1" to the end of that product number. Then is the user then adds another row within the IG and enters the same product number, I then want it to append "-2" to the end of it.
Only when the product changes number, I need the sequence to reset to "-1" for that new product as per 556677 and so forth.
Other scenarios that should also be taken into consideration are as follows:
From above IG, the user entered 123456 again but this should calculate that the next sequence for 123456 is "-3"
The same needs to be catered for, when a Product is removed from the IG but to always look at the max sequence number for that product.
I was thinking of possibly using APEX_COLLECTIONS as a means of storing what is currently in the grid, since no changes have been committed to the database.
Assuming you have a collection of product values (in this case, I am using the built-in SYS.ODCINUMBERLIST which is a VARRAY data type) then the SQL for your output would be:
SELECT id,
id AS grid_row,
product,
product || '-' || ROW_NUMBER() OVER ( PARTITION BY product ORDER BY id )
AS product_item
FROM (
SELECT ROWNUM AS id,
COLUMN_VALUE AS product
FROM TABLE(
SYS.ODCINUMBERLIST(
123456,
123456,
556677,
654321,
654321,
654321,
123456
)
)
)
ORDER BY id
Output:
ID | GRID_ROW | PRODUCT | PRODUCT_ITEM
-: | -------: | ------: | :-----------
1 | 1 | 123456 | 123456-1
2 | 2 | 123456 | 123456-2
3 | 3 | 556677 | 556677-1
4 | 4 | 654321 | 654321-1
5 | 5 | 654321 | 654321-2
6 | 6 | 654321 | 654321-3
7 | 7 | 123456 | 123456-3
db<>fiddle here
As you mentioned, the data you enter is not saved into the DB whilst you are inserting your products, so it is not in fact stored anywhere.
So you cannot go check if that value already exists and enter a -2 or other.
Some things to consider would be to maybe save the values into a temp table so you can then have a function go check how many product_item like 123456-% are in there and use that number +1 as your new product_item.
Or you could go the even harder way and do it all with javascript. For this you will need to somehow get all records in the IG, go through them all and see how many occurences of 123456 you have and then insert 123456-(no of occurences + 1).

Creating a view in a not relational database

I had an issue and I hope that someone could help me out. In fact, I work on a poorly designed database and I have no control to change things in it. I have a table "Books", and each book can have one or more author. Unfortunately the database is not fully relational (please don't ask me why because I am asking the same question from the beginning). In the table "Books" there is a field called "Author_ID" and "Author_Name", so when a book was written by 2 or 3 authors their IDs and Their names will be concatenated in the same record separated by an star. Here is a demonstration:
ID_BOOK | ID_AUTHOR | NAME AUTHOR | Adress | Country |
----------------------------------------------------------------------------------
001 |01 | AuthorU | AdrU | CtryU |
----------------------------------------------------------------------------------
002 |02*03*04 | AuthorX*AuthorY*AuthorZ | AdrX*NULL*AdrZ | NULL*NULL*CtryZ |
----------------------------------------------------------------------------------
I need to create a view against this table that would give me this result:
ID_BOOK | ID_AUTHOR | NAME AUTHOR | Adress | Country |
----------------------------------------------------------------------------------
001 |01 | AuthorU | AdrU | CtryU |
----------------------------------------------------------------------------------
002 |02 | AuthorX | AdrX | NULL |
----------------------------------------------------------------------------------
002 |03 | AuthorY | NULL | NULL |
----------------------------------------------------------------------------------
002 |04 | AuthorZ | AdrZ | CtryZ |
----------------------------------------------------------------------------------
I will continue trying to do it and I hope that someone could help me with at least some hints. Many thanks guys.
After I applied the solution given by you guys I got this problem. I am trying to solve it and hopefully you can help me. In fact, when the sql query run, the CLOB fields are disorganized when some of them contain NULL value. The reslut should be like above, but i got the result below:
ID_BOOK | ID_AUTHOR | NAME AUTHOR | Adress | Country |
----------------------------------------------------------------------------------
001 |01 | AuthorU | AdrU | CtryU |
----------------------------------------------------------------------------------
002 |02 | AuthorX | AdrX | CtryZ |
----------------------------------------------------------------------------------
002 |03 | AuthorY | AdrZ | NULL |
----------------------------------------------------------------------------------
002 |04 | AuthorZ | NULL | NULL |
----------------------------------------------------------------------------------
Why does it put the NULL values in the end? Thank you.
in 11g you can use a factored recursive sub query for this:
with data (id_book, id_author, name, item_author, item_name, i)
as (select id_book, id_author, name,
regexp_substr(id_author, '[^\*]+', 1, 1) item_author,
regexp_substr(name, '[^\*]+', 1, 1) item_name,
2 i
from books
union all
select id_book, id_author, name,
regexp_substr(id_author, '[^\*]+', 1, i) item_author,
regexp_substr(name, '[^\*]+', 1, i) item_name,
i+1
from data
where regexp_substr(id_author, '[^\*]+', 1, i) is not null)
select id_book, item_author, item_name
from data;
fiddle
A couple weeks ago I answered a similar question here. That answer has an explanation (I hope) of the general approach so I'll skip the explanation here. This query will do the trick; it uses REGEXP_REPLACE and leverages its "occurrence" parameter to pick the individual author ID's and names:
SELECT
ID_Book,
REGEXP_SUBSTR(ID_Author, '[^*]+', 1, Counter) AS AuthID,
REGEXP_SUBSTR(Name_Author, '[^*]+', 1, Counter) AS AuthName
FROM Books
CROSS JOIN (
SELECT LEVEL Counter
FROM DUAL
CONNECT BY LEVEL <= (
SELECT MAX(REGEXP_COUNT(ID_Author, '[^*]+'))
FROM Books))
WHERE REGEXP_SUBSTR(Name_Author, '[^*]+', 1, Counter) IS NOT NULL
ORDER BY 1, 2
There's a Fiddle with your data plus another row here.
Addendum: OP has Oracle 9, not 11, so regular expressions won't work. Following are instructions for doing the same task without regexes...
Without REGEXP_COUNT, the best way count authors is to count the asterisks and add one. To count asterisks, take the length of the string, then subtract its length when all the asterisks are sucked out of it: LENGTH(ID_Author) - LENGTH(REPLACE(ID_Author, '*')).
Without REGEX_SUBSTR, you need to use INSTR to find the position of the asterisks, and then SUBSTR to pull out the author IDs and names. This gets a little complicated - consider these Author columns from your original post:
Author U
Author X*Author Y*Author Z
AuthorX lies between the beginning the string and the first asterisk.
AuthorY is surrounded by asterisks
AuthorZ lies between the last asterisk and the end of the string.
AuthorU is all alone and not surrounded by anything.
Because of this, the opening piece (WITH AuthorInfo AS... below) adds an asterisk to the beginning and the end so every author name (and ID) is surrounded by asterisks. It also grabs the author count for each row. For the sample data in your original post, the opening piece will yield this:
ID_Book AuthCount ID_Author Name_Author
------- --------- ---------- -------------------------
001 1 *01* *AuthorU*
002 3 *02*03*04* *AuthorX*AuthorY*AuthorZ*
Then comes the join with the "Counter" table and the SUBSTR machinations to pull out the individual names and IDs. The final query looks like this:
WITH AuthorInfo AS (
SELECT
ID_Book,
LENGTH(ID_Author) -
LENGTH(REPLACE(ID_Author, '*')) + 1 AS AuthCount,
'*' || ID_Author || '*' AS ID_Author,
'*' || Name_Author || '*' AS Name_Author
FROM Books
)
SELECT
ID_Book,
SUBSTR(ID_Author,
INSTR(ID_Author, '*', 1, Counter) + 1,
INSTR(ID_Author, '*', 1, Counter+1) - INSTR(ID_Author, '*', 1, Counter) - 1) AS AuthID,
SUBSTR(Name_Author,
INSTR(Name_Author, '*', 1, Counter) + 1,
INSTR(Name_Author, '*', 1, Counter+1) - INSTR(Name_Author, '*', 1, Counter) - 1) AS AuthName
FROM AuthorInfo
CROSS JOIN (
SELECT LEVEL Counter
FROM DUAL
CONNECT BY LEVEL <= (SELECT MAX(AuthCount) FROM AuthorInfo))
WHERE AuthCount >= Counter
ORDER BY ID_Book, Counter
The Fiddle is here
If you have an authors table, you can do:
select b.id_book, a.id_author, a.NameAuthor
from books b left outer join
authors a
on '*'||NameAuthor||'*' like '%*||a.author||'*%'
In addition:
SELECT distinct id_book,
, trim(regexp_substr(id_author, '[^*]+', 1, LEVEL)) id_author
, trim(regexp_substr(author_name, '[^*]+', 1, LEVEL)) author_name
FROM yourtable
CONNECT BY LEVEL <= regexp_count(id_author, '[^*]+')
ORDER BY id_book, id_author
/
ID_BOOK ID_AUTHOR AUTHOR_NAME
------------------------------------
001 01 AuthorU
002 02 AuthorX
002 03 AuthorY
002 04 AuthorZ
003 123 Jane Austen
003 456 David Foster Wallace
003 789 Richard Wright
No REGEXP:
SELECT str, SUBSTR(str, substr_start_pos, substr_end_pos) final_str
FROM
(
SELECT str, substr_start_pos
, (CASE WHEN substr_end_pos <= 0 THEN (Instr(str, '*', 1)-1)
ELSE substr_end_pos END) substr_end_pos
FROM
(
SELECT distinct '02*03*04' AS str
, (Instr('02*03*04', '*', LEVEL)+1) substr_start_pos
, (Instr('02*03*04', '*', LEVEL)-1) substr_end_pos
FROM dual
CONNECT BY LEVEL <= length('02*03*04')
)
ORDER BY substr_start_pos
)
/
STR FINAL_STR
---------------------
02*03*04 02
02*03*04 03
02*03*04 04

SQLite Compare two columns

I am creating a database for my Psych class and I am scoring a personality profile. I need to compare two test items and, if they match a condition, then copy into a separate table.
Example (pseudocode is between \)Sqlite3
INSERT INTO Scale
SELECT* FROM Questions
WHERE \\if Question 1 IS 'TRUE' AND Question 3 IS 'FALSE' THEN Copy this Question
and its response into the Scale table\\;
I have about 100 other questions that work like this. Sample format goes like this:
IF FirstQuestion IS value AND SecondQuestion IS value THEN
Copy both questions into the Scale TABLE.
---------- EDITED AFTER FIRST RESPONSE! EDITS FOLLOW-------------
Here is my TestItems table:
ItemID | ItemQuestion | ItemResponse
```````````````````````````````````````````````````
1 | Is the sky blue? | TRUE
2 | Are you a person? | TRUE
3 | 2 Plus 2 Equals Five | FALSE
What I want to do: If Question 1 is TRUE AND Question 3 is FALSE, then insert BOTH questions into the table 'Scale' (which is setup like TestItems). I tried this:
INSERT INTO Scale
SELECT * FROM TestItems
WHERE ((ItemID=1) AND (ItemResponse='TRUE'))
AND ((ItemID=3) AND (ItemResponse='FALSE'));
HOWEVER: The above INSERT copies neither.
The Resulting 'Scale' table should look like this:
ItemID | ItemQuestion | ItemResponse
```````````````````````````````````````````````````
1 | Is the sky blue? | TRUE
3 | 2 Plus 2 Equals Five | FALSE
There is nothing wrong with your query. You're just there:
INSERT INTO Scale
SELECT * FROM Questions
WHERE `Question 1` = 1 AND `Question 3` = 0;
Here 1 and 0 are values (in your first case, true and false). First of all you should ensure there are fields Question 1 and Question 3 in your Questions table. Secondly the column count as well as data types of Scale table should match Questions table. Otherwise you will have to do selectively choose the fields in your SELECT query.
Edit: To respond to your edit, I am not seeing an elegant solution. You could do this:
INSERT INTO Scale
SELECT * FROM TestItems WHERE ItemID = 1 AND ItemResponse = 'TRUE'
UNION
SELECT * FROM TestItems WHERE ItemID = 3 AND ItemResponse = 'FALSE'
WHERE (SELECT COUNT(*) FROM (
SELECT 1 FROM TestItems WHERE ItemID = 1 AND ItemResponse = 'TRUE'
UNION
SELECT * FROM TestItems WHERE ItemID = 3 AND ItemResponse = 'FALSE'
) AS t) >= 2
Your insert did not work because ItemID cant be both 1 and 3 at the same time. My solution gets the required records to be inserted into Scale table, but verifies both the record exists by checking the count. Additionally you could (should) do as below since this can be marginally more efficient (the above SQL was to clearly show the logic being used):
INSERT INTO Scale
SELECT * FROM TestItems WHERE ItemID = 1 AND ItemResponse = 'TRUE'
UNION
SELECT * FROM TestItems WHERE ItemID = 3 AND ItemResponse = 'FALSE'
WHERE (
SELECT COUNT(*)
FROM TestItems
WHERE ItemID = 1 AND ItemResponse = 'TRUE'
OR ItemID = 3 AND ItemResponse = 'FALSE'
) >= 2

Resources