How to merge records based on consective fields in Teradata - teradata

I have a source like below table:
+---------+--+--------+--+---------+--+--+------+
| ID | | SEQ_NO | | UNIT_ID | | | D_ID |
+---------+--+--------+--+---------+--+--+------+
| 7979092 | | 1 | | 99 | | | 759 |
| 7979092 | | 2 | | -1 | | | 869 |
| 7979092 | | 3 | | -1 | | | 927 |
| 7979092 | | 4 | | -1 | | | 812 |
| 7979092 | | 5 | | 99 | | | 900 |
| 7979092 | | 6 | | 99 | | | 891 |
| 7979092 | | 7 | | -1 | | | 785 |
| 7979092 | | 8 | | -1 | | | 762 |
| 7979092 | | 9 | | -1 | | | 923 |
+---------+--+--------+--+---------+--+--+------+
I have to merge the rows when consecutive unit_id has same value. We should take max(D_id) when we consolidate the rows. Expected output is:
+---------+---------+------+
| ID | UNIT_ID | D_ID |
+---------+---------+------+
| 7979092 | 99 | 759 |
| 7979092 | -1 | 927 |
| 7979092 | 99 | 900 |
| 7979092 | -1 | 923 |
+---------+---------+------+
I have tried to find the solution using Teradata ordered analytical function, but did not find the solution. I use Teradata 16.
Thank You.

This logic is a bit quirky, it's based on two sequences created by different sort orders:
SELECT
ID
,UNIT_ID
,Max(D_ID)
FROM
(
SELECT
ID
,SEQ_NO
,UNIT_ID
,D_ID
-- assign the same value to consecutive UNIT_IDs
,SEQ_NO -
Row_Number()
Over(PARTITION BY ID, UNIT_ID
ORDER BY SEQ_NO) AS grp
FROM tab
) AS dt
GROUP BY 1,2,grp

You can use RESET WHEN to dynamically create groups within the window. Here's one way to do it:
select ID, UNIT_ID,
max(D_ID) over(
partition by ID order by SEQ_NO
reset when UNIT_ID <> UNIT_ID_prev -- Create new group for new value
) as D_ID
from (
select ID, SEQ_NO, UNIT_ID, D_ID,
lag(UNIT_ID) over(partition by ID order by SEQ_NO) as UNIT_ID_prev -- Previous value
from MY_TABLE
) src
qualify row_number() over(
partition by ID order by SEQ_NO
reset when UNIT_ID <> UNIT_ID_prev -- Match original max() window
) = 1 -- One row per group (similar to DISTINCT)

Related

change column type and convert the existing values from string to integer in mariadb

I have a table name employees
MariaDB [company]> describe employees;
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| employee_id | char(10) | NO | | NULL | |
| first_name | varchar(20) | NO | | NULL | |
| last_name | varchar(20) | NO | | NULL | |
| email | varchar(60) | NO | | NULL | |
| phone_number | char(14) | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| job_id | int(11) | NO | | NULL | |
| salary | varchar(30) | NO | | NULL | |
| commission_pct | char(10) | NO | | NULL | |
| manager_id | char(10) | NO | | NULL | |
| department_id | char(10) | NO | | NULL | |
+----------------+-------------+------+-----+---------+-------+
MariaDB [company]> select * from employees;
+-------------+-------------+-------------+--------------------+---------------+------------+--------+----------+----------------+------------+---------------+
| employee_id | first_name | last_name | email | phone_number | hire_date | job_id | salary | commission_pct | manager_id | department_id |
+-------------+-------------+-------------+--------------------+---------------+------------+--------+----------+----------------+------------+---------------+
| 100 | Steven | King | sking#gmail.com | 515.123.4567 | 2003-06-17 | 1 | 24000.00 | 0.00 | 0 | 90 |
| 101 | Neena | Kochhar | nkochhar#gmail.com | 515.123.4568 | 2005-09-21 | 2 | 17000.00 | 0.00 | 100 | 90 |
| 102 | Lex | Wow | Lwow#gmail.com | 515.123.4569 | 2001-01-13 | 2 | 17000.00 | 0.00 | 100 | 9 |
| 103 | Alexander | Hunold | ahunold#gmail.com | 590.423.4567 | 2006-01-03 | 3 | 9000.00 | 0.00 | 102 | 60 |
| 104 | Bruce | Ernst | bernst#gmail.com | 590.423.4568 | 2007-05-21 | 3 | 6000.00 | 0.00 | 103 | 60 |
| 105 | David | Austin | daustin#gmail.com | 590.423.4569 | 2005-06-25 | 3 | 4800.00 | 0.00 | 103 | 60 |
| 106 | Valli | Pataballa | vpatabal#gmail.com | 590.423.4560 | 2006-02-05 | 3 | 4800.00 | 0.00 | 103 | 60 |
| 107 | Diana | Lorentz | dlorentz#gmail.com | 590.423.5567 | 2007-02-07 | 3 | 4200.00 | 0.00 | 103 | 60 |
| 108 | Nancy | Greenberg | ngreenbe#gmail.com | 515.124.4569 | 2002-08-17 | 4 | 12008.00 | 0.00 | 101 | 100 |
| 109 | Daniel | Faviet | dfaviet#gmail.com | 515.124.4169 | 2002-08-16 | 5 | 9000.00 | 0.00 | 108 | 100 |
| 110 | John | Chen | jchen#gmail.com | 515.124.4269 | 2005-09-28 | 5 | 8200.00 | 0.00 | 108 | 100 |
| 111 | Ismael | Sciarra | isciarra#gmail.com | 515.124.4369 | 2005-09-30 | 5 | 7700.00 | 0.00 | 108 | 100 |
| 112 | Jose | Urman | jurman#gmail.com | 515.124.4469 | 2006-03-07 | 5 | 7800.00 | 0.00 | 108 | 100 |
| 113 | Luis | Popp | lpopp#gmail.com | 515.124.4567 | 2007-12-07 | 5 | 6900.00 | 0.00 | 108 | 100 |
| 114 | Den | Raphaely | drapheal#gmail.com | 515.127.4561 | 2002-12-07 | 6 | 11000.00 | 0.00 | 100 | 30 |
| 115 | Alexander | Khoo | akhoo#gmail.com | 515.127.4562 | 2003-05-18 | 7 | 3100.00 | 0.00 | 114 | 30 |
+-------------+-------------+-------------+--------------------+---------------+------------+--------+----------+----------------+------------+---------------+
I wanted to change the salary column from string to integer. So, I ran this command
MariaDB [company]> alter table employees modify column salary int;
ERROR 1292 (22007): Truncated incorrect INTEGER value: '24000.00'
As you can see it gave me truncation error. I found some previous questions where they showed how to use convert() and trim() but those actually didn't answer my question.
sql code and data can be found here https://0x0.st/oYoB.com_5zfu
I tested this on MySQL and it worked fine. So it is apparently an issue only with MariaDB.
The problem is that a string like '24000.00' is not an integer. Integers don't have a decimal place. So in strict mode, the implicit type conversion fails.
I was able to work around this by running this update first:
update employees set salary = round(salary);
The column is still a string, but '24000.00' has been changed to '24000' (with no decimal point character or following digits).
Then you can alter the data type, and implicit type conversion to integer works:
alter table employees modify column salary int;
See demonstration using MariaDB 10.6:
https://dbfiddle.uk/V6LrEMKt
P.S.: You misspelled the column name "commission_pct" as "comission_pct" in your sample DDL, and I had to edit that to test. In the future, please use one of the db fiddle sites to share samples, because they will test your code.

how to solve problem running code for MySQL8 on MySQL 5.7?

I have the following data:
+---------+--------+----------+------+-------+--------+-----------+
| xType | xAccID | xAccName | xCat | xYear | xMonth | xRaseed |
+---------+--------+----------+------+-------+--------+-----------+
| Amounts | 52 | Acc1 | Rs | 2020 | 11 | 3144.83 |
| Amounts | 52 | Acc1 | Rs | 2020 | 12 | -15199.64 |
| Amounts | 53 | Acc2 | Cus | 2020 | 12 | 5306.04 |
| Amounts | 53 | Acc2 | Cus | 2020 | 11 | 1090.64 |
+---------+--------+----------+------+-------+--------+-----------+
actually, I want to sum the (xRaseed) in the current row with the (xRaseed) in the previous row For each (xAccID) separately
the result that I want:
+---------+--------+----------+------+-------+--------+--------------------------------+
| xType | xAccID | xAccName | xCat | xYear | xMonth | xRaseed |
+---------+--------+----------+------+-------+--------+--------------------------------+
| Amounts | 52 | Acc1 | Rs | 2020 | 11 | 3144.83 |
| Amounts | 52 | Acc1 | Rs | 2020 | 12 | Not -15199.64 But (-12,054.81) |
| Amounts | 53 | Acc2 | Cus | 2020 | 12 | 5306.04 |
| Amounts | 53 | Acc2 | Cus | 2020 | 11 | Not 1090.64 But (6,396.68) |
+---------+--------+----------+------+-------+--------+--------------------------------+
I applied the following solution that I got from somebody here:
select t.*,
sum(xRaseed) over (partition by xAccID order by xYear, xMonth) as running_xRaseed
from t;
but everything was working in the local server but when I applied the solution on my hosting, didn't work?? in the local I use (xampp - 10.4.17-MariaDB), and in my hosting, I use (MySQL 5.7.23-23), what's the problem, please?
Here is a db<>fiddle
On versions of MySQL earlier than 8+, we can use a correlated subquery to find the rolling sum:
SELECT xType, xAccID, xAccName, xCat, xYear, xMonth,
(SELECT SUM(t2.xRaseed) FROM yourTable t2
WHERE t2.xAccID = t1.xAccID AND
(t2.xYear < t1.xYear OR
t2.xYear = t1.xYear AND t2.xMonth <= t1.xMonth)) AS xRaseed
FROM yourTable t1
ORDER BY
xAccId,
xYear,
xMonth;

add and subtract by type

I have a SQLite table payments:
+------+--------+-------+
| user | amount | type |
+------+--------+-------+
| AAA | 100 | plus |
| AAA | 200 | plus |
| AAA | 50 | minus |
| BBB | 100 | plus |
| BBB | 20 | minus |
| BBB | 5 | minus |
| CCC | 200 | plus |
| CCC | 300 | plus |
| CCC | 25 | minus |
I need to calculate the sum with type 'plus' and subtract from it the sum with type 'minus' for each user.
The result table should look like this:
+------+--------+
| user | total |
+------+--------+
| AAA | 250 |
| BBB | 75 |
| CCC | 475 |
I think that my query is terrible, and I need help to improve it:
select user,
(select sum(amount) from payments as TABLE1 WHERE TABLE1.type = 'plus' AND
TABLE1.user= TABLE3.user) -
(select sum(amount) from payments as TABLE2 WHERE TABLE2.type = 'minus' AND
TABLE2.user= TABLE3.user) as total
from payments as TABLE3
group by client
order by id asc
The type is easier handled with a CASE expression. And then you can merge the aggregation into the outer query:
SELECT user,
SUM(CASE type
WHEN 'plus' THEN amount
WHEN 'minus' THEN -amount
END) AS total
FROM payments
GROUP BY client
ORDER BY id;

How do I take out data from an event for multiple parameters with value of one parameter being the same in the event

Take for example,
event_dim.name = "Start_Level"
event_dim.params.key = "Chapter_Name"
event_dim.params.value.string_value = "chapter_1" (or "chapter_2" or "chapter_3" and so on)
event_dim.params.key = "Level"
event_dim.params.value.int_value = 1 or 2 or 3 or 4 and so on
event_dim.params.key = "Opening_Balance"
event_dim.params.value = 1000 or 1200 or 300 or so on
How do I take out the data if I want to:
- Look at unique users who've played "Level" only for event_dim.params.string_value = "chapter_1" (meaning for levels in Chapter 1)
- Look at the "Opening_Balance" per "Level" only the levels in the chapter where event_dim.params.key = "Chapter_Name" and event_dim.params.value.string_value = "chapter_2"
Currently, I am trying to do it as below to grab the data which I don't think is giving me proper data. I am trying to take out level data for users who've installed the game between a particular date (through first_open) and from a particular source.:
SELECT
COUNT(DISTINCT(app_instance)),
event_value.int_value
FROM (
SELECT
user_dim.app_info.app_instance_id AS app_instance,
event.name AS event,
(
SELECT
user_prop.value.value.int_value
FROM
UNNEST(user_dim.user_properties) AS user_prop
WHERE
user_prop.key = 'first_open_time') AS first_open,
params.key AS event_param,
params.value AS event_value
FROM
`app_package.app_events_*`,
UNNEST(event_dim) AS event,
UNNEST(event.params) AS params
WHERE
event.name = "start_level"
AND user_dim.traffic_source.user_acquired_source = "source"
AND params.key != 'firebase_event_origin'
AND params.key != 'firebase_screen_class'
AND params.key != 'firebase_screen_id' )
WHERE
event_param = "Level"
AND (first_open >= 1516579200000 AND first_open <= 1516924800000)
GROUP BY
event_value.int_value
However, I am not able to segregate events which are specific to when chapter_name = "chapter_1" in the event. (I don't know how to do it unfortunately and hence the question)
Update: (Some additional information added as requested by Mikhail)
Sample Input events would be as follows:
+-----------------+-------------+-----------------+--------------+-----------+
| app_instance_id | event_name | param_key | string_value | int_value |
+-----------------+-------------+-----------------+--------------+-----------+
| 100001 | start_level | chapter_name | chapter_1 | null |
| | | level | null | 1 |
| | | opening_balance | null | 2000 |
| | start_level | chapter_name | chapter_1 | null |
| | | level | null | 2 |
| | | opening_balance | null | 2500 |
| | start_level | chapter_name | chapter_1 | null |
| | | level | null | 2 |
| | | opening_balance | null | 2750 |
| | start_level | chapter_name | chapter_1 | null |
| | | level | null | 3 |
| | | opening_balance | null | 3000 |
| | start_level | chapter_name | chapter_2 | null |
| | | level | null | 1 |
| | | opening_balance | null | 3100 |
| | start_level | chapter_name | chapter_2 | null |
| | | level | null | 2 |
| | | opening_balance | null | 3500 |
| | start_level | chapter_name | chapter_2 | null |
| | | level | null | 3 |
| | | opening_balance | null | 3800 |
| 100002 | start_level | chapter_name | chapter_1 | null |
| | | level | null | 1 |
| | | opening_balance | null | 2000 |
| | start_level | chapter_name | chapter_1 | null |
| | | level | null | 2 |
| | | opening_balance | null | 2250 |
| | start_level | chapter_name | chapter_1 | null |
| | | level | null | 2 |
| | | opening_balance | null | 2400 |
| | start_level | chapter_name | chapter_1 | null |
| | | level | null | 3 |
| | | opening_balance | null | 2800 |
| | start_level | chapter_name | chapter_2 | null |
| | | level | null | 1 |
| | | opening_balance | null | 3000 |
| | start_level | chapter_name | chapter_2 | null |
| | | level | null | 2 |
| | | opening_balance | null | 3200 |
+-----------------+-------------+-----------------+--------------+-----------+
Output required is as follows:
+-----------+-------+--------------+-------------------+---------------+
| Chapter | Level | Unique Users | Total Level Start | Avg. Open Bal |
+-----------+-------+--------------+-------------------+---------------+
| chapter_1 | 1 | 2 | 2 | 2000 |
| chapter_1 | 2 | 2 | 3 | 2383 |
| chapter_1 | 3 | 2 | 3 | 2850 |
| chapter_2 | 1 | 2 | 2 | 3050 |
| chapter_2 | 2 | 2 | 2 | 3350 |
| chapter_2 | 3 | 1 | 1 | 3800 |
+-----------+-------+--------------+-------------------+---------------+
For anyone who is looking for an answer to this question, you can try the below standard sql query:
SELECT
chapter,
level,
count(distinct id) as Unique_Users,
count(id) as Level_start,
avg(opening_balance) as Avg_Open_Bal,
FROM(
SELECT
user_dim.app_info.app_instance_id AS id,
event.date,
event.name,
(SELECT value.string_value FROM UNNEST(event.params) WHERE key = "chapter_name") AS chapter,
(SELECT value.int_value FROM UNNEST(event.params) WHERE key = "level") AS level,
(SELECT value.int_value FROM UNNEST(event.params) WHERE key = "opening_coin_balance") AS open_bal
FROM
`<table_name>`,
UNNEST(event_dim) AS event
WHERE
event.name = "start_level"
)
GROUP BY
chapter,
level

Query performance - 'Left join is null' vs 'Not exists select'

I have a question about a query that I want to execute, but I dont know what is the best qua performance. I need to get all the words exclude the words that have a relation with the table wordfilter.
The output of the queries is right, but maybe there is a better solution for this. I have almost none knowledge about query plans, I'm trying to understand it now.
SELECT CONCAT(SPACE(1), UCASE(stocknews.word.word), SPACE(1)) AS word, stocknews.word.language
FROM stocknews.word
WHERE NOT EXISTS (SELECT word_id FROM stocknews.wordfilter WHERE stocknews.word.id = word_id)
AND user_id = 1
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | word | ref | user_id | user_id | 4 | const | 843 | Using where |
| 2 | MATERIALIZED | wordfilter | index | PRIMARY | PRIMARY | 756 | | 16 | Using index |
+----+--------------+------------+-------+---------------+---------+---------+-------+------+-------------+
Against
SELECT CONCAT(SPACE(1), UCASE(stocknews.word.word), SPACE(1)) AS word, stocknews.word.language
FROM stocknews.word
LEFT JOIN stocknews.wordfilter ON stocknews.word.id = stocknews.wordfilter.word_id
WHERE stocknews.wordfilter.word_id IS NULL AND user_id = 1
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+
| 1 | SIMPLE | word | ref | user_id | user_id | 4 | const | 843 | |
| 1 | SIMPLE | wordfilter | ref | PRIMARY | PRIMARY | 4 | word.id | 1 | Using where; Using index; Not exists |
+----+-------------+------------+------+---------------+---------+---------+---------+------+--------------------------------------+
Any help is welcome! An explanation would be nice.
Edit:
For query 1:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 0 |
| Handler_icp_attempts | 0 |
| Handler_icp_match | 0 |
| Handler_mrr_init | 0 |
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 1044 |
| Handler_read_last | 0 |
| Handler_read_next | 859 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_tmp_update | 0 |
| Handler_tmp_write | 215 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
25 rows in set (0.00 sec)
For query 2:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 0 |
| Handler_icp_attempts | 0 |
| Handler_icp_match | 0 |
| Handler_mrr_init | 0 |
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 844 |
| Handler_read_last | 0 |
| Handler_read_next | 843 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_tmp_update | 0 |
| Handler_tmp_write | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
It seems to be a close race between the two formulations. (Some other example may show a clearer winner.)
From the HANDLER values: Query 1 did more read_keys, and some writing (which goes along with MATERIALIZED). The other numbers were about same. So, I conclude that Query 1 is slower -- although possibly not enough slower to make much difference.
I vote for LEFT JOIN as the better query pattern (in this case)

Resources