how to solve problem running code for MySQL8 on MySQL 5.7? - window-functions

I have the following data:
+---------+--------+----------+------+-------+--------+-----------+
| xType | xAccID | xAccName | xCat | xYear | xMonth | xRaseed |
+---------+--------+----------+------+-------+--------+-----------+
| Amounts | 52 | Acc1 | Rs | 2020 | 11 | 3144.83 |
| Amounts | 52 | Acc1 | Rs | 2020 | 12 | -15199.64 |
| Amounts | 53 | Acc2 | Cus | 2020 | 12 | 5306.04 |
| Amounts | 53 | Acc2 | Cus | 2020 | 11 | 1090.64 |
+---------+--------+----------+------+-------+--------+-----------+
actually, I want to sum the (xRaseed) in the current row with the (xRaseed) in the previous row For each (xAccID) separately
the result that I want:
+---------+--------+----------+------+-------+--------+--------------------------------+
| xType | xAccID | xAccName | xCat | xYear | xMonth | xRaseed |
+---------+--------+----------+------+-------+--------+--------------------------------+
| Amounts | 52 | Acc1 | Rs | 2020 | 11 | 3144.83 |
| Amounts | 52 | Acc1 | Rs | 2020 | 12 | Not -15199.64 But (-12,054.81) |
| Amounts | 53 | Acc2 | Cus | 2020 | 12 | 5306.04 |
| Amounts | 53 | Acc2 | Cus | 2020 | 11 | Not 1090.64 But (6,396.68) |
+---------+--------+----------+------+-------+--------+--------------------------------+
I applied the following solution that I got from somebody here:
select t.*,
sum(xRaseed) over (partition by xAccID order by xYear, xMonth) as running_xRaseed
from t;
but everything was working in the local server but when I applied the solution on my hosting, didn't work?? in the local I use (xampp - 10.4.17-MariaDB), and in my hosting, I use (MySQL 5.7.23-23), what's the problem, please?
Here is a db<>fiddle

On versions of MySQL earlier than 8+, we can use a correlated subquery to find the rolling sum:
SELECT xType, xAccID, xAccName, xCat, xYear, xMonth,
(SELECT SUM(t2.xRaseed) FROM yourTable t2
WHERE t2.xAccID = t1.xAccID AND
(t2.xYear < t1.xYear OR
t2.xYear = t1.xYear AND t2.xMonth <= t1.xMonth)) AS xRaseed
FROM yourTable t1
ORDER BY
xAccId,
xYear,
xMonth;

Related

change column type and convert the existing values from string to integer in mariadb

I have a table name employees
MariaDB [company]> describe employees;
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| employee_id | char(10) | NO | | NULL | |
| first_name | varchar(20) | NO | | NULL | |
| last_name | varchar(20) | NO | | NULL | |
| email | varchar(60) | NO | | NULL | |
| phone_number | char(14) | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| job_id | int(11) | NO | | NULL | |
| salary | varchar(30) | NO | | NULL | |
| commission_pct | char(10) | NO | | NULL | |
| manager_id | char(10) | NO | | NULL | |
| department_id | char(10) | NO | | NULL | |
+----------------+-------------+------+-----+---------+-------+
MariaDB [company]> select * from employees;
+-------------+-------------+-------------+--------------------+---------------+------------+--------+----------+----------------+------------+---------------+
| employee_id | first_name | last_name | email | phone_number | hire_date | job_id | salary | commission_pct | manager_id | department_id |
+-------------+-------------+-------------+--------------------+---------------+------------+--------+----------+----------------+------------+---------------+
| 100 | Steven | King | sking#gmail.com | 515.123.4567 | 2003-06-17 | 1 | 24000.00 | 0.00 | 0 | 90 |
| 101 | Neena | Kochhar | nkochhar#gmail.com | 515.123.4568 | 2005-09-21 | 2 | 17000.00 | 0.00 | 100 | 90 |
| 102 | Lex | Wow | Lwow#gmail.com | 515.123.4569 | 2001-01-13 | 2 | 17000.00 | 0.00 | 100 | 9 |
| 103 | Alexander | Hunold | ahunold#gmail.com | 590.423.4567 | 2006-01-03 | 3 | 9000.00 | 0.00 | 102 | 60 |
| 104 | Bruce | Ernst | bernst#gmail.com | 590.423.4568 | 2007-05-21 | 3 | 6000.00 | 0.00 | 103 | 60 |
| 105 | David | Austin | daustin#gmail.com | 590.423.4569 | 2005-06-25 | 3 | 4800.00 | 0.00 | 103 | 60 |
| 106 | Valli | Pataballa | vpatabal#gmail.com | 590.423.4560 | 2006-02-05 | 3 | 4800.00 | 0.00 | 103 | 60 |
| 107 | Diana | Lorentz | dlorentz#gmail.com | 590.423.5567 | 2007-02-07 | 3 | 4200.00 | 0.00 | 103 | 60 |
| 108 | Nancy | Greenberg | ngreenbe#gmail.com | 515.124.4569 | 2002-08-17 | 4 | 12008.00 | 0.00 | 101 | 100 |
| 109 | Daniel | Faviet | dfaviet#gmail.com | 515.124.4169 | 2002-08-16 | 5 | 9000.00 | 0.00 | 108 | 100 |
| 110 | John | Chen | jchen#gmail.com | 515.124.4269 | 2005-09-28 | 5 | 8200.00 | 0.00 | 108 | 100 |
| 111 | Ismael | Sciarra | isciarra#gmail.com | 515.124.4369 | 2005-09-30 | 5 | 7700.00 | 0.00 | 108 | 100 |
| 112 | Jose | Urman | jurman#gmail.com | 515.124.4469 | 2006-03-07 | 5 | 7800.00 | 0.00 | 108 | 100 |
| 113 | Luis | Popp | lpopp#gmail.com | 515.124.4567 | 2007-12-07 | 5 | 6900.00 | 0.00 | 108 | 100 |
| 114 | Den | Raphaely | drapheal#gmail.com | 515.127.4561 | 2002-12-07 | 6 | 11000.00 | 0.00 | 100 | 30 |
| 115 | Alexander | Khoo | akhoo#gmail.com | 515.127.4562 | 2003-05-18 | 7 | 3100.00 | 0.00 | 114 | 30 |
+-------------+-------------+-------------+--------------------+---------------+------------+--------+----------+----------------+------------+---------------+
I wanted to change the salary column from string to integer. So, I ran this command
MariaDB [company]> alter table employees modify column salary int;
ERROR 1292 (22007): Truncated incorrect INTEGER value: '24000.00'
As you can see it gave me truncation error. I found some previous questions where they showed how to use convert() and trim() but those actually didn't answer my question.
sql code and data can be found here https://0x0.st/oYoB.com_5zfu
I tested this on MySQL and it worked fine. So it is apparently an issue only with MariaDB.
The problem is that a string like '24000.00' is not an integer. Integers don't have a decimal place. So in strict mode, the implicit type conversion fails.
I was able to work around this by running this update first:
update employees set salary = round(salary);
The column is still a string, but '24000.00' has been changed to '24000' (with no decimal point character or following digits).
Then you can alter the data type, and implicit type conversion to integer works:
alter table employees modify column salary int;
See demonstration using MariaDB 10.6:
https://dbfiddle.uk/V6LrEMKt
P.S.: You misspelled the column name "commission_pct" as "comission_pct" in your sample DDL, and I had to edit that to test. In the future, please use one of the db fiddle sites to share samples, because they will test your code.

How to merge records based on consective fields in Teradata

I have a source like below table:
+---------+--+--------+--+---------+--+--+------+
| ID | | SEQ_NO | | UNIT_ID | | | D_ID |
+---------+--+--------+--+---------+--+--+------+
| 7979092 | | 1 | | 99 | | | 759 |
| 7979092 | | 2 | | -1 | | | 869 |
| 7979092 | | 3 | | -1 | | | 927 |
| 7979092 | | 4 | | -1 | | | 812 |
| 7979092 | | 5 | | 99 | | | 900 |
| 7979092 | | 6 | | 99 | | | 891 |
| 7979092 | | 7 | | -1 | | | 785 |
| 7979092 | | 8 | | -1 | | | 762 |
| 7979092 | | 9 | | -1 | | | 923 |
+---------+--+--------+--+---------+--+--+------+
I have to merge the rows when consecutive unit_id has same value. We should take max(D_id) when we consolidate the rows. Expected output is:
+---------+---------+------+
| ID | UNIT_ID | D_ID |
+---------+---------+------+
| 7979092 | 99 | 759 |
| 7979092 | -1 | 927 |
| 7979092 | 99 | 900 |
| 7979092 | -1 | 923 |
+---------+---------+------+
I have tried to find the solution using Teradata ordered analytical function, but did not find the solution. I use Teradata 16.
Thank You.
This logic is a bit quirky, it's based on two sequences created by different sort orders:
SELECT
ID
,UNIT_ID
,Max(D_ID)
FROM
(
SELECT
ID
,SEQ_NO
,UNIT_ID
,D_ID
-- assign the same value to consecutive UNIT_IDs
,SEQ_NO -
Row_Number()
Over(PARTITION BY ID, UNIT_ID
ORDER BY SEQ_NO) AS grp
FROM tab
) AS dt
GROUP BY 1,2,grp
You can use RESET WHEN to dynamically create groups within the window. Here's one way to do it:
select ID, UNIT_ID,
max(D_ID) over(
partition by ID order by SEQ_NO
reset when UNIT_ID <> UNIT_ID_prev -- Create new group for new value
) as D_ID
from (
select ID, SEQ_NO, UNIT_ID, D_ID,
lag(UNIT_ID) over(partition by ID order by SEQ_NO) as UNIT_ID_prev -- Previous value
from MY_TABLE
) src
qualify row_number() over(
partition by ID order by SEQ_NO
reset when UNIT_ID <> UNIT_ID_prev -- Match original max() window
) = 1 -- One row per group (similar to DISTINCT)

why does the frequency of my Gnocchi measurements not match the set granularity

Im running openstack and am trying to get my gnocchi meters to come through more frequently so that I can run a scaling demo without lots of 5 minute lags. In Gnocchi I have changed the Archive-policy to be a custom policy with granularity set to 30 seconds (I've also tried the following using the existing 'medium' policy and it has the same result)
+---------------------+--------------------------------------------------------+
| Field | Value |
+---------------------+--------------------------------------------------------+
| aggregation_methods | std, count, min, max, sum, mean |
| back_window | 0 |
| definition | - points: 120, granularity: 0:00:30, timespan: 1:00:00 |
| name | test |
+---------------------+--------------------------------------------------------+
the cpu_util meter is picking it up correclty
+------------------------------------+-------------------------------------------------------------------+
| Field | Value |
+------------------------------------+-------------------------------------------------------------------+
| archive_policy/aggregation_methods | std, count, min, max, sum, mean |
| archive_policy/back_window | 0 |
| archive_policy/definition | - points: 120, granularity: 0:00:30, timespan: 1:00:00 |
| archive_policy/name | test |
| created_by_project_id | e499d0c2e0fb4a05ac39c3f8c260052b |
| created_by_user_id | 21759a51f3834b9bbae49c3ed17a13e4 |
| creator | 21759a51f3834b9bbae49c3ed17a13e4:e499d0c2e0fb4a05ac39c3f8c260052b |
| id | e5a02f3a-9fbe-4e44-bb91-e1cfe6b86143 |
| name | cpu_util |
| resource/created_by_project_id | e499d0c2e0fb4a05ac39c3f8c260052b |
| resource/created_by_user_id | 21759a51f3834b9bbae49c3ed17a13e4 |
| resource/creator | 21759a51f3834b9bbae49c3ed17a13e4:e499d0c2e0fb4a05ac39c3f8c260052b |
| resource/ended_at | None |
| resource/id | 243b9715-95ba-4532-9728-3e61776e1c29 |
| resource/original_resource_id | 243b9715-95ba-4532-9728-3e61776e1c29 |
| resource/project_id | 43a7db62d5d54c4590e363868fff49e2 |
| resource/revision_end | None |
| resource/revision_start | 2018-08-08T14:05:09.770765+00:00 |
| resource/started_at | 2018-08-08T13:20:45.948842+00:00 |
| resource/type | instance |
| resource/user_id | 4e5015006b304e7ca57edc5419b42be3 |
| unit | % |
+------------------------------------+-------------------------------------------------------------------+
but the measurements are still only coming out every 5 min
gnocchi measures show e5a02f3a-9fbe-4e44-bb91-e1cfe6b86143
+---------------------------+-------------+--------------+
| timestamp | granularity | value |
+---------------------------+-------------+--------------+
| 2018-08-08T13:30:00+00:00 | 30.0 | 0.0400002375 |
| 2018-08-08T13:35:00+00:00 | 30.0 | 0.0366666763 |
| 2018-08-08T13:40:00+00:00 | 30.0 | 0.0366667101 |
| 2018-08-08T13:45:00+00:00 | 30.0 | 0.0399999545 |
| 2018-08-08T13:50:00+00:00 | 30.0 | 0.0366664861 |
| 2018-08-08T13:55:00+00:00 | 30.0 | 0.0400000543 |
| 2018-08-08T14:00:00+00:00 | 30.0 | 0.0366665877 |
+---------------------------+-------------+--------------+
any ideas what I am missing?
I had the same issue. In Gnocchi-backed Ceilometer there is a new configuration file: polling.yaml. Resources polling interval is being set there.
https://review.opendev.org/#/c/405682/
https://docs.openstack.org/ceilometer/pike/admin/telemetry-best-practices.html

change column value only if two other columns are duplicates

I am having a hard time to figure this out in R.
This is what I would like to do.
In a data frame like below, I would like to do if Name and Class duplicates add two row's score and if not, leave it as it is.
+------------------+-----------+-------+
| Name | Class | Score |
+------------------+-----------+-------+
| Sara | Sophomore | 10 |
| John | Freshman | 20 |
| Taylor | Sophomore | 30 |
| Tyler | Junior | 10 |
| Keith | Junior | 20 |
| Andrew | Senior | 30 |
| Victor | Senior | 10 |
| Nancy |Sophomore | 20 |
| Taylor | Junior | 30 |
| John | Senior | 10 |
| Victor | Freshman | 20 |
| Sara | Sophomore | 30 |
| John | Freshman | 10 |
| Taylor | Sophomore | 20 |
| John | Senior | 30 |
+------------------+-----------+-------+
So basically, the end result should look like:
+--------+-----------+-------+--+--+--+--+
| Name | Class | Score | | | | |
+--------+-----------+-------+--+--+--+--+
| Sara | Sophomore | 40 | | | | |
| John | Freshman | 30 | | | | |
| Taylor | Sophomore | 50 | | | | |
| Tyler | Junior | 10 | | | | |
| Keith | Junior | 20 | | | | |
| Andrew | Senior | 30 | | | | |
| Victor | Senior | 10 | | | | |
| Nancy | Sophomore | 20 | | | | |
| Taylor | Junior | 30 | | | | |
| John | Senior | 40 | | | | |
| Victor | Freshman | 20 | | | | |
+--------+-----------+-------+--+--+--+--+
As you see if name is the only duplicated value, it does not change (Example of John Freshman and John Senior). If class is the only duplicated value, it does not change either... Two columns in a row have to be duplicated to make the change.
My try is as below, but it is not working and am getting error message
'Error in if ((experiment[i, 1] == experiment[j, 1]) & (experiment[i, 2] == : missing value where TRUE/FALSE needed'
My code:
# creating an empty data frame
experiment1<-data.frame(matrix(ncol=3, nrow=15))
for(i in 1: nrow(experiment)){
for(j in i+1: nrow(experiment)){
if((experiment[i,1] == experiment[j,1]) & (experiment[i,2] == experiment[j,2])){
experiment1[i,1] <- experiment[i,1]
experiment1[i,2] <- experiment[i,2]
experiment1[i,3] <- experiment[i,3] + experiment[j,3]}
else{
experiment1[i,1] <- experiment[i,1]
experiment1[i,2] <- experiment[i,2]
experiment1[i,3] <- experiment[i,3]}}}
Could anyone help fixing my code or figuring out "nobler" code?
Aggregation is like the first argument explained in any basic R tutorial, I suggest you go and follow some.
base R
aggregate(formula = Score ~ Name + Class, data = mydf, FUN = sum)
dplyr
mydf %>% group_by(Name, Class) %>% summarize(scoreSum = sum(Score))
data.table
setDT(mydf)[ , .(scoreSum = sum(number)), by = .(Name, Class)]

Oracle 11g r2: strange behavior on index

I have the query:
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
--TBELENCO.RAGIONE_SOCIALE,
--ROW_NUMBER() OVER (ORDER BY TBELENCO.DATA_PROC DESC) R
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
--,C.RAGIONE_SOCIALE
--ROW_NUMBER() OVER (ORDER BY LOG.DATA_PROC DESC) R
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
-- LEFT JOIN(
-- SELECT CUST.RAGIONE_SOCIALE,STR.POD,RSC.DATA_DA, RSC.DATA_A
-- FROM
-- MS038_METERS STR JOIN MS036_REL_SITES_CUSTOMERS RSC ON
-- STR.ID_SITO=RSC.ID_SITO
-- JOIN MS030_CUSTOMERS CUST ON
-- CUST.ID_CLIENTE=RSC.ID_CLIENTE
-- ) C ON
-- C.POD=LOG.POD
--AND LOG.DATA_PROC BETWEEN C.DATA_DA AND C.DATA_A
WHERE
1=1
--AND LOG.DATA_PROC>=TRUNC(SYSDATE)
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3
--TO_DATE('01/11/2014', 'DD/MM/YYYY')
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
If I execute the query with AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3, Oracle uses the index on the data_proc field of the MS042_LOADING_LOGS (LOG) table, if I use, instead, AND LOG.DATA_PROC>=TRUNC(SYSDATE)-4 or -5, or -6, etc, it uses a table access full. Why this behavior?
I also execute a :
ALTER INDEX MS042_DATA_PROC_IDX REBUILD;
but with no changes.
Thank,
Igor
--***********************************************************
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
WHERE
1=1
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-1
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
Plan hash value: 2191058229
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 30866 (2)| 00:06:11 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 94236 | 1196K| 30866 (2)| 00:06:11 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 94236 | 1104K| 30866 (2)| 00:06:11 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MS042_LOADING_LOGS | 94236 | 920K| 30864 (2)| 00:06:11 |
|* 7 | INDEX RANGE SCAN | MS042_DATA_PROC_IDX | 94236 | | 25742 (2)| 00:05:09 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
7 - access(SYS_OP_DESCEND("DATA_PROC")<=SYS_OP_DESCEND(TRUNC(SYSDATE#!)-1))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("DATA_PROC"))>=TRUNC(SYSDATE#!)-1)
Plan hash value: 69930686
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 95921 (1)| 00:19:12 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 1467K| 18M| 95921 (1)| 00:19:12 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 1467K| 16M| 95921 (1)| 00:19:12 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| MS042_LOADING_LOGS | 1467K| 13M| 95912 (1)| 00:19:11 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
6 - filter("LOG"."DATA_PROC">=TRUNC(SYSDATE#!)-4)
The larger the fraction of rows that will be returned, the more efficient a table scan is and the less efficient it is to use an index. Apparently, Oracle expects that inflection point to come when the query returns more than 3 days of data. If that is inaccurate, I would expect that the statistics on your table or indexes are inaccurate.

Resources