Problem Using ROW_NUMBER() function in MariaDB - mariadb

I would like to have a row number column in a select table output, but when I try using the ROW_NUMBER() function MariaDB throws a syntax error. There are several references on the web (http://www.mysqltutorial.org/mysql-window-functions/mysql-row_number-function/ ) but so far I have not been successful. Here is a segment of my MariaDB table:
+---------------------+------------+
| date_reading | temp_patio |
|---------------------+------------+
| 2019-09-03 06:26:00 | 17.6 |
| 2019-09-03 06:33:00 | 17.5 |
| 2019-09-03 06:40:00 | 17.5 |
| 2019-09-03 06:46:00 | 17.5 |
| 2019-09-03 06:53:00 | 17.4 |
| 2019-09-03 07:00:00 | 17.4 |
| 2019-09-03 07:07:00 | 17.4 |
| 2019-09-03 07:13:00 | 17.4 |
The document says that the options for the "OVER ()" option are optional, but I have tried both with and without an OVER () clause and with and without an ORDER BY clause.
Here is my select command:
select ROW_NUMBER() OVER ( ) as Therow, * from MyData where
Date_Reading > Now()- INTERVAL 3 HOUR;
Optionally I have tried without the OVER () clause and also using OVER ( ORDER BY ID).
My MariaDB version is
Server version: 10.1.38-MariaDB-0+deb9u1 Raspbian 9.0
Can someone assist?...RDK

Window functions are supported in MariaDB 10.2 or higher version only.
MariaDB 10.2 or higher:
SELECT
MyData.*,
ROW_NUMBER() OVER ( ORDER BY ID ) as Therow
FROM MyData
WHERE Date_Reading > Now()- INTERVAL 3 HOUR;
For lower version:
We can use the MySQL variable to do this job.
SELECT
MyData.*,
#row_num:= #row_num + 1 AS Therow
FROM
MyData,
(SELECT #row_num:= 0 AS num) AS c
WHERE Date_Reading > Now()- INTERVAL 3 HOUR
ORDER BY test.`date` ASC;

Related

How to use MERGE keyword in pl/sql?

I am updating a table, but I keep getting follwing error
ERROR: syntax error at or near "MERGE"
LINE 3: MERGE into
when i try to use a merge statement. I don't see anything obvious wrong with the syntax. can someone point out the obivous
MERGE into Table2 t2
using (select name, max(id) max_id from Table1 t1 group by name ) t1
on (t2.project_name=t1.name)
when matched then update set projectid=max_id where status='ongoing' ;
Table1
1 | alpha | 2021 |
2 | groundwork | 2020 |
3 | NETOS | 2021 |
5 | WebOPD | 2019 |
Table2
id | name | year | status | project name | projectID
1 | john | 2021 | ongoing | alpha | 1
2 | linda | 2021 | completed | NETOS | 3
3 | pat | 2021 | WebOPD | completed | 5
4 | tom | 2021 | ongoing | alpha | 1
version : PostgreSQL 13.6
The last line in your message says you use PostgreSQL. Tag you used (plsql) means Oracle. Which one is it, after all? I presume former, but - syntax you used is Oracle.
MERGE documentation for PostgreSQL says that
INTO can't be used
no parenthesis for ON clause
WHERE clause can't be used
See if something like this helps:
MERGE Table2 t2
using (select t1.name,
max(t1.id) max_id
from Table1 t1 join table2 t2 on t2.project_name = t1.name
where t2.status = 'ongoing'
group by name
) x
on t2.project_name = x.name
when matched then update set
t2.projectid = x.max_id ;

SQLite time duration calculation from rows

I want to calculate duration between rows with datetime data in SQLite.
Let's consider this for the base data (named intervals):
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | Off |
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
I'd like to calculate the duration for both On and Off state:
| Total On | 26mins |
| Total Off | 48mins |
Then I wrote this query:
SELECT
"Total " || interval_start.state AS state,
(SUM(strftime('%s', interval_end.date)-strftime('%s', interval_start.date)) / 60) || "mins" AS duration
FROM
intervals interval_start
INNER JOIN
intervals interval_end ON interval_end.id =
(
SELECT id FROM intervals WHERE
id > interval_start.id AND
state = CASE WHEN interval_start.state = 'On' THEN 'Off' ELSE 'On' END
ORDER BY id
LIMIT 1
)
GROUP BY
interval_start.state
However if the base data is a not in strict order:
| id | date | state |
| 1 | 2020-07-04 10:11 | On |
| 2 | 2020-07-04 10:22 | On | !!!
| 3 | 2020-07-04 11:10 | On |
| 4 | 2020-07-04 11:25 | Off |
My query will calculate wrong, as it will pair the only Off date with each On dates and sum them together.
Desired behavior should result something like this:
| Total On | 74mins |
| Total Off | 0mins | --this line can be omitted, or can be N/A
I have two questions:
How can I rewrite the query to handle these wrong data situations?
I feel my query is not the best in terms of performance, is it possible to improve it?
Use a CTE where you return only the starting rows of each state and then aggregate:
with cte as (
select *, lead(id) over (order by date) next_id
from (
select *, lag(state) over (order by date) prev_state
from intervals
)
where state <> coalesce(prev_state, '')
)
select c1.state,
sum(strftime('%s', c2.date) - strftime('%s', c1.date)) / 60 || 'mins' duration
from cte c1 inner join cte c2
on c2.id = c1.next_id
group by c1.state
See the demos: 1 and 2

Top 3 of each group by time of day - SQL Lite

I'm currently learning how to use SQL Lite, and would like to sort the top 3 most popular pickup locations by hour. I have millions of rows of data with columns of interest being lpep_pickup_datetime (Pickup time) and POLocationID (Pickup location).
I'd like to the top 3 most popular pickup locations by hour.
Here is a sample of the data:
+----------------------+--------------+-----------------+
| lpep_pickup_datetime | PULocationID | passenger_count |
+----------------------+--------------+-----------------+
| 1/1/2017 0:01 | 42 | 1 |
| 1/1/2017 0:03 | 75 | 1 |
| 1/1/2017 0:04 | 82 | 5 |
| 1/1/2017 0:01 | 255 | 1 |
| 1/1/2017 0:00 | 166 | 1 |
| 1/1/2017 0:00 | 179 | 1 |
| 1/1/2017 0:02 | 74 | 1 |
| 1/1/2017 0:15 | 112 | 1 |
| 1/1/2017 0:06 | 36 | 1 |
| 1/1/2017 0:14 | 127 | 5 |
| 1/1/2017 0:01 | 41 | 1 |
| 1/1/2017 0:31 | 97 | 1 |
| 1/1/2017 0:01 | 255 | 5 |
| 1/1/2017 0:00 | 70 | 1 |
| 1/1/2017 0:03 | 255 | 1 |
| 1/1/2017 0:03 | 82 | 1 |
| 1/1/2017 0:00 | 36 | 1 |
| 1/1/2017 0:01 | 7 | 1 |
+----------------------+--------------+-----------------+
Trying this on SQLLiteStudio 3.2.1 - might I just need to use a full MySQL suite in order to be able to use the proper functions?
SELECT
PULocationID, count(PULocationID)
FROM GreenCabs2017
GROUP BY PULocationID
ORDER BY count(PULocationID) DESC
LIMIT 3
The query I've tried only returns top 3 pickup locations across the entire dataset and not by hour of day - how would I be able to group by hour of day? Other solutions on StackExchange reference date_time and date_format functions that won't execute when I try them on SQL Lite - what's a query that would work on SQL Lite?
Ideally would have something like the below:
+-------------+--------------+-----------------+
| Time of Day | PULocationID | PULocationCount |
+-------------+--------------+-----------------+
| 0:00 | 74 | 677 |
| 0:00 | 65 | 333 |
| 0:00 | 55 | 220 |
+-------------+--------------+-----------------+
This would be the output for top 3 pickup locations from midnight to 1:00 AM. This time range would have to apply across all the dates, i.e. 1/1 to 1/31 and not just 1/1 like the sample I provided.
UPDATE:
Changed the format of the timestamps to be YYYY-MM-DD HH:MM:SS format, so I can use the datetime functions now.
Was able to run a query which I think may bring me much closer to what I'm looking for:
SELECT lpep_pickup_datetime, PULocationID, count(PULocationID)
FROM GreenCabs2017
WHERE STRFTIME('%Y', lpep_pickup_datetime) = '2017' AND
STRFTIME('%H', lpep_pickup_datetime) <= '01' AND
STRFTIME('%H', lpep_pickup_datetime) >= '00'
GROUP BY PULocationID
ORDER BY count(PULocationID) DESC
LIMIT 3
That gave an output of
+----------------------+--------------+---------------------+
| lpep_pickup_datetime | PULocationID | count(PULocationID) |
+----------------------+--------------+---------------------+
| 1/31/2017 1:13 | 255 | 7845 |
| 1/31/2017 1:04 | 7 | 4596 |
| 1/31/2017 1:07 | 82 | 3892 |
+----------------------+--------------+---------------------+
But the lpep_pickup_datetime column still indicates that this would be in between 1:00 AM and 2:00 AM and not 12:00 AM and 1:00 AM? Removing the "=" sign in the query results in no results being returned. And I would prefer to not do this for every hour in the day - would there be a way to have an output by hour through one query?
The timestamp string format your data is using, m/d/YYYY H:MM, is not very good. It can't be used by sqlite date and time functions, can't be meaningfully ordered for sorting, and in general is very hard to work with in sqlite. Remember, sqlite does not have dedicated date or time types, just strings or numbers, so the format you're using has to obey the rules of those types. So your first step is to, by whatever means, fix those timestamps. The following assumes you changed them to YYYY-mm-dd HH:MM strings like 2017-01-01 00:01, or another compatible format. It also assumes you're using a fairly recent sqlite release, as it uses window functions which were added in 3.25.
(Edit: You appear to be using NYC taxi data from here, which has timestamps in a good format already, and is suitable for easy importing into sqlite. That makes it trivial to fix.)
Given all that, this query:
WITH ranked AS
(SELECT hour, PULocationID, pickups
, row_number() OVER (PARTITION BY hour ORDER BY pickups DESC) AS rn
FROM (SELECT strftime('%H:00', lpep_pickup_datetime) AS hour
, PULocationID
, count(*) AS pickups
FROM GreenCabs2017
GROUP BY strftime('%H:00', lpep_pickup_datetime), PULocationID))
SELECT * FROM ranked
WHERE rn <= 3
ORDER BY hour, rn
will give, for NYC Green Cab data for January 2017
hour PULocationID pickups rn
---------- ------------ ---------- ----------
00:00 255 4224 1
00:00 7 2518 2
00:00 82 2135 3
01:00 255 3621 1
01:00 7 2078 2
01:00 256 1870 3
02:00 255 3261 1
02:00 256 1798 2
02:00 7 1676 3
03:00 255 2854 1
03:00 256 1589 2
03:00 7 1475 3
and so on.
Basically, it counts the number of times each location appears in each hour, and for each hour, assigns each location a row number based on sorting by that number. Then only the first three rows of each hour are returned in the final outer select. You can also use rank() or dense_rank() instead of row_number(), which will potentially return more than 3 rows per hour in case of ties but also more accurately reflect the most popular locations in those cases.
(This query benefits a lot from having an index on the group by expression:
CREATE INDEX greencabs2017_idx_hour_loc ON GreenCabs2017(strftime('%H:00', lpep_pickup_datetime), PULocationID);
)
Test table created from the sqlite3 shell via:
sqlite> .mode csv
sqlite> .import '|curl -s https://s3.amazonaws.com/nyctlc/trip+data/green_tripdata_2017-01.csv | sed 2d' GreenCabs2017

SQLite pivot producing alternate NULLS

I'm having a problem with my SQLite pivot code, mainly taken from McPeppr's answer here: Pivot in SQLite
Creating my temp table:
WITH t1 AS (
SELECT band,
p.name,
status,
strftime('%Y-%m', time_start) AS Month,
AVG(time) AS Avg
FROM person p
JOIN action a ON p.person_id = a.person_id
JOIN log l ON p.log_id = l.log_id
WHERE p.person = 'Joe' AND opps = '2'
GROUP BY band, Month, status, strftime('%Y-%m', time_stamp_start)
ORDER BY Month, CASE status
WHEN 'one' THEN 0
WHEN 'two' THEN 1
WHEN 'three' THEN 2
WHEN 'four' THEN 3
END
),
t1 looks like:
band | name | status | month | AVG
------+--------+--------+-----------+---------------
1 | Joe | one | 2018-01 | 3.33
2 | Joe | one | 2018-01 | 4.11
1 | Joe | two | 2018-02 | 2.55
2 | Joe | two | 2018-02 | 3.45
..........
When I try pivot in a select I get:
Select band, Month,
case when status = 'one' then response_avg end as One,
case when status = 'two' then response_avg end as Two,
...,
from t1
This:
band | month | One | Two
------+------------+-------+---------
1 | 2018-01 | 3.41 | NULL
2 | 2018-01 | 3.55 | NULL
1 | 2018-01 | NULL | 2.55
2 | 2018-01 | NULL | 4.61
1 | 2018-02 | 1.55 | NULL
2 | 2018-02 | 2.43 | NULL
1 | 2018-02 | NULL | 4.33
2 | 2018-02 | NULL | 3.44
Whereas I want
band | month | One | Two
------+------------+-------+---------
1 | 2018-01 | 3.41 | 2.55
2 | 2018-01 | 3.55 | 4.61
1 | 2018-02 | 1.55 | 2.55
2 | 2018-02 | 2.43 | 4.61
I understand that the status column is causing this but can't figure out how to fix it.
I've tried a good few methods (multiple temp tables, sub-selects to remove the "status" due to default grouping) from different questions I found on here but keep ending up with the same result. Any help appreciated
The trick when you are using CASE/WHEN is to use aggregative functions like MAX and then group by all the non-aggragate columns :
SELECT
band,
Month,
MAX(CASE
when status = 'one' then response_avg
END) as One,
MAX(CASE
when status = 'two' then response_avg
END) as Two
FROM t1
GROUP BY band,
Month

Do Timestamp operations work on UNIX_TIMESTAMP or Visual Representation of Timestamp

I read this and As per MySQL documentation for Timestamp:
It can hold values starting at '1970-01-01 00:00:01' (UTC) to
'2038-01-19 05:14:07' (UTC) . This range is caused by MariaDB storing
the TIMESTAMP values as the number of seconds since '1970-01-01
00:00:00' (UTC).
so all the timestamp related operations are done in UNIX_TIMESTAMP as there is no timezone info stored in timestamp. That was my understanding.
My current time zone: IST (+05:30)
After midnight, when date changed in IST but not in UTC I did an insert operation. I thought if I do a DATE(now()) for IST it should show the yesterday's stored record as I thought UNIX_TIMESTAMP will be used for timestamp comparison which will remain same.
In below code block you can see the timezone offset as +05:30 which is for IST. Record against ID = 5 was inserted at 13th Feb,2017 00:04 pm. But in UTC_STAMP would have been somewhere 12th Feb,2017 18hr34min(5hr30min back). So date is not changed in UTC. I did a select statement for date = 13th Feb 2017, I thought I would I get (1,2,3,5) records as result because UTC representation of 13th Feb 2017,00:00 in IST still falls under UTC date of 12thFeb,2017 . But I got only record against ID = 5.
Q.1
In short I was thinking that value = 2017-02-13 will be converted to UNIX_TIMESTAMP(a numerical value) and then comparison occurs. What I am missing ? or else mention the steps taken by db to generate the below result ? I hope I was able explain myself.
Q.2 How does java.sql.Timestamp executes ? It works something like mentioned in code block or it first converts timestamp values to unix_timestamp and then do the conversion or is it database internal implementation compares long values of timestamp ?
MariaDB [test]>SELECT ##global.time_zone, ##session.time_zone;
+--------------------+---------------------+
| ##global.time_zone | ##session.time_zone |
+--------------------+---------------------+
| SYSTEM | +05:30 |
+--------------------+---------------------+
MariaDB [test]> desc t;
+-------+-----------+------+-----+-------------------+---------------- -------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+-------------------+-----------------------------+
| id | int(11) | YES | | NULL | |
| ts | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-------+-----------+------+-----+-------------------+--------------- --------------+
MariaDB [test]> select * from t;
+------+---------------------+
| id | ts |
+------+---------------------+
| 1 | 2017-02-12 22:10:35 |
| 2 | 2017-02-12 22:10:35 |
| 3 | 2017-02-12 22:13:06 |
| 4 | 2001-07-22 12:12:12 |
| 5 | 2017-02-13 00:04:01 |
+------+---------------------+
MariaDB [test]> select * from t where date(ts) = '2017-02-13';
+------+---------------------+
| id | ts |
+------+---------------------+
| 5 | 2017-02-13 00:04:01 |
+------+---------------------+
MariaDB [test]> set time_zone = '+00:00';
MariaDB [test]> SELECT ##global.time_zone, ##session.time_zone;
+--------------------+---------------------+
| ##global.time_zone | ##session.time_zone |
+--------------------+---------------------+
| SYSTEM | +00:00 |
+--------------------+---------------------+
MariaDB [test]> select * from t;
+------+---------------------+
| id | ts |
+------+---------------------+
| 1 | 2017-02-12 16:40:35 |
| 2 | 2017-02-12 16:40:35 |
| 3 | 2017-02-12 16:43:06 |
| 4 | 2001-07-22 06:42:12 |
| 5 | 2017-02-12 18:34:01 |
+------+---------------------+
MariaDB [test]> select * from t where date(ts) = '2017-02-12';
+------+---------------------+
| id | ts |
+------+---------------------+
| 1 | 2017-02-12 16:40:35 |
| 2 | 2017-02-12 16:40:35 |
| 3 | 2017-02-12 16:43:06 |
| 5 | 2017-02-12 18:34:01 |
+------+---------------------+
EDIT1: I tried using database server with UTC timezone and IST as application server. After midnight, when IST changed its date and UTC didn't- I repeated the insert and create operations as mentioned above.Below are the records and info:
MariaDB [test]> SELECT ##global.time_zone, ##session.time_zone;
+--------------------+---------------------+
| ##global.time_zone | ##session.time_zone |
+--------------------+---------------------+
| UTC | UTC |
+--------------------+---------------------+
1 row in set (0.30 sec)
MariaDB [test]> select * from t ;
+----+---------------------+
| id | ts |
+----+---------------------+
| 1 | 2017-02-13 19:22:15 |
| 2 | 2017-02-13 19:22:15 |
| 3 | 2017-02-13 19:21:40 |
| 4 | 2001-07-22 12:12:12 |
| 5 | 2017-02-14 00:56:13 |
+----+---------------------+
5 rows in set (0.40 sec)
MariaDB [test]> select UTC_TIMESTAMP;
+---------------------+
| UTC_TIMESTAMP |
+---------------------+
| 2017-02-13 19:21:22 |
+---------------------+
1 row in set (0.38 sec)
And used JDBC, to get the response:
SELECT * FROM t WHERE date(ts) = date(:currentDate);
where, currentDate = Timestamp.from(Instant.now()); from Java
Response was:
[
{
"id": 1,
"timestamp1": 1486993935000
},
{
"id": 2,
"timestamp1": 1486993935000
},
{
"id": 3,
"timestamp1": 1486993900000
}
]
why record(id=5) did not came ? Doesn't it mean that it looks for Visual Representation rather than extract date from UTC_TIMESTAMP numerical value if it would have done that it would have fetched record with id = 5.

Resources