How to query SQLite data hourly? - sqlite

The data table looks like the following:
ID DATE
1 2020-12-31 10:10:00
2 2020-12-31 20:30:00
3 2020-12-31 20:50:00
4 2021-01-02 17:10:00
5 2021-01-02 17:20:00
6 2021-01-02 17:30:00
7 2021-01-03 23:10:00
..
And I would like to query only the last entry per hour per day, and to have the resulte like:
ID DATE
1 2020-12-31 10:10:00
3 2020-12-31 20:50:00
6 2021-01-02 17:30:00
7 2021-01-03 23:10:00
..
I tried to look for hourly query and found the following
strftime('%H', " + DATE + ", '+1 hours')
However, not sure how to use it properly (e.g. with GROUP BY ? then how to ensure it takes the lastest entry of the hour), therefore, would be great to have some help here!

You can do it with ROW_NUMBER() window function:
SELECT ID, DATE
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY strftime('%Y%m%d%H', DATE) ORDER BY DATE DESC) rn
FROM tablename
)
WHERE rn = 1
ORDER BY ID
Instead of strftime('%Y%m%d%H', DATE) you could also use substr(DATE, 1, 13).
For versions of SQLite previous to 3.25.0 which do not support window functions you can do it with NOT EXISTS:
SELECT t1.*
FROM tablename t1
WHERE NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE strftime('%Y%m%d%H', t2.DATE) = strftime('%Y%m%d%H', t1.DATE)
AND t2.DATE > t1.DATE
)
See the demo.
Results:
> ID | DATE
> -: | :------------------
> 1 | 2020-12-31 10:10:00
> 3 | 2020-12-31 20:50:00
> 6 | 2021-01-02 17:30:00
> 7 | 2021-01-03 23:10:00

Related

Kusto Query Language - Round datetime to nearest month using bin

I have plenty of logs with its own timestamp, and I am trying to count the logs on a monthly basis.
Here is a sample table and query using bin(30d):
datatable(Date:datetime, Log:string)[
datetime(2018-02-02T15:14),"log1",
datetime(2018-03-23T12:14),"log2",
datetime(2018-03-24T16:14),"log3",
datetime(2019-04-26T15:14),"log4"]
| summarize count(Log) by bin(Date,30d)
The output I want:
Date count_Log
2018-02 00:00:00.0000000 1
2018-03 00:00:00.0000000 2
2019-04 00:00:00.0000000 1
The output I get:
Date count_Log
2018-01-17 00:00:00.0000000 1 //see the date, it shows JAN but the log is of Feb
2018-03-18 00:00:00.0000000 2
2019-04-12 00:00:00.0000000 1
I need the summary month wise, so How can I do it month wise? I accept the bin size as a parameter of different values like 1h, 1d, 7d, 10d, etc. There is no timespan of 1 month.
So How can I do it without having to extract month/year manually ?
you can use the startofmonth() function
for example:
datatable(Date: datetime, Log: string)
[
datetime(2018-02-02T15:14), "log1",
datetime(2018-03-23T12:14), "log2",
datetime(2018-03-24T16:14), "log3",
datetime(2019-04-26T15:14), "log4"
]
| summarize count() by startofmonth(Date)
Column1
count_
2018-02-01 00:00:00.0000000
1
2018-03-01 00:00:00.0000000
2
2019-04-01 00:00:00.0000000
1

SQL - How to append a period

I have an existing SQLite_table like this:
startdate - enddate
2018-01-01 - 2018-06-30
2018-07-01 - 2018-12-31
2019-01-01 - 2019-06-30
2019-07-01 - 2019-12-31
2020-01-01 - 2020-06-30
2020-07-01 - 2020-12-31
2021-01-01 - 2021-06-30
What is the SQL-Statement for the result;
The result should be:
2019-11-01 2020-12-31 // 60 Days difference
2020-01-01 2020-06-30 // 180 Days difference
2020-07-01 2020-12-31 // 180 Days difference
2021-01-01 2021-06-30 // 180 Days difference
'2019-11-01' is entered via the search field in my Android app as input
The point is the output of the small period at the beginning / the
insertion of the first period in the statement
I tried 'Union' and it gives me an error.
How can I do this with a query ?
I am thankful for any help
and what is the SQL statement, when it should
return data from '2019-11-01' until 'now'
2019-11-01 2020-12-31 // 60 Days difference
2020-01-01 2020-06-30 // 180 Days difference
2020-07-01 2020-12-31 // 180 Days difference
2021-01-01 2021-02-16 // 46 Days difference ! !
Thank's
You need all the rows where enddate is less or equal than '2019-11-01':
SELECT MAX(startdate, '2019-11-01') startdate, enddate
FROM tablename
WHERE enddate >= '2019-11-01'
See the demo.
Results:
startdate
enddate
2019-11-01
2019-12-31
2020-01-01
2020-06-30
2020-07-01
2020-12-31
2021-01-01
2021-06-30
Edit, for your 2nd question:
SELECT MAX(startdate, '2019-11-01') startdate,
MIN(enddate, CURRENT_DATE) enddate
FROM tablename
WHERE enddate >= '2019-11-01' AND startdate <= CURRENT_DATE
See the demo.
Results:
startdate
enddate
2019-11-01
2019-12-31
2020-01-01
2020-06-30
2020-07-01
2020-12-31
2021-01-01
2021-02-16

Creating a SQLite query

I have a SQLite database, I want to create a query that will group records if the DateTime is within 60 minutes - the hard part is the DateTime is cumulative so if we have 3 records with DateTimes 2019-12-14 15:40:00, 2019-12-14 15:56:00 and 2019-12-14 16:55:00 it would all fall in one group. Please see the hands and desired output of the query to help you understand the requirement.
Database Table "Hands"
ID DateTime Result
1 2019-12-14 15:40:00 -100
2 2019-12-14 15:56:00 1000
3 2019-12-14 16:55:00 -2000
4 2012-01-12 12:00:00 400
5 2016-10-01 21:00:00 900
6 2016-10-01 20:55:00 1000
Desired output of query
StartTime Count Result
2019-12-14 15:40:00 3 -1100
2012-01-12 12:00:00 1 400
2016-10-01 20:55:00 2 1900
You can use some window functions to indicate at which record a new group should start (because of a datetime difference with the previous that is 60 minutes or larger), and then to turn that information into a unique group number. Finally you can group by that group number and perform the aggregation functions on it:
with base as (
select DateTime, Result,
coalesce(cast((
julianday(DateTime) - julianday(
lag(DateTime) over (order by DateTime)
)
) * 24 >= 1 as integer), 1) as firstInGroup
from Hands
), step as (
select DateTime, Result,
sum(firstInGroup) over (
order by DateTime rows
between unbounded preceding and current row) as grp
from base
)
select min(DateTime) DateTime,
count(*) Count,
sum(Result) Result
from step
group by grp;
DB-fiddle

How to get count of multiple distinct columns with one column as date

I have a table with columns as id,date,name
id date name
1 2019-08-01 00:00:00 abc
1 2019-08-01 00:00:00 def
2 2019-08-01 00:00:00 pqr
1 2019-08-31 00:00:00 def
I want to get the count of id for given month.
The expected result for count of id for month 8 must be 3
SELECT strftime('%Y/%m/%d', date) as vdate,count(DISTINCT vdate,id) AS totalcount FROM cardtable WHERE date BETWEEN date('" + $rootScope.mydate + "', 'start of month') AND date('" + $rootScope.mydate + "','start of month','+1 month','-1 day') group by vdate
Basically i want to count if id and date both are distinct.for example if there are 2 entries on date 2019-08-01 with same id than it should give count as 1,if there 3 entries on date 2019-08-01 in which 2 entries are with id 1 and 3rd entry with 2 than it should count 2 and when there are 2 entries with id 1 and on different date lets say 1 entry on 2019-08-01 with id 1 and other on 2019-08-31 with id 1 than count id for month 8 must 2.How can i modify the above query.
Use a subquery which returns the distinct values that you want to count:
SELECT COUNT(*) AS totalcount
FROM (
SELECT DISTINCT strftime('%Y/%m/%d', date), id
FROM cardtable
WHERE date(date) BETWEEN
date('" + $rootScope.mydate + "', 'start of month')
AND
date('" + $rootScope.mydate + "','start of month','+1 month','-1 day')
)
See the demo.
Results:
| totalcount |
| ---------- |
| 3 |

Hive: No results are displayed from the query

I am writing a query on this table to get the sum of size for all the directories, group by directory where date is yesterday. I am getting no output from the below query.
test.id test.path test.size test.date
1 this/is/the/path1/fil.txt 232.24 2019-06-01
2 this/is/the/path2/test.txt 324.0 2016-06-01
3 this/is/the/path3/index.txt 12.3 2017-05-01
4 this/is/the/path4/test2.txt 134.0 2019-03-23
5 this/is/the/path1/files.json 2.23 2018-07-23
6 this/is/the/path1/code.java 1.34 2014-03-23
7 this/is/the/path2/data.csv 23.42 2016-06-23
8 this/is/the/path3/test.html 1.33 2018-09-23
9 this/is/the/path4/prog.js 6.356 2019-06-23
4 this/is/the/path4/test2.txt 134.0 2019-04-23
SELECT regexp_replace(path,'[^/]+$',''), sum(cast(size as decimal))
from test WHERE date > date_sub(current_date, 1) GROUP BY path,size;
You must not group by size, only by regexp_replace(path,'[^/]+$','').
Also, since you want only yesterday's rows why do you use WHERE date > '2019%?
You can get yesterday's date with date_sub(current_date, 1):
select
regexp_replace(path,'[^/]+$',''),
sum(cast(size as decimal))
from test
where date = date_sub(current_date, 1)
group by regexp_replace(path,'[^/]+$','');
You probably want WHERE date >= '2019-01-01'. Using % in matching strings, for example your 2019%, only works with LIKE, not inequality matching.
The example you gave looks like you want all rows in calendar year 2019.
For yesterday, you want
WHERE date >= DATE_SUB(current_date, -1)
AND date < current_date
This works even if your date column contains timestamps.

Resources