I have datas in a table with schema:
Id INTEGER,
date DATETIME,
value REAL
id is primary key, and I have an index on date column to speed up querying values within a specific date range.
What should I do if I need N equal date ranges between specific start and end dates, and query aggregated datas for each date range?
For example:
Start date: 2015-01-01
End date: 2019-12-31
N: 5
In this case equal date intervals should be:
2015-01-01 ~ 2015-12-31
2016-01-01 ~ 2016-12-31
2017-01-01 ~ 2017-12-31
2018-01-01 ~ 2018-12-31
2019-01-01 ~ 2019-12-31
And the query should aggregate all values (AVG) in between those intervals, so I would like to have 5 total rows after the execution.
Maybe something with CTE?
There are 2 ways to do it.
They both use recursive ctes but return different results.
The 1st one with NTILE():
with
dates as (select '2015-01-01' mindate, '2019-12-31' maxdate),
alldates as (
select mindate date from dates
union all
select date(a.date, '1 day')
from alldates a cross join dates d
where a.date < d.maxdate
),
groups as (
select *, ntile(5) over (order by date) grp
from alldates
),
cte as (
select min(date) date1, max(date) date2
from groups
group by grp
)
select * from cte;
Results:
| date1 | date2 |
| ---------- | ---------- |
| 2015-01-01 | 2016-01-01 |
| 2016-01-02 | 2016-12-31 |
| 2017-01-01 | 2017-12-31 |
| 2018-01-01 | 2018-12-31 |
| 2019-01-01 | 2019-12-31 |
And the 2nd builds the groups with math:
with
dates as (select '2015-01-01' mindate, '2019-12-31' maxdate),
cte1 as (
select mindate date from dates
union all
select date(
c.date,
((strftime('%s', d.maxdate) - strftime('%s', d.mindate)) / 5) || ' second'
)
from cte1 c inner join dates d
on c.date < d.maxdate
),
cte2 as (
select date date1, lead(date) over (order by date) date2
from cte1
),
cte as (
select date1,
case
when date2 = (select maxdate from dates) then date2
else date(date2, '-1 day')
end date2
from cte2
where date2 is not null
)
select * from cte
Results:
| date1 | date2 |
| ---------- | ---------- |
| 2015-01-01 | 2015-12-31 |
| 2016-01-01 | 2016-12-30 |
| 2016-12-31 | 2017-12-30 |
| 2017-12-31 | 2018-12-30 |
| 2018-12-31 | 2019-12-31 |
In both cases you can get the averages by joining the table to the cte:
select c.date1, c.date2, avg(t.value) avg_value
from cte c inner join tablename t
on t.date between c.date1 and c.date2
group by c.date1, c.date2
Related
The data table looks like the following:
ID DATE
1 2020-12-31 10:10:00
2 2020-12-31 20:30:00
3 2020-12-31 20:50:00
4 2021-01-02 17:10:00
5 2021-01-02 17:20:00
6 2021-01-02 17:30:00
7 2021-01-03 23:10:00
..
And I would like to query only the last entry per hour per day, and to have the resulte like:
ID DATE
1 2020-12-31 10:10:00
3 2020-12-31 20:50:00
6 2021-01-02 17:30:00
7 2021-01-03 23:10:00
..
I tried to look for hourly query and found the following
strftime('%H', " + DATE + ", '+1 hours')
However, not sure how to use it properly (e.g. with GROUP BY ? then how to ensure it takes the lastest entry of the hour), therefore, would be great to have some help here!
You can do it with ROW_NUMBER() window function:
SELECT ID, DATE
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY strftime('%Y%m%d%H', DATE) ORDER BY DATE DESC) rn
FROM tablename
)
WHERE rn = 1
ORDER BY ID
Instead of strftime('%Y%m%d%H', DATE) you could also use substr(DATE, 1, 13).
For versions of SQLite previous to 3.25.0 which do not support window functions you can do it with NOT EXISTS:
SELECT t1.*
FROM tablename t1
WHERE NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE strftime('%Y%m%d%H', t2.DATE) = strftime('%Y%m%d%H', t1.DATE)
AND t2.DATE > t1.DATE
)
See the demo.
Results:
> ID | DATE
> -: | :------------------
> 1 | 2020-12-31 10:10:00
> 3 | 2020-12-31 20:50:00
> 6 | 2021-01-02 17:30:00
> 7 | 2021-01-03 23:10:00
I have a table with columns as id,date,name
id date name
1 2019-08-01 00:00:00 abc
1 2019-08-01 00:00:00 def
2 2019-08-01 00:00:00 pqr
1 2019-08-31 00:00:00 def
I want to get the count of id for given month.
The expected result for count of id for month 8 must be 3
SELECT strftime('%Y/%m/%d', date) as vdate,count(DISTINCT vdate,id) AS totalcount FROM cardtable WHERE date BETWEEN date('" + $rootScope.mydate + "', 'start of month') AND date('" + $rootScope.mydate + "','start of month','+1 month','-1 day') group by vdate
Basically i want to count if id and date both are distinct.for example if there are 2 entries on date 2019-08-01 with same id than it should give count as 1,if there 3 entries on date 2019-08-01 in which 2 entries are with id 1 and 3rd entry with 2 than it should count 2 and when there are 2 entries with id 1 and on different date lets say 1 entry on 2019-08-01 with id 1 and other on 2019-08-31 with id 1 than count id for month 8 must 2.How can i modify the above query.
Use a subquery which returns the distinct values that you want to count:
SELECT COUNT(*) AS totalcount
FROM (
SELECT DISTINCT strftime('%Y/%m/%d', date), id
FROM cardtable
WHERE date(date) BETWEEN
date('" + $rootScope.mydate + "', 'start of month')
AND
date('" + $rootScope.mydate + "','start of month','+1 month','-1 day')
)
See the demo.
Results:
| totalcount |
| ---------- |
| 3 |
I'm trying to list all days of the next week of given date
Example:
If today is 2019-09-24 then the result should be:
DAY_OF_WEEK
2019-09-24
2019-09-25
2019-09-26
2019-09-27
2019-09-28
2019-09-29
2019-09-30
This is the query I came up with and I wonder if there is more elegant way to achieve the same results:
SELECT date('now') AS DAY_OF_WEEK
UNION
SELECT date('now', '+1 day') AS DAY_OF_WEEK
UNION
SELECT date('now', '+2 day') AS DAY_OF_WEEK
UNION
SELECT date('now', '+3 day') AS DAY_OF_WEEK
UNION
SELECT date('now', '+4 day') AS DAY_OF_WEEK
UNION
SELECT date('now', '+5 day') AS DAY_OF_WEEK
UNION
SELECT date('now', '+6 day') AS DAY_OF_WEEK
Your code is correct.
If you want you can use a CTE which returns only the numbers from 0 to 6 and select from it the number of days to add to the current date:
WITH days as (
SELECT 0 AS day UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
)
SELECT date('now', '+' || day || ' day') AS DAY_OF_WEEK FROM days
See the demo.
Or with a RECURSIVE CTE:
WITH RECURSIVE days(day) AS (
SELECT 0
UNION ALL
SELECT day + 1 FROM days
LIMIT 7
)
SELECT date('now', '+' || day || ' day') AS DAY_OF_WEEK FROM days;
See the demo.
Results:
| DAY_OF_WEEK |
| ----------- |
| 2019-09-24 |
| 2019-09-25 |
| 2019-09-26 |
| 2019-09-27 |
| 2019-09-28 |
| 2019-09-29 |
| 2019-09-30 |
data.table join is not selecting the maximum date, but is the maximum value. See the following example:
table1 <- fread(
"individual_id | date
1 | 2018-01-06
2 | 2018-01-06",
sep ="|"
)
table1$date = as.IDate(table1$date)
table2 <- fread(
"individual_id | date_second | company_id | value
1 | 2018-01-02 | 62 | 1
1 | 2018-01-04 | 62 | 1.5
1 | 2018-01-05 | 63 | 1
2 | 2018-01-01 | 71 | 2
2 | 2018-01-02 | 74 | 1
2 | 2018-01-05 | 74 | 4",
sep = "|"
)
table2$date_second = as.IDate(table2$date_second)
The following join should select the maximum value by company id and then select the return the maximum of all the values returned for each individual.
The join to select max value:
table2[table1, on=.(individual_id, date_second<=date),
#for each row of table1,
by=.EACHI,
# get the maximum value by company_id and the max of all of these
max(.SD[,max(value), by=.(company_id)]$V1)]
output:
individual_id date_second V1
1: 1 2018-01-06 1.5
2: 2 2018-01-06 4.0
same join, selecting max date:
table2[table1, on=.(individual_id, date_second<=date),
#for each row of table1,
by=.EACHI,
# get the maximum date by company_id and the max of all of these
max(.SD[,max(date_second), by=.(company_id)]$V1)]
output:
individual_id date_second V1
1: 1 2018-01-06 2018-01-02
2: 2 2018-01-06 2018-01-01
Why is it not returning the max date like it did the max value?
I guess you are looking for an update join:
table1[table2
, on = .(individual_id, date >= date_second)
, by = .EACHI
, second_date := max(i.date_second)][]
which gives:
> table1
individual_id date second_date
1: 1 2018-01-06 2018-01-05
2: 2 2018-01-06 2018-01-05
ok, it turns out you cannot select based on one of the join criteria, so I have to create a new column date_second_copy and then select based of this, e.g.:
table2$date_second_copy = table2$date_second
table2[table1, on=.(individual_id, date_second<=date),
#for each row of table1,
by=.EACHI,
# get the maximum date by company_id and the max of all of these
max(.SD[,max(date_second_copy), by=.(company_id)]$V1)]
I need compare value from 1 column with previous value from 2 column. For example, I have table:
id | create_date | end_date
1 | 2016-12-31 | 2017-01-25
2 | 2017-01-26 | 2017-05-21
3 | 2017-05-22 | 2017-08-26
4 | 2017-09-01 | 2017-09-02
I need to compare create_date for id = 2 with end_date for id = 1
and compare create_date for id = 3 with end_date for id = 2 etc.
Result: show me id which has create_date (id = n) <> end_date (id = n-1) + interval '1' day
Should I use lag() function? How I can compare it? Which function I should use and how?
Thank you
Teradata doesn't have lag/lead, but you can still get the same functionality:
select
id,
create_date,
end_date,
max(end_date) over (order by id between 1 preceding and 1 preceding) as prev_end_date
...
qualify
create_date <> prev_end_date + INTERVAL '1' day;