Count All Rows That meet Condition For A Given Month - sqlite

I need to make a SQLite query for something I just can't quite wrap my brain around.
I have a database table with a bunch of "issues". And each "issue" has a createddate, and resolutiondate.
id createddate resolutiondate
-------------------------------------------
1 2019-04-18 2019-08-18
2 2019-04-20 2019-04-21
3 2019-05-08 2019-06-05
etc....
What I need to do, is count how many "issues" every month in the past 12 months, had a created date <= that month, and where the resolutiondate is > that month. I want a table that looks like this:
Month No. Of Issues Not Resolved But Existed That Month
---------------------------------------------------------------
2019-04 20
2019-05 17
2019-06 15
etc...
I'm struggling, because I essentially need to check every row multiple times, for every month it's created date is <= that month, and it hasn't been resolved yet. The count for a particular issue could increase the value for "No. Of Issues" for both April 2019 AND May 2019, for example, if it wasn't resolved for 2 months. I'm not sure how to check all rows multiple times.
I have to do it in SQLite.
My current attempt that doesn't seem to be working:
SELECT * FROM(
SELECT substr(createddate, 1, 7) AS created
FROM {{ project_key }}
GROUP BY substr(createddate, 1, 7)
) a JOIN (
SELECT substr(createddate, 1, 7) AS created,
COUNT(CASE WHEN julianday(substr(resolutiondate, 1, 10)) >= julianday(substr(created, 1, 10)) THEN 1 ELSE NULL END) as "No. Issues Not Resolved"
FROM {{ project_key }}
GROUP BY substr(createddate, 1, 7)
) b ON b.created = a.created

With a recursive CTE that returns the past 12 months and a left join to the table:
with months as (
select strftime('%Y-%m', 'now', '-1 year') month
union all
select strftime('%Y-%m', strftime('%Y-%m-%d', month || '-01', '+1 month') )
from months
where month < strftime('%Y-%m', 'now', '-1 month')
)
select m.month,
count(id) [No. Of Issues Not Resolved But Existed That Month]
from months m left join tablename t
on strftime('%Y-%m', t.createddate) <= m.month and strftime('%Y-%m', t.resolutiondate) > m.month
group by m.month
See the demo.
Results:
| month | No. Of Issues Not Resolved But Existed That Month |
| ------- | ------------------------------------------------- |
| 2019-02 | 0 |
| 2019-03 | 0 |
| 2019-04 | 1 |
| 2019-05 | 2 |
| 2019-06 | 1 |
| 2019-07 | 1 |
| 2019-08 | 0 |
| 2019-09 | 0 |
| 2019-10 | 0 |
| 2019-11 | 0 |
| 2019-12 | 0 |
| 2020-01 | 0 |

Related

MariaDB / MySQL: Partial "total" number of rows for moving time-window

I have a MariaDB Database with Users and their appropriate registration date, something like:
+----+----------+----------------------+
| ID | Username | RegistrationDatetime |
+----+----------+----------------------+
| 1 | A | 2022-01-03 12:00:00 |
| 2 | B | 2022-01-03 14:00:00 |
| 3 | C | 2022-01-04 23:00:00 |
| 4 | D | 2022-01-04 14:00:00 |
| 5 | E | 2022-01-05 14:00:00 |
+----+----------+----------------------+
I want to know the total number of users in the system at the end of every date with just one query - is that possible?
So the result should be something like:
+------------+-------+
| Date | Count |
+------------+-------+
| 2022-01-03 | 2 |
| 2022-01-04 | 4 |
| 2022-01-05 | 5 |
+------------+-------+
Yes it's easy with single queries and looping over the dates using PHP, but how to do it with just one query?
EDIT
Thanks for all the replies, yes, users could get cancelled / deleted, i.e. going by the max(ID) for a specific time period is NOT possible. There could be gaps in the column ID
Use COUNT() window function:
SELECT DISTINCT
DATE(RegistrationDatetime) AS Date,
COUNT(*) OVER (ORDER BY DATE(RegistrationDatetime)) AS Count
FROM tablename;
See the demo.
SELECT
date(RegistrationDatetime ),
sum(count(*)) over (order by date(RegistrationDatetime ))
FROM
mytable
GROUP BY
date(RegistrationDatetime );
output:
date(RegistrationDatetime )
sum(count(*)) over (order by date(RegistrationDatetime ))
2022-01-03
2
2022-01-04
4
2022-01-05
5
see: DBFIDDLE
SELECT t1.RegistrationDatetime AS Date,
(SELECT COUNT(*) FROM users t2 WHERE t2.RegistrationDatetime <= t1.RegistrationDatetime) AS Count
FROM users t1
GROUP BY t1.RegistrationDatetime
If you have no cancelled users, you can do:
SELECT DATE(RegistrationDatetime) AS date_, MAX(Id) AS cnt
FROM tab
GROUP BY DATE(RegistrationDatetime)
Check the demo here.
Otherwise you may need to use a ROW_NUMBER to generate that ranking:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY RegistrationDatetime) AS rn
FROM tab
)
SELECT DATE(RegistrationDatetime) AS date_, MAX(rn) AS cnt
FROM cte
GROUP BY DATE(RegistrationDatetime)
Check the demo here.

Swap results of a value from last years value in same month if the month-year combination is not equal to month-year of system

I've a table as under
+----+-------+---------+
| ID | VALUE | DATE |
+----+-------+---------+
| 1 | 10 | 2019-09 |
| 1 | 12 | 2018-09 |
| 2 | 13 | 2019-10 |
| 2 | 14 | 2018-10 |
| 3 | 67 | 2019-01 |
| 3 | 78 | 2018-01 |
+----+-------+---------+
I want to be able to swap the VALUE column for all ID's where the DATE != year-month of system date
and If the DATE == year-month of system date then just keep this years value
the resulting table I need is as under
+----+-------+---------+
| ID | VALUE | DATE |
+----+-------+---------+
| 1 | 12 | 2019-09 |
| 2 | 13 | 2019-10 |
| 3 | 78 | 2019-01 |
+----+-------+---------+
As Jon and Maurits noticed, your example is unclear: you give no line with what is a wrong format to you, and you mention "current year" but do not describe the expected output for the next year for instance.
Here is an attempt of code to actually answer your question:
library(dplyr)
x = read.table(text = "
ID VALUE DATE
1 10 2019-09
1 12 2018-09
1 12 2018-09-04
1 12 2018-99
2 13 2019-10
2 14 2018-10
3 67 2019-01
3 78 2018-01
", header=T)
x %>%
mutate(DATE = paste0(DATE, "-01") %>% as.Date("%Y-%m-%d")) %>%
group_by(ID) %>%
filter(DATE==max(DATE, na.rm=T))
I inserted two lines with a "wrong" format (according to me) and treated "current year" as the maximum year you could find in the column for each ID.
This may be wrong assertions, but I'd need more information to better answer this.

Conditionally rolling up dates in R

Hi I am trying to work out a way to conditionally roll up dates in R.
Suppose I have the following table below and I want to roll dates up using the Flags variable. The Flag can either be 1 or 2 and dictates which subsequent dates can be linked up.
DateStart <- c("2018-01-01", "2018-01-04", "2018-01-05", "2018-01-09", "2018-01-12", "2018-01-20")
DateEnd <- c("2018-01-05", "2018-01-09", "2018-01-12", "2018-01-15", "2018-01-20", "2018-01-21")
IndexRecord <- c(1, NA, NA, NA, NA, NA)
Flag1 <- c(1,1,1,1,1,1)
Flag2 <- c(2,1,1,1,1,1)
Flag3 <- c(1,1,2,1,2,1)
df1 <- data.frame(DateStart = as.Date(DateStart),
DateEnd = as.Date(DateEnd),
IndexRecord = IndexRecord,
Flag1 = Flag1,
Flag2 = Flag2,
Flag3 = Flag3) %>%
arrange(DateStart)
df1
| | DateStart | DateEnd | IndexRecord | Flag1 | Flag2 | Flag3 |
|---|------------|------------|-------------|-------|-------|-------|
| 1 | 2018-01-01 | 2018-01-05 | 1 | 1 | 2 | 1 |
| 2 | 2018-01-04 | 2018-01-09 | NA | 1 | 1 | 1 |
| 3 | 2018-01-05 | 2018-01-12 | NA | 1 | 1 | 2 |
| 4 | 2018-01-09 | 2018-01-15 | NA | 1 | 1 | 1 |
| 5 | 2018-01-12 | 2018-01-20 | NA | 1 | 1 | 2 |
| 6 | 2018-01-20 | 2018-01-21 | NA | 1 | 1 | 1 |
A Flag with value of 1 for the current period means that for a subsequent row to have a valid link, the subsequent row must have DateStart occurring before DateEnd of the current row. Using Flag1 as the column of interest, the result would look like:
| | DateStart | DateEnd | IndexRecord | Flag1 |
|---|------------|------------|-------------|-------|
| 1 | 2018-01-01 | 2018-01-05 | 1 | 1 |
| 2 | 2018-01-04 | 2018-01-09 | NA | 1 |
| 3 | 2018-01-05 | 2018-01-12 | NA | 1 |
| 4 | 2018-01-09 | 2018-01-15 | NA | 1 |
| 5 | 2018-01-12 | 2018-01-20 | NA | 1 |
A Flag with value of 2 for the current period means that for a subsequent row to have a valid link, the subsequent row must have DateStart occurring on the DateEnd of the current row. Using Flag2 as the column of interest, the result would look like:
| | DateStart | DateEnd | IndexRecord | Flag2 |
|---|------------|------------|-------------|-------|
| 1 | 2018-01-01 | 2018-01-05 | 1 | 2 |
| 3 | 2018-01-05 | 2018-01-12 | NA | 1 |
| 4 | 2018-01-09 | 2018-01-15 | NA | 1 |
| 5 | 2018-01-12 | 2018-01-20 | NA | 1 |
One of the more complex cases could occur with patterns such as seen in Flag3 with the desired results:
| | DateStart | DateEnd | IndexRecord | Flag3 |
|---|------------|------------|-------------|-------|
| 1 | 2018-01-01 | 2018-01-05 | 1 | 1 |
| 2 | 2018-01-04 | 2018-01-09 | NA | 1 |
| 3 | 2018-01-05 | 2018-01-12 | NA | 2 |
| 5 | 2018-01-12 | 2018-01-20 | NA | 2 |
| 6 | 2018-01-20 | 2018-01-21 | NA | 1 |
Cheers,
J
Edits:
Since this was perhaps not clear let me clarify step by step.
Flag1.
We see that in Row 1 it ends on the 2018-01-05 and Flag1 is 1.
This means that for a subsequent row to be linked to this episode, the next episode's DateStart must occur before DateEnd of Row 1. Row 2, satisfies this condition since 2018-01-04 occurs before 2018-01-05 and therefore is valid link.
If we look at the remaining rows, all these dates are nested except the Row 6. Since Flag1 of Row5 is 1, we cannot count Row 6, hence why the table stops at Row 5.
Total elapsed time is the from 2018-01-01 to 2018-01-20.
Flag2.
Row 1 has Flag2 equal to 2 which means that only a subsequent DateStart of 2018-01-05 can be linked to this row. Therefore Row 2 is dropped. If we keeping moving down we see that Row 3 has a DateStart of 2018-01-05 and therefore can be linked to Row 1.
Looking at the remaining rows, it has the same pattern as for Flag1 since Flag1 and Flag2 are identical from this point onwards.
Similarly to Flag1, total elapsed time is the from 2018-01-01 to 2018-01-20
There is no difference in elapsed time compared to Flag1 but differs in the journey taken.
Flag3.
Flag3 for Row 1 and Row 2 are the same as in Flag1 which means at this point Row 1, Row 2, and Row 3 are kept as in the Flag1 example.
Flag3 for Row 3 however is 2. Since the DateEnd of Row 3 is 2018-01-12, only Row 5 can be linked and Row 4 is removed.
Since Row 5 has Flag3 of 2 and a DateEnd of 2018-01-20, Row 6 can also be linked to this set.
The total elapsed time for this set is from 2018-01-01 to 2018-01-21.

Index by category with sorting by column in R sqldf package

How to add index by category in R with sorting by column in sqldf package. I look for equivalent of SQL:
ROW_NUMBER() over(partition by [Category] order by [Date] desc
Suppose we have a table:
+----------+-------+------------+
| Category | Value | Date |
+----------+-------+------------+
| apples | 3 | 2018-07-01 |
| apples | 2 | 2018-07-02 |
| apples | 1 | 2018-07-03 |
| bananas | 9 | 2018-07-01 |
| bananas | 8 | 2018-07-02 |
| bananas | 7 | 2018-07-03 |
+----------+-------+------------+
Desired results are:
+----------+-------+------------+-------------------+
| Category | Value | Date | Index by category |
+----------+-------+------------+-------------------+
| apples | 3 | 2018-07-01 | 3 |
| apples | 2 | 2018-07-02 | 2 |
| apples | 1 | 2018-07-03 | 1 |
| bananas | 9 | 2018-07-01 | 3 |
| bananas | 8 | 2018-07-02 | 2 |
| bananas | 7 | 2018-07-03 | 1 |
+----------+-------+------------+-------------------+
Thank you for hints in comments how it can be done in lots of other packages different then sqldf: Numbering rows within groups in a data frame
1) PostgreSQL This can be done with the PostgreSQL backend to sqldf:
library(RPostgreSQL)
library(sqldf)
sqldf('select *,
ROW_NUMBER() over (partition by "Category" order by "Date" desc) as seq
from "DF"
order by "Category", "Date" ')
giving:
Category Value Date seq
1 apples 3 2018-07-01 3
2 apples 2 2018-07-02 2
3 apples 1 2018-07-03 1
4 bananas 9 2018-07-01 3
5 bananas 8 2018-07-02 2
6 bananas 7 2018-07-03 1
2) SQLite To do it with the SQLite backend (which is the default backend) we need to revise the SQL statement appropriately. Be sure that RPostgreSQL is NOT loaded before doing this. We have assumed that the data is already sorted by Date within each Category based on the data shown in the question but if that were not the case it would be easy enough to extend the SQL to sort it first.
library(sqldf)
sqldf("select a.*, count(*) seq
from DF a left join DF b on a.Category = b.Category and b.rowid >= a.rowid
group by a.rowid
order by a.Category, a.Date")
Note
The input DF in reproducible form is:
Lines <- "
Category Value Date
apples 3 2018-07-01
apples 2 2018-07-02
apples 1 2018-07-03
bananas 9 2018-07-01
bananas 8 2018-07-02
bananas 7 2018-07-03
"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)

How to use Row_Number based on Number of Days?

How to group and rank records based on 7 days.
Call 1 - 06-Jun-14 16.39.14 Rank 1
Call 7 - 10-Jun-14 14.28.40 Rank 7
After 7 days, whenever the next call date occurs,
I need to watch the next 7 days and rank accordingly.
Call 1 - 27-Jun-14 11.44.35 Rank 1
Call 4 - 03-Jul-14 14.23.39 Rank 4
CALL_DATE ROW_NUMBER
06-Jun-14 16.39.14 1
06-Jun-14 17.29.27 2
07-Jun-14 09.13.18 3
07-Jun-14 14.45.52 4
08-Jun-14 13.05.44 5
08-Jun-14 13.14.49 6
10-Jun-14 14.28.40 7
27-Jun-14 11.44.35 1
27-Jun-14 11.46.27 2
27-Jun-14 12.00.21 3
03-Jul-14 14.23.39 4
You can calculate the day number within the range by using the first_value() analytic function and getting the difference; then divide that by seven to get the week number (within the data); and then use that calculate the row_number() of each date within its calculated week number.
select call_date,
row_number() over (partition by week_num order by call_date) as row_num
from (
select call_date,
ceil((trunc(call_date)
- trunc(first_value(call_date) over (order by call_date))
+ 1) / 7) as week_num
from t42
)
order by call_date;
Which gives:
| CALL_DATE | ROW_NUM |
|-----------------------------|---------|
| June, 06 2014 16:39:14+0000 | 1 |
| June, 06 2014 17:29:27+0000 | 2 |
| June, 07 2014 09:13:18+0000 | 3 |
| June, 07 2014 14:45:52+0000 | 4 |
| June, 08 2014 13:05:44+0000 | 5 |
| June, 08 2014 13:14:49+0000 | 6 |
| June, 10 2014 14:28:40+0000 | 7 |
| June, 27 2014 11:44:35+0000 | 1 |
| June, 27 2014 11:46:27+0000 | 2 |
| June, 27 2014 12:00:21+0000 | 3 |
| July, 03 2014 14:23:39+0000 | 4 |
SQL Fiddle showing some of the intermediate steps and the final result.

Resources