MariaDB / MySQL: Partial "total" number of rows for moving time-window - count

I have a MariaDB Database with Users and their appropriate registration date, something like:
+----+----------+----------------------+
| ID | Username | RegistrationDatetime |
+----+----------+----------------------+
| 1 | A | 2022-01-03 12:00:00 |
| 2 | B | 2022-01-03 14:00:00 |
| 3 | C | 2022-01-04 23:00:00 |
| 4 | D | 2022-01-04 14:00:00 |
| 5 | E | 2022-01-05 14:00:00 |
+----+----------+----------------------+
I want to know the total number of users in the system at the end of every date with just one query - is that possible?
So the result should be something like:
+------------+-------+
| Date | Count |
+------------+-------+
| 2022-01-03 | 2 |
| 2022-01-04 | 4 |
| 2022-01-05 | 5 |
+------------+-------+
Yes it's easy with single queries and looping over the dates using PHP, but how to do it with just one query?
EDIT
Thanks for all the replies, yes, users could get cancelled / deleted, i.e. going by the max(ID) for a specific time period is NOT possible. There could be gaps in the column ID

Use COUNT() window function:
SELECT DISTINCT
DATE(RegistrationDatetime) AS Date,
COUNT(*) OVER (ORDER BY DATE(RegistrationDatetime)) AS Count
FROM tablename;
See the demo.

SELECT
date(RegistrationDatetime ),
sum(count(*)) over (order by date(RegistrationDatetime ))
FROM
mytable
GROUP BY
date(RegistrationDatetime );
output:
date(RegistrationDatetime )
sum(count(*)) over (order by date(RegistrationDatetime ))
2022-01-03
2
2022-01-04
4
2022-01-05
5
see: DBFIDDLE

SELECT t1.RegistrationDatetime AS Date,
(SELECT COUNT(*) FROM users t2 WHERE t2.RegistrationDatetime <= t1.RegistrationDatetime) AS Count
FROM users t1
GROUP BY t1.RegistrationDatetime

If you have no cancelled users, you can do:
SELECT DATE(RegistrationDatetime) AS date_, MAX(Id) AS cnt
FROM tab
GROUP BY DATE(RegistrationDatetime)
Check the demo here.
Otherwise you may need to use a ROW_NUMBER to generate that ranking:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(ORDER BY RegistrationDatetime) AS rn
FROM tab
)
SELECT DATE(RegistrationDatetime) AS date_, MAX(rn) AS cnt
FROM cte
GROUP BY DATE(RegistrationDatetime)
Check the demo here.

Related

Retrieve items by latest date, item code and comparison to additional table

I have two tables:
table "A" with various items identified by item codes (integer). each item appears several times, with different upload dates. the tables also show the store under which the item is sold (store ID- integer)
table "B" with a list of the desired item codes (50 items) to draw.
I am interested in extracting (showing) items from the first table, according to the item codes on the second table. the items chosen should also have the highest upload date and belong to a specific store id (as I choose).
for example: the item "rice", has an item code of - 77893. this item code is on table "B", meaning I want to show it. in table "A", there are multiple entries for "rice":
table exapmle
table "A":
item_name | item_code | upload_date | store_id
rice | 77893 | 2021-11-18 | 001
rice | 77893 | 2020-05-30 | 011
rice | 77893 | 2020-11-02 | 002
apple | 90837 | 2020-05-14 | 002
apple | 90837 | 2020-05-14 | 020
rice | 77893 | 2020-05-15 | 002
apple | 90837 | 2020-01-08 | 002
rice | 77893 | 2020-05-15 | 005
table "B":
item_code
90837
77893
output:
item_name | item_code | upload_date | store_id
rice | 77893 | 2020-11-02 | 002
apple | 90837 | 2020-05-14 | 002
"rice" and "apple" have item codes that are also on table "B". in this example, I am interested in items that are sold at store 002.
so far I only managed to return the item by its latest upload date. however, I inserted the item code manually and also was not able to filter store_id's.
any help or guidelines on how to execute this idea will be very helpful.
thank you!
Filter the rows of the table A for the store that you want and the items that you have in the table B and then aggregate to get the rows with the max date:
SELECT item_name, MAX(upload_date) upload_date, store_id
FROM A
WHERE store_id = '002' AND item_name IN (SELECT item_name FROM B)
GROUP BY item_code;
or, with a join:
SELECT A.item_name, MAX(A.upload_date) upload_date, A.store_id
FROM A INNER JOIN B
ON B.item_code = A.item_code
WHERE A.store_id = '002'
GROUP BY A.item_code;
See the demo.

Count All Rows That meet Condition For A Given Month

I need to make a SQLite query for something I just can't quite wrap my brain around.
I have a database table with a bunch of "issues". And each "issue" has a createddate, and resolutiondate.
id createddate resolutiondate
-------------------------------------------
1 2019-04-18 2019-08-18
2 2019-04-20 2019-04-21
3 2019-05-08 2019-06-05
etc....
What I need to do, is count how many "issues" every month in the past 12 months, had a created date <= that month, and where the resolutiondate is > that month. I want a table that looks like this:
Month No. Of Issues Not Resolved But Existed That Month
---------------------------------------------------------------
2019-04 20
2019-05 17
2019-06 15
etc...
I'm struggling, because I essentially need to check every row multiple times, for every month it's created date is <= that month, and it hasn't been resolved yet. The count for a particular issue could increase the value for "No. Of Issues" for both April 2019 AND May 2019, for example, if it wasn't resolved for 2 months. I'm not sure how to check all rows multiple times.
I have to do it in SQLite.
My current attempt that doesn't seem to be working:
SELECT * FROM(
SELECT substr(createddate, 1, 7) AS created
FROM {{ project_key }}
GROUP BY substr(createddate, 1, 7)
) a JOIN (
SELECT substr(createddate, 1, 7) AS created,
COUNT(CASE WHEN julianday(substr(resolutiondate, 1, 10)) >= julianday(substr(created, 1, 10)) THEN 1 ELSE NULL END) as "No. Issues Not Resolved"
FROM {{ project_key }}
GROUP BY substr(createddate, 1, 7)
) b ON b.created = a.created
With a recursive CTE that returns the past 12 months and a left join to the table:
with months as (
select strftime('%Y-%m', 'now', '-1 year') month
union all
select strftime('%Y-%m', strftime('%Y-%m-%d', month || '-01', '+1 month') )
from months
where month < strftime('%Y-%m', 'now', '-1 month')
)
select m.month,
count(id) [No. Of Issues Not Resolved But Existed That Month]
from months m left join tablename t
on strftime('%Y-%m', t.createddate) <= m.month and strftime('%Y-%m', t.resolutiondate) > m.month
group by m.month
See the demo.
Results:
| month | No. Of Issues Not Resolved But Existed That Month |
| ------- | ------------------------------------------------- |
| 2019-02 | 0 |
| 2019-03 | 0 |
| 2019-04 | 1 |
| 2019-05 | 2 |
| 2019-06 | 1 |
| 2019-07 | 1 |
| 2019-08 | 0 |
| 2019-09 | 0 |
| 2019-10 | 0 |
| 2019-11 | 0 |
| 2019-12 | 0 |
| 2020-01 | 0 |

Index by category with sorting by column in R sqldf package

How to add index by category in R with sorting by column in sqldf package. I look for equivalent of SQL:
ROW_NUMBER() over(partition by [Category] order by [Date] desc
Suppose we have a table:
+----------+-------+------------+
| Category | Value | Date |
+----------+-------+------------+
| apples | 3 | 2018-07-01 |
| apples | 2 | 2018-07-02 |
| apples | 1 | 2018-07-03 |
| bananas | 9 | 2018-07-01 |
| bananas | 8 | 2018-07-02 |
| bananas | 7 | 2018-07-03 |
+----------+-------+------------+
Desired results are:
+----------+-------+------------+-------------------+
| Category | Value | Date | Index by category |
+----------+-------+------------+-------------------+
| apples | 3 | 2018-07-01 | 3 |
| apples | 2 | 2018-07-02 | 2 |
| apples | 1 | 2018-07-03 | 1 |
| bananas | 9 | 2018-07-01 | 3 |
| bananas | 8 | 2018-07-02 | 2 |
| bananas | 7 | 2018-07-03 | 1 |
+----------+-------+------------+-------------------+
Thank you for hints in comments how it can be done in lots of other packages different then sqldf: Numbering rows within groups in a data frame
1) PostgreSQL This can be done with the PostgreSQL backend to sqldf:
library(RPostgreSQL)
library(sqldf)
sqldf('select *,
ROW_NUMBER() over (partition by "Category" order by "Date" desc) as seq
from "DF"
order by "Category", "Date" ')
giving:
Category Value Date seq
1 apples 3 2018-07-01 3
2 apples 2 2018-07-02 2
3 apples 1 2018-07-03 1
4 bananas 9 2018-07-01 3
5 bananas 8 2018-07-02 2
6 bananas 7 2018-07-03 1
2) SQLite To do it with the SQLite backend (which is the default backend) we need to revise the SQL statement appropriately. Be sure that RPostgreSQL is NOT loaded before doing this. We have assumed that the data is already sorted by Date within each Category based on the data shown in the question but if that were not the case it would be easy enough to extend the SQL to sort it first.
library(sqldf)
sqldf("select a.*, count(*) seq
from DF a left join DF b on a.Category = b.Category and b.rowid >= a.rowid
group by a.rowid
order by a.Category, a.Date")
Note
The input DF in reproducible form is:
Lines <- "
Category Value Date
apples 3 2018-07-01
apples 2 2018-07-02
apples 1 2018-07-03
bananas 9 2018-07-01
bananas 8 2018-07-02
bananas 7 2018-07-03
"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)

R - Performing a CountIF for a multiple rows data frame

I've googled lots of examples about how to perform a CountIF in R, however I still didn't find the solution for what I want.
I basically have 2 dataframes:
df1: customer_id | date_of_export - here, we have only 1 date of export per customer
df2: customer_id | date_of_delivery - here, a customer can have different delivery dates (which means, same customer will appear more than once in the list)
And I need to count, for each customer_id in df1, how many deliveries they got after the export date. So, I need to count if df1$customer_id = df2$customer_id AND df1$date_of_export <= df2$date_of_delivery
To understand better:
customer_id | date_of_export
1 | 2018-01-12
2 | 2018-01-12
3 | 2018-01-12
customer_id | date_of_delivery
1 | 2018-01-10
1 | 2018-01-17
2 | 2018-01-13
2 | 2018-01-20
3 | 2018-01-04
My output should be:
customer_id | date_of_export | deliveries_after_export
1 | 2018-01-12 | 1 (one delivery after the export date)
2 | 2018-01-12 | 2 (two deliveries after the export date)
3 | 2018-01-12 | 0 (no delivery after the export date)
Doesn't seem that complicated but I didn't find a good approach to do that. I've been struggling for 2 days and nothing accomplished.
I hope I made myself clear here. Thank you!
I would suggest merging the two data.frames together and then it's a simple sum():
library(data.table)
df3 <- merge(df1, df2)
setDT(df3)[, .(deliveries_after_export = sum(date_of_delivery > date_of_export)), by = .(customer_id, date_of_export)]
# customer_id date_of_export deliveries_after_export
#1: 1 2018-01-12 1
#2: 2 2018-01-12 2
#3: 3 2018-01-12 0

Copy over the previous rows data until there is a mismatch found in current and previous data

Our DB is Oracle 11g. I have a table with the structure is as shown below:
Person_Id | Eff_Dt | Wid | Prv_Wid
P1 | 1/1/2001 | 2001 |
P1 | 1/10/2001 | 2001 |
P1 | 10/10/2001 | 2002 |
P1 | 1/1/2002 | 2003 |
P1 | 5/4/2002 | 2003 |
P1 | 8/6/2002 | 2002 |
P1 | 1/1/2005 | 2001 |
P1 | 1/10/2006 | 2001 |
I have a requirement that the Prv_Wid should be derived from the previous row's WID. But if the current and previous WIDs are same, then I have to go to the old previous row where the WID is different and put it in my current row's Prv_Wid.
It should be as shown below:
Person_Id | Eff_Dt | Wid| Prv_Wid
P1 | 1/1/2001 | 2001 | 0
P1 | 1/10/2001 | 2001 | 0
P1 | 10/10/2001 | 2002 | 2001
P1 | 1/1/2002 | 2003 | 2002
P1 | 5/4/2002 | 2003 | 2002
P1 | 8/6/2002 | 2002 | 2003
P1 | 1/1/2005 | 2001 | 2002
P1 | 1/10/2006 | 2001 | 2002
I have tried several ways like lead, lag, first_value, last_value and procedure
to achieve. But, I am not successful. Could you please provide any solution.
Thanks you so much.
Something like this... you can change it to an insert or merge statement as needed. The trick is to prepare your data first, by adding several nulls as demonstrated in the prep CTE (you may want to select from prep to see what it does - not needed for the final solution, but needed so that you understand how this works).
EFF_DT in the output uses my date format model settings.
with
tbl ( Person_Id, Eff_Dt, Wid, Prv_Wid ) as (
select 'P1', to_date( '1/1/2001', 'dd/mm/yyyy'), 2001, null from dual union all
select 'P1', to_date( '1/10/2001', 'dd/mm/yyyy'), 2001, null from dual union all
select 'P1', to_date('10/10/2001', 'dd/mm/yyyy'), 2002, null from dual union all
select 'P1', to_date( '1/1/2002', 'dd/mm/yyyy'), 2003, null from dual union all
select 'P1', to_date( '5/4/2002', 'dd/mm/yyyy'), 2003, null from dual union all
select 'P1', to_date( '8/6/2002', 'dd/mm/yyyy'), 2002, null from dual union all
select 'P1', to_date( '1/1/2005', 'dd/mm/yyyy'), 2001, null from dual union all
select 'P1', to_date( '1/10/2006', 'dd/mm/yyyy'), 2001, null from dual
),
prep ( person_id, eff_dt, wid, prv_wid ) as (
select person_id, eff_dt, wid,
case when wid != lag(wid) over (partition by person_id order by eff_dt)
then lag(wid) over (partition by person_id order by eff_dt) end
from tbl
)
select person_id, eff_dt, wid,
nvl(last_value(prv_wid ignore nulls)
over (partition by person_id order by eff_dt), 0) as prv_wid
from prep
;
PE EFF_DT WID PRV_WID
-- ------------------- ---------- ----------
P1 2001-01-01 00:00:00 2001 0
P1 2001-10-01 00:00:00 2001 0
P1 2001-10-10 00:00:00 2002 2001
P1 2002-01-01 00:00:00 2003 2002
P1 2002-04-05 00:00:00 2003 2002
P1 2002-06-08 00:00:00 2002 2003
P1 2005-01-01 00:00:00 2001 2002
P1 2006-10-01 00:00:00 2001 2002
8 rows selected

Resources