SQLite Group by last seven days (not weekly!) - sqlite

I was looking some GROUP BY questions here but no one is like mine.
I have a current table like this:
|day | client|
----------------
|2020-01-07|id11|
|2020-01-07|id10|
|2020-01-06|id09|
|2020-01-06|id08|
|2020-01-05|id07|
|2020-01-04|id06|
|2020-01-03|id05|
|2020-01-03|id04|
|2020-01-02|id03|
|2020-01-01|id02|
|2020-01-01|id01|
And I want to create a new column with the ocurrences of unique clients for the last seven days (day - 6) (not weekly!) and show it by day:
|day |last 7 day clients|
----------------
|2020-01-07|11|
|2020-01-06| 9|
|2020-01-05| 7|
|2020-01-04| 6|
|2020-01-03| 5|
|2020-01-02| 3|
|2020-01-01| 2|
All the aswers here are grouping weekly!
What I tryed:
pd.read_sql_query("""SELECT DATE(day) dateColumn,
COUNT(DISTINCT client) AS seven_day_users
FROM table
GROUP BY date(dateColumn, '-6 days')
ORDER BY dateColumn DESC;
""", conn)
But the result is grouping by day, not into the interval.

Use a self join of the table and aggregation:
SELECT t1.day,
COUNT(DISTINCT t2.client) `last 7 day clients`
FROM tablename t1 INNER JOIN tablename t2
ON t2.day BETWEEN date(t1.day, '-6 day') AND t1.day
GROUP BY t1.day
ORDER BY t1.day DESC;
See the demo.

Related

Calculate Count of users every month in Kusto query language

I have a table named tab1:
Timestamp Username. sessionid
12-12-2020. Ravi. abc123
12-12-2020. Hari. oipio878
12-12-2020. Ravi. ytut987
11-12-2020. Ram. def123
10-12-2020. Ravi. jhgj54
10-12-2020. Shiv. qwee090
10-12-2020. bob. rtet4535
30-12-2020. sita. jgjye56
I want to count the number of distinct Usernames per day, so that the output would be:
day. count
10-12-2020. 3
11-12-2020. 1
12-12-2020. 2
30-12-2020. 1
Tried query:
tab1
| where timestamp > datetime(01-08-2020)
| range timestamp from datetime(01-08-2020) to now() step 1d
| extend day = dayofmonth(timestamp)
| distinct Username
| count
| project day, count
To get a very close estimation of the number of Usernames per day, just run this (the number won't be accurate, see details here):
tab1
| summarize dcount(Username) by bin(Timestamp, 1d)
If you want accurate results, then you should do this (just note that the query will be less performant than the previous one, and will only work if you have up to 1,000,000 usernames / day):
tab1
| summarize make_set(Username) by bin(Timestamp, 1d)
| project Timestamp, Count = array_length(set_Username)

SQLite: Calculate how a counter has increased in current day and week

I have a SQLite database with a counter and timestamp in unixtime as showed below:
+---------+------------+
| counter | timestamp |
+---------+------------+
| | 1582933500 |
| 1 | |
+---------+------------+
| 2 | 1582933800 |
+---------+------------+
| ... | ... |
+---------+------------+
I would like to calculate how 'counter' has increased in current day and current week.
It is possible in a SQLite query?
Thanks!
Provided you have SQLite version >= 3.25.0 the SQLite window functions will help you achieve this.
Using the LAG function to retrieve the value from the previous record - if there is none (which will be the case for the first row) a default value is provided, that is same as current row.
For the purpose of demonstration this code:
SELECT counter, timestamp,
LAG (timestamp, 1, timestamp) OVER (ORDER BY counter) AS previous_timestamp,
(timestamp - LAG (timestamp, 1, timestamp) OVER (ORDER BY counter)) AS diff
FROM your_table
ORDER BY counter ASC
will give this result:
1 1582933500 1582933500 0
2 1582933800 1582933500 300
In a CTE get the min and max timestamp for each day and join it twice to the table:
with cte as (
select date(timestamp, 'unixepoch', 'localtime') day,
min(timestamp) mindate, max(timestamp) maxdate
from tablename
group by day
)
select c.day, t2.counter - t1.counter difference
from cte c
inner join tablename t1 on t1.timestamp = c.mindate
inner join tablename t2 on t2.timestamp = c.maxdate;
With similar code get the results for each week:
with cte as (
select strftime('%W', date(timestamp, 'unixepoch', 'localtime')) week,
min(timestamp) mindate, max(timestamp) maxdate
from tablename
group by week
)
select c.week, t2.counter - t1.counter difference
from cte c
inner join tablename t1 on t1.timestamp = c.mindate
inner join tablename t2 on t2.timestamp = c.maxdate;

Make partitions based on difference in date in Postgres window function

I have data in the following format
id | first_name | last_name | birth_date
abc | Jared | Pollard | 1970-01-01
def | Jared | Pollard | 1972-02-02
ghi | Jared | Pollard | 1980-01-01
klm | Jared | Pollard | 2015-01-01
and I would like a query which groups data based on the following rule
If first_name, last_name are equal and birth_dates are within 5 years of each other, than records belong to same group
So the above data contains three groups group1=(abc, def), group2=(ghi) and group3=(klm)
Currently I have the following query which incorrectly creates only 2 groups, group1=(abc, def) and group2=(ghi, klm)
SELECT
g.id,
FIRST_VALUE(g.id) OVER (PARTITION BY lower(trim(g.last_name)), lower(trim(g.first_name)),
CASE WHEN g.birth_date between g.fv_birth_date - interval '5 year' AND g.fv_birth_date + interval '5 year' THEN 1 ELSE 0 END
ORDER BY g.last_used_dt DESC NULLS LAST) AS cluster_id
FROM (
SELECT id, last_used_dt, last_name, first_name, birth_date,
FIRST_VALUE(birth_date)
OVER (PARTITION BY
lower(trim(last_name)),
lower(trim(first_name))
ORDER BY last_used_dt DESC NULLS LAST) AS fv_birth_date
FROM guest
) g;
I understand this is because of the CASE statement within the PARTITION BY clause but am unable to come up with any other query

Order of columns after pivot in application insights

User wants a count of unique sessions per week in application insights. I have the query working, including a pivot, but the Week columns are out of order. I would prefer if they were in order.
pageViews
| where timestamp < now()
| summarize Sessions= dcount(session_Id)
by Week=bin(datepart("weekOfYear", timestamp), 1), user_AuthenticatedId
| order by Week
| evaluate pivot(Week, sum(Sessions))
| join kind=innerunique (pageViews
| summarize MostRecentRequest = max(timestamp) by user_AuthenticatedId)
on $right.user_AuthenticatedId == $left.user_AuthenticatedId
| project-away user_AuthenticatedId1
I've tried ordering by timestamp before the summarize, and ordering by week after the summarize (still in there) and no luck.
There's currently a "trick" that will work: serialize right after your order by
pageViews
| where timestamp < now()
| where isnotempty(user_AuthenticatedId)
| summarize Sessions= dcount(session_Id)
by Week=bin(datepart("weekOfYear", timestamp), 1), user_AuthenticatedId
| order by Week
| serialize // <--------------------------------- RIGHT HERE
| evaluate pivot(Week, sum(Sessions))
| join kind=innerunique (pageViews
| summarize TotalSessions=dcount(session_Id), MostRecentRequest = max(timestamp) by user_AuthenticatedId)
on $right.user_AuthenticatedId == $left.user_AuthenticatedId
| project-away user_AuthenticatedId1
| top 100 by TotalSessions desc
gets me this in workbooks, with the weeks in descending order (I also added total session count to sort/top by with some custom column settings set):
the custom settings I have for the column settings in workbooks:
delete all the #'d columns that are there by default and add one for ^[0-9]+$ set to heatmap:
I refactored query a bit for my own comprehension. I took the the left and right into "views". Thought I'd share.
let users_MostRecent_Session =
pageViews
| summarize
TotalSessions=dcount(session_Id)
, MostRecentRequest = max(timestamp)
by
user_AuthenticatedId
;
//
let users_sessions_ByWeek =
pageViews
| where timestamp < now()
| where isnotempty(user_AuthenticatedId)
| summarize
Sessions= dcount(session_Id)
by
Week=bin(datepart("weekOfYear", timestamp), 1)
, user_AuthenticatedId
| order by Week
| serialize
| evaluate pivot(Week, sum(Sessions))
;
//
//
users_sessions_ByWeek
| join kind=innerunique
users_MostRecent_Session
on user_AuthenticatedId
| project-away user_AuthenticatedId1
| top 100 by TotalSessions desc

Is there a way to reuse subqueries in the same query?

See Update at end of question for solution thanks to marked answer!
I'd like to treat a subquery as if it were an actual table that can be reused in the same query. Here's the setup SQL:
create table mydb.mytable
(
id integer not null,
fieldvalue varchar(100),
ts timestamp(6) not null
)
unique primary index (id, ts)
insert into mydb.mytable(0,'hello',current_timestamp - interval '1' minute);
insert into mydb.mytable(0,'hello',current_timestamp - interval '2' minute);
insert into mydb.mytable(0,'hello there',current_timestamp - interval '3' minute);
insert into mydb.mytable(0,'hello there, sir',current_timestamp - interval '4' minute);
insert into mydb.mytable(0,'hello there, sir',current_timestamp - interval '5' minute);
insert into mydb.mytable(0,'hello there, sir. how are you?',current_timestamp - interval '6' minute);
insert into mydb.mytable(1,'what up',current_timestamp - interval '1' minute);
insert into mydb.mytable(1,'what up',current_timestamp - interval '2' minute);
insert into mydb.mytable(1,'what up, mr man?',current_timestamp - interval '3' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '4' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '5' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '6' minute);
What I want to do is return only rows where FieldValue differs from the previous row. This SQL does just that:
locking row for access
select id, fieldvalue, ts from
(
--locking row for access
select
id, fieldvalue,
min(fieldvalue) over
(
partition by id
order by ts, fieldvalue rows
between 1 preceding and 1 preceding
) fieldvalue2,
ts
from mydb.mytable
) x
where
hashrow(fieldvalue) <> hashrow(fieldvalue2)
order by id, ts desc
It returns:
+----+---------------------------------+----------------------------+
| id | fieldvalue | ts |
+----+---------------------------------+----------------------------+
| 0 | hello | 2015-05-06 10:13:34.160000 |
| 0 | hello there | 2015-05-06 10:12:34.350000 |
| 0 | hello there, sir | 2015-05-06 10:10:34.750000 |
| 0 | hello there, sir. how are you? | 2015-05-06 10:09:34.970000 |
| 1 | what up | 2015-05-06 10:13:35.470000 |
| 1 | what up, mr man? | 2015-05-06 10:12:35.690000 |
| 1 | what up, duder? | 2015-05-06 10:09:36.240000 |
+----+---------------------------------+----------------------------+
The next step is to return only the last row per ID. If I were to use this SQL to write the previous SELECT to a table...
create table mydb.reusetest as (above sql) with data;
...I could then do this do get the last row per ID:
locking row for access
select t1.* from mydb.reusetest t1,
(
select id, max(ts) ts from mydb.reusetest
group by id
) t2
where
t2.id = t1.id and
t2.ts = t1.ts
order by t1.id
It would return this:
+----+------------+----------------------------+
| id | fieldvalue | ts |
+----+------------+----------------------------+
| 0 | hello | 2015-05-06 10:13:34.160000 |
| 1 | what up | 2015-05-06 10:13:35.470000 |
+----+------------+----------------------------+
If I could reuse the subquery in my initial SELECT, I could achieve the same results. I could copy/paste the entire query SQL into another subquery to create a derived table, but this would just mean I'd need to change the SQL in two places if I ever needed to modify it.
Update
Thanks to Kristján, I was able to implement the WITH clause into my SQL like this for perfect results:
locking row for access
with items (id, fieldvalue, ts) as
(
select id, fieldvalue, ts from
(
select
id, fieldvalue,
min(fieldvalue) over
(
partition by id
order by ts, fieldvalue
rows between 1 preceding and 1 preceding
) fieldvalue2,
ts
from mydb.mytable
) x
where
hashrow(fieldvalue) <> hashrow(fieldvalue2)
)
select t1.* from items t1,
(
select id, max(ts) ts from items
group by id
) t2
where
t2.id = t1.id and
t2.ts = t1.ts
order by t1.id
Does WITH help? That lets you define a result set you can use multiple times in the SELECT.
From their example:
WITH orderable_items (product_id, quantity) AS
( SELECT stocked.product_id, stocked.quantity
FROM stocked, product
WHERE stocked.product_id = product.product_id
AND product.on_hand > 5
)
SELECT product_id, quantity
FROM orderable_items
WHERE quantity < 10;

Resources