SQLite: Calculate how a counter has increased in current day and week - sqlite

I have a SQLite database with a counter and timestamp in unixtime as showed below:
+---------+------------+
| counter | timestamp |
+---------+------------+
| | 1582933500 |
| 1 | |
+---------+------------+
| 2 | 1582933800 |
+---------+------------+
| ... | ... |
+---------+------------+
I would like to calculate how 'counter' has increased in current day and current week.
It is possible in a SQLite query?
Thanks!

Provided you have SQLite version >= 3.25.0 the SQLite window functions will help you achieve this.
Using the LAG function to retrieve the value from the previous record - if there is none (which will be the case for the first row) a default value is provided, that is same as current row.
For the purpose of demonstration this code:
SELECT counter, timestamp,
LAG (timestamp, 1, timestamp) OVER (ORDER BY counter) AS previous_timestamp,
(timestamp - LAG (timestamp, 1, timestamp) OVER (ORDER BY counter)) AS diff
FROM your_table
ORDER BY counter ASC
will give this result:
1 1582933500 1582933500 0
2 1582933800 1582933500 300

In a CTE get the min and max timestamp for each day and join it twice to the table:
with cte as (
select date(timestamp, 'unixepoch', 'localtime') day,
min(timestamp) mindate, max(timestamp) maxdate
from tablename
group by day
)
select c.day, t2.counter - t1.counter difference
from cte c
inner join tablename t1 on t1.timestamp = c.mindate
inner join tablename t2 on t2.timestamp = c.maxdate;
With similar code get the results for each week:
with cte as (
select strftime('%W', date(timestamp, 'unixepoch', 'localtime')) week,
min(timestamp) mindate, max(timestamp) maxdate
from tablename
group by week
)
select c.week, t2.counter - t1.counter difference
from cte c
inner join tablename t1 on t1.timestamp = c.mindate
inner join tablename t2 on t2.timestamp = c.maxdate;

Related

SQLite Group by last seven days (not weekly!)

I was looking some GROUP BY questions here but no one is like mine.
I have a current table like this:
|day | client|
----------------
|2020-01-07|id11|
|2020-01-07|id10|
|2020-01-06|id09|
|2020-01-06|id08|
|2020-01-05|id07|
|2020-01-04|id06|
|2020-01-03|id05|
|2020-01-03|id04|
|2020-01-02|id03|
|2020-01-01|id02|
|2020-01-01|id01|
And I want to create a new column with the ocurrences of unique clients for the last seven days (day - 6) (not weekly!) and show it by day:
|day |last 7 day clients|
----------------
|2020-01-07|11|
|2020-01-06| 9|
|2020-01-05| 7|
|2020-01-04| 6|
|2020-01-03| 5|
|2020-01-02| 3|
|2020-01-01| 2|
All the aswers here are grouping weekly!
What I tryed:
pd.read_sql_query("""SELECT DATE(day) dateColumn,
COUNT(DISTINCT client) AS seven_day_users
FROM table
GROUP BY date(dateColumn, '-6 days')
ORDER BY dateColumn DESC;
""", conn)
But the result is grouping by day, not into the interval.
Use a self join of the table and aggregation:
SELECT t1.day,
COUNT(DISTINCT t2.client) `last 7 day clients`
FROM tablename t1 INNER JOIN tablename t2
ON t2.day BETWEEN date(t1.day, '-6 day') AND t1.day
GROUP BY t1.day
ORDER BY t1.day DESC;
See the demo.

calculate percentages with postgresql join queries

I am trying to calculate percentages by joining 3 tables data to get the percentages of positive_count, negative_count, neutral_count of each user's tweets. I have succeeded in getting positive, negative and neutral counts, but failing to get the same as percentages instead of counts. Here is the query to get counts:
SELECT
t1.u_id,count() as total_tweets_count ,
(
SELECT count() from t1,t2,t3 c
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Positive'
) as pos_count ,
(
SELECT count() from t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Negative'
) as neg_count ,
(
SELECT count() from t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id AND
t3.sentiment='Neutral'
) as neu_count
FROM t1,t2,t3
WHERE
t1.u_id='18839785' AND
t1.u_id=t2.u_id AND
t2.ts_id=t3.ts_id
GROUP BY t1.u_id;
**OUTPUT:**
u_id | total_tweets_count | pos_count | neg_count | neu_count
-----------------+--------------------+-----------+-----------+-------
18839785| 88 | 38 | 25 | 25
(1 row)
Now I want the same in percentages instead of counts. I have written the query in the following way but failed.
SELECT
total_tweets_count,pos_count,
round((pos_count * 100.0) / total_tweets_count, 2) AS pos_per,neg_count,
round((neg_count * 100.0) / total_tweets_count, 2) AS neg_per,
neu_count, round((neu_count * 100.0) / total_tweets_count, 2) AS neu_per
FROM (
SELECT
count(*) as total_tweets_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Positive'
) AS pos_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Negative'
) AS neg_count,
count(
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id AND
c.sentiment='Neutral') AS neu_count
FROM t1,t2, t3
WHERE
a.u_id='18839785' AND
a.u_id=b.u_id AND
b.ts_id=c.ts_id
GROUP BY a.u_id
) sub;
Can anyone help me out in achieving as percentages for each user data as below?
u_id | total_tweets_count | pos_count | neg_count | neu_count
------------------+--------------------+-----------+-----------+-----
18839785| 88 | 43.18 | 28.4 | 28.4
(1 row)
I am not entirely sure what you are looking for.
For starters, you can simplify your query by using conditional aggregation instead of three scalar subqueries (which btw. do not need to repeat the where condition on a.u_id)
You state you want to "count for all users", so you need to remove the WHERE clause in the main query. The simplification also gets rid of the repeated WHERE condition.
select u_id,
total_tweets_count,
pos_count,
round((pos_count * 100.0) / total_tweets_count, 2) AS pos_per,
neg_count,
round((neg_count * 100.0) / total_tweets_count, 2) AS neg_per,
neu_cont,
round((neu_count * 100.0) / total_tweets_count, 2) AS neu_per
from (
SELECT
t1.u_id,
count(*) as total_tweets_count,
count(case when t3.sentiment='Positive' then 1 end) as pos_count,
count(case when t3.sentiment='Negative' then 1 end) as neg_count,
count(case when t3.sentiment='Neutral' then 1 end) as neu_count
FROM t1
JOIN t2 ON t1.u_id=t2.u_id
JOIN t3 t2.ts_id=t3.ts_id
-- no WHERE condition on the u_id here
GROUP BY t1.u_id
) t
Note that I replaced the outdated, ancient and fragile implicit joins in the WHERE clause with "modern" explicit JOIN operators
With a more up-do-date Postgres version, the expression count(case when t3.sentiment='Positive' then 1 end) as pos_count can also be re-written to:
count(*) filter (where t3.sentiment='Positive') as pos_count
which is a bit more readable (and understandable I think).
In your query you can achieve the repetition of the global WHERE condition on the u_id by using a co-related subquery, e.g.:
(
SELECT count(*)
FROM t1 inner_t1 --<< use different aliases than in the outer query
JOIN t2 inner_t2 ON inner_t2.u_id = inner_t1.u_id
JOIN t3 inner_t3 ON inner_t3.ts_id = inner_t2.ts_id
-- referencing the outer t1 removes the need to repeat the hardcoded ID
WHERE innter_t1.u_id = t1.u_id
) as pos_count
The repetition of the table t1 isn't necessary either, so the above could be re-written to:
(
SELECT count(*)
FROM t2 inner_t2
JOIN t3 inner_t3 ON inner_t3.ts_id = inner_t2.ts_id
WHERE inner_t2.u_id = t1.u_id --<< this references the outer t1 table
) as pos_count
But the version with conditional aggregation will still be a lot faster than using three scalar sub-queries (even if you remove the unnecessary repetition of the t1 table).

Is there a way to reuse subqueries in the same query?

See Update at end of question for solution thanks to marked answer!
I'd like to treat a subquery as if it were an actual table that can be reused in the same query. Here's the setup SQL:
create table mydb.mytable
(
id integer not null,
fieldvalue varchar(100),
ts timestamp(6) not null
)
unique primary index (id, ts)
insert into mydb.mytable(0,'hello',current_timestamp - interval '1' minute);
insert into mydb.mytable(0,'hello',current_timestamp - interval '2' minute);
insert into mydb.mytable(0,'hello there',current_timestamp - interval '3' minute);
insert into mydb.mytable(0,'hello there, sir',current_timestamp - interval '4' minute);
insert into mydb.mytable(0,'hello there, sir',current_timestamp - interval '5' minute);
insert into mydb.mytable(0,'hello there, sir. how are you?',current_timestamp - interval '6' minute);
insert into mydb.mytable(1,'what up',current_timestamp - interval '1' minute);
insert into mydb.mytable(1,'what up',current_timestamp - interval '2' minute);
insert into mydb.mytable(1,'what up, mr man?',current_timestamp - interval '3' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '4' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '5' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '6' minute);
What I want to do is return only rows where FieldValue differs from the previous row. This SQL does just that:
locking row for access
select id, fieldvalue, ts from
(
--locking row for access
select
id, fieldvalue,
min(fieldvalue) over
(
partition by id
order by ts, fieldvalue rows
between 1 preceding and 1 preceding
) fieldvalue2,
ts
from mydb.mytable
) x
where
hashrow(fieldvalue) <> hashrow(fieldvalue2)
order by id, ts desc
It returns:
+----+---------------------------------+----------------------------+
| id | fieldvalue | ts |
+----+---------------------------------+----------------------------+
| 0 | hello | 2015-05-06 10:13:34.160000 |
| 0 | hello there | 2015-05-06 10:12:34.350000 |
| 0 | hello there, sir | 2015-05-06 10:10:34.750000 |
| 0 | hello there, sir. how are you? | 2015-05-06 10:09:34.970000 |
| 1 | what up | 2015-05-06 10:13:35.470000 |
| 1 | what up, mr man? | 2015-05-06 10:12:35.690000 |
| 1 | what up, duder? | 2015-05-06 10:09:36.240000 |
+----+---------------------------------+----------------------------+
The next step is to return only the last row per ID. If I were to use this SQL to write the previous SELECT to a table...
create table mydb.reusetest as (above sql) with data;
...I could then do this do get the last row per ID:
locking row for access
select t1.* from mydb.reusetest t1,
(
select id, max(ts) ts from mydb.reusetest
group by id
) t2
where
t2.id = t1.id and
t2.ts = t1.ts
order by t1.id
It would return this:
+----+------------+----------------------------+
| id | fieldvalue | ts |
+----+------------+----------------------------+
| 0 | hello | 2015-05-06 10:13:34.160000 |
| 1 | what up | 2015-05-06 10:13:35.470000 |
+----+------------+----------------------------+
If I could reuse the subquery in my initial SELECT, I could achieve the same results. I could copy/paste the entire query SQL into another subquery to create a derived table, but this would just mean I'd need to change the SQL in two places if I ever needed to modify it.
Update
Thanks to Kristján, I was able to implement the WITH clause into my SQL like this for perfect results:
locking row for access
with items (id, fieldvalue, ts) as
(
select id, fieldvalue, ts from
(
select
id, fieldvalue,
min(fieldvalue) over
(
partition by id
order by ts, fieldvalue
rows between 1 preceding and 1 preceding
) fieldvalue2,
ts
from mydb.mytable
) x
where
hashrow(fieldvalue) <> hashrow(fieldvalue2)
)
select t1.* from items t1,
(
select id, max(ts) ts from items
group by id
) t2
where
t2.id = t1.id and
t2.ts = t1.ts
order by t1.id
Does WITH help? That lets you define a result set you can use multiple times in the SELECT.
From their example:
WITH orderable_items (product_id, quantity) AS
( SELECT stocked.product_id, stocked.quantity
FROM stocked, product
WHERE stocked.product_id = product.product_id
AND product.on_hand > 5
)
SELECT product_id, quantity
FROM orderable_items
WHERE quantity < 10;

Fetching multiple records while sampling epoch timestamp feild from MySQL table?

I have a table with feilds like
TimeStamp | Feild1 | Feild 2
--------------------------------------
1902909002 | xyddtz | 233447
1902909003 | xytzff | 233442
1902909005 | xytzdd | 233443
1902909007 | xytzdd | 233443
1902909009 | xytsqz | 233436
Now i want to query it and fetch records like between 1902909002 and 1902909007 which will be easily done with :
Select * from table where timestamp > 1902909001 AND timestamp < 1902909008
but two more things i want to do is :
maybe that particular time stamp is not there so i have to find nearest value
like if there are 200 records in that range but i want to fetch only 20 so i want to skip every 19 records in a row and fetch 20th, 40th etc records.
i will be having date in datetime format. i know i can convert it before querying but if there is some option in query itself then it willbe better.
Try this query
select * from(
select #rn:=if(#rn < rid, rid, #rn) as rn, rid, timestamp, feild1, feild2
from
(select #rn:=#rn+1 as rId, tbl.*
from tbl
join
(select #rn:=0) tmp
where timestamp between 1902909002 and 1902909024 order by rid desc)a
join
(select #rn:=0)tmp)tmp
where rid%(rn div 6)=0
SQL FIDDLE
Try this (it is crucial for both queries to be exec in the same mysql session):
SET #c:=0;
SELECT
*
FROM (
SELECT
* ,
#c:=#c+1 as counter
FROM
table
WHERE
timestamp > 1902909001
AND timestamp < 1902909008
) as tmp
WHERE
counter % 20 =1;

Query to perform date arithmetic on same field depending upon separate status field?

I have an Oracle table that contains data similar to the following:
ID | STATUS | TIME
-------------------------------
1 | IN | 2013/26/03 00:00
1 | OUT | 2013/26/03 07:00
1 | IN | 2013/27/03 03:00
2 | IN | 2013/26/03 01:00
2 | OUT | 2013/26/03 06:00
3 | IN | 2013/26/03 01:30
.
.
The STATUS represents check-in and check-out, where the ID represents individuals.
I've come up with a query using sub-queries but it seems inelegant and inefficient. Is it possible to write a single query (meaning no sub-queries) to calculate an elapsed time (IN -> OUT) for each ID?
UPDATE: Also, would it be possible to display the elapsed time the individual is OUT? For example in the data listed above Individual #1 is IN for 7 hours, but OUT for 20 hours (2013/27/03 03:00 - 2013/26/03 07:00). Since this would be calculated across records I'm not sure how this can be written.
try this
select timein.id, 24 * (timeout.time - timein.time) ElapsedTime
from t timein
left outer join t timeout on timein.id = timeout.id
where timein.status = 'IN' and timeout.status = 'OUT'
if your time field is char datatype then you need to do this
select timein.id, 24 * (TO_DATE(timeout.time, 'YYYY-DD-MM hh24:mi')
- TO_DATE(timein.time, 'YYYY-DD-MM hh24:mi')) ElapsedTime
from t timein
left outer join t timeout on timein.id = timeout.id
where timein.status = 'IN' and timeout.status = 'OUT'
try this for days with time
select timein.id, NUMTODSINTERVAL((timeout.time - timein.time),'day') ElapsedTime
from t timein
left outer join t timeout on timein.id = timeout.id
where timein.status = 'IN' and timeout.status = 'OUT'
For In and Out time you can use this and modify according to your data
with cte as
(
select t.id, status,
24 * (t.time - LAG(t.time)
OVER (partition by id ORDER BY t.time)) AS diff
from t
)
select t1.id, t1.diff timeIn, t2.diff timeOut
from cte t1
LEFT OUTER JOIN
cte t2 on t1.id = t2.id and t2.status = 'IN' and t2.diff is not null
where t1.status = 'OUT'

Resources