I'm using multiple CASE WHEN to find device actions in selected days, but instead of getting only the abreviation names (like V or C), sometimes i get the full action name. If i try to replace the 'ELSE action' with ELSE '', i get some blanks, even though there aren't any blank actions... How can i improve my query?
SELECT device,
CASE
WHEN action='Vaccum' AND strftime('%d', timestamp_action) = '25' THEN 'V'
WHEN action='Cooling' AND strftime('%d', timestamp_action) = '25' THEN 'C' ELSE action END AS '25',
CASE
WHEN action='Vaccum' AND strftime('%d', timestamp_action) = '26' THEN 'V'
WHEN action='Cooling' AND strftime('%d', timestamp_action) = '26' THEN 'C' ELSE action END AS '26',
FROM diary WHERE strftime('%m', timestamp_action = '08')
GROUP BY device
ORDER BY device
I want to get the latest action on selected days of all devices. I have around 100 devices and i need the actions for the entire month.
Example table:
timestamp_action | device | action
------------------------+---------------+-----------
2022-08-25 11:08 | 1 | Cooling
2022-08-25 11:09 | 1 | Vaccum
2022-08-25 11:08 | 2 | Cooling
2022-08-26 11:10 | 2 | Vaccum
2022-08-26 11:11 | 2 | Cooling
2022-08-26 12:30 | 1 | Vaccum
So the result i'm looking for is:
device | 25 | 26 .....
-----------+-----------+--------------
1 | V | V
2 | C | C
Use 2 levels of aggregation:
WITH cte AS (
SELECT device,
strftime('%d', timestamp_action) day,
CASE action WHEN 'Vaccum' THEN 'V' WHEN 'Cooling' THEN 'C' ELSE action END action,
MAX(timestamp_action) max_timestamp_action
FROM diary
WHERE strftime('%Y-%m', timestamp_action) = '2022-08'
GROUP BY device, day
)
SELECT device,
MAX(CASE WHEN day = '25' THEN action END) `25`,
MAX(CASE WHEN day = '26' THEN action END) `26`
FROM cte
GROUP BY device;
See the demo.
Related
I am a beginner in BigQuery with Firebase and I try to get the number of users and sessions group by app versions (device.operating_system_version).
Until now, I have this code working (but without group by)
SELECT
COUNT(DISTINCT user_pseudo_id) AS all_users,
(
SELECT
COUNT(user_pseudo_id)
FROM
`xxxx.analytics_xxxx.events_20191105`
WHERE
event_name = 'session_start'
) AS session,
(
SELECT
COUNT(event_name) as totalScreen
FROM
`xxxx.analytics_xxxx.events_20191105`
WHERE
event_name = 'newScreen'
) AS screenView
FROM `xxxx.analytics_xxxx.events_20191105`
The result is fine :
Row | all_users | session | screenView
1 | 80 | 150 | 550
But when I try to group by app version, the number of users is correct and different for each app version but the number of sessions and screenViews are always the same for all app versions. The code is the following :
SELECT
device.operating_system_version AS os_Version,
COUNT(DISTINCT user_pseudo_id) AS all_users,
(
SELECT
COUNT(user_pseudo_id)
FROM
`xxxx.analytics_xxxx.events_20191105`
WHERE
event_name = 'session_start'
) AS session,
(
SELECT
COUNT(event_name) as totalScreen
FROM
`xxxx.analytics_xxxx.events_20191105`
WHERE
event_name = 'newScreen'
) AS screenView
FROM `xxxx.analytics_xxxx.events_20191105`
GROUP BY os_Version
The result is :
Row | os_Version | all_users | session | screenView
1 | 9 | 14 | 150 | 550
2 | 6.0.1 | 4 | 150 | 550
3 | 8.0.0 | 9 | 150 | 550
4 | 7.0 | 3 | 150 | 550
...
I don't understand this behaviour. It is like if the "Group BY" was only applied on the "user" and not on the subqueries.
Thank you
It is like if the "Group BY" was only applied on the "user" and not on the subqueries. That is exactly what is happening, the subqueries are returning the same results for each os_Version 'group'.
I would rewrite the query like this:
select
device.operating_system_version as os_Version,
count(distinct user_pseudo_id) as all_users,
count(case when event_name = 'session_start' then 1 else null end) as session,
count(case when event_name = 'newScreen' then 1 else null end) as screenView
from `xxxx.analytics_xxxx.events_20191105`
group by 1
I have a case statement to sum, round and label amounts that works fine, but the data ends up in horizontal format (multiple money amounts per record), causing me to do an unpivot in a subsequent statement to format the data vertically (one money amount per record). I would like to accomplish this in one statement if possible. My code is as follows:
SELECT
Field,
ROUND(SUM(CASE
WHEN TYPE = 'Paid Loss'
THEN AMOUNT
ELSE 0
END
), 2
) PAID,
ROUND(SUM(CASE
WHEN TYPE = 'OS'
THEN AMOUNT
ELSE 0
END
), 2
) OS,
ROUND(SUM(CASE
WHEN TYPE <> 'Paid Exp'
THEN AMOUNT
ELSE 0
END
), 2
) INCURRED
FROM dbo.mydatabase
GROUP BY Field;
The result is:
Field |PAID |OS |INCURRED
----------------------------
result1 | 1 | 20 | 10
result2 | 5 | 30 | 15
When what I really want is:
Field | DATA_TYPE | AMOUNT
---------------------------
result1 | PAID | 1
result2 | PAID | 5
result3 | OS | 20
result4 | OS | 30
result5 | INCURRED | 10
result6 | INCURRED | 15
Keys will be unique so that isn't an issue. Anyone know how to rearrange the CASE so this can be done in one statement? Thanks!
Wouldn't this work?
SELECT Field, 'PAID' DATA_TYPE, ROUND(SUM(AMOUNT), 2) AMOUNT
FROM dbo.mydatabase
WHERE TYPE = 'Paid Loss'
GROUP BY Field
UNION ALL
SELECT Field, 'OS' DATA_TYPE, ROUND(SUM(AMOUNT), 2) AMOUNT
FROM dbo.mydatabase
WHERE TYPE = 'OS'
GROUP BY Field
UNION ALL
SELECT Field, 'INCURRED' DATA_TYPE, ROUND(SUM(AMOUNT), 2) AMOUNT
FROM dbo.mydatabase
WHERE TYPE <> 'Paid Exp'
GROUP BY Field
or even like this:
SELECT Field
, (CASE
WHEN TYPE IN ('Paid Loss', 'OS') THEN TYPE
WHEN TYPE <> 'Paid Exp' THEN 'INCURRED'
END) DATA_TYPE
, ROUND(SUM(AMOUNT), 2) AMOUNT
FROM dbo.mydatabase
GROUP BY Field
, (CASE
WHEN TYPE IN ('Paid Loss', 'OS') THEN TYPE
WHEN TYPE <> 'Paid Exp' THEN 'INCURRED'
END) DATA_TYPE
I have data in the following format
id | first_name | last_name | birth_date
abc | Jared | Pollard | 1970-01-01
def | Jared | Pollard | 1972-02-02
ghi | Jared | Pollard | 1980-01-01
klm | Jared | Pollard | 2015-01-01
and I would like a query which groups data based on the following rule
If first_name, last_name are equal and birth_dates are within 5 years of each other, than records belong to same group
So the above data contains three groups group1=(abc, def), group2=(ghi) and group3=(klm)
Currently I have the following query which incorrectly creates only 2 groups, group1=(abc, def) and group2=(ghi, klm)
SELECT
g.id,
FIRST_VALUE(g.id) OVER (PARTITION BY lower(trim(g.last_name)), lower(trim(g.first_name)),
CASE WHEN g.birth_date between g.fv_birth_date - interval '5 year' AND g.fv_birth_date + interval '5 year' THEN 1 ELSE 0 END
ORDER BY g.last_used_dt DESC NULLS LAST) AS cluster_id
FROM (
SELECT id, last_used_dt, last_name, first_name, birth_date,
FIRST_VALUE(birth_date)
OVER (PARTITION BY
lower(trim(last_name)),
lower(trim(first_name))
ORDER BY last_used_dt DESC NULLS LAST) AS fv_birth_date
FROM guest
) g;
I understand this is because of the CASE statement within the PARTITION BY clause but am unable to come up with any other query
This may be a kind of the Knapsack problem.
I need to traverse a data table, group it by a column, choosing ones with better time.
Then repeat the previous step until a limit given by column CAPACITY is not reached.
This is the demo scenario:
create table if not exists data( vid num, size num, epid num, sid num, capacity num, dt );
delete from data;
insert into data(vid,size,epid,sid,capacity,dt)
values
(0,20,1,1,50,1100), -- 2nd choice
(0,20,1,1,50,1000), -- 1st choice
(0,20,1,1,50,1200), -- last choice excluded because out of capacity
(1,20,2,2,50,1100), -- 2nd choice
(1,20,2,2,50,1000), -- 1st choice
(1,20,2,2,50,1200); -- last choice excluded because out of capacity
This is the non recursive solution:
with best0 as (
select a.rowid as tid,a.vid,a.sid,a.size,a.dt,a.capacity-a.size as remains,0 as level
from data a
group by a.sid
having min(a.dt)
),
best1 as (
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.capacity,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best0 b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best0)
and b.remains-a.size>0
) a group by a.sid having min(a.dt)
),
best2 as (
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.capacity,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best1 b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best0 union all select tid from best1)
and b.remains-a.size>0
) a group by a.sid having min(a.dt)
)
select * from best0
union all
select * from best1
union all
select * from best2
And this the result:
tid | vid | sid | size | Dtime | capacity | group_level
--- | --- | --- | ---- | ----- | -------- | -----------
2 | 0 | 1 | 20 | 1000 | 30 | 0
5 | 1 | 2 | 20 | 1000 | 30 | 0
1 | 0 | 1 | 20 | 1100 | 10 | 1
4 | 1 | 2 | 20 | 1100 | 10 | 1
This is the recursive version that give error: "recursive reference in a subquery: best"
with recursive best(tid,vid,sid,size,dt,remains,level)
as (
select a.rowid as tid,a.vid,a.sid,a.size,a.dt,a.capacity-a.size as remains,0 as level
from data a
group by a.sid
having min(a.dt)
union all
select a.tid,a.vid,a.sid,a.size,a.dt,a.remains, a.level
from (
select
a.rowid as tid,a.sid,a.vid,a.size,a.dt,b.remains-a.size as remains,
b.level+1 as level
from data a
join best b on b.sid=a.sid -- and b.level=a.level-1
where not a.rowid in (select tid from best) and b.remains-a.size>0
) a group by a.sid having min(a.dt)
)
select * from best
I tried differents solutions even using a loop counter but everyone give the same error.
See Update at end of question for solution thanks to marked answer!
I'd like to treat a subquery as if it were an actual table that can be reused in the same query. Here's the setup SQL:
create table mydb.mytable
(
id integer not null,
fieldvalue varchar(100),
ts timestamp(6) not null
)
unique primary index (id, ts)
insert into mydb.mytable(0,'hello',current_timestamp - interval '1' minute);
insert into mydb.mytable(0,'hello',current_timestamp - interval '2' minute);
insert into mydb.mytable(0,'hello there',current_timestamp - interval '3' minute);
insert into mydb.mytable(0,'hello there, sir',current_timestamp - interval '4' minute);
insert into mydb.mytable(0,'hello there, sir',current_timestamp - interval '5' minute);
insert into mydb.mytable(0,'hello there, sir. how are you?',current_timestamp - interval '6' minute);
insert into mydb.mytable(1,'what up',current_timestamp - interval '1' minute);
insert into mydb.mytable(1,'what up',current_timestamp - interval '2' minute);
insert into mydb.mytable(1,'what up, mr man?',current_timestamp - interval '3' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '4' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '5' minute);
insert into mydb.mytable(1,'what up, duder?',current_timestamp - interval '6' minute);
What I want to do is return only rows where FieldValue differs from the previous row. This SQL does just that:
locking row for access
select id, fieldvalue, ts from
(
--locking row for access
select
id, fieldvalue,
min(fieldvalue) over
(
partition by id
order by ts, fieldvalue rows
between 1 preceding and 1 preceding
) fieldvalue2,
ts
from mydb.mytable
) x
where
hashrow(fieldvalue) <> hashrow(fieldvalue2)
order by id, ts desc
It returns:
+----+---------------------------------+----------------------------+
| id | fieldvalue | ts |
+----+---------------------------------+----------------------------+
| 0 | hello | 2015-05-06 10:13:34.160000 |
| 0 | hello there | 2015-05-06 10:12:34.350000 |
| 0 | hello there, sir | 2015-05-06 10:10:34.750000 |
| 0 | hello there, sir. how are you? | 2015-05-06 10:09:34.970000 |
| 1 | what up | 2015-05-06 10:13:35.470000 |
| 1 | what up, mr man? | 2015-05-06 10:12:35.690000 |
| 1 | what up, duder? | 2015-05-06 10:09:36.240000 |
+----+---------------------------------+----------------------------+
The next step is to return only the last row per ID. If I were to use this SQL to write the previous SELECT to a table...
create table mydb.reusetest as (above sql) with data;
...I could then do this do get the last row per ID:
locking row for access
select t1.* from mydb.reusetest t1,
(
select id, max(ts) ts from mydb.reusetest
group by id
) t2
where
t2.id = t1.id and
t2.ts = t1.ts
order by t1.id
It would return this:
+----+------------+----------------------------+
| id | fieldvalue | ts |
+----+------------+----------------------------+
| 0 | hello | 2015-05-06 10:13:34.160000 |
| 1 | what up | 2015-05-06 10:13:35.470000 |
+----+------------+----------------------------+
If I could reuse the subquery in my initial SELECT, I could achieve the same results. I could copy/paste the entire query SQL into another subquery to create a derived table, but this would just mean I'd need to change the SQL in two places if I ever needed to modify it.
Update
Thanks to Kristján, I was able to implement the WITH clause into my SQL like this for perfect results:
locking row for access
with items (id, fieldvalue, ts) as
(
select id, fieldvalue, ts from
(
select
id, fieldvalue,
min(fieldvalue) over
(
partition by id
order by ts, fieldvalue
rows between 1 preceding and 1 preceding
) fieldvalue2,
ts
from mydb.mytable
) x
where
hashrow(fieldvalue) <> hashrow(fieldvalue2)
)
select t1.* from items t1,
(
select id, max(ts) ts from items
group by id
) t2
where
t2.id = t1.id and
t2.ts = t1.ts
order by t1.id
Does WITH help? That lets you define a result set you can use multiple times in the SELECT.
From their example:
WITH orderable_items (product_id, quantity) AS
( SELECT stocked.product_id, stocked.quantity
FROM stocked, product
WHERE stocked.product_id = product.product_id
AND product.on_hand > 5
)
SELECT product_id, quantity
FROM orderable_items
WHERE quantity < 10;