I'm having a problem with my SQLite pivot code, mainly taken from McPeppr's answer here: Pivot in SQLite
Creating my temp table:
WITH t1 AS (
SELECT band,
p.name,
status,
strftime('%Y-%m', time_start) AS Month,
AVG(time) AS Avg
FROM person p
JOIN action a ON p.person_id = a.person_id
JOIN log l ON p.log_id = l.log_id
WHERE p.person = 'Joe' AND opps = '2'
GROUP BY band, Month, status, strftime('%Y-%m', time_stamp_start)
ORDER BY Month, CASE status
WHEN 'one' THEN 0
WHEN 'two' THEN 1
WHEN 'three' THEN 2
WHEN 'four' THEN 3
END
),
t1 looks like:
band | name | status | month | AVG
------+--------+--------+-----------+---------------
1 | Joe | one | 2018-01 | 3.33
2 | Joe | one | 2018-01 | 4.11
1 | Joe | two | 2018-02 | 2.55
2 | Joe | two | 2018-02 | 3.45
..........
When I try pivot in a select I get:
Select band, Month,
case when status = 'one' then response_avg end as One,
case when status = 'two' then response_avg end as Two,
...,
from t1
This:
band | month | One | Two
------+------------+-------+---------
1 | 2018-01 | 3.41 | NULL
2 | 2018-01 | 3.55 | NULL
1 | 2018-01 | NULL | 2.55
2 | 2018-01 | NULL | 4.61
1 | 2018-02 | 1.55 | NULL
2 | 2018-02 | 2.43 | NULL
1 | 2018-02 | NULL | 4.33
2 | 2018-02 | NULL | 3.44
Whereas I want
band | month | One | Two
------+------------+-------+---------
1 | 2018-01 | 3.41 | 2.55
2 | 2018-01 | 3.55 | 4.61
1 | 2018-02 | 1.55 | 2.55
2 | 2018-02 | 2.43 | 4.61
I understand that the status column is causing this but can't figure out how to fix it.
I've tried a good few methods (multiple temp tables, sub-selects to remove the "status" due to default grouping) from different questions I found on here but keep ending up with the same result. Any help appreciated
The trick when you are using CASE/WHEN is to use aggregative functions like MAX and then group by all the non-aggragate columns :
SELECT
band,
Month,
MAX(CASE
when status = 'one' then response_avg
END) as One,
MAX(CASE
when status = 'two' then response_avg
END) as Two
FROM t1
GROUP BY band,
Month
Related
I have two tables that are something as follows:
WORKDAYS
DATE | WORKDAY_LENGHT |
-----------+----------------+
12-05-2018 | 8 |
13-05-2018 | 6.5 |
14-05-2018 | 7.5 |
15-05-2018 | 8 |
ACCIDENTS
TOD | SEVERITY |
-----------------+-----------+
12-05-2018 12:00 | minor |
12-05-2018 15:00 | minor |
13-05-2018 08:00 | severe |
13-05-2018 12:00 | severe |
14-05-2018 10:30 | severe |
And I need a result that is as follows:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 2 |
13-05-2018 | 6.5 | 2 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
What I so far have tried is this:
SELECT DISTINCT
w.date,
(
SELECT
COUNT(*)
FROM
accidents a
WHERE
date(w.date) = date(a.tod)
)
AS accidents_count
FROM
workdays w
Which gives me an answer that is somewhat in the right direction. Something like this:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 1 |
12-05-2018 | 8 | 1 |
13-05-2018 | 6.5 | 1 |
13-05-2018 | 6.5 | 1 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
This is sqlite, so the date values are stored as strings. The date function therefore should make them just dates, right? Or is that the one causing problems?
I was missing a group by and feel ashamed for opening a question before figuring this out.
adding GROUP BY date(w.date) is the solution here.
I'm working with SQL Server and I have this 3 tables
STUDENTS
| id | student |
-------------
| 1 | Ronald |
| 2 | Jenny |
SCORES
| id | score | period | student |
| 1 | 8 | 1 | 1 |
| 2 | 9 | 2 | 1 |
PERIODS
| id | period |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
And I want a query that returns this result:
| student | score1 | score2 | score3 | score4 |
| Ronald | 8 | 9 | null | null |
| Jenny | null | null | null | null |
As you can see, the number of scores depends of the periods because sometimes it can be 4 o 3 periods.
I don't know if I have the wrong idea or should I make this in the application, but I want some help.
You need to PIVOT your data e.g.
select Y.Student, [1], [2], [3], [4]
from (
select T.Student, P.[Period], S.Score
from Students T
cross join [Periods] P
left join Scores S on S.[Period] = P.id and S.Student = T.id
) X
pivot
(
sum(Score)
for [Period] in ([1],[2],[3],[4])
) Y
Reference: https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-20
I have the following table in an sqlite database
+----+-------------+-------+
| ID | Week Number | Count |
+----+-------------+-------+
| 1 | 1 | 31 |
| 2 | 2 | 16 |
| 3 | 3 | 73 |
| 4 | 4 | 59 |
| 5 | 5 | 44 |
| 6 | 6 | 73 |
+----+-------------+-------+
I want to get the following table out. Where I get this weeks sales as one column and then the next column will be last weeks sales.
+-------------+-----------+-----------+
| Week Number | This_Week | Last_Week |
+-------------+-----------+-----------+
| 1 | 31 | null |
| 2 | 16 | 31 |
| 3 | 73 | 16 |
| 4 | 59 | 73 |
| 5 | 44 | 59 |
| 6 | 73 | 44 |
+-------------+-----------+-----------+
This is the select statement i was going to use:
select
id, week_number, count,
(select count from tempTable
where week_number = (week_number-1))
from
tempTable;
You are comparing values in two different rows. When you are just writing week_number, the database does not know which one you mean.
To refer to a column in a specific table, you have to prefix it with the table name: tempTable.week_number.
And if both tables have the same name, you have to rename at least one of them:
SELECT id,
week_number,
count AS This_Week,
(SELECT count
FROM tempTable AS T2
WHERE T2.week_number = tempTable.week_number - 1
) AS Last_Week
FROM tempTable;
In case of you want to take a query upon a same table twice, you have to put aliases on the original one and its replicated one to differentiate them
select a.week_number,a.count this_week,
(select b.count from tempTable b
where b.week_number=(a.week_number-1)) last_week
from tempTable a;
I have a dataset like:
+----+-------+---------+----------+--+
| id | time | event | timediff | |
+----+-------+---------+----------+--+
| 1 | 15.00 | install | - | |
| 1 | 15.30 | sale | 00.30 | |
| 1 | 16.00 | sale | 00.30 | |
| 2 | 15.00 | sale | - | |
| 2 | 15.30 | sale | 0.30 | |
| 3 | 16.00 | install | - | |
| 4 | 15.00 | install | - | |
| 5 | 13.00 | install | - | |
| 5 | 14.00 | sale | 01.00 | |
+----+-------+---------+----------+--+
I want to clean this dataset:
I want to exclude the ids for which the first (and the next n..) events are sales but not installs.
I want to exclude the ids for which there is an install but no sales (those ids are indeed the unique ones)
Obtaining finally a result like:
+----+-------+---------+----------+
| id | time | event | timediff |
+----+-------+---------+----------+
| 1 | 15.00 | install | - |
| 1 | 15.30 | sale | 0.30 |
| 1 | 16.00 | sale | 0.30 |
| 5 | 13.00 | install | - |
| 5 | 14.00 | sale | 01.00 |
+----+-------+---------+----------+
How can I do that in R? is there any specific package for data manipulation or I can just use if formulas? Should I use tapply?
Based on the example, we can group by 'id' and filter the 'event' column that has first element as 'install' and 2nd as 'sale' to get the expected output.
df1 %>%
group_by(id) %>%
filter(first(event)=='install' & event[2L]=='sale')
id time event timediff
# (int) (dbl) (chr) (dbl)
#1 1 15.0 install NA
#2 1 15.3 sale 0.3
#3 1 16.0 sale 0.3
#4 5 13.0 install NA
#5 5 14.0 sale 1.0
Or if all the elements except first one should be 'sale', we create a logical variable ('ind') by comparing the first element as 'install' and the successive elements as 'sale' (using lead), then filter groups where all the 'ind' are TRUE. If needed, we can remove the 'ind' column using select.
df1 %>%
group_by(id) %>%
mutate(ind= first(event)=='install' & lead(event, default='sale')=='sale') %>%
filter(all(ind)) %>%
ungroup() %>%
select(-ind)
Or we can use data.table., grouped by 'id', if the number of rows is greater than 1 (.N >1), first element is 'install' (event[1L]=='install') and all the rest of the elements are 'sale', then we get the Subset of Data.table (.SD).
library(data.table)
setDT(df1)[, if(.N > 1 & event[1L]=='install' & all(event[2:.N]=='sale')) .SD, by = id]
# id time event timediff
#1: 1 15.0 install NA
#2: 1 15.3 sale 0.3
#3: 1 16.0 sale 0.3
#4: 5 13.0 install NA
#5: 5 14.0 sale 1.0
I've this two tables, members and water_meter
members
id | name
=========
1 | Dani
2 | Dina
3 | Roni
water_meter
id | member_id | date | start | finish | paid | paid_at
===+============+===========+=======+===========+=======+=====================+
1 | 1 |2014-07-01 | 12.3 | 38.7 | 1 | 2014-12-29 18:28:30
2 | 2 |2014-07-01 | 57.2 | 64.3 | 0 | null
3 | 3 |2014-07-01 | 14.6 | 52.3 | 0 | null
This member need to pay their water usage every month. What I want is, the 'start' value of each month is the 'finish' value from previous months. This is my query to check water usage at August,
SELECT m.id, m.name,
ifnull(t.start, (SELECT ifnull(finish, 0) FROM members m2
LEFT JOIN water_meter t2 ON m2.id = t2.member_id AND t2.date = '2014-07-01') ) as start,
t.finish, paid
FROM members m
LEFT JOIN water_meter t ON m.id = t.member_id AND t.date = '2014-08-01'
Result :
id | name | start | finish |
===+========+========+=========+
1 | Dani | 38.7 | null |
2 | Dina | 38.7 | null |
3 | Roni | 38.7 | null |
As you can see, the "start" value is not right. What is the right query for this case?
What I want is like this
id | name | start | finish |
===+========+========+=========+
1 | Dani | 38.7 | null |
2 | Dina | 64.3 | null |
3 | Roni | 52.3 | null |
Check : http://sqlfiddle.com/#!7/29a4c/2
You haven't assigned correct where condition in inner query.
SELECT m.id, m.name,
ifnull(t.start,
(SELECT ifnull(finish, 0) FROM members m2
LEFT JOIN water_meter t2
ON m2.id = t2.member_id AND t2.date = '2014-07-01'
where m2.id = m.id)) as start,
t.finish, paid
FROM members m
LEFT JOIN water_meter t ON m.id = t.member_id AND t.date = '2014-08-01'
WHERE m.active = 1
I don't like query itself, but that produces the output you wanted.
A little better (no subqueries, which may be slow on large dataset) solution:
select
members.id,
name,
coalesce(wm_cur.start, wm_prev.finish),
wm_cur.finish
from members
left join water_meter wm_cur
on members.id = wm_cur.member_id
and wm_cur.date between '2014-08-01' and date('2014-08-01','start of month','+1 month','-1 day')
left join water_meter wm_prev
on members.id = wm_prev.member_id
and wm_prev.date between '2014-07-01' and date('2014-07-01','start of month','+1 month','-1 day')
where members.active = 1
You can replace coalesce with ifnull if you wish. It also handles entire month and not only first day, which may or may not be what you want it to be.