I have a table as below
id color date
1 red 01/01
1 red 01/02
1 yellow 01/03
1 red 02/01
2 red 01/01
2 blue 01/02
2 blue 02/02
3 red 01/01
4 red 02/01
The ideal output should be:
id pattern
1 (red, yellow) to (red)
2 (red, blue) to (blue)
3 (red)
4 (red)
The result displays two things:
The unique pattern within one month
Aggregate the unique pattern together within time range
Ex, we can see that id_1 has a pattern changes on Jan from (red,yellow) but on Feb, it only has one pattern (red). Therefore the final output should be (red,yellow) to (red)
The query that I have now is
select drv.id, extract(month from date) as month,
trim(trailing',' from (XMLAGG(TRIM(color)',' order by date)(varchar(10000)))) as pattern from
(select id, color,
lag(color)over(partition by id, extract(month from date) order by date) as prev_color
from table
qualify prev_color <> color
) as drv
group by id, extract(month from date)
The query is incomplete, since it didn't capture the movement from month to month, but is it possible that we may return it by month?
Is there any way that we can mix using XMLAGG and partition by? or can anyone give any ideas?
By Fred's method 1 which mentioned in the comment, the answer can be achieved by following query:
select drv1.id,
trim(trailing',' from (XMLAGG(TRIM(color)'-' order by date)(varchar(10000)))) as pattern_mthly from
(
select drv.id, extract(month from date) as month,
trim(trailing',' from (XMLAGG(TRIM(color)',' order by date)(varchar(10000)))) as pattern from
(select id, color,
lag(color)over(partition by id, extract(month from date) order by date) as prev_color
from table
qualify prev_color <> color
) as drv
group by id, extract(month from date)
) as drv1
group by id
Thanks
Related
I have a following requirement: I have a table in following format.
and this is what I want it to be transformed into:
Basically I want number of users with various combination of activities
I want to have this format as I want to create a TreeMap visualization out of it.
This is what I have done till now.
First find out number of users with activity groupings
WITH lookup AS
(
SELECT listagg(name,',') AS groupings,
processed_date,
guid
FROM warehouse.test
GROUP BY processed_date,
guid
)
SELECT groupings AS activity_groupings,
LENGTH(groupings) -LENGTH(REPLACE(groupings,',','')) + 1 AS count,
processed_date,
COUNT( guid) AS users
FROM lookup
GROUP BY processed_date,
groupings
I put the results in a separate table
Then, I do a Split and coalesce like this:
SELECT NULLIF(SPLIT_PART(groupings,',', 1),'') AS grouping_1,
COALESCE(NULLIF(SPLIT_PART(groupings,',', 2),''), grouping_1) AS grouping_2,
COALESCE(NULLIF(SPLIT_PART(groupings,',', 3),''), grouping_2, grouping_1) AS grouping_3,
num_users
FROM warehouse.groupings) AS expr_qry
GROUP BY grouping_1,
grouping_2,
grouping_3
The problem is the first query takes more than 90 minutes to execute as I have more than 250M rows.
There must be a better and efficient way to di this.
Any heads up would be greatly appreciated.
Thanks
You do not need to use complex string manipulation functions (LISTAGG(), SPLIT_PART()). You can achieve what you're after with the ROW_NUMBER() function and simple aggregates.
-- Create sample data
CREATE TEMP TABLE test_data (id, guid, name)
AS SELECT 1::INT, 1::INT, 'cooking'
UNION ALL SELECT 2::INT, 1::INT, 'cleaning'
UNION ALL SELECT 3::INT, 2::INT, 'washing'
UNION ALL SELECT 4::INT, 4::INT, 'cooking'
UNION ALL SELECT 6::INT, 5::INT, 'cooking'
UNION ALL SELECT 7::INT, 3::INT, 'cooking'
UNION ALL SELECT 8::INT, 3::INT, 'cleaning'
;
-- Assign a row number to each name per guid
WITH name_order AS (
SELECT guid
, name
, ROW_NUMBER() OVER(PARTITION BY guid ORDER BY id) row_n
FROM test_data
) -- Use MAX() to collapse each guid's data to 1 row
, groupings AS (
SELECT guid
, MAX(CASE WHEN row_n = 1 THEN name END) grouping_1
, MAX(CASE WHEN row_n = 2 THEN name END) grouping_2
FROM name_order
GROUP BY guid
) -- Count the guids per each grouping
SELECT grouping_1
, COALESCE(grouping_2, grouping_1) AS grouping_2
, COUNT(guid) num_users
FROM groupings
GROUP BY 1,2
;
-- Output
grouping_1 | grouping_2 | num_users
------------+------------+-----------
washing | washing | 1
cooking | cleaning | 2
cooking | cooking | 2
I'm trying to select a distinct HourlyRate from a table, and then group the resulting HourlyRate by a FECode (basically a person). One person may have 2 or 3 rates over time, but the results that are returning involve the same HourlyRate being repeated for the same FECode.
SELECT DISTINCT Cost/Hours As HourlyRate, Date, FECode
FROM Table1
WHERE HourlyRate != ''
GROUP BY HourlyRate, FECode
ORDER BY FECode
The result looks like as follows:
HourlyRate, Date, FECode
215.00, 2017-04-06, AAA
215.00, 2017-04-27, AAA
225.00, 2017-06-16, AAA
The data from Table1 is as follows:-
Date, FECode, Cost, Hours
2017-04-06, AAA, 236.5, 1.1
2017-04-27, AAA, 43, 0.2
2017-06-16, AAA, 247.5, 1.1
Clearly, in this example, the second result of 215.00 should not be returning, but it is. How do I stop this from happening?
The result is ok because DISTINCT remove the line which match on "full set of columns". The Cost/Hours is number which is divide and the result looks like round number (but the number is not the same), therefore it did not match as the same number. try use this, and do not forget the remove date column:
SELECT cast(Cost/Hours as text) As HourlyRate, FECode
FROM Table1
WHERE HourlyRate != ''
ORDER BY FECode
These two values are not equal:
SELECT 236.5/1.1 = 43/0.2;
0
There actually is a difference:
SELECT 236.5/1.1 - 43/0.2;
-2.8421709430404e-14
See Is floating point math broken?
You have to round the result.
(And using the column Date with this GROUP BY does not make sense.)
The following query returns the expected result:-
SELECT ROUND(Cost/Hours, 2) As HourlyRate, Date, FECode FROM Table1 WHERE HourlyRate!= '' GROUP BY FECode, HourlyRate ORDER BY FECode ASC
I have data like this:
I am trying to transform it to this (using SQLite). In the desired result, within each id, each start should be on the same row as the chronologically closest end. If an id has a start but no end (like id=4), then the corresponding end, will be empty (as shown below).
I have tried this
select
id,
max( case when start_end = "start" then date end) as start,
max(case when start_end = "end" then date end ) as end
from df
group by id
But the result is this, which is wrong because id=5 only have one row, when it should have two:
id start end
1 2 1994-05-01 1996-11-04
2 4 1979-07-18 <NA>
3 5 2010-10-01 2012-10-06
Any help is much appreciated
CREATE TABLE mytable(
id INTEGER NOT NULL PRIMARY KEY
,start_end VARCHAR(5) NOT NULL
,date DATE NOT NULL
);
INSERT INTO mytable(id,start_end,date) VALUES (2,'start','1994-05-01');
INSERT INTO mytable(id,start_end,date) VALUES (2,'end','1996-11-04');
INSERT INTO mytable(id,start_end,date) VALUES (4,'start','1979-07-18');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2005-02-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2009-09-17');
INSERT INTO mytable(id,start_end,date) VALUES (5,'start','2010-10-01');
INSERT INTO mytable(id,start_end,date) VALUES (5,'end','2012-10-06');
select
s.id as id,
s.date as 'start',
min(e.date) as 'end' -- earliest end date from "same id&start"
from
-- only start dates
(select id, date
from intable
where start_end='start'
) as s
left join -- keep the start-only lines
-- only end dates
(select id, date
from intable
where start_end='end'
) as e
on s.id = e.id
and s.date < e.date -- not too early
group by s.id, s.date -- "same id&start"
order by s.id, s.date; -- ensure sequence
Left join (to keep the start-only line for id "4") two on-the-fly tables, start dates and end dates.
Take the minimal end date which is just higher than start date (same id, using min()and group by.
Order by id, then start date.
I tested this on a test table which is similar to your dump, but has no "NOT NULL" and no "PRIMARY KEY". I guess for this test table that is irrelevant; otherwise explain the effect, please.
Note:
Internally three pairs of dates for id 5 (those that match end>start) are found, but only those are forwarded with the lowest end (min(end)) for each of the two different combinations of ID and start group by ID, start. The line where end>start but end not being the minimum is therefor not returned. That makes two lines with start/end pairs as desired.
Output (with .headers on):
id|start|end
2|1994-05-01|1996-11-04
4|1979-07-18|
5|2005-02-01|2009-09-17
5|2010-10-01|2012-10-06
UPDATE: Incorporate helpful comments by #MatBailie.
Thank you! This is exactly what I needed to do, only with a few changes:
SELECT
s.value AS 'url',
"AVGDATE" AS 'fieldname',
sum(e.value)/count(*) AS 'value'
FROM
(SELECT url, value
FROM quicktag
WHERE fieldname='NAME'
) AS s
LEFT JOIN
(SELECT url, substr(value,1,4) AS value
FROM quicktag
WHERE fieldname='DATE'
) AS e
ON s.url = e.url
WHERE e.value != ""
GROUP BY s.value;
I had a table like this:
url fieldname value
---------- ---------- ----------
1000052801 NAME Thomas
1000052801 DATE 2007
1000131579 NAME Morten
1000131579 DATE 2005
1000131929 NAME Tanja
1000131929 DATE 2014
1000158449 NAME Knud
1000158449 DATE 2007
1000158450 NAME Thomas
1000158450 DATE 2003
I needed to correlate NAME and DATE in columns based on url as a key, and generate a field with average DATE grouped by multiple NAME fields.
So my result looks like this:
url fieldname value
---------- ---------- ----------
Thomas AVGDATE 2005
Morten AVGDATE 2005
Tanja AVGDATE 2014
Knud AVGDATE 2007
Unfortunately I not have enough posts to make my vote count yet.
Oracle 10g
My requirements are to:
Select each department
Select each individual item per department (each item get's its own row, but combine if duplicates)
Select each color per distinct department AND item (if duplicate, select the lowest number)
Select each user per distinct department AND item (aggregate if multiple)
DB data
Department Item_List Color User
Research Item 1 1. Blue John
Research Item 1;Item 2 2. Blue Mike
Research Item 1;Item 2; Item 3 1. Red Steve
Research Item 2 1. Purple John
Research Item 1;Item 4 2. Red Bill
Ops Item 1;Item 2 3. Silver John
Ops Item 1;Item 3 3. Silver Mike
Ops Item 4 4. Yellow Mark
Expected Results
Department Item_List Color User
Research Item 1 1. Blue John, Mike
Research Item 2 1. Blue Mike
Research Item 1 1. Red Steve, Bill
Research Item 2 1. Red Steve
Research Item 3 1. Red Steve
Research Item 2 1. Purple John
Research Item 4 1. Red Bill
Ops Item 1 3. Silver John, Mike
Ops Item 2 3. Silver John
Ops Item 3 3 Silver Mike
Ops Item 4 4. Yellow Mark
I am using the following SQL, but it is not working:
with data as
(
select
DEPARTMENT,
ITEM_LIST,
(length(ITEM_LIST)-length(replace(ITEM_LIST,';','')))+1 cnt,
MIN(Color) as Color,
wm_concat(USER) as USER
from DataBase_Table
Group by
DEPARTMENT,
ITEM_LIST
)
select
DEPARTMENT
ITEM_LIST,
Color,
User
from
(
select distinct
DEPARTMENT,
ltrim(regexp_substr(ITEM_LIST,'[^;]+',1,level)) ITEM_LIST,
Color,
level,
User
from data
connect by level <= cnt
order by DEPARTMENT
)
;
with
t1 as (
select
department,
trim(regexp_substr(item_list, '[^;]+', 1, occ)) as item,
to_number(regexp_substr(color, '^\d+')) as color_no,
regexp_replace(color, '^\d+(.*)$', '\1') as color_name,
user_
from
(
select level as occ from dual
connect by level <= (select max(length(item_list)) from source)
),
source
where regexp_substr(item_list, '[^;]+', 1, occ) is not null
)
select
department,
item,
color,
listagg(user_, ',') within group (order by user_)
from
(
select
min(color_no)||color_name as color,
color_name
from t1
group by color_name
)
natural join t1
group by
department,
item,
color
order by 1 desc, 2, 3
SQLFiddle
In Oracle 10 you should use
wm_concat(distinct user_)
instead of
listagg(user_, ',') within group (order by user_)
Here is table named commonprofit:
name date turnover
1 2011/12 42359
1 2010/12 32863
1 2009/12 24293
1 2008/12 16436
1 2007/12 15442
2 2011/12 91634
2 2010/12 58410
2 2009/12 50668
2 2008/12 54297
3 2009/12 12352
3 2008/12 12352
3 2007/12 14226
select name,max(date) as date, turnover from commonprofit group by name
union
select name,min(date) as date,turnover from commonprofit group by name;
The reslut is
name|date|turnover
00001|2007/12|15442
00001|2011/12|42359
00002|2008/12|54297
00002|2011/12|91634
00003|2007/12|14226
00003|2009/12|12352
Why the result is not the following:
name|date|turnover
00001|2011/12|42359
00002|2011/12|91634
00003|2009/12|12352
00001|2007/12|15442
00002|2008/12|54297
00003|2007/12|14226
I want to know the reason why the sequence is not what i want in the sqlite query?
If you want a specific order you must provide an order by clause. E.g.,
select o, name, date, turnover from
(
select 'x' as o, name, max(date) as date, turnover from commonprofit group by name
union
select 'n' as o, name, min(date) as date, turnover from commonprofit group by name
)
order by o desc, name;
x|1|2011/12|42359
x|2|2011/12|91634
x|3|2009/12|12352
n|1|2007/12|15442
n|2|2008/12|54297
n|3|2007/12|14226
The reason is that UNION removes records that are duplicated in its subqueries.
To make finding duplicates easier, SQLite sorts the result.
To avoid this step, use UNION ALL instead of UNION.
(This implies that if there is a name with only one date, it will appear twice.)