SQL Concatenate multiple rows - teradata

I'm using Teradata, I have a table like this
ID String
123 Jim
123 John
123 Jane
321 Jill
321 Janine
321 Johan
I want to query the table so I get
ID String
123 Jim, John, Jane
321 Jill, Janine, Johan
I tried partition but there can be many names.
How do I get this result. Even, to point me in the right direction would be great.

Unfortunately there's no PIVOT in Teradata (only a TD_UNPIVOT in 14.10).
If you got luck there's an aggregate UDF at your site to do a group concat (probably low possibility).
Otherwise there are two options: recursion or aggregation.
If the maximum number of rows per id is known aggregation is normally faster. It's a lot of code, but most of it is based on cut&paste.
SELECT
id,
MAX(CASE WHEN rn = 1 THEN string END)
|| MAX(CASE WHEN rn = 2 THEN ',' || string ELSE '' END)
|| MAX(CASE WHEN rn = 3 THEN ',' || string ELSE '' END)
|| MAX(CASE WHEN rn = 4 THEN ',' || string ELSE '' END)
|| ... -- repeat up to the known maximum
FROM
(
SELECT
id, string,
ROW_NUMBER()
OVER (PARTITION BY id
ORDER BY string) AS rn
FROM t
) AS dt
GROUP BY 1;
For large tables it's much more efficient when you materialize the result of the Derived Table in a Volatile Table first using the GROUP BY column as PI.
For recursion you should use a Volatile Table, too, as OLAP functions are not allowed in the recursive part. Using a view instead will repeatedly calculate the OLAP function and thus result in bad performance.
CREATE VOLATILE TABLE vt AS
(
SELECT
id
,string
,ROW_NUMBER()
OVER (PARTITION BY id
ORDER BY string DESC) AS rn -- reverse order!
,COUNT(*)
OVER (PARTITION BY id) AS cnt
FROM t
) WITH DATA
UNIQUE PRIMARY INDEX(id, rn)
ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte
(id, list, rn) AS
(
SELECT
id
,CAST(string AS VARCHAR(1000)) -- define maximum size based on maximum number of rows
,rn
FROM vt
WHERE rn = cnt
UNION ALL
SELECT
vt.id
,cte.list || ',' || vt.string
,vt.rn
FROM vt
JOIN cte
ON vt.id = cte.id
AND vt.rn = cte.rn - 1
)
SELECT id, list
FROM cte
WHERE rn = 1;
There's one problem with this approach, it might need a lot of spool which is easy to see when you omit theWHERE rn = 1.

SELECT ID,
TRIM(TRAILING ',' FROM (XMLAGG(TRIM(String)|| ',' ORDER BY String) (VARCHAR(10000)))) as Strings
FROM db.table
GROUP BY 1

SQL Server 2017+ and SQL Azure: STRING_AGG
Starting with the next version of SQL Server, we can finally concatenate across rows without having to resort to any variable or XML witchery.
STRING_AGG (Transact-SQL)
SELECT ID, STRING_AGG(String, ', ') AS Strings
FROM TableName
GROUP BY ID

Related

Grouping query in Redshift takes huge amount of time

I have a following requirement: I have a table in following format.
and this is what I want it to be transformed into:
Basically I want number of users with various combination of activities
I want to have this format as I want to create a TreeMap visualization out of it.
This is what I have done till now.
First find out number of users with activity groupings
WITH lookup AS
(
SELECT listagg(name,',') AS groupings,
processed_date,
guid
FROM warehouse.test
GROUP BY processed_date,
guid
)
SELECT groupings AS activity_groupings,
LENGTH(groupings) -LENGTH(REPLACE(groupings,',','')) + 1 AS count,
processed_date,
COUNT( guid) AS users
FROM lookup
GROUP BY processed_date,
groupings
I put the results in a separate table
Then, I do a Split and coalesce like this:
SELECT NULLIF(SPLIT_PART(groupings,',', 1),'') AS grouping_1,
COALESCE(NULLIF(SPLIT_PART(groupings,',', 2),''), grouping_1) AS grouping_2,
COALESCE(NULLIF(SPLIT_PART(groupings,',', 3),''), grouping_2, grouping_1) AS grouping_3,
num_users
FROM warehouse.groupings) AS expr_qry
GROUP BY grouping_1,
grouping_2,
grouping_3
The problem is the first query takes more than 90 minutes to execute as I have more than 250M rows.
There must be a better and efficient way to di this.
Any heads up would be greatly appreciated.
Thanks
You do not need to use complex string manipulation functions (LISTAGG(), SPLIT_PART()). You can achieve what you're after with the ROW_NUMBER() function and simple aggregates.
-- Create sample data
CREATE TEMP TABLE test_data (id, guid, name)
AS SELECT 1::INT, 1::INT, 'cooking'
UNION ALL SELECT 2::INT, 1::INT, 'cleaning'
UNION ALL SELECT 3::INT, 2::INT, 'washing'
UNION ALL SELECT 4::INT, 4::INT, 'cooking'
UNION ALL SELECT 6::INT, 5::INT, 'cooking'
UNION ALL SELECT 7::INT, 3::INT, 'cooking'
UNION ALL SELECT 8::INT, 3::INT, 'cleaning'
;
-- Assign a row number to each name per guid
WITH name_order AS (
SELECT guid
, name
, ROW_NUMBER() OVER(PARTITION BY guid ORDER BY id) row_n
FROM test_data
) -- Use MAX() to collapse each guid's data to 1 row
, groupings AS (
SELECT guid
, MAX(CASE WHEN row_n = 1 THEN name END) grouping_1
, MAX(CASE WHEN row_n = 2 THEN name END) grouping_2
FROM name_order
GROUP BY guid
) -- Count the guids per each grouping
SELECT grouping_1
, COALESCE(grouping_2, grouping_1) AS grouping_2
, COUNT(guid) num_users
FROM groupings
GROUP BY 1,2
;
-- Output
grouping_1 | grouping_2 | num_users
------------+------------+-----------
washing | washing | 1
cooking | cleaning | 2
cooking | cooking | 2

Cannot replace a string with several random strings taken from another table in sqlite

I'm trying to replace a placeholder string inside a selection of 10 random records with a random string (a name) taken from another table, using only sqlite statements.
i've done a subquery in order to replace() of the placeholder with the results of a subquery. I thought that each subquery loaded a random name from the names table, but i've found that it's not the case and each placeholder is replaced with the same string.
select id, (replace (snippet, "%NAME%", (select
name from names
where gender = "male"
) )
) as snippet
from imagedata
where timestamp is not NULL
order by random()
limit 10
I was expecting for each row of the SELECT to have different random replacement every time the subquery is invoked.
hello i'm %NAME% and this is my house
This is the car of %NAME%, let me know what you think
instead each row has the same kind of replacement:
hello i'm david and this is my house
This is the car of david, let me know what you think
and so on...
I'm not sure it can be done inside sqlite or if i have to do it in php over two different database queries.
Thanks in advance!
Seems that random() in the subquery is only evaluated once.
Try this:
select
i.id,
replace(i.snippet, '%NAME%', n.name) snippet
from (
select
id,
snippet,
abs(random()) % (select count(*) from names where gender = 'male') + 1 num
from imagedata
where timestamp is not NULL
order by random() limit 10
) i inner join (
select
n.name,
(select count(*) from names where name < n.name and gender = 'male') + 1 num
from names n
where gender = 'male'
) n on n.num = i.num

Counting 2 Columns from Separate Tables Between 2 Dates

I have a query that Counts 2 columns from 2 separate tables using subqueries, which works. Now I have to implement into this query the ability to filter out these results based on the Date of a Call Record. I will post the query in which I am working with:
SELECT (m.FirstName || " " || m.LastName) AS Members,
(
SELECT count(CallToLineOfficers.MemberID)
FROM CallToLineOfficers
WHERE CallToLineOfficers.MemberID = m.MemberID
)
+ (
SELECT count(CallToMembers.MemberID)
FROM CallToMembers
WHERE CallToMembers.MemberID = m.MemberID
) AS Tally
FROM Members AS m, Call, CallToMembers, CallToLineOfficers
Join Call on CallToMembers.CallID = Call.CallID
and CallToLineOfficers.CallID = Call.CallI
WHERE m.FirstName <> 'None'
-- and Call.Date between '2017-03-21' and '2017-03-22'
GROUP BY m.MemberID
ORDER BY m.LastName ASC;
Ok, so table Call stores the Date and its PK is CallID. Both CallToLineOfficers and CallToMembers are Bridge Tables that also contain only CallID and MemberID. With the current query, where the Date is commented out, that Date range should only return all names, but a count of 1 should appear under 1 person's name.
I have tried joining Call.CallID with both Bridge Tables' CallIDs without any luck, though I think this is the right way to do it. Could someone help point me in the right direction? I am lost. (I tried explaining this the best I could, so if you need more info, let me know.)
UPDATED: Here is a screenshot of what I am getting:
Based on the provided date in the sample, the new results, with the Date, should be:
Bob Clark - 1
Rob Catalano - 1
Matt Butler - 1
Danielle Davidson - 1
Jerry Chuska - 1
Tom Cramer - 1
Everyone else should be 0.
At the moment, the subqueries filter only on the member ID. So for any member ID in the outer query, they return the full count.
To reduce the count, you have to filter in the subqueries:
SELECT (FirstName || " " || LastName) AS Members,
(
SELECT count(*)
FROM CallToLineOfficers
JOIN Call USING (CallID)
WHERE MemberID = m.MemberID
AND Date BETWEEN '2017-03-21' AND '2017-03-22'
)
+ (
SELECT count(*)
FROM CallToMembers
JOIN Call USING (CallID)
WHERE MemberID = m.MemberID
AND Date BETWEEN '2017-03-21' AND '2017-03-22'
) AS Tally
FROM Members AS m
WHERE FirstName <> 'None'
ORDER BY LastName ASC;

Query fails to execute after converting a column from Varchar2 to CLOB

I have a oracle query
select id from (
select ID, ROW_NUMBER() over (partition by LATEST_RECEIPT order by ID) rownumber
from Table
where LATEST_RECEIPT in
(
select LATEST_RECEIPT from Table
group by LATEST_RECEIPT
having COUNT(1) > 1
)
) t
where rownumber <> 1;
The data type of LATEST_RECEIPT was earlier varchar2(4000) and this query worked fine. Since the length of the column needs to be extended i modified it to CLOB, after which this fails. Could anyone help me fix this issue or provide a work around?
You can change your inner query to look for other rows with the same last_receipt value but a different ID (assuming ID is unique); if another row exists then that is equivalent to your count returning greater than one. But you can't simply test two CLOB values for equality, you need to use dbms_lob.compare:
select ID
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID != t1.ID
-- or if ID isn't unique: and t2.ROWID != t1.ROWID
);
Applying the row number filter is tricker, as you also can't use a CLOB in the analytic partition by clause. As André Schild suggested, you can use a hash; here passing the integer value 3, which is the equivalent of dbms_crypto.hash_sh1 (though in theory that could change in a future release!):
select id from (
select ID, ROW_NUMBER() over (partition by dbms_crypto.hash(LATEST_RECEIPT, 3)
order by ID) rownumber
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID != t1.ID
-- or if ID isn't unique: and t2.ROWID != t1.ROWID
)
)
where rownumber > 1;
It is of course possible to get a hash collision, and if that happened - you had two latest_receipt values which both appeared more than once and both hashed to the same value - then you could get too many rows back. That seems pretty unlikely, but it's something to consider.
So rather than ordering you can only look for rows which have the same lastest_receipt and a lower ID:
select ID
from your_table t1
where exists (
select null from your_table t2
where dbms_lob.compare(t2.LATEST_RECEIPT, t1.LATEST_RECEIPT) = 0
and t2.ID < t1.ID
);
Again that assumes ID is unique. If it isn't then you could still use rowid instead, but you would have less control over which rows were found - the lowest rowid isn't necessarily the lowest ID. Presumably you're using this to dine rows to delete. If you actually don't mind which row you keep and which you delete then you could still do:
and t2.ROWID < t1.ROWID
But since you are currently ordering that probably isn't acceptable, and hashing might be preferable, despite the small risk.

Pl/SQL - oracle 9i - Manual Pivoting

We have a table which has three columns in it:
Customer_name, Age_range, Number_of_people.
1 1-5 10
1 5-10 15
We need to return all the number of people in different age ranges as rows of a single query. If we search for customer #1, the query should just return one row:
Header- Age Range (1-5) Age Range (5-10)
10 15
We needed to get all the results in a single row; When I query for customer 1, the result should be only number of people in a single row group by age_range.
What would be the best way to approach this?
You need to manually perform a pivot:
SELECT SUM(CASE WHEN age_range = '5-10'
THEN number_of_people
ELSE NULL END) AS nop5,
SUM(CASE WHEN age_range = '10-15'
THEN number_of_people
ELSE NULL END) AS nop10
FROM customers
WHERE customer_name = 1;
There are easy solutions with 10g and 11g using LISTGAGG, COLLECT, or other capabilities added after 9i but I believe that the following will work in 9i.
Source (http://www.williamrobertson.net/documents/one-row.html)
You will just need to replace deptno with customer_name and ename with Number_of_people
SELECT deptno,
LTRIM(SYS_CONNECT_BY_PATH(ename,','))
FROM ( SELECT deptno,
ename,
ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY ename) -1 AS seq
FROM emp )
WHERE connect_by_isleaf = 1
CONNECT BY seq = PRIOR seq +1 AND deptno = PRIOR deptno
START WITH seq = 1;
DEPTNO CONCATENATED
---------- --------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
This will create a stored FUNCTION which means you can access it at any time.
CREATE OR REPLACE FUNCTION number_of_people(p_customer_name VARCHAR2)
RETURN VARCHAR2
IS
v_number_of_people NUMBER;
v_result VARCHAR2(500);
CURSOR c1
IS
SELECT Number_of_people FROM the_table WHERE Customer_name = p_customer_name;
BEGIN
OPEN c1;
LOOP
FETCH c1 INTO v_number_of_people;
EXIT WHEN c1%NOTFOUND;
v_result := v_result || v_number_of_people || ' ' || CHR(13);
END;
END;
To run it, use:
SELECT number_of_people(1) INTO dual;
Hope this helps, and please let me know if there are any errors, I didn't testrun the function myself.
Just do
select Number_of_people
from table
where Customer_name = 1
Are we missing some detail?

Resources