RANK against multiple columns (Teradata) - teradata

Q: Is there a way to qualify a single column, while ignoring the rest of the columns present in the final report output?
I had to move this question to Teradata-only conference, since there is a solution for SQL Server.
My final output format looks like (without ranking restriction) :
PRODUCT1 STATE1 CITY_A 20
PRODUCT2 STATE1 CITY_A 10
PRODUCT3 STATE1 CITY_A 25
PRODUCT1 STATE1 CITY_B 20
PRODUCT2 STATE1 CITY_B 10
PRODUCT3 STATE1 CITY_B 1
PRODUCT1 STATE2 CITY_C 20
PRODUCT2 STATE2 CITY_C 10
PRODUCT3 STATE2 CITY_C 1
I'm banging my head over a problem how to qualify highest 2 grossing PRODUCTs, *overall* produced by a dynamic SQL expression.
Desired output:
PRODUCT1 STATE1 CITY_A 20
PRODUCT2 STATE1 CITY_A 10
PRODUCT1 STATE1 CITY_B 20
PRODUCT2 STATE1 CITY_B 10
PRODUCT1 STATE2 CITY_C 20
PRODUCT2 STATE2 CITY_C 10
Product 3 should not qualify being RANKed <=2 - if ranked overall, and not being broken by state and city.
I'd like to see if an analytical function can be used at the external select level where the final formatted output is created.
Using Teradata 14.00, so no access to DENSE_RANK yet...
UPDATE:
Mr.ZLK suggested a non-OLAP solution, which works in SQL Server, but not in TD, unfortunately:
select * from products sqlMain
where product_id in (1,2,3) /* dynamic pre-condition */
and product_id in
(select top 2 product_id from products t1
where sqlMain.product_id=t1.product_id
group by product_id order by sum(total) desc
)
This would give PRODUCT1 and PRODUCT2
select * from products sqlMain
where product_id in (3) /* another dynamic pre-condition */
and product_id in
(select top 2 product_id from products t1
where sqlMain.product_id=t1.product_id
group by product_id order by sum(total) desc
)
This would give PRODUCT3, correctly.

If your Teradata version is higher than 14.10, you could use the DENSE_RANK analytical function in a QUALIFY clause.
Assuming that the columns of your table are product, state, city and gross, it would be something like this:
SELECT
product,
state,
city,
gross
FROM (SELECT
product,
state,
city,
gross,
SUM(gross) OVER(PARTITION BY product) AS total
FROM yourTable) t
QUALIFY DENSE_RANK() OVER(ORDER BY total DESC) <= 2
;

Related

How do I get duplicate rows?

I have got a Teradata table. I have attached a part of the table for reference.
I need to print out the rows which have exactly the same values
Table Values:
id Name City Country
1 John Berlin Germany
2 Mike Warsaw Poland
3 Neil London England
1 John Berlin Germany
2 Mike Warsaw Poland
4 Alan Moscow Russia
The output that I am expecting is
id Name City Country
1 John Berlin Germany
2 Mike Warsaw Poland
This might be solved your problem.
SELECT *
FROM TableName
group by id, Name, city, country
having count(*) > 1;
#ManguYogi's solution works fine (and is probably more efficient), but I wanted to add another solution because it's a rare case that except all can be useful:
select * from mytab
EXCEPT ALL
select distinct * from mytab
If a row exists more than twice it will be returned multiple times. Of course, if you're interested in this you can simply add the count to the #ManguYogi's Select.
-- this will select all duplicate rows
select * from johns_table
qualify row_number() over (partition by id order by id) > 1

Transforming a data frame in R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 4 years ago.
I have a dataframe in R with client information and sales per product. Product is a field with multiple values. Sales is a separate field. I would like to convert the table so the sales from each product has its own column so that I have one row per client (rather than one row per client per product). I have seen information on how to transpose a table, but this is different. Below are two simplified examples of what I am starting with and the desired end result. The real situation will have many more columns, clients and products.
Starting point:
start <- data.frame(client = c(1,1,1,2,2,2),
product=c("Product1","Product2","Product3","Product1","Product2","Product3"),
sales = c(100,500,300,200,400,600))
Output:
client product sales
1 1 Product1 100
2 1 Product2 500
3 1 Product3 300
4 2 Product1 200
5 2 Product2 400
6 2 Product3 600
Following is the desired end result:
end <- data.frame(client = c(1,2),
Product1 = c(100,200), Product2 = c(500,400),
Product3 = c(300,600))
Output:
client Product1 Product2 Product3
1 1 100 500 300
2 2 200 400 600
How can I transform this data from the start to end in R? Thanks in advance for any assistance!
> install.packages("reshape2") # to install 'reshape2'.
> library(reshape2)
> dcast(start, client ~ product)
Using sales as value column: use value.var to override.
client Product1 Product2 Product3
1 1 100 500 300
2 2 200 400 600

Lead function group by in oracle

I want to group by lead function by two column. Here is my table data.
Id Name_Id Name Item_Id Item_Name date
1 1 Car 1 SUV 1-Jan-2015
2 1 Car 1 SUV 12-March-2015
3 1 Car 1 SUV 20-April-2015
4 1 Car 2 Sport 23-April-2015
5 2 Bike 1 SUV 18-July-2015
6 2 Bike 1 SUV 20-Aug-2015
7 2 Bike 2 Sport 18-Sept-2015
8 2 Bike 3 Honda 20-OCT-2015
And I need result from above table like.
Id Name_Id Name Item_Id Item_Name start date end date
1 1 Car 1 SUV 1-Jan-2015 20-April-2015
2 1 Car 2 Sport 20-April-2015 23-April-2015
3 2 Bike 1 SUV 18-July-2015 20-Aug-2015
4 2 Bike 2 Sport 20-Aug-2015 18-Sept-2015
5 2 Bike 3 Honda 18-Sept-2015 20-OCT-2015
Any suggestion really appreciated.
I don't think you need to use LEAD here. The CTE below computes, for each Item_Id, the earliest and latest date. This is then joined to your original table to restrict to records corresponding to the earliest Item_Id. At the same time, the end date is also pulled in during the join.
WITH cte AS (
SELECT Name,
Item_Id,
MIN(date) AS start_date,
MAX(date) AS end_date
FROM yourTable
GROUP BY Name, Item_Id
)
SELECT t1.Id, t1.Name_Id, t1.Name, t1.Item_Id, t1.Item_Name,
t2.start_date,
t2.end_date
FROM yourTable t1
INNER JOIN cte t2
ON t1.Item_Id = t2.Item_Id AND
t1.Name = t2.Name AND
t1.date = t2.start_date

Perform JOIN in SQLITE on two SELECT statements from the same table

This is how I have a sample table in SQLITE
ID NAME AGE ADDRESS SALARY
1 Paul 32 California 20000.0
2 Allen 25 Texas 15000.0
3 Teddy 23 Norway 20000.0
4 Mark 25 Rich-Mond 65000.0
5 David 27 Texas 85000.0
6 Kim 22 South-Hall 45000.0
7 Paul 32 California 20000.0
8 Allen 25 Texas 15000.0
9 Teddy 23 Norway 20000.0
What I want to achieve is a join on my SQLITE table on these two queries
select AGE, count(*) as SALARYLESSTHAN45 from company where salary < 45000 group by salary
select AGE, count(*) as SALARYMORETHAN45 from company where salary > 45000 group by salary
I tried the following
select AGE, count(*) as SALARYLESSTHAN45 from company where salary < 45000 group by salary ) T1
INNER JOIN
select AGE, count(*) as SALARYMORETHAN45 from company where salary > 45000 group by salary ) T2
ON T1.AGE = T2.AGE
but cannot get this to work...
Can someone share an example of how to achieve this in SQLITE ?
A join on two different tables would look like this:
SELECT ... FROM Tab1 JOIN Tab2 ON ...
To do the join on the result of a query, you have to replace the table name with a subquery:
select AGE,
SALARYLESSTHAN45,
SALARYMORETHAN45
from (select AGE,
count(*) as SALARYLESSTHAN45
from company
where salary < 45000
group by salary)
join (select AGE,
count(*) as SALARYMORETHAN45
from company
where salary > 45000
group by salary)
using (AGE);

The logic of WHERE Clause along with > operator and the sub-query

I don't get the logic for the query 3 as below, and hope someone could give me some idea.
For the query 3,
SELECT ID, NAME, AGE, SALARY FROM COMPANY WHERE AGE > (SELECT AGE FROM COMPANY WHERE SALARY < 20000);
The sub-query would find out the result where the salary < 20000 first, and that is what query2 shown as below. And then the parent query would find out the result where using all the age's record from the table COMPANY(total of 7 record: 18,19,22,23,24,29,37) to compare with the age's result from sub-query(total of 4 record: 18,19,23,29) and then show the greater record based on age.
I expect the result should only show the ID 7 only like below, since only this record is met the condition. The greater age from the result of sub-query(query 2) is 29, so only this record the age is over 29.
ID NAME AGE SALARY
7 Vicky 37 32500.0
Unfortunately my expectation is not met, and it show me the result like query 3 as below.
I hope to understand the logic how its work for query 3, and hope someone could assist.
1.sqlite> SELECT ID, NAME, AGE, SALARY FROM COMPANY;
ID NAME AGE SALARY
1 John 24 21000.0
2 Davy 22 20000.0
3 Kenny 19 9700.0
4 Henry 23 13555.0
5 Sam 18 17000.0
6 Ray 29 8000.0
7 Vicky 37 32500.0
2.sqlite> SELECT ID, NAME, AGE, SALARY FROM COMPANY WHERE SALARY < 20000;
ID NAME AGE SALARY
3 Kenny 19 9700.0
4 Henry 23 13555.0
5 Sam 18 17000.0
6 Ray 29 8000.0
3.sqlite> SELECT ID, NAME, AGE, SALARY FROM COMPANY WHERE AGE > (SELECT AGE FROM COMPANY WHERE SALARY < 20000);
ID NAME AGE SALARY
1 John 24 21000.0
2 Davy 22 20000.0
4 Henry 23 13555.0
6 Ray 29 8000.0
7 Vicky 37 32500.0
At a guess, since it doesn't throw an error (which seems a better idea; see also Col. 32's comment):
Sqlite just picks the first returned age. That age should be random, but going by the results shown in your query 2 and assuming some consistency, the first result is likely 19. Then, it picks all ages larger than 19, which is what you see in the results of query 3.
Shuffle things around or create another set of data, and see if what you get now from query 2 and 3 are consistent with this assumption.
Someone else may know the internals of Sqlite enough to explain why this happens.

Resources