I have a coworker that doesn't want to include null rows in a percentile rank. The default Teradata function seems to just treat null as the lowest number in the set, so I decided to do the math manually. I started using the below query to test out my equation
drop table tmp;
create multiset volatile table tmp (
num byteint
) primary index (num)
on commit preserve rows
;
insert into tmp
values (1)
;insert into tmp
values (2)
;insert into tmp
values (1)
;insert into tmp
values (4)
;insert into tmp
values (null)
;insert into tmp
values (4)
;insert into tmp
values (null)
;insert into tmp
values (2)
;insert into tmp
values (9)
;insert into tmp
values (null)
;insert into tmp
values (10)
;insert into tmp
values (10)
;insert into tmp
values (11)
;
select
num,
case
when num is null then 0
else cast(dense_rank() over (partition by case when num is not null then 1 else 2 end order by num) as number)
end as str_rnk,
q.nn,
str_rnk/q.nn as pct_rnk
from tmp
cross join (
select cast(count(num) as number) as nn from tmp
) q
order by num
;
So what I expect to see in result set is this:
num str_rnk nn pct_rnk
null 0 10 0
null 0 10 0
null 0 10 0
1 1 10 0.1
1 1 10 0.1
2 2 10 0.2
2 2 10 0.2
4 3 10 0.3
4 3 10 0.3
9 4 10 0.4
10 5 10 0.5
10 5 10 0.5
But I'm getting a result that looks like it did a regular rank instead of a dense_rank, like this:
num str_rnk nn pct_rnk
null 0 10 0
null 0 10 0
null 0 10 0
1 1 10 0.1
1 1 10 0.1
2 2 10 0.3
2 2 10 0.3
4 3 10 0.5
4 3 10 0.5
9 4 10 0.7
10 5 10 0.8
10 5 10 0.8
I know I could set the rank in a subquery and It would calculate the way I expect it to, but why isn't it doing it the way I have it now?
While this doesn't answer your question. It's not the division that's an issue, it seems to be some oddball issue running that CAST and Dense_Rank twice in the same SELECT.
Consider:
select
num,
case
when num is null then 0
else cast(dense_rank() over (partition by case when num is not null then 1 else 2 end order by num) as number)
end as str_rnk,
case
when num is null then 0
else cast(dense_rank() over (partition by case when num is not null then 1 else 2 end order by num) as number)
end as str_rnk2
from tmp
cross join (
select cast(count(num) as number) as nn from tmp
) q;
+--------+---------+----------+
| num | str_rnk | str_rnk2 |
+--------+---------+----------+
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 2 | 2 | 3 |
| 4 | 3 | 5 |
| 4 | 3 | 5 |
| 9 | 4 | 7 |
| 10 | 5 | 8 |
| 10 | 5 | 8 |
| 11 | 6 | 10 |
| <null> | 0 | 0 |
| <null> | 0 | 0 |
| <null> | 0 | 0 |
+--------+---------+----------+
Since the CAST isn't necessary here:
select
num,
case
when num is null then 0
else dense_rank() over (partition by case when num is not null then 1 else 2 END order by num)
end as str_rnk,
case
when num is null then 0
else dense_rank() over (partition by case when num is not null then 1 else 2 END order by num)
end as str_rnk2
from tmp
cross join (
select cast(count(num) as number) as nn from tmp
) q;
+--------+---------+----------+
| num | str_rnk | str_rnk2 |
+--------+---------+----------+
| 1 | 1 | 1 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 2 | 2 |
| 4 | 3 | 3 |
| 4 | 3 | 3 |
| 9 | 4 | 4 |
| 10 | 5 | 5 |
| 10 | 5 | 5 |
| 11 | 6 | 6 |
| <null> | 0 | 0 |
| <null> | 0 | 0 |
| <null> | 0 | 0 |
+--------+---------+----------+
Your query, with a quick rewrite:
select
num,
case
when num is null then 0
else dense_rank() over (partition by num * 0 order by num)
end as str_rnk,
str_rnk * 1.0/COUNT(*) OVER (PARTITION BY num * 0) as pct_rnk
from tmp
order by num
;
+--------+---------+---------+
| num | str_rnk | pct_rnk |
+--------+---------+---------+
| <null> | 0 | 0.0 |
| <null> | 0 | 0.0 |
| <null> | 0 | 0.0 |
| 1 | 1 | 0.1 |
| 1 | 1 | 0.1 |
| 2 | 2 | 0.2 |
| 2 | 2 | 0.2 |
| 4 | 3 | 0.3 |
| 4 | 3 | 0.3 |
| 9 | 4 | 0.4 |
| 10 | 5 | 0.5 |
| 10 | 5 | 0.5 |
| 11 | 6 | 0.6 |
+--------+---------+---------+
Or if you want to get the CASE statement out of there completely:
select
num,
dense_rank() over (partition by num * 0 order by num) * (num * 0 + 1.0) as str_rnk,
str_rnk/COUNT(*) OVER (PARTITION BY num * 0) as pct_rnk
from tmp
order by num;
As JNevill noted this is a bug, you should open an incident with Teradata support:
SELECT
num,
-- cast to FLOAT or DECIMAL works as expected
Cast(Dense_Rank() Over (ORDER BY num) AS NUMBER) AS a,
a AS b
FROM tmp
num a b
---- --- ---
? 1 1
? 1 1
? 1 1
1 2 4
1 2 4
2 3 6
2 3 6
4 4 8
4 4 8
9 5 10
10 6 11
10 6 11
11 7 13
But adding QUALIFY a<>b returns an empty result :-)
The original calculation for PERCENT_RANK is based on
Cast(Rank() Over (ORDER BY num) -1 AS DEC(18,6)) / Count(*) Over ()
If you want to exclude NULLs you can switch to Count(num) and NULLS LAST:
SELECT
num,
CASE
WHEN num IS NOT NULL
THEN Cast(Dense_Rank() Over (ORDER BY num NULLS LAST) AS DECIMAL(18,6))
ELSE 0
END AS str_rnk,
str_rnk / Count(num) Over ()
FROM tmp
Or using that slick num * 0 trick:
SELECT
num,
Coalesce(Dense_Rank()
Over (ORDER BY num NULLS LAST)
* (num * 0 +1.000000), 0) AS str_rnk,
str_rnk / Count(num) Over ()
FROM tmp
Related
I want to subtract values in the "place" column for each record returned in a "race", "bib", "split" group by so that a "diff" column appears like so.
Desired Output:
race | bib | split | place | diff
----------------------------------
10 | 514 | 1 | 5 | 0
10 | 514 | 2 | 3 | 2
10 | 514 | 3 | 2 | 1
10 | 17 | 1 | 8 | 0
10 | 17 | 2 | 12 | -4
10 | 17 | 3 | 15 | -3
I'm new to using the coalesce statement and the closest I have come to the desired output is the following
select a.race,a.bib,a.split, a.place,
coalesce(a.place -
(select b.place from ranking b where b.split < a.split), a.place) as diff
from ranking a
group by race,bib, split
which produces:
race | bib | split | place | diff
----------------------------------
10 | 514 | 1 | 5 | 5
10 | 514 | 2 | 3 | 2
10 | 514 | 3 | 2 | 1
10 | 17 | 1 | 8 | 8
10 | 17 | 2 | 12 | 11
10 | 17 | 3 | 15 | 14
Thanks for looking!
To compute the difference, you have to look up the value in the row that has the same race and bib values, and the next-smaller split value:
SELECT race, bib, split, place,
coalesce((SELECT r2.place
FROM ranking AS r2
WHERE r2.race = ranking.race
AND r2.bib = ranling.bib
AND r2.split < ranking.split
ORDER BY r2.split DESC
LIMIT 1
) - place,
0) AS diff
FROM ranking;
I have table1 like this.
id | name | sum1 | sum2 | bonus
———|——————|—————————|——————|——————
9 | X | 225 | 0,68 | 3
10 | X | 30 | 0,85 | 3
11 | X | 3384,73 | 0,8 | 3
15 | Y | 2800 | 2 | 3
16 | Y | 500 | 0 | 0
17 | Y | 2077,49 | 0,8 | 3
18 | Y | 26736,96| 0,7 | 8
19 | Z | 209,9 | 1,5 | 3
20 | Z | 700 | 1 | 3
21 | Z | 6550 | 0 | 0
I want sum bonus column for each of "name" subgroup and get in result query table2
id | name | sum1 | sum2 | bonus
————————|——————| ————————|——————|——————
9 | X | 225 | 0,68 | 3
10 | X | 30 | 0,85 | 3
11 | X | 3384,73 | 0,8 | 3
totalX | null | null | null | 9
15 | Y | 2800 | 2 | 3
16 | Y | 500 | 0 | 0
17 | Y | 2077,49 | 0,8 | 3
18 | Y | 26736,96| 0,7 | 8
totalY | null | null | null | 14
19 | Z | 209,9 | 1,5 | 3
20 | Z | 700 | 1 | 3
21 | Z | 6550 | 0 | 0
totalZ | null | null | null | 6
I did try "over partition by"
SELECT table1.*, sum(bonus) over (PARTITION by name) as bonus_total FROM table1
It got me an extra column with bonus sum for each subgroup but this is not exactly what I want to get
id | name | sum1 | sum2 | bonus| bonus_total
————————|——————| ————————|——————|——————|————————————
9 | X | 225 | 0,68 | 3| 9
10 | X | 30 | 0,85 | 3| 9
11 | X | 3384,73 | 0,8 | 3| 9
15 | Y | 2800 | 2 | 3| 14
16 | Y | 500 | 0 | 0| 14
17 | Y | 2077,49 | 0,8 | 3| 14
18 | Y | 26736,96| 0,7 | 8| 14
19 | Z | 209,9 | 1,5 | 3| 6
20 | Z | 700 | 1 | 3| 6
21 | Z | 6550 | 0 | 0| 6
You can do this by doing a partial group by rollup plus some conditional clauses:
with table1 as (select 9 id, 'X' name, 225 sum1, 0.68 sum2, 3 bonus from dual union all
select 10 id, 'X' name, 30 sum1, 0.85 sum2, 3 bonus from dual union all
select 11 id, 'X' name, 3384.73 sum1, 0.8 sum2, 3 bonus from dual union all
select 15 id, 'Y' name, 2800 sum1, 2 sum2, 3 bonus from dual union all
select 16 id, 'Y' name, 500 sum1, 0 sum2, 0 bonus from dual union all
select 17 id, 'Y' name, 2077.49 sum1, 0.8 sum2, 3 bonus from dual union all
select 18 id, 'Y' name, 26736.96 sum1, 0.7 sum2, 8 bonus from dual union all
select 19 id, 'Z' name, 209.9 sum1, 1.5 sum2, 3 bonus from dual union all
select 20 id, 'Z' name, 700 sum1, 1 sum2, 3 bonus from dual union all
select 21 id, 'Z' name, 6550 sum1, 0 sum2, 0 bonus from dual)
select case when id is null then 'total'||name else to_char(id) end id,
case when id is not null then name end name,
case when id is not null then sum(sum1) end sum1,
case when id is not null then sum(sum2) end sum2,
sum(bonus) bonus
from table1 t1
group by name, rollup (id)
order by t1.name, t1.id;
ID NAME SUM1 SUM2 BONUS
-------- ---- ---------- ---------- ----------
9 X 225 .68 3
10 X 30 .85 3
11 X 3384.73 .8 3
totalX 9
15 Y 2800 2 3
16 Y 500 0 0
17 Y 2077.49 .8 3
18 Y 26736.96 .7 8
totalY 14
19 Z 209.9 1.5 3
20 Z 700 1 3
21 Z 6550 0 0
totalZ 6
The case statements are required purely to get the formatting you required. I had to include sums around the sum1 and sum2 columns in order to get them to appear in the results as you wanted - we turn them into nulls for the output.
Also, I am assuming that the id column is set to disallow null values.
I have the following tables:
book_tbl:
book_instance_id | book_type_id | library_instance_id | location_id | book_index
1 | 70000 | 2 | 0 | 1
2 | 70000 | 2 | 0 | 2
3 | 70000 | 2 | 0 | 3
4 | 70000 | 3 | 0 | 1
5 | 70000 | 3 | 0 | 2
6 | 70000 | 3 | 0 | 3
7 | 70000 | 4 | 1 | 1
8 | 70000 | 4 | 1 | 2
9 | 70000 | 4 | 1 | 3
and library_tbl:
library_instance_id | library_type_id | location_id
2 | 1000 | 0
3 | 1001 | 0
4 | 1000 | 1
I would like to update the field book_type_id in book_tbl only for the first element (index) in library_type_id 1000
To retrieve this information I used sqlite query:
SELECT * FROM ( ( SELECT *
FROM library_tbl
WHERE library_type_id=1000 ) t1
join book_tbl t2 on t1.location_id=t2.location_id
AND t1.library_instance_id=t2.library_instance_id
AND book_index=1 )
How could I use the query above with UPDATE query to update rows 1 and 7:
UPDATE book_tbl SET book_type_id=15000 WHERE ????
Use EXISTS with a correlated subquery to check whether the corresponding library row exists:
UPDATE book_tbl
SET book_type_id = 15000
WHERE EXISTS (SELECT 1
FROM library_tbl
WHERE library_type_id = 1000
AND location_id = book_tbl.location_id
AND library_instance_id = book_tbl.library_instance_id)
AND book_index = 1;
The following query works in SQL Server but not in SQLite 3.8.7 and I would like to know why.
Table
l | r
0 | 10
0 | 2
8 | 10
Query
SELECT * FROM Segments AS s1
LEFT JOIN Segments AS s2
ON ((s2.l <= s1.l AND s2.r > s1.r)
OR (s2.l < s1.l AND s2.r >= s1.r));
Expected output
s1.l | s1.r | s2.l | s2.r
0 | 10 | null | null
0 | 2 | 0 | 10
8 | 10 | 0 | 10
However I got
s1.l | s1.r | s2.l | s2.r
0 | 10 | 0 | 2
0 | 2 | 0 | 10
8 | 10 | 0 | 10
And when I switched the expression order i.e
((s2.l < s1.l AND s2.r >= s1.r) (s2.l <= s1.l AND s2.r > s1.r))
I got
s1.l | s1.r | s2.l | s2.r
0 | 10 | 8 | 10
0 | 2 | 0 | 10
8 | 10 | 0 | 10
This was solved by using | instead of OR, but I am wondering why OR did not work?
Heres the example on SQLFiddle
http://sqlfiddle.com/#!7/15859/22/1
Thanks
This is a bug that was fixed in SQLite 3.8.7.2.
I have a dataset that looks like
| ID | Category | Failure |
|----+----------+---------|
| 1 | a | 0 |
| 1 | b | 0 |
| 1 | b | 0 |
| 1 | a | 0 |
| 1 | c | 0 |
| 1 | d | 0 |
| 1 | c | 0 |
| 1 | failure | 1 |
| 2 | c | 0 |
| 2 | d | 0 |
| 2 | d | 0 |
| 2 | b | 0 |
This is data where each ID potentially ends in a failure event, through an intermediate sequence of events {a, b, c, d}. I want to be able to count the number of IDs for which each of those intermediate events occur by failure event.
So, I would like a table of the form
| | a | b | c | d |
|------------+---+---+---+---|
| Failure | 4 | 5 | 6 | 2 |
| No failure | 9 | 8 | 6 | 9 |
where, for example, the number 4 indicates that in 4 of the IDs where a occurred ended in failure.
How would I go about doing this in R?
You can use table for example:
dat <- data.frame(categ=sample(letters[1:4],20,rep=T),
failure=sample(c(0,1),20,rep=T))
res <- table(dat$failure,dat$categ)
rownames(res) <- c('Failure','No failure')
res
a b c d
Failure 3 2 2 1
No failure 1 2 4 5
you can plot it using barplot:
barplot(res)
EDIT to get this by ID, you can use by for example:
dat <- data.frame(ID=c(rep(1,9),rep(2,11)),categ=sample(letters[1:4],20,rep=T),
failure=sample(c(0,1),20,rep=T))
by(dat,dat$ID,function(x)table(x$failure,x$categ))
dat$ID: 1
a b c d
0 1 2 1 3
1 1 1 0 0
---------------------------------------------------------------------------------------
dat$ID: 2
a b c d
0 1 2 3 0
1 1 3 1 0
EDIT using tapply
Another way to get this is using tapply
with(dat,tapply(categ,list(failure,categ,ID),length))