filter NULL rows if non-null exists in groupby query in SQL (Teradata) - teradata

Given the following table:
ID
Price
Agg_ID
1
34
a
1
42
NULL
2
34
c
I would like to have one row per ID where the Agg_ID is not null, if such not null exists, otherwise, one of the null rows is chosen. Regardless, only one row per ID should exist after filter.
ID
Price
Agg_ID
1
34
a
2
34
c
I tried following the following question, suggesting to use the orderby, but I couldn't assert that after orderby, the nonnull rows are first.
Thanks.

Related

Return no row if non meets criteria using SUM()

According to documentation Sqlite SUM() function returns NULL if no row in table meets criteria.
I don't want any lines to return if the id does not exist, like in the query:
SELECT SUM(tMoney.money),tCustomer.name FROM tMoney JOIN tCustomer ON tMoney.id = tCustomer.id WHERE tCustomer.id = 3
tMoney
id money
--- ------
1 210
2 400
1 150
tCustomer
name id
--- ------
bob 1
dan 2
Just filter out lines with null using "having" clause
SELECT SUM(tMoney.money), tCustomer.name
FROM tMoney
JOIN tCustomer
ON tMoney.id = tCustomer.id
WHERE tCustomer.id = 3
group by tCustomer.name
having SUM(tMoney.money) is not null;

SQLITE3 - Deleting rows that have multiple columns of the same value

I have an SQLite3 database from which I want to remove rows that have two fields of the same value.
It seems that I am able to select such values with this query:
SELECT * FROM mydb GROUP BY user_id, num HAVING COUNT(*) > 1
However I am not able to delete them.
DELETE FROM mydb WHERE user_id IN (SELECT * FROM mydb GROUP BY user_id, num HAVING COUNT(*) > 1)
returns a syntax error.
This is what I expect:
Example:
id user_id num
1 1 1
2 1 1
3 2 1
4 1 2
5 2 2
In this example id 1 and 2 have both columns (user_id and num) of the same value so they should be removed. Preferably, but not necessarily I would like to have a solution that would leave only one such row (doesn't matter which one).
Result:
id user_id num
2 1 1
3 2 1
4 1 2
5 2 2
Note: id is a primary key. user_id is a foreign key. num is an INTEGER.
You were having a syntax error because your IN operator has a single value on the left (user_id) but a table of non-single-value rows in the right side (SELECT *). Compare like with like; WHERE user_id IN (SELECT user_id ...) to avoid it.
Anyway, here's a query to delete all-but-newest:
DELETE FROM mydb
WHERE id NOT IN (
SELECT MAX(id) FROM mydb
GROUP BY user_id, num
);
The subquery will return the highest id for every unique (user_id, num) combination. Then we just delete all the other rows. I.e. in your example, the subquery would return 2, 3, 4, 5 as "correct", which would result in deletion of row 1.

BQ array lookup: similar to NTH, but based on index, not position

The NTH function is really useful for extracting nested array elements in BQ, but its utility for a given table depends on each row's nested array containing the same amount of elements, and in the same order. If I have a 2+ column nested array where one column is variable name/ID, and the different instances of the array in different rows have inconsistent naming and/or ordering, is there an elegant way to fetch/pivot a variable based on the variable name/ID?
For example, if row1 has customDimensions array:
index value
4 aaa
23 bbb
70 ccc
and row2 has customDimensions array:
index value
4 ddd
70 eee
I'd want to run something like
SELECT
NTHLOOKUP(70, customdims.index, customdims.value) as val70,
NTHLOOKUP(4, customdims.index, customdims.value) as val4,
NTHLOOKUP(23, customdims.index, customdims.value) as val23
from my_table;
And get:
val70 val4 val23
ccc aaa bbb
eee ddd (null)
I've been able to get this sort of result by making a subquery for each desired index value, unnesting the array in each and filtering WHERE index = (value), but that gets really ugly as the variables pile up. Is there an alternative?
EDIT: Based on Mikhail's answer below (thank you!!) I was able to write my query more elegantly. Not quite as slick as an NTHLOOKUP, but I'll take it:
select id,
max(case when index = 41 then value[OFFSET(0)] else '' end) as val41,
max(case when index = 59 then value[OFFSET(0)] else '' end) as val59
from
(select
concat(array1.thing1, array1.thing2) as id,
cd.index,
ARRAY_AGG(distinct cd.value) as value
FROM my_table g
,unnest(array1) as array1
,unnest(array1.customDimensions) as cd
where index in (41,59)
group by 1,2
order by 1,2
) x
group by 1
order by 1
The best I can "offer" is below (BigQuery Standard SQL)
#standardSQL
WITH `project.dataset.my_table` AS (
SELECT ARRAY<STRUCT<index INT64, value STRING>>
[(4, 'aaa'), (23, 'bbb'), (70, 'ccc')] customDimensions
UNION ALL
SELECT ARRAY<STRUCT<index INT64, value STRING>>
[(4, 'ddd'), (70, 'eee')] customDimensions
)
SELECT cd.index, ARRAY_AGG(cd.value) VALUES
FROM `project.dataset.my_table`,
UNNEST(customDimensions) cd
GROUP BY cd.index
with result as below
Row index values
1 4 aaa
ddd
2 23 bbb
3 70 ccc
eee
I would recommend to stay with this flatten version as it serves most of practical cases I can think of
But if you still want to further pivot this - there are quite a number of posts related to how to pivot in BigQuery
I've been able to get this sort of result by making a subquery for each desired index value, unnesting the array in each and filtering WHERE index = (value), but that gets really ugly as the variables pile up. Is there an alternative?
Yes, you can use a user-defined function to encapsulate the common logic. For example,
CREATE TEMP FUNCTION NTHLOOKUP(
targetIndex INT64,
customDimensions ARRAY<STRUCT<index INT64, value STRING>>
) AS (
(SELECT value FROM UNNEST(customDimensions)
WHERE index = targetIndex)
);
SELECT
NTHLOOKUP(70, customDimensions) as val70,
NTHLOOKUP(4, customDimensions) as val4,
NTHLOOKUP(23, customDimensions) as val23
from my_table;

SQLite get MAX value BETWEEN range with some null fields

I have a table, two columns some rows:
c1 | c2
--------
10 | 90
11 | 89.5
12 | 89
13 | 87
14 | null
15 | 86
I want to get the MAX value of column c2 but only between c1's values 12 and 15
I try with:
SELECT MAX(IFNULL(c2, 0)) AS max_value FROM mytable WHERE data BETWEEN 12 AND 15
but not working. Is there a way to ignore the null value ?
MAX aggregated function will automatically skip null values, so the following query should be fine:
SELECT MAX(c2) FROM tablename WHERE c1 BETWEEN 12 AND 15;
Please see fiddle here. I see your where condition is
WHERE data BETWEEN 12 AND 15
are you sure you are applying the WHERE clause to the correct column? I think you need to change it to:
WHERE c1 BETWEEN 12 AND 15
"The max() aggregate function returns the maximum value of all values in the group. The maximum value is the value that would be returned last in an ORDER BY on the same column. Aggregate max() returns NULL if and only if there are no non-NULL values in the group." - http://sqlite.org/lang_aggfunc.html
Assumed, you have no negatives and you don't want NULL to be 0, You can simply use
SELECT MAX(c2) AS max_value FROM mytable ...
If all rows for c2 are null, max(c2) will return null

How to create a column for even and odd records dynamically?

I have a query in Teradata. I want to add an additional column that would be a VARCHAR.
It should say whether the selected record is even or odd
select id, name, CASE newColumn WHEN --- ???
from my table
Like this
id name newColumn
1 asdf odd
2 ts df even
32 htssdf odd
4 asdfsd even
23 gftht odd
How can I do this
Based on your example, I can't tell how you are sorting the results. You would need to define a sort order. Let's assume you would do it based on the id number.
SELECT id, name,
ROW_NUMBER() OVER(ORDER BY id) row_id,
CASE WHEN ROW_NUMBER() OVER(ORDER BY id) MOD 2 = 0 THEN 'Even' ELSE 'Odd' END newColumn
FROM my table
The row_id is incrementally assigned based on the id field being sorted ascending. You then use the MOD function to determine if there's a remainder after dividing the number by a value (in this case 2). Result would look like the following:
id name row_id newColumn
1 asdf 1 Odd
2 ts df 2 Even
4 asdfsd 3 Odd
23 gftht 4 Even
32 htssdf 5 Odd

Resources