How to set all values for a column to a max value on a certain WHERE? - sqlite

If I have:
2 baskets of oranges with 7 and 10 each
3 baskets of peaches with 12 and 15 each
then I want to set:
for every orange basket value of maxfruit to 10 and
for every peach basket value of maxfruit to 15
I tried
update baskets set maxfruit = (select max(fruitCount) from baskets b where b.fruit = fruit)
but it just sets everything to 15...

In SQL, when you are referencing a column by its name, the table instance that you end up with is the innermost one, unless you use a table prefix.
So fruit refers to the innermost instance, b. This means that b.fruit and fruit are always the same value.
To refer to the outer table instance, you must use the name of the outer table:
update baskets
set maxfruit = (select max(fruitCount)
from baskets b
where b.fruit = baskets.fruit);
^^^^^^^^
(And instead of b.fruit, you could write just fruit, but that could be unclear.)

your update is just pulling the max from the whole table you can use a sub query to pull out the max for each fruit
UPDATE b
SET b.maxfruit = b2.fruitCount
FROM baskets b
INNER JOIN (SELECT fruit, MAX(fruitCount) AS fruitCount
FROM baskets
GROUP BY fruit) b2 ON b.fruit = b2.fruit

Related

Creating even ranges based on values in an oracle table

I have a big table which is 100k rows in size and the PRIMARY KEY is of the datatype NUMBER. The way data is populated in this column is using a random number generator.
So my question is, can there be a possibility to have a SQL query that can help me with getting partition the table evenly with the range of values. Eg: If my column value is like this:
1
2
3
4
5
6
7
8
9
10
And I would like this to be broken into three partitions, then I would expect an output like this:
Range 1 1-3
Range 2 4-7
Range 3 8-10
It sounds like you want the WIDTH_BUCKET() function. Find out more.
This query will give you the start and end range for a table of 1250 rows split into 20 buckets based on id:
with bkt as (
select id
, width_bucket(id, 1, 1251, 20) as id_bucket
from t23
)
select id_bucket
, min(id) as bkt_start
, max(id) as bkt_end
, count(*)
from bkt
group by id_bucket
order by 1
;
The two middle parameters specify min and max values; the last parameter specifies the number of buckets. The output is the rows between the minimum and maximum bows split as evenly as possible into the specified number of buckets. Be careful with the min and max parameters; I've found poorly chosen bounds can have an odd effect on the split.
This solution works without width_bucket function. While it is more verbose and certainly less efficient it will split the data as evenly as possible, even if some ID values are missing.
CREATE TABLE t AS
SELECT rownum AS id
FROM dual
CONNECT BY level <= 10;
WITH
data AS (
SELECT id, rownum as row_num
FROM t
),
total AS (
SELECT count(*) AS total_rows
FROM data
),
parts AS (
SELECT rownum as part_no, total.total_rows, total.total_rows / 3 as part_rows
FROM dual, total
CONNECT BY level <= 3
),
bounds AS (
SELECT parts.part_no,
parts.total_rows,
parts.part_rows,
COALESCE(LAG(data.row_num) OVER (ORDER BY parts.part_no) + 1, 1) AS start_row_num,
data.row_num AS end_row_num
FROM data
JOIN parts
ON data.row_num = ROUND(parts.part_no * parts.part_rows, 0)
)
SELECT bounds.part_no, d1.ID AS start_id, d2.ID AS end_id
FROM bounds
JOIN data d1
ON d1.row_num = bounds.start_row_num
JOIN data d2
ON d2.row_num = bounds.end_row_num
ORDER BY bounds.part_no;
PART_NO START_ID END_ID
---------- ---------- ----------
1 1 3
2 4 7
3 8 10

BQ array lookup: similar to NTH, but based on index, not position

The NTH function is really useful for extracting nested array elements in BQ, but its utility for a given table depends on each row's nested array containing the same amount of elements, and in the same order. If I have a 2+ column nested array where one column is variable name/ID, and the different instances of the array in different rows have inconsistent naming and/or ordering, is there an elegant way to fetch/pivot a variable based on the variable name/ID?
For example, if row1 has customDimensions array:
index value
4 aaa
23 bbb
70 ccc
and row2 has customDimensions array:
index value
4 ddd
70 eee
I'd want to run something like
SELECT
NTHLOOKUP(70, customdims.index, customdims.value) as val70,
NTHLOOKUP(4, customdims.index, customdims.value) as val4,
NTHLOOKUP(23, customdims.index, customdims.value) as val23
from my_table;
And get:
val70 val4 val23
ccc aaa bbb
eee ddd (null)
I've been able to get this sort of result by making a subquery for each desired index value, unnesting the array in each and filtering WHERE index = (value), but that gets really ugly as the variables pile up. Is there an alternative?
EDIT: Based on Mikhail's answer below (thank you!!) I was able to write my query more elegantly. Not quite as slick as an NTHLOOKUP, but I'll take it:
select id,
max(case when index = 41 then value[OFFSET(0)] else '' end) as val41,
max(case when index = 59 then value[OFFSET(0)] else '' end) as val59
from
(select
concat(array1.thing1, array1.thing2) as id,
cd.index,
ARRAY_AGG(distinct cd.value) as value
FROM my_table g
,unnest(array1) as array1
,unnest(array1.customDimensions) as cd
where index in (41,59)
group by 1,2
order by 1,2
) x
group by 1
order by 1
The best I can "offer" is below (BigQuery Standard SQL)
#standardSQL
WITH `project.dataset.my_table` AS (
SELECT ARRAY<STRUCT<index INT64, value STRING>>
[(4, 'aaa'), (23, 'bbb'), (70, 'ccc')] customDimensions
UNION ALL
SELECT ARRAY<STRUCT<index INT64, value STRING>>
[(4, 'ddd'), (70, 'eee')] customDimensions
)
SELECT cd.index, ARRAY_AGG(cd.value) VALUES
FROM `project.dataset.my_table`,
UNNEST(customDimensions) cd
GROUP BY cd.index
with result as below
Row index values
1 4 aaa
ddd
2 23 bbb
3 70 ccc
eee
I would recommend to stay with this flatten version as it serves most of practical cases I can think of
But if you still want to further pivot this - there are quite a number of posts related to how to pivot in BigQuery
I've been able to get this sort of result by making a subquery for each desired index value, unnesting the array in each and filtering WHERE index = (value), but that gets really ugly as the variables pile up. Is there an alternative?
Yes, you can use a user-defined function to encapsulate the common logic. For example,
CREATE TEMP FUNCTION NTHLOOKUP(
targetIndex INT64,
customDimensions ARRAY<STRUCT<index INT64, value STRING>>
) AS (
(SELECT value FROM UNNEST(customDimensions)
WHERE index = targetIndex)
);
SELECT
NTHLOOKUP(70, customDimensions) as val70,
NTHLOOKUP(4, customDimensions) as val4,
NTHLOOKUP(23, customDimensions) as val23
from my_table;

Zipping rows with the same "key" while joining tables

I have two tables, one with objects, one with properties of the objects. Both tables have a personal ID and a date as "key", but since multiple orders of objects can be done by one person on a single day, it doesn't match well. I do know however, that the entries are entered in the same order in both tables, so it is possible to join on the order, if the personID and date are the same.
This is what I want to accomplish:
Table 1:
PersonID Date Object
1 20-08-2013 A
2 13-11-2013 B
2 13-11-2013 C
2 13-11-2013 D
3 21-11-2013 E
Table 2:
PersonID Date Property
4 05-05-2013 $
1 20-08-2013 ^
2 13-11-2013 /
2 13-11-2013 *
2 13-11-2013 +
3 21-11-2013 &
Result:
PersonID Date Object Property
4 05-05-2013 $
1 20-08-2013 A ^
2 13-11-2013 B /
2 13-11-2013 C *
2 13-11-2013 D +
3 21-11-2013 E &
So what I want to do, is join the two tables and "zip" the group of entries that have the same (PersonID,Date) "key".
Something called "Slick" seems to have this (see here), but I'd like to do it in SQLite.
Any advice would be amazing!
You are on the right track. Why not just do a LEFT JOIN between the tables like
select t2.PersonID,
t2.Date,
t1.Object,
t2.Property
from table2 t2
left join table1 t1 on t2.PersonID = t1.PersonID
order by t2.PersonID
Use a additional column to make every key unique in both tables. For example in SQLite you could use RowIDs to keep track of the order of insertion. To store this additional column in the database itself might be useful for other queries as well, but you do not have to store this.
First add the column ID to both tables, the DDL queries should now look like this: (make sure you do not add the primary key constraint until both tables are filled.
CREATE TABLE table1 (
ID,
PersonID,
Date,
Object
);
CREATE TABLE table2 (
ID,
PersonID,
Date,
Property
);
Now populate the ID column. You can adjust the ID to your liking. Make sure you do this for table2 as well:
UPDATE table1
SET ID =(
SELECT table1.PersonID || '-' || table1.Date || '-' || count( * )
FROM table1 tB
WHERE table1.RowID >= tB.RowID
AND
table1.PersonID == tB.PersonID
AND
table1.Date == tB.Date
);
Now you can join them:
SELECT t2.PersonID,
t2.Date,
t1.Object,
t2.Property
FROM table2 t2
LEFT JOIN table1 t1
ON t2.ID = t1.ID;

How to create a column for even and odd records dynamically?

I have a query in Teradata. I want to add an additional column that would be a VARCHAR.
It should say whether the selected record is even or odd
select id, name, CASE newColumn WHEN --- ???
from my table
Like this
id name newColumn
1 asdf odd
2 ts df even
32 htssdf odd
4 asdfsd even
23 gftht odd
How can I do this
Based on your example, I can't tell how you are sorting the results. You would need to define a sort order. Let's assume you would do it based on the id number.
SELECT id, name,
ROW_NUMBER() OVER(ORDER BY id) row_id,
CASE WHEN ROW_NUMBER() OVER(ORDER BY id) MOD 2 = 0 THEN 'Even' ELSE 'Odd' END newColumn
FROM my table
The row_id is incrementally assigned based on the id field being sorted ascending. You then use the MOD function to determine if there's a remainder after dividing the number by a value (in this case 2). Result would look like the following:
id name row_id newColumn
1 asdf 1 Odd
2 ts df 2 Even
4 asdfsd 3 Odd
23 gftht 4 Even
32 htssdf 5 Odd

SQLITE query, if last row matches criteria, check row preceding it matches different criteria

I'm finding it hard to get my head around this problem, and I couldn't find any answers to this specific problem anywhere:
Say I have a table like this, I'm just using fruit as an example:
Fruit | Date | Value
=================================
Apple | 1 | other_random_value
Apple | 2 | some_value_1
Apple | 3 | some_value_2
Pear | 1 | other_random_value
Pear | 2 | unexpected_value_1
Pear | 3 | some_value_2
Everything will be ordered by Fruit, then Date.
Basically, if the last row (for each fruit) is some_value_2, but the one preceding it is not some_value_1, I want to match just those fruits (i.e. in this case, Pear).
So, some_value_2 I always expect to come after a row with a certain value for that particular fruit, and if it doesn't I want to flag errors against those particular fruits. It would also be nice to match cases where nothing precedes some_value_2 as well, though if this is too complicated I could match it seperately and just check that some_value_2 is not the first row, which I don't imagine would be a difficult query.
EDIT: Also, being able to match any consecutive rows where the preceding value is unexpected would be nice, though I mainly care about the last 2 rows. So if being able to match all consecutive rows results in a simpler and better performing query, then I might go with that. I'm going to be doing an INSERT at the same time (into an alert table), so if I could flag it as an ERROR if it's the last two rows and a WARNING if it's not, that would be really nifty. Though I wouldn't know where to start with writing a query that does that. Also having a query that performs well is a must, as I will be using this across a large dataset.
EDIT:
This is what I used in the end, it's quite slow, but if I index Date, it's not so bad:
SELECT c.Id AS CId, c.Fruit AS CFruit,
c.Date AS CDate, c.Value AS CValue,
(SELECT Id
FROM fruits
WHERE Fruit = c.Fruit
AND Date >= c.Date
AND Id > c.Id
ORDER BY Date, Id) AS NId, n.Fruit AS NFruit,
n.Date AS NDate, n.Value AS NValue
FROM fruits AS c
JOIN fruits AS n ON n.Id = NId
ORDER BY c.Date, c.Id
I might try Joachim's method again at some point, as I realised I'm getting a lot of results I don't really care much about. Or I might even try incorporating the two somehow and delegate to INFO/ERROR as appropriate...
Solved: I used the same SELECT statement that I used to get NId, and used SELECT COUNT(*) instead of SELECT Id. This told me the number of results after the current one. Then I just used a CASE operator to turn it into a boolean field called Latest :). So I effectively combined Nicolas' and Joachim's methods. Performance still seems OK, probably because SQLite caches the results.
SQLite is (as far as I know) a bit low on efficient operators for this, so this is the best I can come up with for now :)
SELECT Fruit FROM fruits
WHERE ( SELECT COUNT(*) FROM fruits f
WHERE f.fruit=fruits.fruit
AND f.date > fruits.date ) = 1
AND fruits.value <> 'some_value_1'
INTERSECT
SELECT Fruit FROM fruits
WHERE ( SELECT COUNT(*) FROM fruits f
WHERE f.fruit=fruits.fruit
AND f.date > fruits.date ) = 0
AND fruits.value = 'some_value_2'
An SQLfiddle to test with.
I named the table fruits. This query gets you the preceding date for a ‘key‘ (fruit + date)
select fruit, date, value currvalue,
(select max(date) precedingDate
from fruits p
where p.fruit = c.fruit
and p.date < c.date) precedingdate
from fruits c ;
From there we can get the precedent value for each key
select f1.*, precedingdate, f2.value precedingvalue
from
fruits f1 join
(select fruit, date, value,
(select max(date) precedingDate
from fruits p
where p.fruit = c.fruit
and p.date < c.date) precedingdate
from fruits c) f2
on f1.fruit = f2.fruit and f1.date = precedingdate ;
For all the rows that have a previous row, you get both the current and preceding date and the current and preceding value.
Edit : we add an id used to choose when there are several identical previous date (see comment below)
I will be using intermediate views for the sake of clarity but you could write one big query.
As before, what's the previous date :
create view VFruitsWithPreviousDate
as select fruit, date, value, id,
(select max(date)
from fruits p
where p.fruit = c.fruit
and p.date < c.date) previousdate
from fruits c ;
What's the previous id :
create view VFruitsWithPreviousId
as select fruit, date, value,
(select max(id)
from fruits f
where v.fruit = f.fruit AND
v.previousdate = f.date) previousID
from VFruitsWithPreviousDate v ;
A query for all consecutive rows :
select f.*, v.value
from fruits f
join VFruitsWithPreviousId v on f.id = v.previousid ;
You can then add the condition WHERE f.Value = 'some_value_2' AND v.value != 'some_value_1'

Resources