Oracle complex update statement - oracle11g

I have a table where data is as given below
My requirement is to update this table in such a way that, within a group (grouping will be done based on column A), if there is value in column B, same value should be updated to other rows in column B having null values within that group. If column B have null value for all the records within that group, then new sequence should be generated.Also I can't use pl/SQL block for this. I need to write a SQL query to perform this
My expected output is given below

You won't be able to use the sequence_name.nextval directly in your update statement, as the value will increase with every row, meaning that you would end up with different values in your b column for each a value.
The best way round that I can think of doing this is to first of all ensure every set of all-null b values has a single value in it, which you can do as follows:
merge into t1 tgt
using (select a,
b,
rid,
row_number() over (partition by a order by b) rn
from (select a,
b,
rowid rid,
max(b) over (partition by a) max_b
from t1)
where max_b is null) src
on (tgt.rowid = src.rid and src.rn = 1)
when matched then
update set tgt.b = t1_seq.nextval;
This finds the rows which have all the b values as null for a given a, and then updates one of them to have the next sequence value.
Once you've done that, you can then go ahead and populate the null values based on the max b value for that group, like so:
update t1
set b = (select max(b) from t1 t2 where t1.a = t2.a)
where b is null;
See this LiveSQL script for evidence that this works.

Something like this:
update table t1
set B = (select nvl(max(b),sequence_name.nextval) from table where a=t1.a)
Ps: I couldn't test this.
Indeed we can't use sequences in correlated subqueries... :(
One workaround is the use of merge :
merge into teste t1
using (select max(b) as m,a from teste group by a) t2
on (t1.a=t2.a)
when matched then update set b= nvl(t2.m,seq_teste.nextval);
One thing: that nextval will ALWAYS be consumed even when it won't be inserted. If you don't want that, you might need some pl/sql code.

Related

SQLite order results by smallest difference

In many ways this question follows on from my previous one. I have a table that is pretty much identical
CREATE TABLE IF NOT EXISTS test
(
id INTEGER PRIMARY KEY,
a INTEGER NOT NULL,
b INTEGER NOT NULL,
c INTEGER NOT NULL,
d INTEGER NOT NULL,
weather INTEGER NOT NULL);
in which I would typically have entries such as
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
INSERT INTO test (a,b,c,d) VALUES(1,2,5,5,10100306);
INSERT INTO test (a,b,c,d) VALUES(1,5,5,5,11100306);
INSERT INTO test (a,b,c,d) VALUES(5,5,5,5,21101306);
Typically this table would have multiple rows with the some/all of b, c and d values being identical but with different a and weather values. As per the answer to my other question I can certainly issue
WITH cte AS (SELECT *, DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn FROM test where a = 1) SELECT * FROM cte WHERE rn < 3;
No issues thus far. However, I have one further requirement which arises as a result of the weather column. Although this value is an integer it is in fact a composite where each digit represents a "banded" weather condition. Take for example weather = 20100306. Here 2 represents the wind direction divided up into 45 degree bands on the compass, 0 represents a wind speed range, 1 indicates precipitation as snow etc. What I need to do now while obtaining my ordered results is to allow for weather differences. Take for example the first two rows
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
Though otherwise similar they represent rather different weather conditions - the fourth number is four as opposed to 0 indicating a higher precipitation intensity brand. The WITH cte... above would rank the first two rows at the top which is fine. But what if I would rather have the row that differs the least from an incoming "weather condition" of 30130306? I would clearly like to have the second row appearing at the top. Once again, I can live with the "raw" result returned by WITH cte... and then drill down to the right row based on my current "weather condition" in Java. However, once again I find myself thinking that there is perhaps a rather neat way of doing this in SQL that is outwith my skill set. I'd be most obliged to anyone who might be able to tell me how/whether this can be done using just SQL.
You can sort the results 1st by DENSE_RANK() and 2nd by the absolute difference of weather and the incoming "weather condition":
WITH cte AS (
SELECT *,
DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn
FROM test
WHERE a = 1
)
SELECT a,b,c,d,weather
FROM cte
WHERE rn < 3
ORDER BY rn, ABS(weather - ?);
Replace ? with the value of that incoming "weather condition".

SQL Query assistance - Looping through data query

I have two tables. Config and Data. Config table has info to define what I call "Predefined Points". The columns are configId, machineId, iotype, ioid, subfield and predeftype. I have a second table that contains all the data for all the items in the config table linked by configId. Data table contains configId, timestamp, value.
I am trying to return each row from the config table with 2 new columns in the result which would be min timestamp of this particular predefined point and max timestamp of this particular predefined point.
Pseudocode would be
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId where configId = (select configId from TrendConfig)
Where the subquery would return multiple values.
Any idea how to formulate this?
Try an inner join:
select a.*, b.min(timestamp), b.max(timestamp)
from config a
inner join data b
on a.configId = b.configID
I was able to find an answer using: Why can't you mix Aggregate values and Non-Aggregate values in a single SELECT?
The solution was indeed GROUP BY as CL mentioned above.
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId group by a.configId

How to get the total quantity of results using count(*)?

i need to get the total quantity of results for each person but i get ...
resultado
MY QUERY..
select t.fecha_hora_timbre,e.nombre,e.apellido,d.descripcion as departamento_trabaja, t.fecha,count(*)
from fulltime.timbre t, fulltime.empleado e, fulltime.departamento d
where d.depa_id=e.depa_id and t.codigo_empleado=e.codigo_empleado and
trunc(t.fecha) between trunc(to_date('15/02/2017','dd/mm/yyyy')) and trunc(to_date('14/03/2017','dd/mm/yyyy'))
group by t.fecha_hora_timbre,e.nombre,e.apellido,d.descripcion, t.fecha
Expected data...
NOMBRE | APELLIDO | DEPARTAMENTO_TRABAJA | VECES_MARCADAS(count)
MARIA TARCILA IGLESIAS BECERRA ALCALDIA 4
KATHERINE TATIANA SEGOVIA FERNANDEZ ALCALDIA 10
FREDDY AGUSTIN VALDIVIESO VALLEJO ALCALDIA 3
UPDATE..
select e.nombre,e.apellido,d.descripcion as departamento_trabaja,COUNT(*)
from fulltime.timbre t, fulltime.empleado e, fulltime.departamento d
where d.depa_id=e.depa_id and t.codigo_empleado=e.codigo_empleado and
trunc(t.fecha) between trunc(to_date('15/02/2017','dd/mm/yyyy')) and trunc(to_date('14/03/2017','dd/mm/yyyy'))
group by t.fecha_hora_timbre,e.nombre,e.apellido,d.descripcion, t.fecha
You should only select and group by the non-aggregate columns you actually want to count against. At the moment you're including the fecha_hora_timbre and fechacolumns in each row, so you're counting the unique combinations of those columns as well as the name/department information you actually want to count.
select e.nombre, e.apellido, d.descripcion as departamento_trabaja,
count(*) a veces_marcadas
from fulltime.timbre t
join fulltime.empleado e on t.codigo_empleado=e.codigo_empleado
join fulltime.departamento d on d.depa_id=e.depa_id
where t.fecha >= to_date('15/02/2017','dd/mm/yyyy')
and t.fecha < to_date('15/03/2017','dd/mm/yyyy')
group by e.nombre, e.apellido, d.descripcion
I've removed the extra columns. Notice that they have gone from both the select list and the group-by clause. If you have a non-aggregate column in the select list that isn't in the group-by you'll get an ORA-00937 error; but if you have a column in the group-by that isn't in the select list then it will still group by that even though you can't see it and you just won't get the results you expect.
I've also changed from old-style join syntax to modern syntax. And I've changed the date comparison; firstly because doing trunc() as part of trunc(to_date('15/02/2017','dd/mm/yyyy')) is pointless - you already know the time part is midnight, so the trunc doesn't achieve anything. But mostly so that if there is an index on fecha that index can be used. If you do trunc(f.techa) then the value of every column value has to be truncated, which stops the index being used (unless you have a function-based index). As between in inclusive, using >= and < with one day later on the higher limit should have the same effect overall.

Advanced SQLite Update table query

I am trying to update Table B of a database looking like this:
Table A:
id, amount, date, b_id
1,200,6/31/2012,1
2,300,6/31/2012,1
3,400,6/29/2012,2
4,200,6/31/2012,1
5,200,6/31/2012,2
6,200,6/31/2012,1
7,200,6/31/2012,2
8,200,6/31/2012,2
Table B:
id, b_amount, b_date
1,0,0
2,0,0
3,0,0
Now with this query I get all the data I need in one select:
SELECT A.*,B.* FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,0,0
3,400,6/29/2012,1,1,0,0
Now, I just want to copy the selected column amount to b_amount and date to b_date
b_amount=amount, b_date=date
resulting in
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,200,6/31/2012
3,400,6/29/2012,1,1,400,6/29/2012
I've tried COALESCE() without success.
Does someone experienced have a solution for this?
Solution:
Thanks to the answers below, I managed to come up with this. It is probably not the most efficient way but it is fine for a one time only update. This will insert for you the first corresponding entry of each group.
REPLACE INTO A SELECT id, amount, date FROM
(SELECT A.id, A.amount, B.id as Bid FROM A INNER JOIN B ON (B.id=A.B_id)
ORDER BY A.id DESC)
GROUP BY Bid;
So what you are looking for seems to be a JOIN inside of an UPDATE query. In mySQL you would use
UPDATE B INNER JOIN A ON B.id=A.b_id SET B.amount=A.amount, B.date=A.date;
but this is not supported by sqlite as this probably related question points out. However, there is a workaround using REPLACE:
REPLACE INTO B
SELECT B.id, A.amount, A.date FROM A
LEFT JOIN B ON B.id=A.b_id
WHERE A.b_id>0 GROUP BY B.id;
The query will simply fill in the values of table B for all columns which should keep their state and fill in the values of table A for the copied values. Make sure the order of the columns in the SELECT statement meet your column order of table B and all columns are mentioned or you will loose these field's data. This is probably dangerous for future changes on table B. So keep in mind to change the column order/presence of this query when changing table B.
Something a bit off topic, because you did not ask for that: A.b_id is obviously a foreign key to B.id. It seems you are using the value 0 for the foreign key to express that there is no corresponding entry in B. (Inferred from your SELECT with WHERE A.b_id>0.) You should consider using the null value for that. When you are using INNER JOIN then instead of LEFT JOIN you can drop the WHERE clause entirely. The DBS will then sort out all unsatisfied relations.
WARNING Some RDBMS will return 2 rows as you show above. Others will return the Cartesian product of the rows i.e. A rows times B rows.
One tricky method is to generate SQL that is then executed
SELECT "update B set b.b_amount = ", a.amount, ", b.b_date = ", a.date,
" where b.id = ", a.b_id
FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
Now add the batch terminator and execute this SQL. The query result should look like this
update B set b.b_amount = 200, b.b_date = 6/31/2012 where b.id = 1
update B set b.b_amount = 400, b.b_date = 6/29/2012 where b.id = 3
NOTE: Some RDBMS will handle dates differently. Some require quotes.

Fastest Way to Count Distinct Values in a Column, Including NULL Values

The Transact-Sql Count Distinct operation counts all non-null values in a column. I need to count the number of distinct values per column in a set of tables, including null values (so if there is a null in the column, the result should be (Select Count(Distinct COLNAME) From TABLE) + 1.
This is going to be repeated over every column in every table in the DB. Includes hundreds of tables, some of which have over 1M rows. Because this needs to be done over every single column, adding Indexes for every column is not a good option.
This will be done as part of an ASP.net site, so integration with code logic is also ok (i.e.: this doesn't have to be completed as part of one query, though if that can be done with good performance, then even better).
What is the most efficient way to do this?
Update After Testing
I tested the different methods from the answers given on a good representative table. The table has 3.2 million records, dozens of columns (a few with indexes, most without). One column has 3.2 million unique values. Other columns range from all Null (one value) to a max of 40K unique values. For each method I performed four tests (with multiple attempts at each, averaging the results): 20 columns at one time, 5 columns at one time, 1 column with many values (3.2M) and 1 column with a small number of values (167). Here are the results, in order of fastest to slowest
Count/GroupBy (Cheran)
CountDistinct+SubQuery (Ellis)
dense_rank (Eriksson)
Count+Max (Andriy)
Testing Results (in seconds):
Method 20_Columns 5_Columns 1_Column (Large) 1_Column (Small)
1) Count/GroupBy 10.8 4.8 2.8 0.14
2) CountDistinct 12.4 4.8 3 0.7
3) dense_rank 226 30 6 4.33
4) Count+Max 98.5 44 16 12.5
Notes:
Interestingly enough, the two methods that were fastest (by far, with only a small difference in between then) were both methods that submitted separate queries for each column (and in the case of result #2, the query included a subquery, so there were really two queries submitted per column). Perhaps because the gains that would be achieved by limiting the number of table scans is small in comparison to the performance hit taken in terms of memory requirements (just a guess).
Though the dense_rank method is definitely the most elegant, it seems that it doesn't scale well (see the result for 20 columns, which is by far the worst of the four methods), and even on a small scale just cannot compete with the performance of Count.
Thanks for the help and suggestions!
SELECT COUNT(*)
FROM (SELECT ColumnName
FROM TableName
GROUP BY ColumnName) AS s;
GROUP BY selects distinct values including NULL. COUNT(*) will include NULLs, as opposed to COUNT(ColumnName), which ignores NULLs.
I think you should try to keep the number of table scans down and count all columns in one table in one go. Something like this could be worth trying.
;with C as
(
select dense_rank() over(order by Col1) as dnCol1,
dense_rank() over(order by Col2) as dnCol2
from YourTable
)
select max(dnCol1) as CountCol1,
max(dnCol2) as CountCol2
from C
Test the query at SE-Data
A development on OP's own solution:
SELECT
COUNT(DISTINCT acolumn) + MAX(CASE WHEN acolumn IS NULL THEN 1 ELSE 0 END)
FROM atable
Run one query that Counts the number of Distinct values and adds 1 if there are any NULLs in the column (using a subquery)
Select Count(Distinct COLUMNNAME) +
Case When Exists
(Select * from TABLENAME Where COLUMNNAME is Null)
Then 1 Else 0 End
From TABLENAME
You can try:
count(
distinct coalesce(
your_table.column_1, your_table.column_2
-- cast them if you want replace value from column are not same type
)
) as COUNT_TEST
Function coalesce help you combine two columns with replace not null values.
I used this in mine case and success with correctly result.
Not sure this would be the fastest but might be worth testing. Use case to give null a value. Clearly you would need to select a value for null that would not occur in the real data. According to the query plan this would be a dead heat with the count(*) (group by) solution proposed by Cheran S.
SELECT
COUNT( distinct
(case when [testNull] is null then 'dbNullValue' else [testNull] end)
)
FROM [test].[dbo].[testNullVal]
With this approach can also count more than one column
SELECT
COUNT( distinct
(case when [testNull1] is null then 'dbNullValue' else [testNull1] end)
),
COUNT( distinct
(case when [testNull2] is null then 'dbNullValue' else [testNull2] end)
)
FROM [test].[dbo].[testNullVal]

Resources