Matching records between two tables

Matching records between two tables - asp.net

I have the below code to do matching between two different tables. The code only updates the first record as "Matched".
I want to compare each record in ID field from T1 if it present in T2, e.g. To check if A present in T2 then go to next record in T1 and check B if it present in T2 through loop until all records in T1 matched
Table 1
ID
A
B
C
Table 2
ID
A
B
Expected Matching Results
ID
A
B
Any help please
If rs2("ID").Value = rs1("ID").Value Then
rs2.MoveNext()
Do While Not rs2.EOF()
rs1("Matching").Value = "Matched"
rs1.Update()
rs2.MoveFirst()
Loop
End If

Assuming I'm understanding your problem correctly, you just want to update the "Matching" column in T1 if there's a corresponding ID record in T2. Doing this in SQL would be ideal, as Peter said.
If you really want to loop, Peter is correct that you have an endless loop: you exit when you hit the end of the record set, but at the end of each iteration you are resetting back to the first record.
I don't know why you are looping through rs2 when you already found a match. Once you know that rs1("ID").Value = rs2("ID").Value then just update the Matching record in rs1 and call rs2.MoveFirst() so you can start back at the top when trying to find a match for the next ID in T1. No loop is required here.

Related

Oracle complex update statement

I have a table where data is as given below
My requirement is to update this table in such a way that, within a group (grouping will be done based on column A), if there is value in column B, same value should be updated to other rows in column B having null values within that group. If column B have null value for all the records within that group, then new sequence should be generated.Also I can't use pl/SQL block for this. I need to write a SQL query to perform this
My expected output is given below

You won't be able to use the sequence_name.nextval directly in your update statement, as the value will increase with every row, meaning that you would end up with different values in your b column for each a value.
The best way round that I can think of doing this is to first of all ensure every set of all-null b values has a single value in it, which you can do as follows:
merge into t1 tgt
using (select a,
b,
rid,
row_number() over (partition by a order by b) rn
from (select a,
b,
rowid rid,
max(b) over (partition by a) max_b
from t1)
where max_b is null) src
on (tgt.rowid = src.rid and src.rn = 1)
when matched then
update set tgt.b = t1_seq.nextval;
This finds the rows which have all the b values as null for a given a, and then updates one of them to have the next sequence value.
Once you've done that, you can then go ahead and populate the null values based on the max b value for that group, like so:
update t1
set b = (select max(b) from t1 t2 where t1.a = t2.a)
where b is null;
See this LiveSQL script for evidence that this works.

Something like this:
update table t1
set B = (select nvl(max(b),sequence_name.nextval) from table where a=t1.a)
Ps: I couldn't test this.
Indeed we can't use sequences in correlated subqueries... :(
One workaround is the use of merge :
merge into teste t1
using (select max(b) as m,a from teste group by a) t2
on (t1.a=t2.a)
when matched then update set b= nvl(t2.m,seq_teste.nextval);
One thing: that nextval will ALWAYS be consumed even when it won't be inserted. If you don't want that, you might need some pl/sql code.

How to get the total quantity of results using count(*)?

i need to get the total quantity of results for each person but i get ...
resultado
MY QUERY..
select t.fecha_hora_timbre,e.nombre,e.apellido,d.descripcion as departamento_trabaja, t.fecha,count(*)
from fulltime.timbre t, fulltime.empleado e, fulltime.departamento d
where d.depa_id=e.depa_id and t.codigo_empleado=e.codigo_empleado and
trunc(t.fecha) between trunc(to_date('15/02/2017','dd/mm/yyyy')) and trunc(to_date('14/03/2017','dd/mm/yyyy'))
group by t.fecha_hora_timbre,e.nombre,e.apellido,d.descripcion, t.fecha
Expected data...
NOMBRE | APELLIDO | DEPARTAMENTO_TRABAJA | VECES_MARCADAS(count)
MARIA TARCILA IGLESIAS BECERRA ALCALDIA 4
KATHERINE TATIANA SEGOVIA FERNANDEZ ALCALDIA 10
FREDDY AGUSTIN VALDIVIESO VALLEJO ALCALDIA 3
UPDATE..
select e.nombre,e.apellido,d.descripcion as departamento_trabaja,COUNT(*)
from fulltime.timbre t, fulltime.empleado e, fulltime.departamento d
where d.depa_id=e.depa_id and t.codigo_empleado=e.codigo_empleado and
trunc(t.fecha) between trunc(to_date('15/02/2017','dd/mm/yyyy')) and trunc(to_date('14/03/2017','dd/mm/yyyy'))
group by t.fecha_hora_timbre,e.nombre,e.apellido,d.descripcion, t.fecha

You should only select and group by the non-aggregate columns you actually want to count against. At the moment you're including the fecha_hora_timbre and fechacolumns in each row, so you're counting the unique combinations of those columns as well as the name/department information you actually want to count.
select e.nombre, e.apellido, d.descripcion as departamento_trabaja,
count(*) a veces_marcadas
from fulltime.timbre t
join fulltime.empleado e on t.codigo_empleado=e.codigo_empleado
join fulltime.departamento d on d.depa_id=e.depa_id
where t.fecha >= to_date('15/02/2017','dd/mm/yyyy')
and t.fecha < to_date('15/03/2017','dd/mm/yyyy')
group by e.nombre, e.apellido, d.descripcion
I've removed the extra columns. Notice that they have gone from both the select list and the group-by clause. If you have a non-aggregate column in the select list that isn't in the group-by you'll get an ORA-00937 error; but if you have a column in the group-by that isn't in the select list then it will still group by that even though you can't see it and you just won't get the results you expect.
I've also changed from old-style join syntax to modern syntax. And I've changed the date comparison; firstly because doing trunc() as part of trunc(to_date('15/02/2017','dd/mm/yyyy')) is pointless - you already know the time part is midnight, so the trunc doesn't achieve anything. But mostly so that if there is an index on fecha that index can be used. If you do trunc(f.techa) then the value of every column value has to be truncated, which stops the index being used (unless you have a function-based index). As between in inclusive, using >= and < with one day later on the higher limit should have the same effect overall.

Optimization of Oracle Query

I have following query :
SELECT distinct A1 ,sum(total) as sum_total FROM
(
SELECT A1, A2,A3,A4,A5,A6,COUNT(A7) AS total,A8
FROM (
select a.* from table1 a
left join (select * from table_reject where name = 'smith') b on A.A3 = B.B3 and A.A9 =B.B2
where B.ID is null
) t1
WHERE A8 >= NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
GROUP BY
CUBE(A1, A2,A3,A4,A5,A6,A8)
)INN
WHERE
INN.A1 IS NOT NULL AND
INN.A2 IS NULL AND
INN.A3 IS NULL AND
INN.A4 IS NULL AND
INN.A5 IS NULL AND
INN.A6 is NULL AND
INN.A8 IS NOT NULL
GROUP BY A1
ORDER BY sum_total DESC ;
Total number records in table1 is around 8 million.
My problem is i need to optimize the above query in best possible way.I did tried to make index on column A8 of table1 and creating the index helped me to decrease the cost of query but execution time of query is more or less same when there was no index on the table1.
Any help would be appreciated.
Thanks

CUBE operation on large data set is really expensive, so you need to check do you really need all that data in inner query. because i see you are doing COUNT in inner and then on the outer query you have SUM of counts. so in other words, give me the row count of A7 at for all combination A1-A8 (-A7). then get only SUM for selected combinations filtered by WHERE clause. we can sure optimize this by limiting CUBE on certain column itself but very obvious things so far i have notice are as follows.
if you use below query and have right index o Table1 and Table_reject then both query can utilize the Index and reduce the data set needs to be join and further processed.
I am not 100% sure but yes Partial CUBE processing is possible and need to check that.
clustered index --> Table1 need on A8 And Table_Reject need clustered index on NAME.
non-clustered index--> Table1 need on A3,A9and Table_reject need on B3,B2
SELECT qry1.
(
SELECT A1, A2,A3,A4,A5,A6,A7,A8
FROM table1
WHERE A8 >= NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
)qry1
LEFT JOIN
(
select B3,B2,ID
from table_reject
where name = 'smith'
)qry2
ON qry1.A3 = qry2.B3 and qry1.A9=qry2.B2
WHERE qry2.ID IS NULL
EDIT1:
I tried to find out what will be the difference in CUBE operator result if you do it on all Columns or you do it on only columns that you need it in result set. what I found is the way CUBE function works you do not need to perform CUBE on all columns. because at the end you just care about combinations generated by CUBE where A1 and A8 is NOT NULL.
Try this link and see the output.
enter link description here
Query 1 and Query2 is just inner most queries to compare the CUBE result set.
Query3 and Query4 is the same query that you are trying and you see the results are same in both case.
DECLARE #NEXT_DAY DATE = NEXT_DAY ( trunc(to_date('17/09/2013 12:00:00','dd/mm/yyyy hh24:mi:ss')) ,'SUN' )
SELECT distinct A1 ,sum(total) as sum_total FROM
(
SELECT A1,COUNT(A7) AS total,A8
FROM (
select a.a1,a.a7,a.a8
from table1 a
left join (select * from table_reject where name = 'smith') b
on A.A3 = B.B3 and A.A9 =B.B2
where B.ID is null
) t1
WHERE A8 >= #NEXT_DAY
GROUP BY
CUBE(A1,A8)
)INN
WHERE INN.A1 IS NOT NULL AND
INN.A8 IS NOT NULL
GROUP BY A1
ORDER BY sum_total DESC ;
EDIT3
As I mentioned in the Comment this is a Round3 update. i can not change comment but i meant Edit3 instead Round3.
well the new change in your query is adding the WHERE A8 >= #NEXT_DAY condition in the inner most left join select where A8 >= #NEXT_DAY AND B.ID is null as well. that has improved the selection very much.
in your last comment you mentioned that query is taking 30-35 second and as you change the value of A8 it keep increasing. now with the execution time you didn't mentioned how much data is in the result set. why that is important? because if my query is returning 5M rows as a final result set that will going to spend 90% time in just droping that data on to UI, or output file what ever output method you are using. but actual performance should be measured how soon the query has started giving first couple of rows. because by that time Optimizer has already decided the execution paln and DB is executing that plan. yet i agree that if query is returning 100 rows and taking 10 seconds then something can be wrong with execution plan.
to demo that what I did is I created the dummy data. and perfomred your query against it.
i have table Test_CubeData with 9M rows in it with the same column numbers and data type you explained for your Table1. I have second table Table_Reject with 80K rows with number of columns and its datatype I figured out from query. To test the extreme side of this table; name column has only one value "smith" and ID is null for all 80K rows. so column values that can affect inner left join result will be B2 and B3.
in these tests i do not have any index on both tables. both are heap. and you see the results are in still few seconds with acceptable range of data in result set. as my result data set increases the completion time increases. if i create explained indexes then it will give me Index Seek operation
for all these tested cases. but at certain point that index will also exhaust and become Index Scan.
one sure example would be if my filter value for A8 column is smallest date value exist in that column. in that case Optimizer will see that all 9M rows need to be participate in inner select and CUBE and lot of data will be get processed in memory. which is expected. on the other hand lets see the another example of queries. i have unique 32873 values in A8 column and those values are almost equally distributed among 9M rows. so per single A8 values there are 260 to 300 rows. now if I execute the query for any single value smallest, largest, or any thing in between the query execution time should not change.
notice the highlighted text in each image below that indicated what the value of A8 filter is chosen,
important columns only in the select list instead using *, added A8 filter in the inner left join query, execution plan showing the TableScan operation on both table, query execution time in second,and total number of rows return by the query.
I hope that this will clear some doubts on performance of your query and will help you to set right expectation.
**Table Row Counts**
**TableScan_InnerLeftJoin**
**TableScan_FullQuery_248Rows**
**TableScan_FullQuery_5K**
**TableScan_FullQuery_56K**
**TableScan_FullQuery_480k**

You're calculating a potentially very large cube result on seven columns, and then discarding all the results except those that are logically just a group_by on column A1.
I suggest that you rewrite the query to just group by A1.

Cognos: Count the number of occurences of a distinct id

I'm making a report in Cognos Report Studio and I'm having abit of trouble getting a count taht I need. What I need to do is count the number of IDs for a department. But I need to split the count between initiated and completed. If an ID occures more than once, it is to be counted as completed. The others, of course, will be initiated. So I'm trying to count the number of ID occurences for a distinct ID. Here is the query I've made in SQl Developer:
SELECT
COUNT((CASE WHEN COUNT(S.RFP_ID) > 8 THEN MAX(CT.GCT_STATUS_HISTORY_CLOSE_DT) END)) AS "Sales Admin Completed"
,COUNT((CASE WHEN COUNT(S.RFP_ID) = 8 THEN MIN(CT.GCT_STATUS_HISTORY_OPEN_DT) END)) as "Sales Admin Initiated"
FROM
ADM.B_RFP_WC_COVERAGE_DIM S
JOIN ADM.B_GROUP_CHANGE_REQUEST_DIM CR
ON S. RFP_ID = CR.GCR_RFP_ID
JOIN ADM.GROUP_CHANGE_TASK_FACT CT
ON CR.GROUP_CHANGE_REQUEST_KEY = CT.GROUP_CHANGE_REQUEST_KEY
JOIN ADM.B_DEPARTMENT_DIM D
ON D.DEPARTMENT_KEY = CT.DEPARTMENT_RESP_KEY
WHERE CR.GCR_CHANGE_TYPE_ID = '20'
AND S.RFP_LOB_IND = 'WC'
AND S.RFP_AUDIT_IND = 'N'
AND CR.GCR_RECEIVED_DT BETWEEN '01-JAN-13' AND '31-DEC-13'
AND D.DEPARTMENT_DESC = 'Sales'
AND CT.GCT_STATUS_IND = 'C'
GROUP BY S.RFP_ID ;
Now this works. But I'm not sure how to translate taht into Cognos. I tried doing a CASE taht looked liek this(this code is using basic names such as dept instead of D.DEPARTMENT_DESC):
CASE WHEN dept = 'Sales' AND count(ID for {DISTINCT ID}) > 1 THEN count(distinct ID)END)
I'm using count(distinct ID) instead of count(maximum(close_date)). But the results would be the same anyway. The "AND" is where I think its being lost. It obviously isn't the proper way to count occurences. But I'm hoping I'm close. Is there a way to do this with a CASE? Or at all?
--EDIT--
To make my question more clear, here is an example:
Say I have this data in my table
ID
---
1
2
3
4
2
5
5
6
2
My desired count output would be:
Initiated Completed
--------- ---------
4 2
This is because two of the distinct IDs (2 and 5) occure more than once. So they are counted as Completed. The ones that occure only once are counted as Initiated. I am able to do this in SQl Dev, but I can't figure out how to do this in Cognos Report Studio. I hope this helps to better explaine my issue.

Oh, I didn't quite got it originally, amending the answer.
But it's still easiest to do with 2 queries in Report Studio. Key moment is that you can use a query as a source for another query, guaranteeing proper group by's and calculations.
So if you have ID list in the table in Report Studio you create:
Query 1 with dataitems:
ID,
count(*) or count (1) as count_occurences
status (initiated or completed) with a formula: if (count_occurences > 1) then ('completed') else ('initiated').
After that you create a query 2 using query one as source with just 2 data items:
[Query1].[Status]
Count with formula: count([Query1].[ID])
That will give you the result you're after.
Here's a link to doco on how to nest queries:
http://pic.dhe.ibm.com/infocenter/cx/v10r1m0/topic/com.ibm.swg.ba.cognos.ug_cr_rptstd.10.1.0.doc/c_cr_rptstd_wrkdat_working_with_queries_rel.html?path=3_3_10_6#cr_rptstd_wrkdat_working_with_queries_rel

Randomize numbers in VB.NET without using a range i.e. Min and Max

I was wondering if anyone can point me in the right direction please? Say I am pulling the following table, I would then like to select an ID randomly. I understand how to select a random number using a Randomize() call followed by the relevant syntax but I want to pre-define the range.
i.e. Table Data
ID | Name
4345 Mike
3456 Lee
4567 John
There will be many more names but for this example you could use 3 or 4 etc..
Please help I'm starting to itch :o|

Just to make sure I understand what you want:
Given a table, you want to randomly select one of the ID values from that table.
If so, this should do it:
Dim rand As New Random()
Dim record As Integer = rand.[Next](0, myDataTable.Rows.Count)
Dim randomID As Integer = CInt(myDataTable.Rows(record)("ID"))
We have all the information we need to randomly select a row, and by extension randomly select one of the ID values in the table.

In old Vb you would do
Dim i as integer
i = (Rnd * (maxval-minval)) + minval
Since rnd returns a random number between 0 and 1 you would scale the number to the right range.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Matching records between two tables - asp.net

Related

Oracle complex update statement

How to get the total quantity of results using count(*)?

Optimization of Oracle Query

Cognos: Count the number of occurences of a distinct id

Randomize numbers in VB.NET without using a range i.e. Min and Max

Categories

Resources