I have 2 tables, say T1 and T2, with a 1-n relationship (n can be 0). I need to join the 2 tables, but only on the latest T2. So the query I made was like:
select * from t1 left join t2 on t1.a = t2.b group by t1.a having t2.c=max(t2.c)
Problem is that if there a no lines on T2 the query does not return a line, despite the LEFT JOIN. I think this is incorrect in regards to the SQL standard.
So does anyone know how to have a result even when n=0?
HAVING is executed after the grouping.
And any comparison with NULL fails, so this filters out rows with n=0.
In any case, t2.c=max(t2.c) does not make sense and does not do what you want.
If you have SQLite 3.7.11 or later, you can use max() in the result to select the row from which the GROUP BY results for the other columns come:
SELECT *, max(t2.c)
FROM t1 LEFT JOIN t2 ON t1.a = t2.b
GROUP BY t1.a
Related
I know there are many topics regarding this question but none actually helped me solve my problem. I am still sort of new when it comes to databases and I came across this problem.
I have a table named tests which contains two columns: id and date.
I want to calculate the average difference of days between a couple values.
Say select date from tests where id=1 which will provide me with a list of dates. I want to calculate the avg difference between those days.
Table "tests"
1|2018-03-13
1|2018-03-01
2|2018-03-13
2|2018-03-01
3|2018-03-13
3|2018-03-01
1|2018-03-17
2|2018-03-17
3|2018-03-17
Select date from tests where id=1
2018-03-13
2018-03-01
2018-03-17
Now I am looking to calculate the average difference in days between those three dates.
Can really use some help, thank you!
Edit:
Sorry for being unclear, I'll clarify my question.
So student one had a test on the 01/03, then on the 13/03 and then on the 17/03. What I want to calculate is the avg difference in days between test to test, so:
Diff between first to second is 12 days. Diff between second to third is 4 days.
12+6 divided by two since we have two gaps is 8 eight.
I am looking to calculate the average difference in days between those three dates.*
And by average difference we mean "take the average of the absolute value of the difference between all dates". That's 12 + 16 + 4 / 3 or 10.6667.
We need all combinations of dates. For this we need a self-join with no repeats. That's accomplished by picking a field and using on with a < or >.
select t1.date, t2.date
from tests as t1
join tests as t2 on t1.id = t2.id and t1.date < t2.date
where t1.id = 1;
2018-03-01|2018-03-13
2018-03-01|2018-03-17
2018-03-13|2018-03-17
Now that we have all combinations, we can take the difference. But not by simply subtracting the dates, SQLite doesn't support that. First, convert them to Julian Days.
sqlite> select julianday(t1.date), julianday(t2.date) from tests as t1 join tests as t2 on t1.id = t2.id and t1.date < t2.date where t1.id = 1;
2458178.5|2458190.5
2458178.5|2458194.5
2458190.5|2458194.5
Now that we have numbers we can take the absolute value of the difference and do an average.
select avg(abs(julianday(t1.date) - julianday(t2.date)))
from tests as t1
join tests as t2 on t1.id = t2.id and t1.date < t2.date
where t1.id = 1;
UPDATE
What I want to calculate is the avg difference in days between test to test, so: Diff between first to second is 12 days. Diff between second to third is 4 days. Then (12+4)/2=8 which should be the result.
For this twist on the problem you want to compare each row with the next one. You want a table like this:
2018-03-01|2018-03-13
2018-03-13|2018-03-17
Other databases have features like window or lag to accomplish this. SQLite doesn't have that. Again, we'll use a self-join, but we have to do it per row. This is a correlated subquery.
select t1.date as date, (
select t2.date
from tests t2
where t1.id = t2.id and t2.date > t1.date
order by t2.date
limit 1
) as next
from tests t1
where id = 1
and next is not null
The subquery-as-column finds the next date for each row.
This is a bit unwieldy, so let's turn it into a view. Then we can use it as a table. Just take out the where id = 1 so it's generally useful.
create view test_and_next as
select t1.id, t1.date as date, (
select t2.date
from tests t2
where t1.id = t2.id and t2.date > t1.date
order by t2.date
limit 1
) as next
from tests t1
where next is not null
Now we can treat test_and_next as a table with the columns id, date, and next. Then it's the same as before: turn them into Julian Days, subtract, and take the average.
select avg(julianday(next) - julianday(date))
from test_and_next
where id = 1;
Note that this will go sideways when you have two rows with the same date: there's no way for SQL to know which is the "next" one. For example, if there were two tests for ID 1 on "2018-03-13" they'll both choose "2018-03-17" as the "next" one.
2018-03-01|2018-03-13
2018-03-13|2018-03-17
2018-03-13|2018-03-17
I'm not sure how to fix this.
I have this sql statement
SELECT a.*, c.*
FROM ALUMNOS AS a
JOIN calif AS c
where a.curp=c.curp
If I select individually the data on each table - it is showed, but when I do the join - the results are 0. Can you help me, or you need more information?Thanks.
Edit: Already sollved, the data it's shown when both tables have at least 1 column with the same id.
Edit 2: I didn't thougt that the statement doesn't show nothing if the joined column has no data. My bad.
Your query is incorrect. What you want is an implicit join
SELECT a.*, c.*
FROM ALUMNOS AS a, calif AS c
where a.curp=c.curp
you can also rewrite it as explicit join:
SELECT a.*, c.*
FROM ALUMNOS JOIN CALIF USING (curp)
Can some one please help in solving my problem
I have three tables to be joined ed using indexes in Teradata to improve performance. Query specified below:-
Select b.Id, b.First_name, b.Last_name, c. Id,
c.First_name, c.Last_name, c.Result
from
(
select a.Id, a.First_name, a. Last_name, a.Approver1, a.Approver2
From table1 a
Inner join table2 d
On a.Id =D.Id
and A.Approver1 =a.Approver1
And a.Approve2 =D.Approver2
) b
Left join
(
select * from table3
where result is not null
and application like 'application1'
) c
On c. Id=b.Id
Group by b.Id, b.First_name, b.Last_name, c.Id,
c.First_name, c.Last_name, c.Result
The above query is taking so much of time since PI not defined correctly.
First two tables (table1 and 2) are with same set of columns hence pi can be defined like PI on I'd, approve1, approve2
However, while joining with table3 am confused and need to understand how to define pi. Is it something that PI can only work when we have same set of columns in the tables?
Structure of table3 is
I'd, first name, last name, result
And table 1 and table2
Id , First name, Last name, Approved 1, Approved 2, Results
Can you please help in defining primary indexes so that query can be optimised.
Teradata will usually not use Secondary Indexes for joins. The best PI would be id for all three tables, of course you need to check if there are not too many rows per value and it's not too skewed.
GROUP BY can be simplified to a DISTINCT, why do you need it, can you show the Primary Keys of those tables?
Edit based on comment:
PI-based joins are by far the fastest way. But you should be able the get rid of the DISTINCT, too, it's always a huge overhead.
Try replacing the 1st join with a NOT EXISTS:
Select b.Id, b.First_name, b.Last_name, c. Id,
c.First_name, c.Last_name, c.Result
from
(
select a.Id, a.First_name, a. Last_name, a.Approver1, a.Approver2
From table1 a
WHERE EXISTS
(
SELECT *
FROM table2 d
WHERE a.Id =D.Id
and A.Approver1 =a.Approver1
And a.Approve2 =D.Approver2
)
) b
Left join
(
select * from table3
where result is not null
and application like 'application1'
) c
On c. Id=b.Id
I would like to know how to get all the rows from table1 that have a matching row in table3.
Teh structure of the tables is:
table1:
k1 k2
table2:
k1 k2 t1 t2 date type
table3:
t1 t2 date status
The conditions are:
k1 and k2 have to match with the corresponding columns in table2.
In table2 I will only chek those rows where date='today' and type='a'.
That can return 0, 1 or many rows in table2.
Looking at t1 and t2 from table 2, I get the rows that match in table3.
If in table3 date='today' and status='ok', I will return the original row from table1, this is, k1 and k2.
How can I do this query (inner joins, exists, whatever) having into account that the three tables have millions of rows, so it must be as optimal as possible?
I have the query, which is right for sure, but they are too many conditions for Teradata to come with the answer. Too many joins, I think.
I would not consider three tables and a few millions of rows a complex query.
In Teradata you usually don't have to think that much about join/in/exists, all will be rewritten to joins internally. But there's is a one-to-many-to-one relation, so you should avoid a join as this will need a final DISTINCT.
Better use IN or EXISTS instead:
SELECT
K1,K2
FROM Table1
WHERE (K1,K2) IN
(
SELECT K1,K2
FROM Table2
WHERE datecol = CURRENT_DATE
AND typecol = 'a'
AND (T1,T2) IN
(
SELECT T1,T2
FROM Table3
WHERE datecol = CURRENT_DATE
AND status = 'ok'
)
)
Regarding the actual plan: if there are the necessary statistics the optimizer should choose a good plan, check the confidences levels in Explain. You can also run a diagnostic helpstats on for session; before running Explain to see if there are missing stats.
Something like the following should work.
SELECT
Table1.*
FROM
Table1
INNER JOIN Table2 ON
Table1.K1 = Table2.K1 AND
Table1.K2 = Table2.K2 AND
Table2.date = CURRENT_DATE and
Table2.type = 'a'
INNER JOIN Table3 ON
Table2.T1 = Table3.T1 AND
Table2.T2 = Table3.T2 AND
Table3.date = CURRENT_DATE and
Table3.status = "OK"
Update:
Speaking more to the optimization part of the question. The execution steps that Teradata will most likely take here are:
In parallel it will select all records from Table1, Records from Table2 where the date is CURRENT_DATE and the type is a, and Records from Table3 where the date is CURRENT_DATE and the status is OK.
It will then join the results from the SELECT of Table2 to the results of the SELECT from table1.
It will then join the results from that to the results from the SELECT of table3.
You can get more information by putting EXPLAIN before your SELECT query. The results returned from the database will be the explanation of your Teradata server will execute the query, which can be very enlightening when trying to optimize a big slow query.
Unfortunately the steps above are the best you can hope for. Parallel execution of all three tables with the filters applied, and then a join of the results. With big data, the slowest part of a query is often the join, so filtering before you get to that step is a big plus.
There's more that can be done to optimize like making sure your Indexes are in order and Collecting statistics, especially on fields where you will be filtering. But without the admin access to do that, your hands are tied.
I am trying to update Table B of a database looking like this:
Table A:
id, amount, date, b_id
1,200,6/31/2012,1
2,300,6/31/2012,1
3,400,6/29/2012,2
4,200,6/31/2012,1
5,200,6/31/2012,2
6,200,6/31/2012,1
7,200,6/31/2012,2
8,200,6/31/2012,2
Table B:
id, b_amount, b_date
1,0,0
2,0,0
3,0,0
Now with this query I get all the data I need in one select:
SELECT A.*,B.* FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,0,0
3,400,6/29/2012,1,1,0,0
Now, I just want to copy the selected column amount to b_amount and date to b_date
b_amount=amount, b_date=date
resulting in
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,200,6/31/2012
3,400,6/29/2012,1,1,400,6/29/2012
I've tried COALESCE() without success.
Does someone experienced have a solution for this?
Solution:
Thanks to the answers below, I managed to come up with this. It is probably not the most efficient way but it is fine for a one time only update. This will insert for you the first corresponding entry of each group.
REPLACE INTO A SELECT id, amount, date FROM
(SELECT A.id, A.amount, B.id as Bid FROM A INNER JOIN B ON (B.id=A.B_id)
ORDER BY A.id DESC)
GROUP BY Bid;
So what you are looking for seems to be a JOIN inside of an UPDATE query. In mySQL you would use
UPDATE B INNER JOIN A ON B.id=A.b_id SET B.amount=A.amount, B.date=A.date;
but this is not supported by sqlite as this probably related question points out. However, there is a workaround using REPLACE:
REPLACE INTO B
SELECT B.id, A.amount, A.date FROM A
LEFT JOIN B ON B.id=A.b_id
WHERE A.b_id>0 GROUP BY B.id;
The query will simply fill in the values of table B for all columns which should keep their state and fill in the values of table A for the copied values. Make sure the order of the columns in the SELECT statement meet your column order of table B and all columns are mentioned or you will loose these field's data. This is probably dangerous for future changes on table B. So keep in mind to change the column order/presence of this query when changing table B.
Something a bit off topic, because you did not ask for that: A.b_id is obviously a foreign key to B.id. It seems you are using the value 0 for the foreign key to express that there is no corresponding entry in B. (Inferred from your SELECT with WHERE A.b_id>0.) You should consider using the null value for that. When you are using INNER JOIN then instead of LEFT JOIN you can drop the WHERE clause entirely. The DBS will then sort out all unsatisfied relations.
WARNING Some RDBMS will return 2 rows as you show above. Others will return the Cartesian product of the rows i.e. A rows times B rows.
One tricky method is to generate SQL that is then executed
SELECT "update B set b.b_amount = ", a.amount, ", b.b_date = ", a.date,
" where b.id = ", a.b_id
FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
Now add the batch terminator and execute this SQL. The query result should look like this
update B set b.b_amount = 200, b.b_date = 6/31/2012 where b.id = 1
update B set b.b_amount = 400, b.b_date = 6/29/2012 where b.id = 3
NOTE: Some RDBMS will handle dates differently. Some require quotes.