maria db performance difference between implicit join using where... and inner join

maria db performance difference between implicit join using where... and inner join - mariadb

In MariaDB, I have a query which takes seconds to run with implicit join (using where...):
SELECT table1_entry.xxx,
table2_entry.xxx
FROM ((select ...
FROM TABLE 1) table1_entry ...
(SELECT *
FROM table2
WHERE table1_entry.id = table2_entry.id) table2_entry
but takes many minutes to run with inner join. the row count in final output result is same as the row count in table1_entry (around 9000+ rows). The row count in table2_entry = 2000+. is there anything special to note about MariaDB, implicit join, inner join?
I ran the individual queries that output table1_entry and table2_entry, each query without joining took seconds to run.
I expected inner join to give a faster query time. Strange that using implicit join gave much faster query time than inner join.

Related

mariadb most efficient way to select several columns from a subquery

I'm generating a table which will in turn be used to format several different statistics and graphs.
Some columns of this table, are a result of subqueries which use a nearly identical structure. My query works, but it is very inefficient even in a simplified example like the following one.
SELECT
o.order,
o.date,
c.clienttype,
o.producttype,
(SELECT date FROM orders_interactions LEFT JOIN categories WHERE order=o.order AND category=3) as completiondate,
(SELECT amount FROM orders_interactions LEFT JOIN categories WHERE order=o.order AND category=3) as amount,
DATEDIFF((select date from orders_interactions LEFT JOIN categories where order=o.order AND category=3),o.date) as elapseddays
FROM orders o
LEFT JOIN clients c ON c.idClient=o.idClient
Being this a simplified example of a much more complex query, I would like to know the recommended approaches for a query like this one, taking into account query times, and readability.
As the example shows, I had to repeat a subquery (the one with date), just to calculate a datediff, since I cannot directly reference the column 'completiondate'
Thank you

You can try a left join.
SELECT o.order,
o.date,
o.producttype,
oi.date completiondate,
oi.amount,
datediff(oi.date, o.date) completiondate
FROM orders o
LEFT JOIN orders_interactions oi
ON oi.order = o.order
AND oi.category = 3;
That doesn't necessarily perform better but there are good chances. For performance an index on order_interactions (order, category) might help in any case.
And if you consider it more readable is up to you. But at least it's less repetitive (Which doesn't necessarily translates to more performance. Just because an expression is repeated in a query doesn't necessarily mean it repeatedly calculated.)

It seems I might have found the answer.
In my opinion, it improves readability quite a bit, and in my real usage scenario, both profile and execution plans are way more efficient, and results are returned in less than 1/3 of the time.
My answer relies on using a SELECT inside the LEFT JOIN, hence, using a subquery as the JOINs 'input'.
SELECT
o.order,
o.date,
c.clienttype,
o.producttype,
tmp.date,
tmp.amount,
DATEDIFF(tmp.date,o.date) as elapseddays
FROM orders o
LEFT JOIN clients c ON c.idClient=o.idClient
LEFT JOIN (SELECT order,date,amount FROM orders_interactions oi LEFT JOIN categories ct ON ct.order=oi.order AND category=3) AS tmp ON tmp.order=o.order
The answer idea, and the explanation about how and why it works, came from this post: Mysql Reference subquery result in parent where clause

SQLITE select unique rows

I have a table where rows appear to be "duplicates" but they are actually not (they have different date).
Suppose each record has a column A that is supposed to be unique. However due to this column A could or could not appear again later with updated information (with column A unchanged), it is no longer unique even when it should be.
Therefore I want the table with latest information only. Currently this table contains 500k entries, however the "true" number of unique entries is less than half of it.
I have tried
SELECT *
FROM TABLE
WHERE A = A
AND Date = (SELECT MAX(Date) from TABLE)
ORDER BY DATE
However this only returns 2 results. How do I achieve that?

The subquery on the date is the correct idea, but you must include the column A in the subquery and relate it back to the main table. I prefer to use explicit joins rather than embedding the subquery in the WHERE statement. This is usually more efficient anyway.
SELECT TABLE.*
FROM TABLE INNER JOIN
(SELECT A, MAX(Date) AS MaxDate FROM TABLE GROUP BY A) AS latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY A, date
Or even better, I prefer CTE (Common Table Expression) syntax, since it makes the individual queries easier to read:
WITH latest AS (
SELECT A, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY A
)
SELECT TABLE.*
FROM TABLE INNER JOIN latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY TABLE.A, TABLE.date
Comparison to other answer
The answer by MikeT relies on a non-standard feature of sqlite. That is okay of itself as long as you are aware that the solution is not compatible with other databases engines/servers and SQL dialects.
The next possible gotcha really relies on your actual data and table schema (neither of which you shared in the question details). If your data allows multiple rows with the same date for the a single A column value, then the conditions in your question are not enough to definitively remove all duplicates. You would need to identify another column by which to resolve any remaining duplicates, but once again your question did not do that.
However, in testing, I found that my solution allows unresolved duplicates to remain in the results. MikeT's solution eliminate all duplicates, but it does so by arbitrarily excluding one of those duplicates. There are ways to fix either solution to definitely select which duplicate to keep, but I will not even attempt that unless you post actual data and the table schema so that my answer is not just mere guessing. I'm glad that my answer was useful thus far, but you need to understand your data better (than reveal in the question) to ensure what solution is actually best.
Bonus
Against my better judgement to just keep expanding on answers... since you should really research this separately... here's an example of how you would continue joining this with other queries...
WITH latest AS (
SELECT A, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY A
),
firstResults AS (
SELECT TABLE.*
FROM TABLE INNER JOIN latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY TABLE.A, TABLE.date
)
SELECT otherTable.*
FROM firstResults JOIN otherTable
ON firstResults.A = otherTable.A
WHERE somecondition = 'foobar'

Another approach if you're using a somewhat recent version of sqlite (3.25 or newer), using the row_number() window function to rank groups of the same a value by date and picking the first one:
WITH cte AS
(SELECT a, date, row_number() OVER (PARTITION BY a ORDER BY date DESC) AS rn
FROM yourtable)
SELECT a, date
FROM cte
WHERE rn = 1;
One important thing to note since I noticed you mentioning another answer was slow is that an index on mytable(a, date DESC) will be needed for this query for best results, and an index on mytable(a, date) will speed up the other answers given.

I believe, if I understand what you have written, that you could use :-
SELECT a,max(date), other FROM mytable GROUP BY a ORDER BY date;
note that the other column represents other columns (if present)
However, the other column will be an arbritary value (from one of the grouped columns) which may well be the required value (in the example it is).
As per :-
Each expression in the result-set is then evaluated once for each
group of rows. If the expression is an aggregate expression, it is
evaluated across all rows in the group. Otherwise, it is evaluated
against a single arbitrarily chosen row from within the group. If
there is more than one non-aggregate expression in the result-set,
then all such expressions are evaluated for the same row.
SQL As Understood By SQLite - SELECT
More correctly, to eliminate an arbritary value(sic) for the other column, you could use :-
SELECT
a /* will always be the same and isn't arbritary */,
max(date) /* will be the maximum data */ AS date,
(SELECT other FROM mytable WHERE a = m.a AND date = m.date) AS other
FROM mytable AS m /* AS m allows the outer query to be distinguished from the inner query */
GROUP BY a /* this effectivel removes duplicates on the a column */
ORDER BY date
;
The example below appears to produce the same result.
Example :-
Using the following to populate the table with some generated testing data :-
CREATE TABLE IF NOT EXISTS mytable (a TEXT, date TEXT, other);
WITH cte(count,a,date,other) AS
(
SELECT 1,1,date('now','+'||(random() % 30)||' days'),'other1'
UNION ALL SELECT count+1,abs(random()) % 20,date('now','+'||(abs(random()) % 30)||' days'), 'other'||(count+1) FROM cte LIMIT 100
INSERT INTO mytable (a,date,other) SELECT a,date,other FROM cte
;
SELECT * FROM mytable ORDER BY DATE DESC;
in this case :-
Highlighted rows being those required to be extracted.
Then after the above has been run the following is run
SELECT * FROM mytable WHERE a = a AND date = (SELECT MAX(date) FROM mytable);
SELECT * FROM mytable WHERE /*a = a AND*/ date = (SELECT MAX(date) FROM mytable);
/* Will only select 1 row per unique value of a BUT other will be an arbritary value not necessairlly the latest */
SELECT a,max(date), other FROM mytable GROUP BY a /* group by effectively display unique */;
SELECT
a /* will always be the same and isn't arbritary */,
max(date) /* will be the maximum data */ AS date,
(SELECT other FROM mytable WHERE a = m.a AND date = m.date) AS other
FROM mytable AS m
GROUP BY a
;
The first two results show that a = a does nothing as it will always be true.
The thrid query produces (unordered) :-
Note ticks assigned by checking the value of other from the previous result.
In this case this shorter query works OK even though values of other are arbritary values (they aren't really as it depends upon how the query planner plasn the query).
The fourth, the more correct, produces the same results :-
Result 2 (your orignal query) and 3 (original without a = a) produce :-
and :-

Convert RIGHT JOIN in LEFT JOIN

I have been tasked with making queries in SQL Server stored procedures compatible with SQLite.
The following query contains RIGHT JOINs, which are not compatible with SQLite:
SELECT ...
FROM Operatori
INNER JOIN Preventivi
ON Operatori.Pk = Preventivi.FkOperatori
RIGHT JOIN Operatori AS Operatori_1
INNER JOIN Soluzioni
ON Operatori_1.Pk = Soluzioni.FkOperatore
INNER JOIN Reparti
ON Operatori_1.FkReparto = Reparti.Pk
RIGHT JOIN Progetti
INNER JOIN Anagrafica
ON Progetti.FkClienti = Anagrafica.Pk
AND Soluzioni.FkProgetti = Progetti.Pk
AND Preventivi.FkSoluzioni = Soluzioni.Pk
I know is that, normally, switching the tables works:
ex.
FROM Operatori
RIGHT JOIN Progetti
became:
FROM Progetti
LEFT JOIN Reparti
But how can I do the conversion if I have other JOINs in the same query?

Compared to an inner join, an outer join returns additional result rows for table rows without a match.
However, this applies only to the matching done with that join's ON clause.
So this query does not actually have any outer joins that make sense, because
the two right joins do not have an ON clause, so there are no matches that could be handled specially; and
even if there were ON clauses on the two outer joins, they are followed by inner joins on the matched columns, so any rows with NULLs in those columns would be filtered out by the inner joins.
So in this query, you can simply replace RIGHT with INNER (in both databases), and the result is guaranteed to be the same.

1 Outer Join vs Left join following with Right Join

If Table A has 7k records and Table B has 5 records then Left Outer Join having Table A as Driver gives 7k records and Right Outer Join having Table B as Driver gives 5 records.
However acc to my understanding a FULL OUTER JOIN gives 7k*5 records. Is it always the case or it varies with the Join Clause? If this is always the case, then to merge the data between Table A and Table B, is it not a better option to have LEFT OUTER JOIN results UNION ALL with Right OUTER JOIN results, instead of FULL OUTER JOIN. Forgive my numbness, I'm very tired.:-)
The numbers shown here gives the relative variation of the records we can have between these tables and Table B will actually grow more in future.

I think what you are thinking of as FULL OUTER JOIN is actually CROSS JOIN, or the Cartesian join operation. CROSS JOIN does return N*M records, where N is the number of records in the left table and M is the number of records in the right table.
FULL OUTER JOIN is different -- it's a combination LEFT JOIN and RIGHT JOIN in a single operation: first the tables are joined and every record resulting from the join operation is returned.
In addition, like a LEFT JOIN, rows from the left table that don't have a match on the right will be returned as well and show NULL columns for the columns in the right table, and like a RIGHT JOIN, rows from the right table that don't have a match on the left will be returned and show NULL columns for the columns in the left table.

Linking 3 tables with a Left Outer Join

I have 3 tables in a SQLite database for an Android app. This picture below shows the relevant tables that I'm working with.
Tables
I'm trying to get two fields, value and name, from measurement_lines and competences respectively, tied to a specific person_id in measurements. I'm trying to make a query that returns these fields but I'm having little luck. The best I've got so far is the following query:
SELECT name, value
FROM measurements, measurement_lines, competences
WHERE measurements.id = measurement_lines.measurements_id
AND measurement_lines.competences_id = competences.id
AND measurements.persons_id = 1
This, however, has one issue. This query won't return any records when a person has no entries in measurements (and subsequently, nothing in measurement_lines). What I want is to always get a list of competence names, even if the value column is empty. I'm guessing I need a Left Outer Join for this but I can't seem to make it work. The following query just returns no records:
SELECT name, value
FROM measurements AS m, competences AS c
LEFT OUTER JOIN measurement_lines AS ml ON c._id = ml.competence_id
WHERE ml.measurement_id = m._id AND m.persons_id = 1

For inner joins, you can be sloppy with the distinction between join conditions and selection predicates, but when outer joins are involved that makes a difference. Any criterion appearing in the WHERE clause filters your result rows after all joins are performed (logically, at least), which can remove result rows associated with outer tables.
In addition, if you're ever uncertain about join order, you can use parentheses to make your intent clear. At least in many DBMSs. It lokos like SQLite doesn't support them.
It looks like you may want this: (edited to avoid use of parentheses)
SELECT c.name, pm.value
FROM competences c
LEFT OUTER JOIN (
SELECT ml.competences_id AS cid,
ml.value AS value
FROM measurement_lines ml
INNER JOIN measurements m
ON m.id = ml.measurements_id
WHERE m.person_id = 1
) pm
ON pm.cid = c.id

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

maria db performance difference between implicit join using where... and inner join - mariadb

Related

mariadb most efficient way to select several columns from a subquery

SQLITE select unique rows

Convert RIGHT JOIN in LEFT JOIN

1 Outer Join vs Left join following with Right Join

Linking 3 tables with a Left Outer Join

Categories

Resources