mariadb most efficient way to select several columns from a subquery - mariadb

I'm generating a table which will in turn be used to format several different statistics and graphs.
Some columns of this table, are a result of subqueries which use a nearly identical structure. My query works, but it is very inefficient even in a simplified example like the following one.
SELECT
o.order,
o.date,
c.clienttype,
o.producttype,
(SELECT date FROM orders_interactions LEFT JOIN categories WHERE order=o.order AND category=3) as completiondate,
(SELECT amount FROM orders_interactions LEFT JOIN categories WHERE order=o.order AND category=3) as amount,
DATEDIFF((select date from orders_interactions LEFT JOIN categories where order=o.order AND category=3),o.date) as elapseddays
FROM orders o
LEFT JOIN clients c ON c.idClient=o.idClient
Being this a simplified example of a much more complex query, I would like to know the recommended approaches for a query like this one, taking into account query times, and readability.
As the example shows, I had to repeat a subquery (the one with date), just to calculate a datediff, since I cannot directly reference the column 'completiondate'
Thank you

You can try a left join.
SELECT o.order,
o.date,
o.producttype,
oi.date completiondate,
oi.amount,
datediff(oi.date, o.date) completiondate
FROM orders o
LEFT JOIN orders_interactions oi
ON oi.order = o.order
AND oi.category = 3;
That doesn't necessarily perform better but there are good chances. For performance an index on order_interactions (order, category) might help in any case.
And if you consider it more readable is up to you. But at least it's less repetitive (Which doesn't necessarily translates to more performance. Just because an expression is repeated in a query doesn't necessarily mean it repeatedly calculated.)

It seems I might have found the answer.
In my opinion, it improves readability quite a bit, and in my real usage scenario, both profile and execution plans are way more efficient, and results are returned in less than 1/3 of the time.
My answer relies on using a SELECT inside the LEFT JOIN, hence, using a subquery as the JOINs 'input'.
SELECT
o.order,
o.date,
c.clienttype,
o.producttype,
tmp.date,
tmp.amount,
DATEDIFF(tmp.date,o.date) as elapseddays
FROM orders o
LEFT JOIN clients c ON c.idClient=o.idClient
LEFT JOIN (SELECT order,date,amount FROM orders_interactions oi LEFT JOIN categories ct ON ct.order=oi.order AND category=3) AS tmp ON tmp.order=o.order
The answer idea, and the explanation about how and why it works, came from this post: Mysql Reference subquery result in parent where clause

Related

Difference between dates in SQLDF in R

I am using the R package SQLDF and am having trouble finding the number of days between two date time variables. The variables ledger_entry_created_at and created_at are Unix Epochs and when I try to subtract them after casting to julianday, I return a vector of NA's.
I've taken a look at this previous question and didn't find it useful since my answer to be given in SQL for reasons that are outside the scope of this question.
If anyone could help me figure out a way to do this inside SQLDF I would be grateful.
EDIT:
SELECT strftime('%Y-%m-%d %H:%M:%S', l.created_at, 'unixepoch') ledger_entry_created_at,
l.ledger_entry_id, l.account_id, l.amount, a.user_id, u.created_at
FROM ledger l
LEFT JOIN accounts a
ON l.account_id = a.account_id
LEFT JOIN users u
ON a.user_id = u.user_id
This answer is trivial, but if you already have two UNIX timestamps, and you want to find out how many days have elapsed between them, you can simply take the difference in seconds (their original unit), and convert to days, e.g.
SELECT
(l.created_at - u.created_at) / (3600*24) AS diff
-- any maybe other columns here
FROM ledger l
LEFT JOIN accounts a
ON l.account_id = a.account_id
LEFT JOIN users u
ON a.user_id = u.user_id;
I don't know why your current approach is failing, as the timestamps you have in the screen capture should be valid inputs to SQLite's julianday function. But, again, you may not need such a complicated route to get the result you want.

What's the proper way to do a spatial join in Oracle?

Oracle has a table function called SDO_JOIN that is used to do join tables based on spatial relations. An example query to find what neighbourhood a house is in is something like this:
select
house.address,
neighbourhood.name
from table(sdo_join('HOUSE', 'GEOMETRY', 'NEIGHBOURHOOD', 'GEOMETRY', 'mask=INSIDE')) a
inner join house
on a.rowid1 = house.rowid
inner join neighbourhood
on a.rowid2 = neighbourhood.rowid;
But I get the same result by just doing a regular join with a spatial relation in the on clause:
select
house.address,
neighbourhood.name
from house
inner join neighbourhood
on sdo_inside(house.geometry, neighbourhood.geometry) = 'TRUE';
I prefer the second method because I think it's easier to understand what exactly is happening, but I wasn't able to find any Oracle documentation on whether or not this is the proper way to do a spatial join.
Is there any difference between these two methods? If there is, what? If there isn't, which style is more common?
The difference is in performance.
The first approach (SDO_JOIN) isolates the candidates by matching the RTREE indexes on each table.
The second approach will search the HOUSE table for each geometry of the NEIGHBORHOOD table.
So much depends on how large your tables are, and in particular, how large the NEIGHBORHOOD table is - or more precisely, how many rows of the NEIGHBORHOOD table your query actually uses. If the NEIGHBORHOOD table is small (less than 1000 rows) then the second approach is good (and the size of the HOUSE table does not matter).
On the other hand, if you need to match millions of houses and millions of neighborhoods, then the SDO_JOIN approach will be more efficient.
Note that the SDO_INSIDE approach can be efficient too: just make sure you enable SPATIAL_VECTOR_ACCELERATION (only if you use Oracle 12.1 or 12.2 and you have the proper licensed for Oracle Spatial and Graph) and use parallelism.

Why is left outer join not producing the right results here?

I have the following code and for some reason the left outer join is not producing the correct results.
Dim StudentCourseList = From stud in students
Group Join cour in courses
on stud.id equals cour.id into joinedlist = Group
From j in joinedlist.defaultifempty
select stud
The count before the left outer join of students is 12 and courses is 4. However, after the join, the student count is 14 due to some reason. It should be 12 if not less than 12. Am I doing something wrong here?
Edit - The query is fine. The problems in with the courses list. It has repeating/duplicate items in it. The question now would be how to get the distinct results?
You have not posted your table data. So this is just an assumption.
Why count is 14?
If one student have more than one course for example having two course than it would result in two records.
Use group by and then select first like below query.
var query = from person in people
join pet in pets on person.Id equals pet.OwnerId into gj
from subpet in gj.DefaultIfEmpty()
group person by person.Id into temp1
select temp1.First();
here is working fiddle. sorry I am not a vb guy so posted answer in c#.

Linking 3 tables with a Left Outer Join

I have 3 tables in a SQLite database for an Android app. This picture below shows the relevant tables that I'm working with.
Tables
I'm trying to get two fields, value and name, from measurement_lines and competences respectively, tied to a specific person_id in measurements. I'm trying to make a query that returns these fields but I'm having little luck. The best I've got so far is the following query:
SELECT name, value
FROM measurements, measurement_lines, competences
WHERE measurements.id = measurement_lines.measurements_id
AND measurement_lines.competences_id = competences.id
AND measurements.persons_id = 1
This, however, has one issue. This query won't return any records when a person has no entries in measurements (and subsequently, nothing in measurement_lines). What I want is to always get a list of competence names, even if the value column is empty. I'm guessing I need a Left Outer Join for this but I can't seem to make it work. The following query just returns no records:
SELECT name, value
FROM measurements AS m, competences AS c
LEFT OUTER JOIN measurement_lines AS ml ON c._id = ml.competence_id
WHERE ml.measurement_id = m._id AND m.persons_id = 1
For inner joins, you can be sloppy with the distinction between join conditions and selection predicates, but when outer joins are involved that makes a difference. Any criterion appearing in the WHERE clause filters your result rows after all joins are performed (logically, at least), which can remove result rows associated with outer tables.
In addition, if you're ever uncertain about join order, you can use parentheses to make your intent clear. At least in many DBMSs. It lokos like SQLite doesn't support them.
It looks like you may want this: (edited to avoid use of parentheses)
SELECT c.name, pm.value
FROM competences c
LEFT OUTER JOIN (
SELECT ml.competences_id AS cid,
ml.value AS value
FROM measurement_lines ml
INNER JOIN measurements m
ON m.id = ml.measurements_id
WHERE m.person_id = 1
) pm
ON pm.cid = c.id

sqlite subqueries with group_concat as columns in select statements

I have two tables, one contains a list of items which is called watch_list with some important attributes and the other is just a list of prices which is called price_history. What I would like to do is group together 10 of the lowest prices into a single column with a group_concat operation and then create a row with item attributes from watch_list along with the 10 lowest prices for each item in watch_list. First I tried joins but then I realized that the operations where happening in the wrong order so there was no way I could get the desired result with a join operation. Then I tried the obvious thing and just queried the price_history for every row in the watch_list and just glued everything together in the host environment which worked but seemed very inefficient. Now I have the following query which looks like it should work but it's not giving me the results that I want. I would like to know what is wrong with the following statement:
select w.asin,w.title,
(select group_concat(lowest_used_price) from price_history as p
where p.asin=w.asin limit 10)
as lowest_used
from watch_list as w
Basically I want the limit operation to happen before group_concat does anything but I can't think of a sql statement that will do that.
Figured it out, as somebody once said "All problems in computer science can be solved by another level of indirection." and in this case an extra select subquery did the trick:
select w.asin,w.title,
(select group_concat(lowest_used_price)
from (select lowest_used_price from price_history as p
where p.asin=w.asin limit 10)) as lowest_used
from watch_list as w

Resources