Sqlite merge columns in join - sqlite

I got two tables:
emails
raw_id | email | score
1, email1, 1
1, email2, 2
2, email3, 3
3, email4, 4
merged
raw_id1 | raw_id2
1, 2
How can I make a query that will show me one row for each distinct row_id with highest score, also if two raw_id are merged they will be considered as same one.
So for the above data, here is my expected result:
select score, email from emails join...
3, email3
4, email4
-

SELECT MAX(e1.raw_id, COALESCE(e2.raw_id, -1)),
MAX(e1.email),
MAX(MAX(e1.score, COALESCE(e2.score, -1)))
FROM emails e1 LEFT JOIN merged m
ON e1.raw_id = m.raw_id1
LEFT JOIN emails e2
ON e2.raw_id = m.raw_id2
GROUP BY MAX(e1.raw_id, COALESCE(e2.raw_id, -1))
Follow the link below for a running demo:
SQLFiddle
Note that this demo is for MySQL, because Fiddle doesn't offer the option for SQLite. The only change I had to make to the query was to replace SQLite's scalar MAX function with MySQL's GREATEST function.

SELECT max(score), email
FROM emails
LEFT JOIN
merged ON emails.raw_id = merged.raw_id1
GROUP BY coalesce(raw_id2, raw_id);

Related

Count field from 2nd table

Here's a sample of what I'm trying to do:
select
t1.Field1
t1.Field2
from Table1 t1
inner join Table2 t2 on t2.Field1 = t1.Field1
where
t1.Field3 like '123-%'
and t1.CreateDate >= '01/01/2021'
having
Count(t2.Field4 = 419) >= 1
2 tables
Table 1 has unique records with an ID (Field 1)
Table 2 has multiple records, with ID Field 1 as well. Table2 may have 10, 20, etc records for Field1. I'm wanting to pull Table1 records where Table2 as at least 1 occurrence of Field4 = 419. Table2 may not have any Field4=419, it may have 1, or it may have 2 or more.
Pretty straight forward I think, but unfortunately I'm new to SQL writing which I why I'm posting for help as I've tried several ways to get this to work without any luck.
Normally you use having when you have a group by in your query.
There will be multiple ways of writing what you want, but one way that I think is easy to understand is selecting from Table1 and doing the count of Table2 in a subquery of the where clause. Depending on the size of your tables and the indexing, you might want to explore other options, but I think this is at least a good starting point that does what you want.
select
t1.Field1
t1.Field2
from Table1 t1
where
t1.Field3 like '123-%'
and t1.CreateDate >= '01/01/2021'
and (SELECT count(*) FROM Table2 t2 WHERE t2.Field1 = t1.Field1 AND t2.Field4 = 419) >= 1

SQLite - Create dummy variable vector/string from multiple columns

I have some data that looks like this:
UserID Category
------ --------
1 a
1 b
2 c
3 b
3 a
3 c
A I'd like to binary-encode this grouped by UserID: three different values exist in Category, so a binary encoding would be something like:
UserID encoding
------ --------
1 "1, 1, 0"
2 "0, 0, 1"
3 "1, 1, 1"
i.e., all three values are present for UserID = 3, so the corresponding vector is "1, 1, 1".
Is there a way to do this without doing a bunch of CASE WHEN statements? There may be dozens of possible values in Category
Cross join the distinct users to distinct categories and left join to the table.
Then use GROUP_CONCAT() window function which supports an ORDER BY clause, to collect the 0s and 1s:
WITH
users AS (SELECT DISTINCT UserID FROM tablename),
categories AS (
SELECT DISTINCT Category, DENSE_RANK() OVER (ORDER BY Category) rn
FROM tablename
),
cte AS (
SELECT u.UserID, c.rn,
'"' || GROUP_CONCAT(t.UserID IS NOT NULL)
OVER (PARTITION BY u.UserID ORDER BY c.rn) || '"' encoding
FROM users u CROSS JOIN categories c
LEFT JOIN tablename t
ON t.UserID = u.UserID AND t.Category = c.Category
)
SELECT DISTINCT userID,
FIRST_VALUE(encoding) OVER (PARTITION BY UserID ORDER BY rn DESC) encoding
FROM cte
ORDER BY userID
This will work for any number of categories.
See the demo.
Results:
UserID
encoding
1
"1,1,0"
2
"0,0,1"
3
"1,1,1"
First create an encoding table to explicit establish order of categories in the bitmap:
create table e (Category int, Encoding int);
insert into e values ('a', 1), ('b', 2), ('c', 4);
First generate a list of users u (cross) joined with the encoding table e to get a fully populated (UserId, Category, Encoding) table. Then left join the fully populated table with the user supplied data t. The right hand side t can now be used to drive if we need to set a bit or not:
select
u.UserId,
'"' ||
group_concat(case when t.UserId is null then 0 else 1 end, ', ')
|| '"' 'encoding'
from
(select distinct UserID from t) u
join e
left natural join t
group by 1
order by e.Encoding
and it gives the expected result:
1|"1, 1, 0"
2|"0, 0, 1"
3|"1, 1, 1"

SQLite Duplicates NULL Perl

I use the following code in Perl to find duplicates in my SQLite database on the basis of 5 columns (Term1 to Term5).
my $dublette = $dbh->selectall_arrayref("
SELECT t.ID, t.Tag1, t.Tag2, t.Term1, t.Term2, t.Term3, t.Term4, t.Term5
FROM Data t inner join (
SELECT ID, Tag1, Tag2, Term1, Term2, Term3, Term4, Term5, COUNT(*) c FROM Data GROUP BY Term1, Term2, Term3, Term4, Term5 HAVING c > 1)
x on t.Term1=x.Term1 AND t.Term2=x.Term2 AND t.Term3=x.Term3 AND t.Term4=x.Term4 AND t.Term5=x.Term5
This seems to work exept if my unused values are NULL (but it works if they are empty). Any idea ho can I change it to get all duplicates?

Join 3 tables without losing ability to refer each table

I have 3 tables with this mock data
Item *id, name*
1, coke
2, fanta
3, juice
Branch *id, name*
1, store
2, warehouse
3, shop
BranchItem *item_id, branch_id, qty*
1, 1, 100
1, 2, 30
2, 2, 10
I want to query for an item(coke for example) and get its quantity in all branches( even the ones it doesn't exist in, those should have NULL for qty column)
So the result should look like
1, coke, store, 100
1, coke, warehouse, 30
1, coke, shop, NULL
I have a query that can do this, but because of aliasing tables, I lose the ability to refer to the column of the result table. The parsing of the result is done in an ORM object which preferably shouldn't be rewritten
The query I have
Select * from item left join (select * from branch left join ( select * from branchitem where item_id = 1) branchitem on branch.id = branchitem.branch_id) JOINEDNAME on true where item.id = 1;
My question is I don't want to Elias the join of branch and brunch item as I lose the ability to refer to them separately in the ORM. How can this query be re-written so the tables retain their names?
You don't need to use subqueries:
SELECT Item.id,
Item.name,
Branch.name,
BranchItem.qty
FROM Item
CROSS JOIN Branch
LEFT JOIN BranchItem ON Item.id = BranchItem.item_id
AND Branch.id = BranchItem.branch_id
WHERE Item.id = 1; -- or put it into the branch join

Retrieve a table to tallied numbers, best way

I have query that runs as part of a function which produces a one row table full of counts, and averages, and comma separated lists like this:
select
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1) applicants,
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1
and applied != 'yes') s_applicants,
(select
LISTAGG(discipline, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_number
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines,
(select
LISTAGG(discipline_count, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_count
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines_count,
(select
round(avg(util.getawardstocols(application_id,'1','AWARD_NAME')), 2)
from vw_disp_details
where round = 2013
and rating = 1) average_award_score,
(select
round(avg(age))
from vw_disp_details
where round = 2013
and rating = 1) average_age
from dual;
Except that instead of 6 main sub-queries there are 23.
This returns something like this (if it were a CSV):
applicants | s_applicants | disciplines | disciplines_count | average_award_score | average_age
107 | 67 | "speed,accuracy,strength" | 3 | 97 | 23
Now I am programmatically swapping out the "rating = 1" part of the where clauses for other expressions. They all work rather quickly except for the "rating = 1" one which takes about 90 seconds to run and that is because the rating column in the vw_disp_details view is itself compiled by a sub-query:
(SELECT score
FROM read r,
eval_criteria_lookup ecl
WHERE r.criteria_id = ecl.criteria_id
AND r.application_id = a.lgo_application_id
AND criteria_description = 'Overall Score'
AND type = 'ABC'
) reader_rank
So when the function runs this extra query seems to slow everything down dramatically.
My question is, is there a better (more efficient) way to run a query like this that is basically just a series of counts and averages, and how can I refactor to optimize the speed so that the rating = 1 query doesn't take 90 seconds to run.
You could choose to MATERIALIZE the vw_disp_details VIEW. That would pre-calculate the value of the rating column. There are various options for how up-to-date a materialized view is kept, you would probably want to use the ON COMMIT clause so that vw_disp_details is always correct.
Have a look at the official documentation and see if that would work for you.
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6002.htm
Do all most of your queries in only one. Instead of doing:
select
(select (count(*) from my_tab) as count_all,
(select avg(age) from my_tab) as avg_age,
(select avg(mypkg.get_award(application_id) from my_tab) as_avg-app_id
from dual;
Just do:
select count(*), avg(age),avg(mypkg.get_award(application_id)) from my_tab;
And then, maybe you can do some union all for the other results. But this step all by itself should help.
I was able to solve this issue by doing two things: creating a new view that displayed only the results I needed, which gave me marginal gains in speed, and in that view moving the where clause of the sub-query that caused the lag into the where clause of the view and tacking on the result of the sub-query as column in the view. This still returns the same results thanks to the fact that there are always going to be records in the table the sub-query accessed for each row of the view query.
SELECT
a.application_id,
util.getstatus (a.application_id) status,
(SELECT score
FROM applicant_read ar,
eval_criteria_lookup ecl
WHERE ar.criteria_id = ecl.criteria_id
AND ar.application_id = a.application_id
AND criteria_description = 'Overall Score' //THESE TWO FIELDS
AND type = 'ABC' //ARE CRITERIA_ID = 15
) score
as.test_total test_total
FROM application a,
applicant_scores as
WHERE a.application_id = as.application_id(+);
Became
SELECT
a.application_id,
util.getstatus (a.application_id) status,
ar.score,
as.test_total test_total
FROM application a,
applicant_scores as,
applicant_read ar
WHERE a.application_id = as.application_id(+)
AND ar.application_id = a.application_id(+)
AND ar.criteria_id = 15;

Resources