SQLite - Group by multiple columns, count distinct of one - sqlite

Say I have the following table:
Commit File Past_Author
1 a Alice
1 a Bob
1 a Bob
1 b Alice
I'm looking to count the number of distinct authors for each file, summed by commit. In this case, I'd want 3 authors for commit 1 (2 for file a + 1 for file b).
I guess something like: SELECT Commit, SUM(NUM_AUTHORS) FROM (SELECT Commit, File, COUNT(DISTINT Past_Author) as NA FROM COMMITS GROUP BY File) GROUP BY Commit

This should do the trick.
SELECT Commit, SUM(NUM_AUTHORS) FROM
(
SELECT
Commit,
File,
COUNT(DISTINCT Past_Author) AS NUM_AUTHORS
FROM COMMITS
GROUP BY Commit, File
) AS AUTHOR_COUNT
GROUP BY Commit;

Related

How to exclude the records Using Qualify statement in Teradata

I have to create population for the people who has only one product association (ABC) using qualify statement.
For example I have the data
Id Code Prod Date
101 202 ABC 2017-05-31
101 203 DEF 2017-04-30
102 302 ABC 2018-06-30
From the above data I need the data for Id=102 because this id has only one prod relation where as id 101 has both ABC and DEF which should be excluded.
I tried the following
Select id,prod from table1
Qualify row_number() over (partition by id order by Date)=1
Where prod=‘ABC’
With this, I get the two records in my data which I don’t want. Appreciate your help.
Select *
from table1
Qualify min(Prod) over (partition by id)='ABC'
and max(Prod) over (partition by id)='ABC'
Both MIN and MAX return the same value ABC, thus there's no other value
If you want to return the id's that have one prod value (ABC) in the table, you can do something like this:
SELECT id, prod
FROM (
SELECT id, prod
FROM table1
GROUP BY id, prod -- Get unique (id, prod) combinations
QUALIFY COUNT(prod) OVER(PARTITION BY id) = 1 -- Get id's with only one prod
) src
WHERE prod = 'ABC' -- Only get rows with "ABC" prod
The key here is the order in which Teradata processes the query:
Aggregate - GROUP BY
OLAP - COUNT(prod) OVER()
QUALIFY
You may be able to move the WHERE prod = 'ABC' into the QUALIFY clause and get rid of the outer SELECT, not 100% sure.
Just use having, instead of qualify. I don't see any need for window fuctions. Something like:
Select id,prod ,
count(prod)
from
table1
group by
id,
prod
having count(prod) = 1

Why does COUNT return NULL instead of `0` in this query?

I have the query
select d.did, count ( h.did ), unique_interested
from dealer as d
left outer join house as h
on h.did = d.did
left outer join (
-- cid = customer id
select hid, count (cid) as unique_interested
from is_interested
group by hid
) as ok
on h.hid = ok.hid
group by d.did
order by d.did asc
;
which is supposed to select the number of houses that each dealer is dealing, and the number of unique customers interested in said houses (as in the number of customers per dealer). This should happen even if the dealers have no houses to deal at the moment, which is why I'm using left outer joins when constructing the table the columns will be picked from.
Now, running this query against my database produces the following output:
d.did count ( h.did) unique_interested
----- -------------- ----------------
1 3
2 3 1
3 0
As you can see, instead of printing 0 in the last column, count returns null, when there is a null in one of the aparments produced by the last part of the join (as in cid is null):
select hid, count ( cid ) as unique_interested
from is_interested
group by hid
I know this is because there are apartments in the table produced by from, that no-one is interested in. But shouldn't count produce 0 instead of the actual column value null in every case?
Any explanation as to why this is happening would be appreciated, as it would lead me towards an answer to another question, which is "Why am I not getting the right number of unique interested customers per dealer from the table is_interested?", as with the current state of my database, the output should look more like:
d.did count ( h.did) unique_interested
----- -------------- ----------------
1 3 2
2 3 2
3 0 0

how can you add multiple rows in sqlcl?

i am trying to add multiple rows in my table. i tried to follow some of the online solutions but i keep getting ORA-00933: SQL command not properly ended.
how do i add multiple rows at once.
insert into driver_detail values(1003,'sajuman','77f8s0990',1),
(1004,'babu ram coi','2g64s8877',8);
INSERT ALL is one way to go.
SQL> create table driver_detail (id integer, text1 varchar2(20), text2 varchar2(20), some_num integer);
Table DRIVER_DETAIL created.
SQL> insert all
2 into driver_detail (id, text1, text2, some_num) values (1003, 'sajuman', '77f8s0090', 1)
3 into driver_detail (id, text1, text2, some_num) values (1004, 'babu ram coi', '2g64s887', 8)
4* select * from dual;
2 rows inserted.
SQL> commit;
Commit complete.
SQL> select * from driver_detail;
ID TEXT1 TEXT2 SOME_NUM
_______ _______________ ____________ ___________
1003 sajuman 77f8s0090 1
1004 babu ram coi 2g64s887 8
But SQLcl is a modern CLI for the Oracle Database, surely there might be a better way?
Yes.
Put your rows into a CSV.
Use the LOAD command.
SQL> delete from driver_detail;
0 rows deleted.
SQL> help load
LOAD
-----
Loads a comma separated value (csv) file into a table.
The first row of the file must be a header row. The columns in the header row must match the columns defined on the table.
The columns must be delimited by a comma and may optionally be enclosed in double quotes.
Lines can be terminated with standard line terminators for windows, unix or mac.
File must be encoded UTF8.
The load is processed with 50 rows per batch.
If AUTOCOMMIT is set in SQLCL, a commit is done every 10 batches.
The load is terminated if more than 50 errors are found.
LOAD [schema.]table_name[#db_link] file_name
SQL> load hr.driver_detail /Users/thatjeffsmith/load_example.csv
--Number of rows processed: 4
--Number of rows in error: 0
0 - SUCCESS: Load processed without errors
SQL> select * from driver_detail;
ID TEXT1 TEXT2 SOME_NUM
_______ _________________ ______________ ___________
1003 'sajuman' '77f8s0990' 1
1004 'babu ram coi' '2g64s8877' 8
1 'hello' 'there' 2
2 'nice to' 'meet you' 3
SQL>

Common Table Expression in sqlite using rowid

I found a good article on converting adjacency to nested sets at http://dataeducation.com/the-hidden-costs-of-insert-exec/
The SQL language used is Microsoft SQL Server (I think) and I am trying to convert the examples given in the article to sqlite (as this is what I have easy access to on my Macbook).
The problem I appear to be having is converting the part of the overall CTE query to do with the Employee Rows
EmployeeRows AS
(
SELECT
EmployeeLevels.*,
ROW_NUMBER() OVER (ORDER BY thePath) AS Row
FROM EmployeeLevels
)
I converted this to
EmployeeRows AS
(
SELECT
EmployeeLevels.*,
rowid AS Row
FROM EmployeeLevels
ORDER BY thePath
)
and the CTE query runs (no syntax errors) but the output I get is a table without the Row and Lft and Rgt columns populated
ProductName ProductID ParentProductID TreePath HLevel Row Lft Rgt
----------- ---------- --------------- ---------- ---------- ---------- ---------- ----------
Baby Goods 0 0 1
Baby Food 10 0 0.10 2
All Ages Ba 100 10 0.10.100 3
Strawberry 200 100 0.10.100.2 4
Baby Cereal 250 100 0.10.100.2 4
Beginners 150 10 0.10.150 3
Formula Mil 300 150 0.10.150.3 4
Heinz Formu 310 300 0.10.150.3 5
Nappies 20 0 0.20 2
Small Pack 400 20 0.20.400 3
Bulk Pack N 450 20 0.20.450 3
I think the start of the problem is the Row is not getting populated and therefore the Lft and Rgt columns do not get populated by the following parts of the query.
Are there any sqlite experts out there to tell me:
am I translating the rowid part of the query correctly
does sqlite support a rowid in a part of a CTE query
is there a better way? :)
Any help appreciated :)
am I translating the rowid part of the query correctly
No.
The SQL:
SELECT
EmployeeLevels.*,
rowid AS Row
FROM EmployeeLevels
ORDER BY thePath
has the Row defined as the rowid of table EmployeeLevels in SQLite, ignoring the order clause. Which is different from the intention of ROW_NUMBER() OVER (ORDER BY thePath) AS Row
does sqlite support a rowid in a part of a CTE query
Unfortunately no. I assume you mean this:
WITH foo AS (
SELECT * FROM bar ORDER BY col_a
)
SELECT rowid, *
FROM foo
but SQLite will report no such column of rowid in foo.
is there a better way?
Not sure it is better but at least it works. In SQLite, you have a mechanism of temp table which exists as long as your connection opens and you didn't delete it deliberately. Rewrite the above SQL in my example:
CREATE TEMP TABLE foo AS
SELECT * FROM bar ORDER BY col_a
;
SELECT rowid, *
FROM foo
;
DROP TABLE foo
;
This one will run without SQLite complaining.
update:
As of SQLite version 3.25.0, window function is supported. Hence you can use row_number() over (order by x) expression in your CTE if you happen to use a newer SQLite

Zipping rows with the same "key" while joining tables

I have two tables, one with objects, one with properties of the objects. Both tables have a personal ID and a date as "key", but since multiple orders of objects can be done by one person on a single day, it doesn't match well. I do know however, that the entries are entered in the same order in both tables, so it is possible to join on the order, if the personID and date are the same.
This is what I want to accomplish:
Table 1:
PersonID Date Object
1 20-08-2013 A
2 13-11-2013 B
2 13-11-2013 C
2 13-11-2013 D
3 21-11-2013 E
Table 2:
PersonID Date Property
4 05-05-2013 $
1 20-08-2013 ^
2 13-11-2013 /
2 13-11-2013 *
2 13-11-2013 +
3 21-11-2013 &
Result:
PersonID Date Object Property
4 05-05-2013 $
1 20-08-2013 A ^
2 13-11-2013 B /
2 13-11-2013 C *
2 13-11-2013 D +
3 21-11-2013 E &
So what I want to do, is join the two tables and "zip" the group of entries that have the same (PersonID,Date) "key".
Something called "Slick" seems to have this (see here), but I'd like to do it in SQLite.
Any advice would be amazing!
You are on the right track. Why not just do a LEFT JOIN between the tables like
select t2.PersonID,
t2.Date,
t1.Object,
t2.Property
from table2 t2
left join table1 t1 on t2.PersonID = t1.PersonID
order by t2.PersonID
Use a additional column to make every key unique in both tables. For example in SQLite you could use RowIDs to keep track of the order of insertion. To store this additional column in the database itself might be useful for other queries as well, but you do not have to store this.
First add the column ID to both tables, the DDL queries should now look like this: (make sure you do not add the primary key constraint until both tables are filled.
CREATE TABLE table1 (
ID,
PersonID,
Date,
Object
);
CREATE TABLE table2 (
ID,
PersonID,
Date,
Property
);
Now populate the ID column. You can adjust the ID to your liking. Make sure you do this for table2 as well:
UPDATE table1
SET ID =(
SELECT table1.PersonID || '-' || table1.Date || '-' || count( * )
FROM table1 tB
WHERE table1.RowID >= tB.RowID
AND
table1.PersonID == tB.PersonID
AND
table1.Date == tB.Date
);
Now you can join them:
SELECT t2.PersonID,
t2.Date,
t1.Object,
t2.Property
FROM table2 t2
LEFT JOIN table1 t1
ON t2.ID = t1.ID;

Resources