I found a good article on converting adjacency to nested sets at http://dataeducation.com/the-hidden-costs-of-insert-exec/
The SQL language used is Microsoft SQL Server (I think) and I am trying to convert the examples given in the article to sqlite (as this is what I have easy access to on my Macbook).
The problem I appear to be having is converting the part of the overall CTE query to do with the Employee Rows
EmployeeRows AS
(
SELECT
EmployeeLevels.*,
ROW_NUMBER() OVER (ORDER BY thePath) AS Row
FROM EmployeeLevels
)
I converted this to
EmployeeRows AS
(
SELECT
EmployeeLevels.*,
rowid AS Row
FROM EmployeeLevels
ORDER BY thePath
)
and the CTE query runs (no syntax errors) but the output I get is a table without the Row and Lft and Rgt columns populated
ProductName ProductID ParentProductID TreePath HLevel Row Lft Rgt
----------- ---------- --------------- ---------- ---------- ---------- ---------- ----------
Baby Goods 0 0 1
Baby Food 10 0 0.10 2
All Ages Ba 100 10 0.10.100 3
Strawberry 200 100 0.10.100.2 4
Baby Cereal 250 100 0.10.100.2 4
Beginners 150 10 0.10.150 3
Formula Mil 300 150 0.10.150.3 4
Heinz Formu 310 300 0.10.150.3 5
Nappies 20 0 0.20 2
Small Pack 400 20 0.20.400 3
Bulk Pack N 450 20 0.20.450 3
I think the start of the problem is the Row is not getting populated and therefore the Lft and Rgt columns do not get populated by the following parts of the query.
Are there any sqlite experts out there to tell me:
am I translating the rowid part of the query correctly
does sqlite support a rowid in a part of a CTE query
is there a better way? :)
Any help appreciated :)
am I translating the rowid part of the query correctly
No.
The SQL:
SELECT
EmployeeLevels.*,
rowid AS Row
FROM EmployeeLevels
ORDER BY thePath
has the Row defined as the rowid of table EmployeeLevels in SQLite, ignoring the order clause. Which is different from the intention of ROW_NUMBER() OVER (ORDER BY thePath) AS Row
does sqlite support a rowid in a part of a CTE query
Unfortunately no. I assume you mean this:
WITH foo AS (
SELECT * FROM bar ORDER BY col_a
)
SELECT rowid, *
FROM foo
but SQLite will report no such column of rowid in foo.
is there a better way?
Not sure it is better but at least it works. In SQLite, you have a mechanism of temp table which exists as long as your connection opens and you didn't delete it deliberately. Rewrite the above SQL in my example:
CREATE TEMP TABLE foo AS
SELECT * FROM bar ORDER BY col_a
;
SELECT rowid, *
FROM foo
;
DROP TABLE foo
;
This one will run without SQLite complaining.
update:
As of SQLite version 3.25.0, window function is supported. Hence you can use row_number() over (order by x) expression in your CTE if you happen to use a newer SQLite
Related
There is an overbearing chance that this might be an incredibly stupid question, so bear with me :)
I have over the last couple of weeks been learning and implementing Sqlite on some data for a project. I love the concept of keys, but there is however one thing that I cannot wrap my head around.
How do you reference the foreign key when inserting a big dataset in the db? Ill give you an example:
Im inserting say 300 rows of data, each row containing ("a","b","c","d","e","f","g"). Everything is going into the same table(original_table).
Now that i have my data in the db, I want to create another table(secondary_table) for the values "c". I then naturally want original_table to have a foreign key which links to the secondary_tables primary key.
I understand that you can create a foreign key before inserting, and then replacing "c" with the corresponding integer before you insert. This however seems very ineffiecient as you would have to replace huge amounts of data before inserting.
So my question is how can I have the foreign key replace the text in an already created table?
Cheers
So my question is how can I have the foreign key replace the text in
an already created table?
yes/no
That is you you can replace column C with the reference to the secondary table (as has been done below in addition to adding the new suggested column) BUT without dropping the table you CANNOT redefine the column's attributes and therefore make it have a type affinity of INTEGER (not really an issue) or specify that it has the FOREIGN KEY constraint.
Mass update is probably not an issue (not not even done withing a transaction here) for something like 300 rows.
How do you reference the foreign key when inserting a big dataset in
the db?
Here's the SQL for how you could do this but instead of trying to play around with column C add a new column that effectively makes column C redundant. However, the new column will have INTEGER type affinity and also have the FOREIGN KEY constraint applied.
300 rows is nothing, the example code uses 3000 rows, although column C only contains a short text value.
:-
-- Create the original table with column c having a finite number of values (0-25)
DROP TABLE IF EXISTS original_table;
CREATE TABLE IF NOT EXISTS original_table (A TEXT, B TEXT, C TEXT, D TEXT, E TEXT, F TEXT, G TEXT);
-- Load the original table with some data
WITH RECURSIVE counter(cola,colb,colc,cold,cole,colf,colg) AS (
SELECT random() % 26 AS cola, random() % 26 AS colb,abs(random() % 26) AS colc,random() % 26 AS cold,random() % 26 AS cole,random() % 26 AS colf,random() % 26 AS colg
UNION ALL
SELECT random() % 26 AS cola, random() % 26 AS colb,abs(random()) % 26 AS colc,random() % 26 AS cold,random() % 26 AS cole,random() % 26 AS colf,random() % 26 AS colg
FROM counter LIMIT 3000
)
INSERT INTO original_table SELECT * FROM counter;
SELECT * FROM original_table ORDER BY C ASC; -- Query 1 the original original_table
-- Create the secondary table by extracting values from the C column of the original table
DROP TABLE IF EXISTS secondary_table;
CREATE TABLE IF NOT EXISTS secondary_table (id INTEGER PRIMARY KEY, c_value TEXT);
INSERT INTO secondary_table (c_value) SELECT DISTINCT C FROM original_table ORDER BY C ASC;
SELECT * FROM secondary_table; -- Query 2 the new secondary table
-- Add the new column as a Foreign key to reference the new secondary_table
ALTER TABLE original_table ADD COLUMN secondary_table_reference INTEGER REFERENCES secondary_table(id);
SELECT * FROM original_table; -- Query 3 the altered original_table but without any references
-- Update the original table to apply the references to the secondary_table
UPDATE original_table
SET secondary_table_reference = (SELECT id FROM secondary_table WHERE c_value = C)
-- >>>>>>>>>> NOTE USE ONLY 1 OR NONE OF THE FOLLOWING 2 LINES <<<<<<<<<<
, C = null; -- OPTIONAL TO CLEAR COLUMN C
-- , C = (SELECT id FROM secondary_table WHERE c_value = C) -- ANOTHER OPTION SET C TO REFERENCE SECONDARY TABLE
;
SELECT * FROM original_table; -- Query 4 the final original table i.e. with references applied (column C now not needed)
Hopefully comments explain.
Results :-
Query 1 The original table without the secondary table :-
Query 2 The secondary table as generated from the original table :-
Query 3 The altered original_table without references applied :-
Query 4 The original table after application of references (applied to new column and old C column) :-
Timings (would obviously depend on numerous factors) :-
-- Create the original table with column c having a finite number of values (0-25)
DROP TABLE IF EXISTS original_table
> OK
> Time: 0.94s
CREATE TABLE IF NOT EXISTS original_table (A TEXT, B TEXT, C TEXT, D TEXT, E TEXT, F TEXT, G TEXT)
> OK
> Time: 0.353s
-- Load the original table with some data
WITH RECURSIVE counter(cola,colb,colc,cold,cole,colf,colg) AS (
SELECT random() % 26 AS cola, random() % 26 AS colb,abs(random() % 26) AS colc,random() % 26 AS cold,random() % 26 AS cole,random() % 26 AS colf,random() % 26 AS colg
UNION ALL
SELECT random() % 26 AS cola, random() % 26 AS colb,abs(random()) % 26 AS colc,random() % 26 AS cold,random() % 26 AS cole,random() % 26 AS colf,random() % 26 AS colg
FROM counter LIMIT 3000
)
INSERT INTO original_table SELECT * FROM counter
> Affected rows: 3000
> Time: 0.67s
SELECT * FROM original_table ORDER BY C ASC
> OK
> Time: 0.012s
-- Query 1 the original original_table
-- Create the secondary table by extracting values from the C column of the original table
DROP TABLE IF EXISTS secondary_table
> OK
> Time: 0.328s
CREATE TABLE IF NOT EXISTS secondary_table (id INTEGER PRIMARY KEY, c_value TEXT)
> OK
> Time: 0.317s
INSERT INTO secondary_table (c_value) SELECT DISTINCT C FROM original_table ORDER BY C ASC
> Affected rows: 26
> Time: 0.24s
SELECT * FROM secondary_table
> OK
> Time: 0s
-- Query 2 the new secondary table
-- Add the new column as a Foreign key to reference the new secondary_table
ALTER TABLE original_table ADD COLUMN secondary_table_reference INTEGER REFERENCES secondary_table(id)
> OK
> Time: 0.31s
SELECT * FROM original_table
> OK
> Time: 0.01s
-- Query 3 the altered original_table but without any references
-- Update the original table to apply the references to the secondary_table
UPDATE original_table
SET secondary_table_reference = (SELECT id FROM secondary_table WHERE c_value = C)
-- , C = null; -- OPTIONAL TO CLEAR COLUMN C
, C = (SELECT id FROM secondary_table WHERE c_value = C)
> Affected rows: 3000
> Time: 0.743s
SELECT * FROM original_table
> OK
> Time: 0.01s
-- Query 4 the final original table i.e. with references applied (column C now not needed)
> not an error
> Time: 0s
Supplementary Query
The following query utilises the combined tables :-
SELECT A,B,D,E,F,G, secondary_table.c_value FROM original_table JOIN secondary_table ON secondary_table_reference = secondary_table.id;
To result in :-
Note the data will not correlate with the previous results as this was run as a separate run and the data is generated randomly.
I have a database with "num" table like this
user_id | number | unix_time
-----------------------------
123 2 xxxxxxxx
123 40 xxxxxxxx
123 24 xxxxxxxx
333 23 xxxxxxxx
333 67 xxxxxxxx
854 90 xxxxxxxx
I'd like to select the last 5 numbers inserted by each user_id, but I can't figure out how to do it.
I tried:
SELECT b.n, a.user_id
FROM num a
JOIN num b on a.user_id = b.user_id
WHERE (
SELECT COUNT(*)
FROM num b2
WHERE b2.n <= b.n
AND b2.user_id = b.user_id
) <= 5
I am adapting the answer from (sql query - how to apply limit within group by).
I use "2" instead of "5" to make the effect visible within your sample data.
Note that I used actual dates instead of your "xxxxxxxx", assuming that most likely you mean "most recent 5" when you write "last 5" and that only works for actual times.
select * from toy a
where a.ROWID IN
( SELECT b.ROWID FROM toy b
WHERE b.user_id = a.user_id
ORDER by unix_time DESC
LIMIT 2
) ;
How is it done:
make on-the-fly tables (i.e. the part within ())
one for each user_id, WHERE b.user_id = a.user_id
order each on-the-fly table separatly (that is the first trick),
by doing the ordering inside the ()
order chronologically backwards ORDER by unix_time DESC
limit to 5 (in the example 2) entries LIMIT 2
limit each on-the-fly table separatly (that is the second trick),
by doing the limiting inside the ()
select everything from the actual table, select * from toy,
but only select from the actual table those lines which occur in the total of all on-the-fly tables,
where a.ROWID IN (
introduce the distinguishing alias "a" for the total view of the table,
toy a
introduce the distinguishing alias "b" for the single-user_id view of the table,
toy b
By the way, here is the dump of what I used for testing
(it is a convenient way of making most of a MCVE):
BEGIN TRANSACTION;
CREATE TABLE toy (user_id int, number int, unix_time date);
INSERT INTO toy VALUES(123,2,'1970-01-01 05:33:20');
INSERT INTO toy VALUES(123,40,'1970-01-01 06:56:40');
INSERT INTO toy VALUES(123,24,'1970-01-01 08:20:00');
INSERT INTO toy VALUES(333,23,'1970-01-01 11:06:40');
INSERT INTO toy VALUES(333,67,'1970-01-01 12:30:00');
INSERT INTO toy VALUES(854,90,'1970-01-01 13:53:20');
COMMIT;
If you want to select last 5 records from the SQlite database then use query
SELECT * FROM table_name ORDER BY user_id DESC LIMIT 5;
Using this query you can select last n transactions...Hope I helped you
The PROD_AMT I'd like to get is when ACCT_NBR, PROD_NBR And PROD_AMT are the same, I only need one PROD_AMT which is 100 (from distinct), and when ACCT_NBR are the same but PROD_NBR are different, then the PROD_AMT I need is 90 (30+60)
SELECT ACCT_NBR
,COUNT(DISTINCT CASE WHEN PROD_NBR = 1 THEN SUM(DISTINCT PROD_AMT)
WHEN PROD_NBR > 1 THEN SUM(PROD_AMT)
END) AS AMT
FROM TABLE
ACCT_NBR PROD_NBR PROD_AMT
3007 001 30
3007 002 60
1000 003 100
1000 003 100
There's probably a few ways to solve this. Using a subquery to determine which records should be summed vs which ones should be distinct, you could use:
SELECT
acct_nbr,
CASE WHEN sumflag = 'X' THEN SUM(prod_amt) ELSE MAX(prod_amt) END as amt
FROM
(
SELECT
acct_nbr,
prod_nbr,
prod_amt,
CASE WHEN COUNT(*) OVER (PARTITION BY Acct_nbr, prod_nbr, prod_amt) = 1 THEN 'X' ELSE NULL END AS sumflag
FROM
table
)t1
GROUP BY acct_nbr, sumflag
I'm just using MAX() here since it doesn't matter... all the values that will be aggregated with max() we know are duplicates, so it's a wash.
You could get similar results with a UNION query where one query would do the summing in the event that the records are distinct, and the other would just return distinct prod_amt's where the records are duplicates.
While the above example is nice if you truly have different aggregation needs depending on complex logic, for your question there's a simpler way of doing the same thing that doesn't use window functions:
SELECT
acct_nbr,
sum(prod_amt) AS amt
FROM
(
SELECT DISTINCT
acct_nbr,
prod_amt
FROM
table
)t1
GROUP BY 1
If you need to adapt this to a complex statement you could just sling your complex statement in as subquery where table is above like:
SELECT
acct_nbr,
sum(prod_amt) AS amt
FROM
(
SELECT DISTINCT
acct_nbr,
prod_amt
FROM
(
YOUR REALLY COMPLEX QUERY GOES IN HERE
)t2
)t1
GROUP BY 1
I would like to determine particular IDs that are not present in a table.
For example, I have the IDs 1, 2 and 3 and want to know if they exist in the table.
Essentially this would boil down to:
SELECT id FROM (
SELECT 1 AS id
UNION
SELECT 2 AS id
UNION
SELECT 3 AS id
)
WHERE
NOT EXISTS (SELECT * FROM table WHERE table.id = id)
Suppose table had the IDs 1 and 4, then this would yield 2 and 3.
Are there more elegant / concise / faster ways to get those IDs in SQLite ?
The compound SELECT operator EXCEPT allows you to do something similar to NOT EXISTS:
SELECT 1 AS id UNION ALL
SELECT 2 UNION ALL
SELECT 3
EXCEPT
SELECT id FROM MyTable
Beginning with SQLite 3.8.3, you can use VALUES everywhere you could use SELECT, but this is just a different syntax:
VALUES (1),
(2),
(3)
EXCEPT
SELECT id FROM MyTable
I'm adding an 'index' column to a table in SQLite3 to allow the users to easily reorder the data, by renaming the old database and creating a new one in its place with the extra columns.
The problem I have is that I need to give each row a unique number in the 'index' column when I INSERT...SELECT the old values.
A search I did turned up a useful term in Oracle called ROWNUM, but SQLite3 doesn't have that. Is there something equivalent in SQLite?
You can use one of the special row names ROWID, OID or _ROWID_ to get the rowid of a column. See http://www.sqlite.org/lang_createtable.html#rowid for further details (and that the rows can be hidden by normal columns called ROWID and so on).
Many people here seems to mix up ROWNUM with ROWID. They are not the same concept and Oracle has both.
ROWID is a unique ID of a database ROW. It's almost invariant (changed during import/export but it is the same across different SQL queries).
ROWNUM is a calculated field corresponding to the row number in the query result. It's always 1 for the first row, 2 for the second, and so on. It is absolutely not linked to any table row and the same table row could have very different rownums depending of how it is queried.
Sqlite has a ROWID but no ROWNUM. The only equivalent I found is ROW_NUMBER() function (see http://www.sqlitetutorial.net/sqlite-window-functions/sqlite-row_number/).
You can achieve what you want with a query like this:
insert into new
select *, row_number() over ()
from old;
No SQLite doesn't have a direct equivalent to Oracle's ROWNUM.
If I understand your requirement correctly, you should be able to add a numbered column based on ordering of the old table this way:
create table old (col1, col2);
insert into old values
('d', 3),
('s', 3),
('d', 1),
('w', 45),
('b', 5465),
('w', 3),
('b', 23);
create table new (colPK INTEGER PRIMARY KEY AUTOINCREMENT, col1, col2);
insert into new select NULL, col1, col2 from old order by col1, col2;
The new table contains:
.headers on
.mode column
select * from new;
colPK col1 col2
---------- ---------- ----------
1 b 23
2 b 5465
3 d 1
4 d 3
5 s 3
6 w 3
7 w 45
The AUTOINCREMENT does what its name suggests: each additional row has the previous' value incremented by 1.
I believe you want to use the constrain LIMIT in SQLite.
SELECT * FROM TABLE can return thousands of records.
However, you can constrain this by adding the LIMIT keyword.
SELECT * FROM TABLE LIMIT 5;
Will return the first 5 records from the table returned in you query - if available
use this code For create Row_num 0....count_row
SELECT (SELECT COUNT(*)
FROM main AS t2
WHERE t2.col1 < t1.col1) + (SELECT COUNT(*)
FROM main AS t3
WHERE t3.col1 = t1.col1 AND t3.col1 < t1.col1) AS rowNum, * FROM Table_name t1 WHERE rowNum=0 ORDER BY t1.col1 ASC