I am trying to understand what's the best way to get specific rows out of a table in one query.
Let's say my table columns are:
Name, age, country
Let's say it has 1000 rows. I then have an array of indexes that represent the rows:
[2,10,20,34,50,120,400, 410,444,810]
How can I get the corresponding names? So, [name at row 2, name at row 10, ....]
I saw something like this online:
SELECT * FROM `posts` WHERE `id` IN (5,6,7,8,9,10)
but, I don't have an id column...do I need to manipulate the table or DB before pulling something like that off?
The following would work (to a fashion) :-
SELECT * FROM `posts` WHERE rowid IN (5,6,7,8,9,10)
This is because generally the ubiquitous id column is generally an alias of the special (normally hidden) rowid column.
That is a table that is not defined using WITHOUT ROWID will have a rowid column with automatically generated id's (unless the rowid is specified or implicitly specified via a column that is an alias of the rowid column).
An alias of the rowid column is defined by specifying ?? INTEGER PRIMARY KEY (where ?? represents a valid column name (often id)). The keyword AUTOINCREMENT can compliment INTEGER PRIMARY KEY and it adds a constraint that the generated id (i.e. the rowid) MUST increase, whilst without unused lower numbers can be used (only relevant when the highest id is 9223372036854775807). With AUTOINCREMENT, should this highest id be reached then an SQLITE_FULL exception will be encountered (without an attempt will be made to select a lower free number first).
The above is a summary of some of the points from SQLite Autoincrement.
Example Demonstration
Consider the following (the last two SELECT queries being the equivalent of the answer) :-
DROP TABLE IF EXISTS rowid_demo1;
DROP TABLE IF EXISTS rowid_demo2;
CREATE TABLE IF NOT EXISTS rowid_demo1 (name TEXT, age INTEGER, country TEXT);
CREATE TABLE IF NOT EXISTS rowid_demo2 (id_column INTEGER PRIMARY KEY, name TEXT, age INTEGER, country TEXT);
INSERT INTO rowid_demo1 (name,age,country) VALUES
('Fred',22,'England'),('Mary',18,'Scotland'),('Heather',19,'Wales');
INSERT INTO rowid_demo2 (name,age,country) VALUES
('Fred',22,'England'),('Mary',18,'Scotland'),('Heather',19,'Wales');
SELECT *,rowid as rowid_column FROM rowid_demo1;
SELECT *,rowid as rowid_column FROM rowid_demo2;
INSERT INTO rowid_demo1 (rowid,name,age,country) VALUES
(100,'George',21,'France');
INSERT INTO rowid_demo2 (rowid,name,age,country) VALUES
(100,'George',21,'France');
SELECT *,rowid as rowid_column FROM rowid_demo1;
SELECT *,rowid as rowid_column FROM rowid_demo2;
-- Ridiculous to do the following but ??????
INSERT INTO rowid_demo2 (rowid,id_column,name,age,country) VALUES
(500,501,'Beth',20,'Spain');
SELECT *,rowid as rowid_column FROM rowid_demo1;
SELECT *,rowid as rowid_column FROM rowid_demo2;
SELECT *,rowid AS rowid_column FROM rowid_demo1 WHERE rowid IN(2,3,100);
SELECT *,rowid AS rowid_column FROM rowid_demo2 WHERE rowid IN(2,3,100,501);
Explanation
This :-
creates 2 tables (the same other than rowid_demo2 has an alias of the rowid column) and - loads some data
runs 2 SELECT queries (one for each table) to shows the rows including the rowid column. (results 1 and 2)
some more data is loaded. However, this time by specifying values for the rowid
The same 2 queries are run (results 3 and 4)
a ridiculous row is added to the 2nd table (both the rowid and it's alias are given values)
Again the same 2 queries are run (results 5 and 6)
Queries equivalent to the answer are run (results 7 and 8).
Demonstration results :-
Related
Ie, are the following two SQL statements equivalent in SQLite?
CREATE TABLE posts (
id INTEGER PRIMARY KEY
);
CREATE TABLE posts (
id INTEGER PRIMARY KEY ASC
);
Yes they are.
There is no need to specify ASC and beware that if you were to specify DESC, then NO they are then not equivalent (see 4 below) as id INTEGER PRIMARY KEY DESC is an exclusion to the column being an alias of the rowid column as per :-
The exception mentioned above is that if the declaration of a column
with declared type "INTEGER" includes an "PRIMARY KEY DESC" clause, it
does not become an alias for the rowid and is not classified as an
integer primary key. This quirk is not by design. It is due to a bug
in early versions of SQLite. But fixing the bug could result in
backwards incompatibilities. Hence, the original behavior has been
retained (and documented) because odd behavior in a corner case is far
better than a compatibility break.
ROWIDs and the INTEGER PRIMARY KEY
You can use id INTEGER, PRIMARY KEY(id, DESC), but still the order defaults to ASC when retrieving the column as it is an alias of the rowid (see 5 below )
Perhaps consider the following :-
DROP TABLE IF EXISTS posts1;
CREATE TABLE posts1 (
id INTEGER PRIMARY KEY
);
DROP TABLE IF EXISTS posts2;
CREATE TABLE posts2 (
id INTEGER PRIMARY KEY ASC
);
DROP TABLE IF EXISTS posts3;
CREATE TABLE posts3 (
id INTEGER PRIMARY KEY DESC
);
DROP TABLE IF EXISTS posts4;
CREATE TABLE posts4 (
id INTEGER, PRIMARY KEY (id DESC)
);
INSERT INTO posts1 VALUES(null),(null),(null);
INSERT INTO posts2 VALUES(null),(null),(null);
INSERT INTO posts3 VALUES(null),(null),(null);
INSERT INTO posts4 VALUES(null),(null),(null);
SELECT * FROM sqlite_master WHERE name LIKE '%posts%';
SELECT * FROM posts1;
SELECT * FROM posts2;
SELECT * FROM posts3;
SELECT * FROM posts4;
Results
1
The query SELECT * FROM sqlite_master WHERE name LIKE '%posts%'; results in :-
As you can see posts3 is significantly different as the index sqlite_autoindex_posts3_1 has been created
The others do not have a specific index created as the id column is an alias of the rowid column
The data for rowid tables is stored as a B-Tree structure containing
one entry for each table row, using the rowid value as the key. This
means that retrieving or sorting records by rowid is fast. Searching
for a record with a specific rowid, or for all records with rowids
within a specified range is around twice as fast as a similar search
made by specifying any other PRIMARY KEY or indexed value.
ROWIDs and the INTEGER PRIMARY KEY
2
The query SELECT * FROM posts1; results in :-
3
The query SELECT * FROM posts2;, confirms the initial YES answer as per :-
4
The query SELECT * FROM posts3;, may be a little confusing, but shows that id INTEGER PRIMARY KEY DESC does not result in an alias of the rowid and in the case of no value or null being inserted into the column, the value is null rather than an auto generated value. There is no UNIQUE constraint conflict (as nulls are considered as being different values).
5
The query SELECT * FROM posts4; produces the same result as for 1 and 2 even though id INTEGER, PRIMARY KEY (id DESC) was used. Confirming that even if DESC is applied via the column definition that the sort order is still defaults to ASC (unless the ORDER BY clause is used).
Note that this peculiarity is specific to the rowid column or an alias thereof.
See both https://www.sqlite.org/lang_createtable.html#rowid and https://www.sqlite.org/lang_createindex.html for a more complete answer. Shawn's link is specific to INTEGER PRIMARY KEY which matches the example code, but the more general question is not answered explicitly in either location, but can be deduced by reading both.
Under SQL Data Constraints, the first link says
In most cases, UNIQUE and PRIMARY KEY constraints are implemented by creating a unique index in the database. (The exceptions are INTEGER PRIMARY KEY and PRIMARY KEYs on WITHOUT ROWID tables.)
The CREATE INDEX page explains that originally the sort order was ignored and all indices were generated in ascending order. Only as of version 3.3.0 is the DESC order "understood". But even that description is somewhat vague, however altogether it is apparent that ASC is the default.
I am learning SQLite and constructed a line which I thought would delete dups but it deletes all rows instead.
DELETE from tablename WHERE rowid not in (SELECT distinct(timestamp) from tablename);
I expected this to delete rows with a duplicate (leaving one). I know I can simply create a new table with the distinct rows, but why does what I have done not work? Thanks
If timestamp is a column in the table and this is what you want to compare so to delete duplicates then do this:
delete from tablename
where exists (
select 1 from tablename t
where t.rowid < tablename.rowid and t.timestamp = tablename.timestamp
)
With recent versions of sqlite, the following is an alternative:
DELETE FROM tablename
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, row_number() OVER (PARTITION BY timestamp) AS rownum
FROM tablename)
WHERE rownum >= 2);
why does what I have done not work?
Consider the WHERE condition:
rowid not in (SELECT distinct(timestamp) from tablename)
The simple answer is that you are not comparing data in the same columns, nor are they columns with the same type of data. rowid is an automatically-incremented integer column and I assume that timestamp column is either a numeric or string column containing time values, or perhaps custom-generated sequential numeric values. Because rowid likely never matches a value in timestamp, then the NOT IN operation will always return true. Thus each row of the table will be deleted.
SQL is rather explicit and so there are no hidden/mysterious column comparisons. It will not automatically compare the rowid's from one query with another. Notice that the various alternative statements do something to distinguish rows with duplicate key values (timestamp in your case), either by direct comparison between main query and subquery, or using windowing functions to uniquely label rows with duplicate values, etc.
Just for kicks, here's another alternative that uses NOT IN like your original code.
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT max(t.rowid) FROM tablename t
GROUP BY t.timestamp )
First notice that this is comparing rowid with max(t.rowid), values which derive from the same column.
Because the subquery groups on t.timestamp, the aggregate function max() will return the greatest/last t.rowid separately for each set of rows with the same t.timestamp value. The resultant list will exclude t.rowid values that are less than the maximum. Thus, the NOT IN operation will not find those lesser values and will return true so they will be deleted.
It also uses basic SQL (no window functions... the OVER keyword). It will likely be more efficient than the alternative that references the outer query from the subquery, because this statement can execute the subquery just once and then use an efficient index to match individual records... it doesn't need to rerun the query for each row. For that matter, it should also be more efficient than the windowing function, because the window partition essentially "groups" on the partitioned columns, but must then execute the windowing function for each row, an extra step not present in the basic aggregate query. Efficiency is not always critical, but something important to consider.
By the way, the distinct keyword is not a function and does not need/accept parenthesis. It is a directive that applies to the entire select statement. The subquery is being interpreted as
SELECT DISTINCT (timestamp) FROM tablename
where DISTINCT is interpreted in isolation and the parenthesis are interpreted as a separate expression.
Update
These two queries will return the same data:
SELECT DISTINCT timestamp FROM tablename;
SELECT timestamp FROM tablename GROUP BY timestamp;
Both results eliminate duplicate rows from the output by showing only unique/distinct values, but neither has a "handle" (other data column) which indicates which rows to keep and which rows to eliminate. In other words, these queries return distinct values, but the results loose all relationship to the source rows and so have no use in specifying which source rows to delete (or keep). To understand better, you should run subqueries separately to inspect what they return so that you can understand and verify what data you're working with.
To make those queries useful, we need to do something to distinguish rows with duplicate key values. The rows need a "handle"--some other key value to select for either deleting or keeping those rows. Try this...
SELECT DISTINCT rowid, timestamp FROM tablename;
But that won't work, because it applies the DISTINCT keyword to ALL returned columns, but since rowid is already unique it will necessarily output each row separately and so there is no use to the query.
SELECT max(rowid), timestamp FROM tablename GROUP BY timestamp;
That query preserves the unique grouping, but provides just one rowid per timestamp as the "handle" to include/exclude for deletion.
try this
DELETE liens from liens where
id in
( SELECT * FROM (SELECT min(id) FROM liens group by lkey having count(*) > 1 ) AS c)
you can do this many times
I have a page bank transfer that contains a column "transfer Number" in my asp.net application. This is an auto generated column. I am taking this value based based on the query:
IF (SELECT COUNT(1) FROM tableName) = 0
SELECT IDENT_CURRENT('tableName')
ELSE
SELECT IDENT_CURRENT('tableName') + 1
The problem is when two user login, it shows same transfer number for this page. How can I show different Transfer number for different users before inserting a record?
Thanks.
set the column as identity and the values will be generated automatically, and after insert, you can get the inserted id from the variable ##identity. Like this
Table Design
CREATE TABLE YourTable
(
SeqNo INT IDENTITY(1,1) PRIMARY KEY,
Col1 VARCHAR(50),
Col2 INT,
...
)
I have set the column SeqNo as Identity and Primary key.
Identity(1,1) means the first value will be 1 and then for each row the increment will be +1 and so on.
Now I insert a record to the table
INSERT INTO YourTable(Col1,Col2)
values('abc',1)
select ##identity
Now after the insert, the 2nd select will return me the value 1 as the value of the identity field is 1.
If I run this one more time, I will get 2 and so on
You can also call the System defined function SCOPE_IDENTITY() (in case triggers are involved) instead on ##IDENTITY
You can avoid conflicts using this since the values are generated by the database itself
I have an sqlite database where I need to insert spatial information along with metadata into an R*tree and an accompanying regular table. Each entry needs to be uniquely defined for the lifetime of the database. Therefore the regular table have an INTEGER PRIMARY KEY AUTOINCREMENT column and my plan was to start with the insert into this table, extract the last inserted rowids and use these for the insert into the R*tree. Alas this doesn't seem possible:
>testCon <- dbConnect(RSQLite::SQLite(), ":memory:")
>dbGetQuery(testCon, 'CREATE TABLE testTable (x INTEGER PRIMARY KEY, y INTEGER)')
>dbGetQuery(testCon, 'INSERT INTO testTable (y) VALUES ($y)', bind.data=data.frame(y=1:5))
>dbGetQuery(testCon, 'SELECT last_insert_rowid() FROM testTable')
last_insert_rowid()
1 5
2 5
3 5
4 5
5 5
Only the last inserted rowid seems to be kept (probably for performance reasons). As the number of records to be inserted is hundreds of thousands, it is not feasible to do the insert line by line.
So the question is: Is there any way to make the last_insert_rowid() bend to my will? And if not, what is the best failsafe alternative? Some possibilities:
Record highest rowid before insert and 'SELECT rowid FROM testTable WHERE rowid > prevRowid'
Get the number of rows to insert, fetch the last_insert_rowid() and use seq(to=lastRowid, length.out=nInserts)
While the two above suggestion at least intuitively should work I don't feel confident enough in sqlite to know if they are failsafe.
The algorithm for generating autoincrementing IDs is documented.
For an INTEGER PRIMARY KEY column, you can simply get the current maximum value:
SELECT IFNULL(MAX(x), 0) FROM testTable
and then use the next values.
I have a sqlite statement that will only insert one row.
INSERT INTO queue (TransKey, CreateDateTime, Transmitted)
VALUES (
(SELECT Id from trans WHERE Id != (SELECT TransKey from queue)),
'2013-12-19T19:47:33',
0
)
How would I have it insert every row where Id from trans != (SELECT TransKey from queue) in one statement?
INSERT INTO queue (TransKey, CreateDateTime, Transmitted)
SELECT Id, '2013-12-19T19:47:33', 0
FROM trans WHERE Id != (SELECT TransKey from queue)
There are two different "flavors" of INSERT. The one you're using (VALUES) inserts one or more rows that you "create" in the INSERT statement itself. The other flavor (SELECT) inserts a variable number of rows that are retrieved from one or more other tables in the database.
While it's not immediately obvious, the SELECT version allows you to include expressions and simple constants -- as long as the number of columns lines up with the number of columns you're inserting, the statement will work (in other databases, the types of the values must match the column types as well).