Date difference between two separate rows in SQLite with no ID - sqlite

I have data in SQLite like this (a few thousands of rows):
1536074432|startRecording
1536074434|stopRecording
1536074443|startRecording
1536074447|stopRecording
1536074458|startRecording
1536074462|stopRecording
And I'd like to get the amounts of seconds passed between two consecutive distinct events (basically how many seconds of video I've recorded).
I know about another similar question (
Date Difference between consecutive rows ), but in my case it's different because I cannot get the "next" row by ID, but I have to get it based on a different event name.
There is an answer that works magic, but it's specific to SQL Server ( Query to find the time difference between successive events ), and I need this for SQLite.
I could do this in Oracle with the LAG / LEAD functions, but no idea how to do it in SQLite.
I could also do this with a separate parsing script, but I think it would be more efficient to be able to do this directly from a query.

Even though there is no id in the table, sqlite stores a rowid (from sqlite CREATE_TABLE doc):
ROWIDs and the INTEGER PRIMARY KEY
Except for WITHOUT ROWID tables, all rows within SQLite tables have a 64-bit signed integer key that uniquely identifies the row within its table. This integer is usually called the "rowid". The rowid value can be accessed using one of the special case-independent names "rowid", "oid", or "rowid" in place of a column name. If a table contains a user defined column named "rowid", "oid" or "rowid", then that name always refers the explicitly declared column and cannot be used to retrieve the integer rowid value.
Assuming perfectly clean data as described :) how about:
select a.rowid,a.time,a.event,b.rowid,b.time,b.event,b.time - a.time as elapsed --,sum(b.time-a.time)
from t2 a, t2 b
where a.rowid % 2 = 1
and b.rowid = a.rowid + 1

Related

Deleting duplicate rows

I am learning SQLite and constructed a line which I thought would delete dups but it deletes all rows instead.
DELETE from tablename WHERE rowid not in (SELECT distinct(timestamp) from tablename);
I expected this to delete rows with a duplicate (leaving one). I know I can simply create a new table with the distinct rows, but why does what I have done not work? Thanks
If timestamp is a column in the table and this is what you want to compare so to delete duplicates then do this:
delete from tablename
where exists (
select 1 from tablename t
where t.rowid < tablename.rowid and t.timestamp = tablename.timestamp
)
With recent versions of sqlite, the following is an alternative:
DELETE FROM tablename
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, row_number() OVER (PARTITION BY timestamp) AS rownum
FROM tablename)
WHERE rownum >= 2);
why does what I have done not work?
Consider the WHERE condition:
rowid not in (SELECT distinct(timestamp) from tablename)
The simple answer is that you are not comparing data in the same columns, nor are they columns with the same type of data. rowid is an automatically-incremented integer column and I assume that timestamp column is either a numeric or string column containing time values, or perhaps custom-generated sequential numeric values. Because rowid likely never matches a value in timestamp, then the NOT IN operation will always return true. Thus each row of the table will be deleted.
SQL is rather explicit and so there are no hidden/mysterious column comparisons. It will not automatically compare the rowid's from one query with another. Notice that the various alternative statements do something to distinguish rows with duplicate key values (timestamp in your case), either by direct comparison between main query and subquery, or using windowing functions to uniquely label rows with duplicate values, etc.
Just for kicks, here's another alternative that uses NOT IN like your original code.
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT max(t.rowid) FROM tablename t
GROUP BY t.timestamp )
First notice that this is comparing rowid with max(t.rowid), values which derive from the same column.
Because the subquery groups on t.timestamp, the aggregate function max() will return the greatest/last t.rowid separately for each set of rows with the same t.timestamp value. The resultant list will exclude t.rowid values that are less than the maximum. Thus, the NOT IN operation will not find those lesser values and will return true so they will be deleted.
It also uses basic SQL (no window functions... the OVER keyword). It will likely be more efficient than the alternative that references the outer query from the subquery, because this statement can execute the subquery just once and then use an efficient index to match individual records... it doesn't need to rerun the query for each row. For that matter, it should also be more efficient than the windowing function, because the window partition essentially "groups" on the partitioned columns, but must then execute the windowing function for each row, an extra step not present in the basic aggregate query. Efficiency is not always critical, but something important to consider.
By the way, the distinct keyword is not a function and does not need/accept parenthesis. It is a directive that applies to the entire select statement. The subquery is being interpreted as
SELECT DISTINCT (timestamp) FROM tablename
where DISTINCT is interpreted in isolation and the parenthesis are interpreted as a separate expression.
Update
These two queries will return the same data:
SELECT DISTINCT timestamp FROM tablename;
SELECT timestamp FROM tablename GROUP BY timestamp;
Both results eliminate duplicate rows from the output by showing only unique/distinct values, but neither has a "handle" (other data column) which indicates which rows to keep and which rows to eliminate. In other words, these queries return distinct values, but the results loose all relationship to the source rows and so have no use in specifying which source rows to delete (or keep). To understand better, you should run subqueries separately to inspect what they return so that you can understand and verify what data you're working with.
To make those queries useful, we need to do something to distinguish rows with duplicate key values. The rows need a "handle"--some other key value to select for either deleting or keeping those rows. Try this...
SELECT DISTINCT rowid, timestamp FROM tablename;
But that won't work, because it applies the DISTINCT keyword to ALL returned columns, but since rowid is already unique it will necessarily output each row separately and so there is no use to the query.
SELECT max(rowid), timestamp FROM tablename GROUP BY timestamp;
That query preserves the unique grouping, but provides just one rowid per timestamp as the "handle" to include/exclude for deletion.
try this
DELETE liens from liens where
id in
( SELECT * FROM (SELECT min(id) FROM liens group by lkey having count(*) > 1 ) AS c)
you can do this many times

Make select query return in order of arguments

I have a relatively simple select query which asks for rows by an column value (this is not controlled by me). I pass in a variable argument of id values to be returned. Here's an example:
select * from team where id in (2, 1, 3)
I'm noticing that as the database changes its order over time, my results are changing order as well. Is there a way to make SQLite guarantee results in the same order as the arguments?
If you could have so many IDs that the query becomes unwieldy, use a temporary table to store them:
CREATE TEMPORARY TABLE SearchIDs (
ID,
OrderNr INTEGER PRIMARY KEY
);
(The OrderNr column is autoincrementing so that it automatically gets proper values when you insert values.)
To do the search, you have to fill this table:
INSERT INTO SearchIDs(ID) VALUES (2), (1), (3) ... ;
SELECT Team.*
FROM Team
JOIN SearchIDs USING (ID)
ORDER BY SearchIDs.OrderNr;
DELETE FROM SearchIDs;
Try this!
select * from team order by
case when 2 then 0
when 1 then 1
when 3 then 2
end

Insert or ignore every column

I have a problem with a sqlite command.
I have a table with three columns: Id, user, number.
The id is continuing. Now if I put a user and a number inside my list, my app should compare if such a user with this number already exist. The problem is, if I use a standard "insert or ignore" command, the Id column is not fixed, so I will get a new entry every time.
So is it possible just two compare two of three columns if they are equal?
Or do I have to use a temporary list, where are only two columns exist?
The INSERT OR IGNORE statement ignores the new record if it would violate a UNIQUE constraint.
Such a constraint is created implicitly for the PRIMARY KEY, but you can also create one explicitly for any other columns:
CREATE TABLE MyTable (
ID integer PRIMARY KEY,
User text,
Number number,
UNIQUE (User, Number)
);
You shouldn't use insert or ignore unless you are specifying the key, which you aren't and in my opinion never should if your key is an Identity (Auto number).
Based on User and Number making a record in your table unique, you don't need the id column and your primary key should be user,number.
If for some reason you don't want to do that, and bearing in mind in that case you are saying that User,Number is not your uniqueness constraint then something like
if not exists(Select 1 From MyTable Where user = 10 and Number = 15)
Insert MyTable(user,number) Values(10,15)
would do the job. Not a SqlLite boy, so you might have to rwiddle with the syntax and wrap escape your column names.

SQLite - Selecting not indexed column in GROUP BY

I have similar situation like question below.
Mysql speed up max() group by
SELECT MAX(id) id, cid FROM table GROUP BY cid
To optimize above query (shown in the question), creating index(cid, id) does the trick.
However, when I add a column that is not indexed to SELECT, query speed drastically slows down.
For example,
SELECT MAX(id) id, cid, newcolumn FROM table GROUP BY cid
If I create index(cid, id, newcolumn), query time comes back to minimal. It seems I should index all the columns I select while using GROUP BY.
Is there any way other than indexing all the columns to be select?
When all the columns used in the query are part of the index (which is then called a covering index), SQLite can get all values from the index and does not need to access the table itself.
When adding a column that is not indexed, each record must be looked up in both the index and the table.
Furthermore, the order of the records in the table is unlikely to be the same as the order in the index, so the table's pages are not read in order, and are read multiple times, which means that caching will not work as well.
The newcolumn values must be read from either the table or an index; there is no other mechanism to store data.
tl;dr: no

Speed up SQL select in SQLite

I'm making a large database that, for the sake of this question, let's say, contains 3 tables:
A. Table "Employees" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
B. Table "Job_Sites" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
C. Table "Workdays" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
emp_id = is a foreign key to Employees(id)
job_id = is a foreign key to Job_Sites(id)
datew = INTEGER that stands for the actual workday, represented by a Unix date in seconds since midnight of Jan 1, 1970
The most common operation in this database is to display workdays for a specific employee. I perform the following select statement:
SELECT * FROM Workdays WHERE emp_id='Actual Employee ID' AND job_id='Actual Job Site ID' AND datew>=D1 AND datew<D2
I need to point out that D1 and D2 are calculated for the beginning of the month in search and for the next month, respectively.
I actually have two questions:
Should I set any fields as indexes besides primary indexes? (Sorry, I seem to misunderstand the whole indexing concept)
Is there any way to re-write the Select statement to maybe speed it up. For instance, most of the checks in it would be to see that the actual employee ID and job site ID match. Maybe there's a way to split it up?
PS. Forgot to say, I use SQLite in a Windows C++ application.
If you use the above query often, then you may get better performance by creating a multicolumn index containing the columns in the query:
CREATE INDEX WorkdaysLookupIndex ON Workdays (emp_id, job_id, datew);
Sometimes you just have to create the index and try your queries to see what is faster.

Resources