I'm writing an SQLite select statement and want to pick out the first hit only that satisfy my criterion.
My problem is that I'm writing code inside a simulation framework that wraps my SQLite code before sending it to the database, and this wrapping already adds 'LIMIT 100' to the end of the code.
What I want to do:
SELECT x, y, z FROM myTable WHERE a = 0 ORDER BY y LIMIT 1
What happens when this simulation development framework has done its job:
SELECT x, y, z FROM myTable WHERE a = 0 ORDER BY y LIMIT 1 LIMIT 100
exec error near "LIMIT": syntax error
So my question is: How do I work around this limitation? Is there any way to still limit my results to give only one hit back despite that the statement will end in 'LIMIT 100'? I'm thinking something like creating a temporary table, add an index and filter on that, but my knowledge is limited to simple database queries.
Related
I am using the following query to obtain the current component serial number (tr_sim_sn) installed on the host device (tr_host_sn) from the most recent record in a transaction history table (PUB.tr_hist)
SELECT tr_sim_sn FROM PUB.tr_hist
WHERE tr_trnsactn_nbr = (SELECT max(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_lot = '99524136'
AND tr_part = '6684112-001')
The actual table has ~190 million records. The excerpt below contains only a few sample records, and only fields relevant to the search to illustrate the query above:
tr_sim_sn |tr_host_sn* |tr_host_pn |tr_domain |tr_trnsactn_nbr |tr_qty_loc
_______________|____________|_______________|___________|________________|___________
... |
356136072015140|99524135 |6684112-000 |vattal_us |178415271 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178424418 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178628048 |1.0000000000
356136072015050|99524136 |6684112-001 |vattal_us |178628051 |-1.0000000000
356136072015836|99524137 |6684112-005 |vattal_us |178645337 |-1.0000000000
...
* = key field
The excerpt illustrates multiple occurrences of tr_trnsactn_nbr for a single value of tr_host_sn. The largest value for tr_trnsactn_nbr corresponds to the current tr_sim_sn installed within tr_host_sn.
This query works, but it is very slow, ~8minutes.
I would appreciate suggestions to improve or refactor this query to improve its speed.
Check with your admins to determine when they last updated the SQL statistics. If the answer is "we don't know" or "never" then you might want to ask them to run the following 4gl program which will create a SQL script to accomplish that:
/* genUpdateSQL.p
*
* mpro dbName -p util/genUpdateSQL.p -param "tmp/updSQLstats.sql"
*
* sqlexp -user userName -password passWord -db dnName -S servicePort -infile tmp/updSQLstats.sql -outfile tmp/updSQLtats.log
*
*/
output to value( ( if session:parameter <> "" then session:parameter else "updSQLstats.sql" )).
for each _file no-lock where _hidden = no:
put unformatted
"UPDATE TABLE STATISTICS AND INDEX STATISTICS AND ALL COLUMN STATISTICS FOR PUB."
'"' _file._file-name '"' ";"
skip
.
put unformatted "commit work;" skip.
end.
output close.
return.
This will generate a script that updates statistics for all table and all indexes. You could edit the output to only update the tables and indexes that are part of this query if you want.
Also, if the admins are nervous they could, of course, try this on a test db or a restored backup before implementing in a production environment.
I am posting this as a response to my request for an improved query.
As it turns out, the following syntax features two distinct features that greatly improved the speed of the query. One is to include tr_domain search criteria in both main and nested portions of the query. Second is to narrow the search by increasing the number of search criteria, which in the following are all included in the nested section of the syntax:
SELECT tr_sim_sn,
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_trnsactn_nbr IN (
SELECT MAX(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_part = '6684112-001'
AND tr_lot = '99524136'
AND tr_type = 'ISS-WO'
AND tr_qty_loc < 0)
This syntax results in ~0.5s response time. (credit to my colleague, Daniel V.)
To be fair, this query uses criteria outside the originally stated parameters that were included in the original post, making it difficult to impossible for others to attempt a reasonable answer. This omission was not on purpose of course, rather due to being fairly new to fundamentals of good query design. This query in part is a result of learning that when too-few or non-indexed fields are used as search criteria in a large table, it is sometimes helpful to narrow the search by increasing the number of search criteria items. The original had 3, this one has 5.
I'm making a flight tracking map that will need to pull live data from a sql lite db. I'm currently just using the sqlite executable to navigate the db and understand how to interact with it. Each aircraft is identified by a unique hex_ident. I want to get a list of all aircraft that have sent out a signal in the last minute as a way of identifying which aircraft are actually active right now. I tried
select distinct hex_ident, parsed_time
from squitters
where parsed_time >= Datetime('now','-1 minute')
I expected a list of 4 or 5 hex_idents only but I'm just getting a list of every entry (today's entries only) and some are outside the 1 minute bound. I'm new to sql so I don't really know how to do this yet. Here's what each entry looks like. The table is called squitters.
{
"message_type":"MSG",
"transmission_type":8,
"session_id":"111",
"aircraft_id":"11111",
"hex_ident":"A1B4FE",
"flight_id":"111111",
"generated_date":"2021/02/12",
"generated_time":"14:50:42.403",
"logged_date":"2021/02/12",
"logged_time":"14:50:42.385",
"callsign":"",
"altitude":"",
"ground_speed":"",
"track":"",
"lat":"",
"lon":"",
"vertical_rate":"",
"squawk":"",
"alert":"",
"emergency":"",
"spi":"",
"is_on_ground":0,
"parsed_time":"2021-02-12T19:50:42.413746"
}
Any ideas?
You must remove 'T' from the value of parsed_time or use datetime() for it also to make the comparison work:
where datetime(parsed_time) >= datetime('now', '-1 minute')
Note that datetime() function does not take into account microseconds, so if you need 100% accuracy, you must put them in the code with concatenation:
where replace(parsed_time, 'T', ' ') >=
datetime('now', '-1 minute') || substr(parsed_time, instr(parsed_time, '.'))
Using SQLite 3.19.2, I'm running into an odd situation.
One of the queries my application performs takes an enormously long time (100+ seconds) when running inside my app. Using the sqlite3 shell, the same query takes 0.5s.
I'm using a custom build of SQLite, statically linked into my app. The version of the shell is from my custom compilation, so this isn't an issue with the compilation.
I was using multiple threads however I've since managed to reproduce this issue single-threaded.
Using perf I've determined that the majority of the CPU time is spent in sqlite3VdbeExec and not in any of my code (for instance code that reads the fields of each returned row.)
The query is sqlite3_prepare_v2'd with bound parameters. I've reproduced the query below as well as a similar query that doesn't exhibit the performance problem.
Has anyone else seen anything like this?
SLOW QUERY (100+s in the app, 0.5s in the shell):
SELECT DISTINCT
Track.*
FROM
TrackGenres,
TrackFirstArtist,
Track
WHERE
TrackFirstArtist.id = Track.id AND
TrackGenres.id = Track.id AND
TrackGenres.genreID = 328
ORDER BY
(CASE WHEN 1 = 1 THEN TrackFirstArtist.artistName COLLATE ENGLISH END) ASC,
(CASE WHEN 1 != 1 THEN TrackFirstArtist.artistName COLLATE ENGLISH END) DESC
LIMIT 50
OFFSET 0;
QUERY PLAN (the plans are identical for whether run in the app or in the shell):
2 0 0 SEARCH TABLE TrackGenre USING COVERING INDEX sqlite_autoindex_TrackGenre_1 (genreID=?)
3 0 1 SEARCH TABLE WorkGenre USING COVERING INDEX sqlite_autoindex_WorkGenre_1 (genreID=?)
3 1 0 SEARCH TABLE TrackWork USING COVERING INDEX sqlite_autoindex_TrackWork_1 (workID=?)
1 0 0 COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (UNION)
4 0 0 SCAN TABLE TrackArtist USING INDEX idx_TrackArtist_trackID
4 1 1 SEARCH TABLE Artist USING INTEGER PRIMARY KEY (rowid=?)
0 0 0 SCAN SUBQUERY 1
0 1 2 SEARCH TABLE Track USING INTEGER PRIMARY KEY (rowid=?)
0 2 1 SEARCH SUBQUERY 4 USING AUTOMATIC COVERING INDEX (id=?)
0 0 0 USE TEMP B-TREE FOR DISTINCT
0 0 0 USE TEMP B-TREE FOR ORDER BY
SIMILAR QUERY (0.5s in app, 0.5s in shell):
SELECT
COUNT (Track.id)
FROM
TrackGenres,
TrackFirstArtist,
Track
WHERE
TrackFirstArtist.id = Track.id AND
TrackGenres.id = Track.id AND
TrackGenres.genreID = 328 AND
( TrackFirstArtist.artistName >= 'a' AND TrackFirstArtist.artistName < 'b' );
I believe this is a bug in SQLite. I tracked the problem down to the CASE statements in the ORDER BY clause when using the query in a prepared statement.
When I removed one or the other of the statements, the query ran quickly again. The collation didn't make any difference.
This doesn't seem to be a general issue with prepared statements, I have other queries with a similar structure and they work properly. As such, I've been unable to create a simple, reproducible example that illustrates the problem.
Ultimately I had to solve this by replacing the variables in the query manually to avoid the use of a prepared statement.
I want to report on various statistics on a specific Teradata database, particularly "free space". Should table skew be included in the calculation? For example, someone suggested the following query:
SELECT databasename
, SUM(maxperm)/1024/1024/1024 (DECIMAL(10,2)) AS space_allocated
, SUM(currentperm)/1024/1024/1024 (DECIMAL(10,2)) AS space_Used
, (MAX(currentperm)*COUNT(*)-SUM(currentperm))
/1024/1024/1024 (DECIMAL(10, 2)) AS skew_Size
, (space_used + skew_size) AS total_space_used
, (MIN(maxperm-currentperm)/1024/1024/1024) * COUNT(*) (DECIMAL(10,2))
AS free_Space
, CAST(total_space_used AS DECIMAL(10,2)) * 100
/ CAST(space_allocated AS DECIMAL(10,2)) AS pct_used
FROM DBC.diskspace
WHERE databasename = 'MyDatabase'
AND maxperm > 0
GROUP BY 1;
I'm particularly curious about the calculation of total_space_used and pct_used. Is it "proper" to account for skewed tables like this?
Definitely keep skewing into account. Your column free_space gives you exactly the free space with skewing of all existing tables considered.
Also the free_space assumes that all future tables are perfectly distributed. Without skewing.
I am using PyQt to insert records into a MySQL database. the code basically looks like
self.table = QSqlTableModel()
self.table.setTable('mytable')
while True:
rec = self.table.record()
values = getValueDictionary()
for k,v in values.items():
rec.setValue(k,QVariant(v))
self.table.insertRecord(-1,rec)
The table currently has ~ 50,000 rows in it.
I have timed each line and found that the insertRecord function is taking ~5 seconds to execute, which is unacceptably slow. Everything else is fast.
For comparison, I also made a version of the code that uses
QSqlQuery.prepare("INSERT INTO mytable (f1,f2,...) VALUES (:f1, :f2,...)")
query.bindValue(":f1",blah)
query.exec_()
In this case, the whole thing takes only ~ 20 milliseconds, so the delay is not in the database connection as far as I can tell.
I'd really prefer to use the QtSql stuff instead of the awkward MySQL commands. Any ideas on how to add a bunch of rows to a MySQL database with QtSql instead of raw comands and with reasonable speed?
Thanks,
G
Things to try:
set your EditStrategy to QSqlTableModel.OnManualSubmit
mass insert rows with insertRows
and see if it helps...
you should use begin before and commit after the loop, or turn off the autocommit feature from MySQL ..
this will give you usually a performance increase of 50% or more ..