sqlite3 bulk insert from C? - sqlite

I came across the .import command to do this (bulk insert), but is there a query version of this which I can execute using sqlite3_exec().
I would just like to copy a small text file contents into a table.
A query version of this one below,
".import demotab.txt mytable"

Sqlite's performance doesn't benefit from bulk insert. Simply performing the inserts separately (but within a single transaction!) provides very good performance.
You might benefit from increasing sqlite's page cache size; that depends on the number of indexes and/or the order in which the data is inserted. If you don't have any indexes, for a pure insert, the cache size is likely not to matter much.
Be sure to use a prepared query, as opposed to regenerating a query plan in the innermost loop. It's extremely important to wrap the statements in a transaction since this avoids the need for the filesystem to sync the database to disk - afterall, partially a written transaction is atomically aborted anyhow, meaning that all fsync()'s are delayed until the transaction completes.
Finally, indexes will limit your insert performance since their creation is somewhat expensive. If you're really dealing with a lot of data and start off with an empty table, it may be beneficial to add the indexes after the data - though this isn't a huge factor.
Oh, and you might want to get one of those intel X25-E SSD's and ensure you have an AHCI controller ;-).
I'm maintaining an app with sqlite db's with about 500000000 rows (spread over several tables) - much of which was bulk inserted using plain old begin-insert-commit: it works fine.

Related

Issue with rollback in MariaDB [duplicate]

There are some groups of queries that include creating tables, fields, etc. How to implement a mechanism by which a group of requests should be executed all, or if somewhere the error was canceled all? That is, the principle of transactionality with ALTER TABLE queries for example (which are committed implicitly)
You are talking about Commit and Rollback.
However if you are mixing normal SQL with DML (CREATE/ALTER/DROP etc) the DML commands have an implied COMMIT as part of there execution, you cannot avoid it. So this will likely cause you problems if mixed with your normal insert/update/delete type queries
Transactional DML is not possible until MySQL 8.0. In that version, there is a "Data Dictionary" that is stored in InnoDB tables. (I don't want to think about the bootstrap required!) The DD allows, at least in theory, the ROLLBACK of DDL actions such as ALTER and DROP TABLE. Study the 8.0 docs to see if it does enough for your needs.
In the past (pre-8.0), Many DML operations were at least reasonable "crash safe". For example, an ALTER used to copy the table over, then quickly swap in the new table. This provided a reasonably good way to recover from, a crash during an ALTER.
There have been a lot of major improvements to ALTER since 5.6. Before then, the only really "instant" alter was adding an option to an ENUM, and that had caveats. There are still things that mandate a complete, time-consuming, rebuild of the table -- such as any change to the PRIMARY KEY.
DML operations should be minimized; they are not designed for frequent use.

Is it possible and meaningful to execute an Array DML command containing BLOB data?

Is it possible to execute an Array DML INSERT or UPDATE statement passing a BLOB field data in the parameter array ? And the more important part of my question, if it is possible, will Array DML command containing BLOB data still be more efficient than executing commands one by one ?
I have noticed that TADParam has a AsBlobs indexed property so I assume it might be possible, but I haven't tried this yet because there's no mention of performance nor example showing this and because the indexed property is of type RawByteString which is not much suitable for my needs.
I'm using FireDAC and working with SQLite database (Params.BindMode = pbByNumber, so I'm using native SQLite INSERT with multiple VALUES). My aim is to store about 100k records containing pretty small BLOB data (about 1kB per record) as fast as possible (in cost of the FireDAC's abstraction).
The main point in your case is that you are using a SQLIte3 database.
With SQLite3, Array DML is "emulated" by FireDAC. Since it is a local instance, not a client-server instance, there is no need to prepare a bunch of rows, then send them at once to avoid network latency (as with Oracle or MS SQL).
Using Array DML may speed up your insertion process a little bit with SQLite3, but I doubt it will very high. Good plain INSERT with binding per number will work just fine.
The main tips about performance in your case will be:
Nest your process within a single transaction (or even better, use one transaction per 1000 rows of data);
Prepare an INSERT statement, then re-execute it with a bound parameter each time;
By default, FireDAC initialize SQLite3 with the fastest options (e.g. disabling LOCK), so let it be.
SQlite3 is very good about BLOB process.
From my tests, FireDAC insertion timing is pretty good, very close to direct SQlite3 access. Only reading is slower than a direct SQLite3 link, due to the overhead of the Delphi TDataSet class.

Optimizing select with transaction under SQLite 3

I read that wrapping a lot of SELECT into BEGIN TRANSACTION/COMMIT was an interesting optimization.
But are these commands really necessary if I use "PRAGMA journal_mode = OFF" before? (Which, if I remember, disables the log and obviously the transaction system too.)
Note that I don't agree with BigMacAttack.
For SQLITE, wrapping SELECTs in a Transaction does do something:
It reduces the number of SHARED locks that are obtained and then dropped.
Reference:
http://www.mail-archive.com/sqlite-users%40sqlite.org/msg79839.html
So I think the transaction would also be beneficial even if you had journal_mode turned off, because there is still the locking overhead to consider.
Maybe read_uncommitted would be something you could consider - I would guess that it would disable the SHARED locking.
"Use transactions – even if you’re just reading the data. This may yield a few milliseconds."
I'm not sure where the Katashrophos.net blog is getting this information, but wrapping SELECT statements in transactions does nothing. Transactions are always and only used when making changes to the database, and transactions cannot be disabled. They are a requirement. What many don't understand is that unless you manually BEGIN and COMMIT a transaction, each statement will be automatically put in their own unique transaction. Be sure to read the de facto SO question on improving sqlite performance. What the author of the blog might have been trying to say, is that if you plan to do an INSERT, then a SELECT, then another INSERT, then it would increase performance to manually wrap these statements in a single transaction. Otherwise sqlite will automatically put the two insert statements in separate unique transactions.
According to the "SQL as Understood by SQLite" documentation concerning transactions:
"No changes can be made to the database except within a transaction. Any command that changes the database (basically, any SQL command other than SELECT) will automatically start a transaction if one is not already in effect."
Lastly, disabling journaling via PRAGMA journal_mode = OFF does not disable transactions, only logging. But disabling the log is a good way to increase performance as well. Normally after each transaction, sqlite will document the transaction in the journal. When it doesn't have to do this, you get a performance boost.
UPDATE:
So it has been brought to my attention by "elegant dice" that the SQLite documentation statement I quote above is misleading. SELECT statements do in fact use the transaction system. This is used to acquire and release a SHARED lock on the database. As a result, it is indeed more efficient to wrap multiple SELECT statements in a single transaction. By doing so, the lock is only acquired and released once, rather than for each individual SELECT statement. This ends up being slightly more efficient while also assuring that all SELECT statements will access the same version of the database in case something has been added/deleted by some other program.

What data store technology/solution allows very fast inserts, lookups and 'selects'

Here's my problem.
I want to ingest lots and lots of data .... right now millions and later billions of rows.
I have been using MySQL and I am playing around with PostgreSQL for now.
Inserting is easy, but before I insert I want to check if that particular records exists or not, if it does I don't want to insert. As the DB grows this operation (obviously) takes longer and longer.
If my data was in a Hashmap the look up would be o(1) so I thought I'd create a Hash index to help with lookups. But then I realised that if I have to compute the Hash again every time I will slow the process down massively (and if I don't compute the index I don't have o(1) lookup).
So I am in a quandry, is there a simple solution? Or a complex one? I am happy to try other datastores, however I need to be able to do reasonably complex queries e.g. something to similar to SELECT statements with WHERE clauses, so I am not sure if no-sql solutions are applicable.
I am very much a novice, so I wouldn't be surprised if there is a trivial solution.
Nosql Stores are good for handling huge inserts and updates
MongoDB has really good feature for update/Insert (called as upsert) based on whether the document is existing.
Check out this page from mongo doc
http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers
Also you can checkout the safe mode in mongo connection. Which you can set it as false to get more efficiency in inserts.
http://www.mongodb.org/display/DOCS/Connections
You could use CouchDB. Its no SQL so you can't do queries per se, but you can create design documents that allow you to run map/reduce functions on your data.

How can i improve the performance of the SQLite database?

Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.

Resources