Are there downsides to having many SQLite prepared statements at once? - sqlite

I'm working with some SQLite code in a C++ project that has several hundred prepared statements compiled at once performing operations on a comparable number of tables. All of the statements are simple selects and updates, but the individualized nature of the tables necessitates correspondingly specific SQL, so attempting the reuse of fewer statements for multiple tables is unrealistic. The statements are generally compiled once for the lifetime of the program and finalized on exit. Insofar as concurrency is concerned, at most two or three statements will ever be executed simultaneously on their own threads.
With the number of tables (and therefore, statements) expected to grow continually throughout development, I'd like to be aware of any potential problems with this design before things get any more complex. Having so many statements feels like code smell to me, not to mention a potential debugging nightmare.
I haven't found anything in the docs about prepared statement limits. Are there any practical limits to the number of prepared statements for a single SQLite database connection? Can high numbers of prepared statements cause performance issues?

Prepared statements do not need much memory.
While optimizing away the SQL parsing overhead is probably not worth the effort, it will not hurt.

Related

Getting large number of entities from datastore

By this question, I am able to store large number (>50k) of entities in datastore. Now I want to access all of it in my application. I have to perform mathematical operations on it. It always time out. One way is to use TaskQueue again but it will be asynchronous job. I need a way to access these 50k+ entities in my application and process them without getting time out.
Part of the accepted answer to your original question may still apply, for example a manually scaled instance with 24h deadline. Or a VM instance. For a price, of course.
Some speedup may be achieved by using memcache.
Side note: depending on the size of your entities you may need to keep an eye on the instance memory usage as well.
Another possibility would be to switch to a faster instance class (and with more memory as well, but also with extra costs).
But all such improvements might still not be enough. The best approach would still be to give your entity data processing algorithm a deeper thought - to make it scalable.
I'm having a hard time imagining a computation so monolithic that can't be broken into smaller pieces which wouldn't need all the data at once. I'm almost certain there has to be some way of using some partial computations, maybe with storing some partial results so that you can split the problem and allow it to be handled in smaller pieces in multiple requests.
As an extreme (academic) example think about CPUs doing pretty much any super-complex computation fundamentally with just sequences of simple, short operations on a small set of registers - it's all about how to orchestrate them.
Here's a nice article describing a drastic reduction of the overall duration of a computation (no clue if it's anything like yours) by using a nice approach (also interesting because it's using the GAE Pipeline API).
If you post your code you might get some more specific advice.

How well does UnQLite perform? How does it compare to SQLite (in performance)?

I've researched on what I can about SQLite and UnQLite but there are still a few things that haven't quite been answered yet. UnQLite appears to have been released within the past few years which would attribute to the lack of benchmarks. "Performance" (read/write speed, querying, avg. database size before significant slowdown, etc.) comparisons may be somewhat apples-to-oranges here.
From all that I have seen the two have very few differences comparatively speaking, namely that SQLite is a relational database whereas UnQLite is a key-value pair and document (via Jx9) database. They're both portable, cross-platform, and 32/64-bit friendly, and can have single-write and multi-read connections. Very little can be found on UnQLite benchmarks while SQLite has quite a few with different implementations across various (scripting) languages. SQLite has some varied performance across in-memory databases, indexed data, and read/write modes with varying data size. Overall SQLite appears quick and reliable.
All that I can find on UnQLite are unreliable and confusing. I cannot seem to find anything helpful. What read/writes speeds does UnQLite seem to peak at? What languages are (not) recommended when using UnQLite? What are some known disadvantages and bugs?
If it helps at all to explain my intrigue, I'm developing a network utility that will be reading and processing packets with hot-swapping between network interfaces. Since the connections can, though unlikely, reach speeds up to 1 Gbps there will be a lot of raw data being written out to a database. It's still in the early stages of development and I'm having to find a way to balance out performance. There are a lot of factors such as missed packets, how large each write size is, how quickly it can process and move data, how much organization will be required, how many tables will be needed, if I can implement multiprocessing, how reliant each database is on HDD speeds, etc. etc.. My data will need tables but whether or not I have to store them as relational is still in the air. Seeing how the two stack up with their own pros and cons (aside from the usual KVP vs Relational debate) may push me towards either one or, if I'm crazy enough, a mix of both
I've done a bit of fooling around with UnQLite using python bindings I wrote. The Python bindings use cython and are quite fast.
What I've found from my experimentation is that UnQLite's key/value APIs are pretty damn fast, comparable to other DBMs. Things slow down a bit when you start using Jx9 and the document store, though.
Basically depends on what you need...
If you want SQL and ad-hoc querying, I'd suggest using SQLite. It is plenty fast and quite flexible.
If you want just keys and values, I'd use something like leveldb or rocksdb.
If you want a lightweight JSON document store, or key/value with a bit "extra", then UnQLite may be a good fit.

How does executemany() work

I have been using c++ and work with sqlite. In python, I have an executemany operation in the library but the c++ library I am using does not have that operation.
I was wondering how the executemany operation optimizes queries to make them faster.
I was looking at the sqlite c/c++ api and saw that there were two commands, sqlite3_reset and sqlite3_clear_bindings, that can be used to clear and reuse prepared statements.
Is this what python does to batch and speedup executemany queries (at least for inserts)? Thanks for your time.
executemany just binds the parameters, executes the statements, and calls sqlite3_reset, in a loop.
Python does not give you direct access to the statement after it has been prepared, so this is the only way to reuse it.
However, SQLite does not take much time for preparing statements, so this is unlikely to have much of an effect on performance.
The most important thing for performance is to batch statements in a transaction; Python tries to be clever and to do this automatically (independently from executemany).
I looked into some of the related posts and found the folowing which was very detailed on ways to improve sqlite batch insert performace. These principles could effectively be used to create an executemany function.
Improve INSERT-per-second performance of SQLite?
The biggest improvement changes were indeed as #CL. said, turning it all into one transaction. The author of the other post also found significant improvement by using and reusing prepared statements and playing with some pragma settings.

Does it make sense to make multiple SQLite databases to improve performance?

I'm just learning SQL/SQLite, and plan to use SQLite 3 for a new website I'm building. It's replacing XML, so concurrency isn't a big concern. But I would like to make it as performant as possible with the technology I'm using. Are there any benefits to using multiple databases for performance, or is the best performance keeping all the data for the site in one file? I ask because 99% of the data will be read-only 99% of the time, but that last 1% will be written to 99% of the time. I know databases don't read in and re-write the whole file for every little change, but I guess I'm wondering if the writes will be much faster if the data is going to a separate 5KB database, rather than part of the ~ 250MB main database.
With proper performance tuning, sqlite can do around 63 300 inserts-per-second. Unless you're planning on some really heavy volume, I would avoid pre-optimizing. Splitting into two databases doesn't feel right to me, and if you're planning on doing joins in the future, you'll be hosed. Especially since you say concurrency isn't a big problem, I would avoid complicating the database design.
Actually with 50 000 databases you will have very bad performance
you should try several tables in single database, sometimes it really can speed up something, but as description of initial task is very general - hard to say exactly what you need, try single table and multiple tables - measure speed

Which is better performance-wise: stored procedure or executing a query with dataadapter?

I am reworking a .NET application that so far has been running slowly. Our databases are Oracle, and the code is written in VB. When writing queries, I typically pass the parameters to a middle tier function which builds the raw SQL. I have a database class that has a function ExecuteQuery which takes in a SQL string and returns a DataTable. This uses an OleDbDataAdapter to run the query on the database.
I found some existing code that sends the SQL and a parameter to a stored procedure which as far as I can tell, opens the query and ouputs it to a SYS_REFCURSOR / DataSet.
I don't know why it's set up this way, but could someone tell me which is better performance-wise? Or the pros/cons to doing it this way?
Thanks in advance
Stored Procedures vs dynamic SQL have the exact same performance. In other words there is no performance advantage of one over the other. (Incidentally, I am a HUGE believer in using stored procs for everything for a host of other reasons but that's not the topic on hand).
Bottle necks can occur for many reasons.
For one, if you are actually code generating select statements it is highly probable that those statements are very unoptimized for the data the app needs. For example, doing a SELECT * which pulls 50 columns back versus a SELECT ID, Description which just pulls the two you need in your application at that point. In this example, the amount of data that has to be read from disk, transferred over the network wire, and pushed into objects in memory of the web server isn't trivial.
These will have to be evaluated on a case by case basis.
I would highly suggest that if you have a "slow" application that you need to improve the performance of the very first thing you ought to do is profile the application. What part of it is running slow? It might be inside the database server, it might be in your middle tier, it may even be a function of your network bandwidth or memory / load limitations on your web server. Heck, there might even be a WAIT command lurking somewhere in there placed by some previous programmer that left the company...
In short, you have at this point absolutely no idea on where to begin. So looking at actual code is premature. Go profile the app and see where things are slowing down. You might find that performance may radically improve simply by putting more memory in the database server.... Which is a much cheaper alternative than rewriting, testing and deploying vast amounts of code.
a stored procedure will definitely have better performance over building a raw query in code and executing it, but the important thing to realize is that, that difference in performance won't be your performance issue, there are many other things that will affect performance much more than just changing just query to be a stored procedure, even if you run a stored procedure and process the results using adapters, data tables, data sets, you're still incurring in a lot of performance, specially if you pass those large objects around (I have seen cases where datasets are returned wrapped in web service calls), so, don't focus on that, focus on caching data, having a good query, create the proper indexes, minimize the use of datasets, datatables, that will yield better benefits than just moving queries to stored procedures

Resources