There are some groups of queries that include creating tables, fields, etc. How to implement a mechanism by which a group of requests should be executed all, or if somewhere the error was canceled all? That is, the principle of transactionality with ALTER TABLE queries for example (which are committed implicitly)
You are talking about Commit and Rollback.
However if you are mixing normal SQL with DML (CREATE/ALTER/DROP etc) the DML commands have an implied COMMIT as part of there execution, you cannot avoid it. So this will likely cause you problems if mixed with your normal insert/update/delete type queries
Transactional DML is not possible until MySQL 8.0. In that version, there is a "Data Dictionary" that is stored in InnoDB tables. (I don't want to think about the bootstrap required!) The DD allows, at least in theory, the ROLLBACK of DDL actions such as ALTER and DROP TABLE. Study the 8.0 docs to see if it does enough for your needs.
In the past (pre-8.0), Many DML operations were at least reasonable "crash safe". For example, an ALTER used to copy the table over, then quickly swap in the new table. This provided a reasonably good way to recover from, a crash during an ALTER.
There have been a lot of major improvements to ALTER since 5.6. Before then, the only really "instant" alter was adding an option to an ENUM, and that had caveats. There are still things that mandate a complete, time-consuming, rebuild of the table -- such as any change to the PRIMARY KEY.
DML operations should be minimized; they are not designed for frequent use.
Related
I'm starting a new project and one of the requirements is to use Teradata. I'm proficient in many different database systems but Teradata is fairly new to me.
On the client end they have removed all foreign keys from their database under the recommendations of "a consultant".
Every part of me cringes.
I'm using a new database instance so I'm not constrained by what they've already done on other databases. I haven't been explicitly told not to use foreign keys and my relation with the customer is such that they will at the very least hear me out. However, my decision and case should be well-informed.
Is there any intrinsic, technological reason that I should not use FKs in Teradata to maintain referential integrity based upon Teradata's design, performance, side-effects, etc...
Of note, I'm accessing Teradata using the .Net Data Provider v16 which only supports up to EF5.
Assuming that the new project is implementing a Data Warehouse there's a simple reason (and this is true for any DWH, not only Teradata): a DWH is not the same as an OLTP system.
Of course you still got Primary & Foreign Keys in the Logical data model, but maybe not implemented in the Physical model (although they are supported by Teradata). There are several reasons:
Data is usually loaded in batches into a DWH and both PK & FKs must be validated by the loading process before Insert/Update/Delete. Otherwise you load 1,000,000 rows and there's a single row failing the constraints. Now you got a Rollback and an error message and try to find the bad data, good luck. But when all the validation is already done during load there's no reason to do the same checks a 2nd time within the database.
Some tables in the DWH will be Slowly Changing Dimensions and there's no way to define a PK/FK on that usibg Standard SQL syntay, you need something like TableA.column references TableB.column and TableA.Timestamp between TableB.ValidFrom and TableB.ValidTo (it is possible when you create Temporal Table)
Sometimes a table is recreated or reloaded from scratch, hard to do if there's a FK referencing it.
Some PKs are never used for any access/join, so why implementing them physically, it's just a huge overhead in CPU/IO/storage.
Knowledge about PK/FK is important for the optimizer, so there's a so-called Soft Foreign Key (REFERENCES WITH NO CHECK OPTION), which is a kind of dummy: applied during optimization, but never actually checked by the DBMS (it's like telling the optimizer trust me, it's correct).
I've been recently assigned on a project using Teradata.
I've been told to strictly use DROP+CREATE instead of DELETE ALL, because the latter "leaves some space allocated someway". This is counter-intuitive to me, and I think it's probably wrong. I searched the web for a comparison between the two methods, but I found nothing.
This only reinforces my belief that DELETE ALL doesn't suffer from the issue above.
However, if this is the case, I must prove it (both practically and theoretically).
So, my question is: is there a difference in space allocation between the two methods? If not, is there an official document (user guide, technical specification, whatever else) that proves it?
Thank you!
There's a discussion here: http://teradataforum.com/teradata/20120403_105705.htm about the very same subject (although it does not really answer the "leaves some space allocated someway" part). They actually recommend DELETE ALL but for other (performance) reasons:
I'll quote just in case the link goes dead:
"Delete all" will be quicker, although being practical there often isn't a lot of difference in the performance of them.
However, especially for a process that is run regularly (say a daily batch process) then I recommend the "delete all" approach. This will do less work as it only removes the data and leaves the definition in place. Remember that if you remove the definition then this requires accessing multiple dictionary tables, and of course you then have to access those same tables (typically) when you re-create the object.
Apart from the performance aspect, the downside of the drop/create approach is that every time you create an object Teradata inserts "default rows" into the AccessRights table, even if subsequent access to the object is controlled via Role security and/or database level security. As you may well know the AccessRights table can easily get large and very skewed. In my experience many sites have a process which cleans this table on a regular basis, removing redundant rows. If your (typically batch) processes regularly drop/create objects, then you're simply adding rows into the table which have previously been removed by a clean process, and which will be removed in the future by the same process. This all sounds like a complete waste of time to me.
Your impression is correct, you didn't find any reference to "DELETE leaves some space allocated" in any place, because it's simply wrong :-)
DELETE ALL is similar to a TRUNCATE in other DBMSes and in most cases use fastpath processing:
First of all, you cannot do DROP/CREATE in one transaction in Teradata (in Oracle there are other problems with everyday DDL) so when ETL processes become complicated you might end up with the dependence where more important business processes depend on less important (like you might see the customers table empty just because the interests rates were not refreshed
or you have an exceeding varchar value in just one minor column)
My opinion: Use transactions and modular programming. In Teradata this means avoiding DDL where possible and using DELETE/UPDATE/MERGE/INSERT instead of DROP/CREATE.
We have a slightly different situation in Postgres where DDL statements are transactional.
I read that wrapping a lot of SELECT into BEGIN TRANSACTION/COMMIT was an interesting optimization.
But are these commands really necessary if I use "PRAGMA journal_mode = OFF" before? (Which, if I remember, disables the log and obviously the transaction system too.)
Note that I don't agree with BigMacAttack.
For SQLITE, wrapping SELECTs in a Transaction does do something:
It reduces the number of SHARED locks that are obtained and then dropped.
Reference:
http://www.mail-archive.com/sqlite-users%40sqlite.org/msg79839.html
So I think the transaction would also be beneficial even if you had journal_mode turned off, because there is still the locking overhead to consider.
Maybe read_uncommitted would be something you could consider - I would guess that it would disable the SHARED locking.
"Use transactions – even if you’re just reading the data. This may yield a few milliseconds."
I'm not sure where the Katashrophos.net blog is getting this information, but wrapping SELECT statements in transactions does nothing. Transactions are always and only used when making changes to the database, and transactions cannot be disabled. They are a requirement. What many don't understand is that unless you manually BEGIN and COMMIT a transaction, each statement will be automatically put in their own unique transaction. Be sure to read the de facto SO question on improving sqlite performance. What the author of the blog might have been trying to say, is that if you plan to do an INSERT, then a SELECT, then another INSERT, then it would increase performance to manually wrap these statements in a single transaction. Otherwise sqlite will automatically put the two insert statements in separate unique transactions.
According to the "SQL as Understood by SQLite" documentation concerning transactions:
"No changes can be made to the database except within a transaction. Any command that changes the database (basically, any SQL command other than SELECT) will automatically start a transaction if one is not already in effect."
Lastly, disabling journaling via PRAGMA journal_mode = OFF does not disable transactions, only logging. But disabling the log is a good way to increase performance as well. Normally after each transaction, sqlite will document the transaction in the journal. When it doesn't have to do this, you get a performance boost.
UPDATE:
So it has been brought to my attention by "elegant dice" that the SQLite documentation statement I quote above is misleading. SELECT statements do in fact use the transaction system. This is used to acquire and release a SHARED lock on the database. As a result, it is indeed more efficient to wrap multiple SELECT statements in a single transaction. By doing so, the lock is only acquired and released once, rather than for each individual SELECT statement. This ends up being slightly more efficient while also assuring that all SELECT statements will access the same version of the database in case something has been added/deleted by some other program.
Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.
I came across the .import command to do this (bulk insert), but is there a query version of this which I can execute using sqlite3_exec().
I would just like to copy a small text file contents into a table.
A query version of this one below,
".import demotab.txt mytable"
Sqlite's performance doesn't benefit from bulk insert. Simply performing the inserts separately (but within a single transaction!) provides very good performance.
You might benefit from increasing sqlite's page cache size; that depends on the number of indexes and/or the order in which the data is inserted. If you don't have any indexes, for a pure insert, the cache size is likely not to matter much.
Be sure to use a prepared query, as opposed to regenerating a query plan in the innermost loop. It's extremely important to wrap the statements in a transaction since this avoids the need for the filesystem to sync the database to disk - afterall, partially a written transaction is atomically aborted anyhow, meaning that all fsync()'s are delayed until the transaction completes.
Finally, indexes will limit your insert performance since their creation is somewhat expensive. If you're really dealing with a lot of data and start off with an empty table, it may be beneficial to add the indexes after the data - though this isn't a huge factor.
Oh, and you might want to get one of those intel X25-E SSD's and ensure you have an AHCI controller ;-).
I'm maintaining an app with sqlite db's with about 500000000 rows (spread over several tables) - much of which was bulk inserted using plain old begin-insert-commit: it works fine.