I need to write single lines to a file placed on a file server. I am considering utilizing SQLite to ensure that the writing of the line was successful and not just partially completed. However, as insert performance is really essential. My question is, what is the exact process (as it read this, write that, read this and so on) that SQLite goes through when it inserts a row to a table. The table does not have any indexes, nor primary keys, constraints or anything.
This is the most common bottleneck in my experience:
http://www.sqlite.org/faq.html#q19
Except for that, SQLite is really fast. :)
You should use transactions and thus try to avoid fsync()-ing on every INSERT. Take a look here for some benchmarks.
Also, be sure to properly set the two important pragmas:
synchronous (NORMAL)
journal_mode (WAL)
you can use explain for details what happens when you execute a statement: http://www.sqlite.org/lang_explain.html
Related
What's the difference between sqlite and better-sqlite3 implementations? I have to use better-sqlite3 to create a database for a form (+ only node.js and express), but the only clear example I found uses sqlite. Is there any difference? If not, thanks. Otherwise, do you know any usefull links for databases and forms with better-sqlite3?
Thanks
In better-sqlite3, you can register custom functions and aggregate functions written in JavaScript, which you can run from within SQL queries.
In better-sqlite3, you can iterate through the cursor of a result set, and then stop whenever you want (you don't have to load the entire result set into memory).
In better-sqlite3, you can conveniently receive query results in many different formats (here, here, and here).
In better-sqlite3, you can safely work with SQLite's 64-bit integers, without losing precision due to JavaScript's number format.
See https://github.com/JoshuaWise/better-sqlite3/issues/262
One important difference is: better-sqlite allows for synchronous SQLite queries. With sqlite, you can't do that.
Is it possible to insert several entities to DB with a single query?
When I use an example from here I can see several queries in the Web Debugger
UPDATED 23.08.2012
I found the following related links. I hope it will help to someone to understand a batch processing:
http://www.doctrine-project.org/blog/doctrine2-batch-processing.html
doctrine2 - How to improve flush efficiency?
Doctrine 2: weird behavior while batch processing inserts of entities that reference other entities
The main things:
Some people seem to be wondering why Doctrine does not use
multi-inserts (insert into (...) values (...), (...), (...), ...
First of all, this syntax is only supported on mysql and newer postgresql versions. Secondly, there is no easy way to get hold of all
the generated identifiers in such a multi-insert when using
AUTO_INCREMENT or SERIAL and an ORM needs the identifiers for identity
management of the objects. Lastly, insert performance is rarely the
bottleneck of an ORM. Normal inserts are more than fast enough for
most situations and if you really want to do fast bulk inserts, then a
multi-insert is not the best way anyway, i.e. Postgres COPY or Mysql
LOAD DATA INFILE are several orders of magnitude faster.
These are the reasons why it is not worth the effort to implement an abstraction that performs multi-inserts on mysql and postgresql in
an ORM.
I hope that clears up some questionmarks.
I think that there will be several insert statements, but only one query to the database per "flush" call.
As mentioned here
http://doctrine-orm.readthedocs.org/en/2.0.x/reference/working-with-objects.html
Each "persist" will add an operation to the current UnitOfWork, then it is the call to EntityManager#flush() which will actually write to the database (encapsulating all the operation of the UnitOfWork in a single transaction).
But I have not checked that the behavior I describe above is the actual behavior.
Best regards,
Christophe
In looking up how to perform an equivalent to SELECT TOP 5 with LINQ-to-SQL, all the answers I've seen suggest using .Take(), like so:
var myObject = (
from myObjects in repository.GetAllMyObjects()
select myObject)
.Take(10);
I don't yet understand most of how LINQ works behind-the-scenes but to my understanding of C-like languages this would resolve by first assigning a temporary array containing ALL records, then copying the first 10 elements in the array to var. Not such a problem if you're working on a small dataset or without any performance constraints but it seems horribly inefficient to me if you're, for example, selecting the most recent 5 log entries from a table which can contain millions of records.
Is my understanding of how this works wrong? If so, could someone explain what actually happens? Otherwise, what (if any) better (ie more efficient) way is there of only selecting x records through LINQ-to-SQL?
[edit]
I have the hypothetical myObject class sending LINQ-to-SQL output to the debug output as per the suggestion in the accepted answer. I ended up using the DebuggerWriter from here: http://www.u2u.info/Blogs/Kris/Lists/Posts/Post.aspx?ID=11
Your assumption is incorrect. With Linq to SQL, it evaluates to an Expression<Func<...>> which can be evaluated and the proper SQL generated. You do not need to worry about it loading all the records.
Also, see this following question. You can attach a stream to your DataContext and see the SQL generated.
How to get the TSQL Query from LINQ DataContext.SubmitChanges()
LINQ uses deferred execution, and, for LINQ-to-SQL, expression trees.
No query will be executed until you enumerate the result of the Take call, so you don't need to worry about anything.
I just went through this last week! I opened the SQL profiler on my dev data base and stepped through the code. It was very interesting to see the generated SQL for the various queries. I recommend you do the same. It may not be an exact answer to your question but it was certainly enlightening to see how your various components generate entirely different SQL statements depending on the contents of the call.
I believe the "deferred query resolution" or something (?) reading on MSDN would be enlightening as well.
In our application,Many pages includes "update" and when we update a table,we update unnecessary columns,which dont change,too.
i want to know that is there a way to avoid unnecessary column updates?We use stored procedures in .net 2003.In Following link,i found a solution but it is not for stored procedures.
http://blogs.msdn.com/alexj/archive/2009/04/25/tip-15-how-to-avoid-loading-unnecessary-properties.aspx
Thanks
You can really only accomplish this with a good ORM tool that generates the update query for you. It will typically look at what changed and generate the query for only the columns that changed.
If you're using a stored procedure then all of the column values get sent over to the database anyway when you call the stored procedure so you can't save there. The SP will probably just execute a run-of-the-mill UPDATE statement. The RDMS then takes over. It won't physically change the data on disc if it's not different. It's smart enough for that.
So my answer in short: don't worry about it. It's not really a big deal and requires drastic changes to get what you want and you wont even see performance benefits.
When I was working at a financial software company, performance was vital. Some tables had hundreds of columns, and the update statements were costly. We created our own ORM layer (in java) which included an object cache. When we generated the update statement, we compared the current values of every field to the values as they were on load and only updated the changed fields.
Our db was SQLServer. I do not remember the performance improvement, but it was substantial and worth the investment. We also did bulk inserts and updates where possible.
I believe that Hibernate and the other big ORMs all do this sort of thing for you, if you do not want to write one yourself.
As title, possible? I have by accident deleted another record due to my ugly html interface in FireFox. The bad thing is this record delete is a root folder which the program automatically cascade delete everything :(
Take a look at undark. I already used it. It it can export the rows (deleted or not) from a SQLite db file if the records were not overwritten. Last version here.
The SQLite-Deleted-Records-Parser does not give the same type of output, but can be useful.
And there are also some products like the SQLite Forensic Explorer, SQLite Repair, Sqlite Database Recovery and SQLiteDoctor.
If you are a developer you can avoid having the same problem again using litereplica. It adds single-master replication to SQLite.
But remember to enable the point-in-time recovery because as the transactions are replicated to the replicas an accidental command like DROP TABLE or DELETE FROM will also be replicated. With PITR you will be able to go to a previous point-in-time.
Or use the Backup API regularly. Although it transfers the entire db on each backup.
And remember: if you copy an SQLite file or use a regular backup approach while a transaction is active
the copy can be corrupted.
Sorry -- nope. Backups are the only option I know of.
In the future, consider never issuing DELETE queries, especially from user-accessible forms (let only the DB admin do it, if anyone) -- just include a field in your tables that marks a record as inactive and then factor that in to your queries in the WHERE clause.
Unfortunately I don't know of a way, either. However, until you do a VACUUM on the SQLite database file the deleted data is generally not technically removed. Perhaps you might be able to still recover some of the data using some sort of hex editor on the file.
It might be possible to go in and see the data via a hex-editor. The only info I could find said that metadata was gone so the records weren't going to come back, but the data itself might still be there. It has a lot to do with how important the data is, I suspect it's not important enough for you to dig out a hex editor.
The data isn't always removed from the file straightaway. If there's lots of it and you're desperate, you could use the UNIX command strings on the file. This may help you to recover various bits and pieces of human-readable data, but it'll be a hard and inaccurate process.
No way. Without a working backup you won't be able to restore this.