I have a sqlite database which is about 75 GB. It takes almost 4 hours to create an index on the database. After indexing, the file size is about 100 GB.
Often, I have to modify (insert/delete/update) large chunks (few GBs) of data. As of now, I am deleting the index before modifying the tables. After the modifications are complete, the indexes are recreated.
Dropping the index takes huge amount of time (it is of the same order as creating the index).
In some very special cases (when entire data needs to be regenerated), I am able to write to a new database file and replace it with the original one. This strategy does not require to me drop the indices.
What can I do to speedup the index deletion in cases I cannot just switch the database files? Any suggestions/ideas are welcome.
This is I think one of the limitations of single file databases. If tables/indices were stored in separate files, then those files could simply be marked deleted.
Related
I am using SQLite because this is for cross-platform. I have about 10 tables with a small amount of data (maybe a few dozen rows each), but then I also have a set of data which might have a million or more rows.
The small dataset isn't really modified that much, just queried, but the large data set will be queried and modified frequently.
Rather than have a single SQLite database with all the tables in it, I was wondering if splitting it into two databases might be smartest.
Basically I'd have one database, lets call it "settings", with the 10 tables in it. I'd then have another database, lets call it "userdata", with the million rows.
I'll be creating a third database called "audits" where I record each change to the "userdata" database. This database is expected to grow (for a short time period).
I am just wondering if people have an opinion as to whether it is a good idea to split my data into multiple databases or if I should just have one massive one.
My thinking is the queries on the "userdata" database might be slightly more efficient since it will only have one table.
Note, this is not for long-term. It is for a short period of time. It will be queried and edited for about a week, then it is done.
I have been playing with database, I imported over million columns, played with functions at the end i selected only one value, deleted the rest of it and my database is the size over 200mb. I am doing it in sqlite3. How to reduce the size of it?
Your database is probably still reserving the space from your previous records.
This is essentially the same question:
Why does clearing an SQLite database not reduce its size?
The accepted answer:
When an object (table, index, trigger, or view) is dropped from the database, it leaves behind empty space. This empty space will be reused the next time new information is added to the database. But in the meantime, the database file might be larger than strictly necessary. Also, frequent inserts, updates, and deletes can cause the information in the database to become fragmented - scrattered out all across the database file rather than clustered together in one place.
The VACUUM command cleans the main database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure.
Edit: you may want to research the pragma command 'auto_vacuum' if you expect to be doing this regularly. It will keep your file size down but has some pros and cons. In a production environment It is best to reserve more space than you need, as this reduces the risk of running out of disk space on the server.
I am working with a sqlite3 database of around 70 gigabytes right now. This db has three tables: one with about 30 million rows, and two more with ~150 and ~300 million each, with each table running from 6-11 columns.
The table with the fewest rows is consuming the bulk of the space, as it contains a raw data column of zipped BLOBs, generally running between 1 and 6 kilobytes per row; all other columns in the database are numeric, and the zipped data is immutable so inefficiency in modification is not a concern.
I have noticed that creating indexes on the numeric columns of this table:
[15:52:36] Query finished in 723.253 second(s).
takes several times as long as creating a comparable index on the table with five times as many rows:
[15:56:24] Query finished in 182.009 second(s).
[16:06:40] Query finished in 201.977 second(s).
Would it be better practice to store the BLOB data in a separate table to access with JOINs? The extra width of each row is the most likely candidate for the slow scan rate of this table.
My current suspicions are:
This is mostly due to the way data is read from disk, making skipping medium-sized amounts of data impractical and yielding a very low ratio of usable data per sector read from the disk by the operating system, and
It is therefore probably standard practice that I did not know as a relative newcomer to relational databases to avoid putting larger, variable-width data into the same table as other data that may need to be scanned without indices
but I would appreciate some feedback from someone with more knowledge in the field.
In the SQLite file format, all the column values in a row are simply appended together, and stored as the row value. If the row is too large to fit into one database page, the remaining data is stored in a linked list of overflow pages.
When SQLite reads a row, it reads only as much as needed, but must start at the beginning of the row.
Therefore, when you have a blob (or a large text value), you should move it to the end of the column list so that it is possible to read the other columns' values without having to go through the overflow page list:
CREATE TABLE t (
id INTEGER PRIMARY KEY,
a INTEGER,
[...],
i REAL,
data BLOB NOT NULL,
);
With a single table, the first bytes of the blob value are still stored inside the table's database pages, which decreases the number of rows that can be stored in one page.
If the other columns are accessed often, then it might make sense to move the blob to a separate table (a separate file should not be necessary). This allows the database to go through more rows at once when reading a page, but increases the effort needed to look up the blob value.
I created a SQLite database as a temporary database(by specifying an empty filename in sqlite3_open() ). I also specified a limit on the amount of memory that can be used by the database to be about 200KB by using 'sqlite3_soft_heap_limit64().
As I keep inserting database rows, I see that the memory used by the database(obtained by sqlite3_memory_used() ) goes up until it reaches 200KB, and then after that, stays constant, even with continued row insertions.
Where is the extra row space for insertions after reaching the limit allocated from? Is it a temporary file created on disk? I want to know the location so that I can control/configure any space issues that may result from such allocation.
I dug into SQLite code to see how this is done. On Unix platforms, SQLite will try to create a temporary file in either:
the location pointed to by the SQLite global variable: sqlite3_temp_directory
the directory pointed to by any of the environment variables 'TEMP', 'TMP', or 'TMPDIR', in that order.
I'm writing an application which produces a lot of data to store in a database.
The DB schema is very simple: it's a table with just 4 columns, but I must fill it with more than 30000 rows.
I'm using SQLite and QSql as API.
Data is produced very fast (no sleeps) and I'm using QSqlQuery to insert a row at time.
However it seems that it takes 7-8 seconds to store 100 rows (I'm using QTime for time counting).
I tried using QSqlTableModel but I noticed no performance improvements, even calling QSqlTableModel::submitAll every 1000 rows (QTime shows 70-80 seconds for 1000 rows).
Is there any way to store rows faster? What is the fastest way to fill a table with SQLite?
You could try looking at whether you've got transactions set up correctly; they're expensive because they have to sync to disk to commit.
Also bear in mind that SQLite is more heavily optimized for reading anyway.
You might try dropping any indexes at the start and then adding them back after all records have been imported. Results will vary of course if you're emptying the table first or just appending new records.