Limiting number of rows in a table - sqlite

My table contains around 16 columns. I want to limit number of rows at 10000. I can :
Insert only if current size less than 10000.
Put a configuration limit (either through some external configuration file or through some dynamic parameter) on the maximum number of rows.
I prefer option 2 because reduced effort checking size for every insert (my table is insert intensive, reading is occasional). It would be useful if this limit can be dynamically set (for example using an sqlite3_limit() -like API), but an /etc/* -like configuration file too would do.
Is this possible on SQLite 3.7.7.1 and Linux (SLES 11)?

Related

Data Insert into Multi Set Table Taking High CPU

Trying to run a initial bulk load into a Multi Set Table in a split of 10 Insert SQL's based on MOD on Identifier Column. The first and second insert is running but third is failing due to High CPU Skew.
The DBQLOGTBL shows the first SQL took about 10% CPU. The second took 30% and the third was taking 50% CPU hence failed.
The number of records being loaded in each is roughly same. The step which is failing as per the Explain plan is when TD does a MERGE to the main table using Spool.
What could be the solution to solve the problem?
Table is a MULTI SET with NUPI
Partition on a Date Column
Post Initial Load data volume will be 6 TB so roughly 600 GB is being inserted in 10 splits

How to overcome Row size too large (> 8126) error on Google-Cloud MySQL5.7 Second Generation

Google Cloud MySQL Engine supports the InnoDB storage engine only.
I am getting the following error when creating a table with 300 columns.
[Err] 1118 - Row size too large (> 8126).
Changing some columns to TEXT or BLOB may help. In the current row format, the BLOB prefix of 0 bytes is stored inline.
I tried creating a table with the combination of some columns as text types and some others as blob types as well but it did not work.
Even modifying innodb_log_file_size is not possible, as it is not allowed on the Google Cloud-SQL Platform.
"Vertical Partitioning"
A table with lots of columns is pushing several limits; you hit one of them. There are several reasonable workarounds, Vertical Partitioning may be the best, especially if many are TEXT/BLOB.
Instead of a single table, have multiple tables with the same PRIMARY KEY, except that one may be AUTO_INCREMENT. JOIN them together as needed to collect the columns. You could even have VIEWs to hide the fact that you split up the table. I recommend grouping the columns by some logical grouping based on the application and which columns are needed 'together'.
Do not splay an array of things across columns; instead, have another table with multiple rows to handle the repetition. Example: address1, state1, country1, address2, state2, country2.
Do not use CHAR or BINARY except for truly fixed-length columns. Most of such are very short. Also, most CHAR columns should be CHARACTER SET ascii, not utf8. (Think, country_code, zipcode, md5.)
innodb_log_file_size is only indirectly related to your Question. What is it's value?
Directly related is innodb_page_size, which defaults to 16K, and virtually no one ever changes. I would expect Cloud Engines to prohibit changing it.
(I'm with Bill on desiring more info about your schema -- so we can be more specific about how to help you.)
You don't have much option here. InnoDB default page size is 16KB, and you must design your tables so at least two rows fit in a page. That's where the limit of 8126 bytes per row comes from.
Variable-length columns like VARCHAR, VARBINARY, BLOB, and TEXT can be longer, because data exceeding the row size limit can be stored on extra pages. To take advantage of this, you must enable the Barracuda table format, and choose ROW_FORMAT=DYNAMIC.
In config:
[mysqld]
innodb_file_per_table = ON
innodb_file_format = Barracuda
innodb_default_row_format = DYNAMIC;
I don't know if these settings are already enabled in Google Cloud SQL, or if they allow you to change these settings.
Read https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html for more information
Again, the advantage of DYNAMIC row format only applies to variable-length data types. If you have 300 columns that are fixed-length, like CHAR, then it doesn't help.
By the way, innodb_log_file_size has nothing to do with this error about row size.
In order to do what you want to do on a Cloud SQL instance, first off run this to set the innodb_strict_mode variable:
SET innodb_strict_mode = 0 ;
After that you should be able to create your table.

Organizing tables with data-heavy rows to optimize access times

I am working with a sqlite3 database of around 70 gigabytes right now. This db has three tables: one with about 30 million rows, and two more with ~150 and ~300 million each, with each table running from 6-11 columns.
The table with the fewest rows is consuming the bulk of the space, as it contains a raw data column of zipped BLOBs, generally running between 1 and 6 kilobytes per row; all other columns in the database are numeric, and the zipped data is immutable so inefficiency in modification is not a concern.
I have noticed that creating indexes on the numeric columns of this table:
[15:52:36] Query finished in 723.253 second(s).
takes several times as long as creating a comparable index on the table with five times as many rows:
[15:56:24] Query finished in 182.009 second(s).
[16:06:40] Query finished in 201.977 second(s).
Would it be better practice to store the BLOB data in a separate table to access with JOINs? The extra width of each row is the most likely candidate for the slow scan rate of this table.
My current suspicions are:
This is mostly due to the way data is read from disk, making skipping medium-sized amounts of data impractical and yielding a very low ratio of usable data per sector read from the disk by the operating system, and
It is therefore probably standard practice that I did not know as a relative newcomer to relational databases to avoid putting larger, variable-width data into the same table as other data that may need to be scanned without indices
but I would appreciate some feedback from someone with more knowledge in the field.
In the SQLite file format, all the column values in a row are simply appended together, and stored as the row value. If the row is too large to fit into one database page, the remaining data is stored in a linked list of overflow pages.
When SQLite reads a row, it reads only as much as needed, but must start at the beginning of the row.
Therefore, when you have a blob (or a large text value), you should move it to the end of the column list so that it is possible to read the other columns' values without having to go through the overflow page list:
CREATE TABLE t (
id INTEGER PRIMARY KEY,
a INTEGER,
[...],
i REAL,
data BLOB NOT NULL,
);
With a single table, the first bytes of the blob value are still stored inside the table's database pages, which decreases the number of rows that can be stored in one page.
If the other columns are accessed often, then it might make sense to move the blob to a separate table (a separate file should not be necessary). This allows the database to go through more rows at once when reading a page, but increases the effort needed to look up the blob value.

Is there a row limit when using a SQLite's INSERT INTO with SELECT query?

I'm attaching a database (B) to another database (A) and trying to populate an empty table in A by doing something like:
INSERT INTO table SELECT * FROM B.table
SQLite's documentation mentions this, but it doesn't mention any limit on the number of rows returned by the SELECT statement (or processable by an INSERT statement in this particular scenario).
Is there any limit on this number of rows, or can I assume that all rows returned by the SELECT query will indeed be inserted?
(please note that I'm not looking for alternative ways of copying the data, I really just want to know whether or not I may bump into any unexpected limits here)
There is no limit, excluding the general limits for SQLite, that can be seen in this page: https://www.sqlite.org/limits.html , for instance:
The theoretical maximum number of rows in a table is 2^64 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first. A 140 terabytes database can hold no more than approximately 1e+13 rows, and then only if there are no indices and if each row contains very little data.
And since you are getting rows from a SQLite table, there is no practical limit.

Should I be worried about the settings table getting huge?

I have got a pretty fat settings table in SQL Server 2012, now with over 100 columns. As the name suggests, this table keeps track of all kinds of setting values within our website. It used to be having less than 50 columns but now its size is doubled.
The reason why I store setting values into database is because users will need to have ability to change these settings via UI.
Should I really be worried about this table getting bigger and bigger over time? Or I will have to find some other ways to store settings data, e.g save into files, perhaps?
First, you don't need to store settings in a database in order to update them at runtime by users. You can simply store them in a settings file that gets updated whenever the user makes changes. This is an xml config file and works well.
If, however, the application is network based, and you want the settings to follow the user from machine to machine, it makes more sense to put it in a database.
Second, yes... 100 columns is huge. Instead of storing each setting in a separate column, you might consider storing each setting in a separate row, and then have a common row format which is ID, SettingName, SettingValue, (maybe) DefaultValue. Then your table can grow as large as you like.
We are using JSON to store user settings. The table obtains only two columns - the user Id and the setting string. This string is quite long, but it doesn't matter. You can also use XML to store this data.
This is worse solution to modify data by finger, but faster to get from your DB and process by the client or by the ASP.NET server.
I am imagining that you are concerned about performance on huge tables?
One question is how many rows in this table? 100 columns with 10000 rows is not real problem. 100 columns over 10million rows is a slightly different ballgame. Not worse of better, just different.
The same considerations apply for small and large tables:
1. Are you indexing properly
2. Is your IO fine
3. Is your space fine
4. Are you querying efficiently
There is no right answer for this, it would depend of why you have big column counts and whether it's hitting your overall performance.
We run 1000s of tables with > 150 columns and no problems, even with millions of rows between them and I can't complain about performance.
And this is relatively de-normalized data, so lots of text.

Resources