I need a sqlite3 database with over 1000 attached databases, but documentation says "The maximum number of attached databases cannot be increased above 125"
Some earlier versions talk about limit 63
Is this upper limit is artificial? I am able to compile sqlite after removing code that gives this limit, but is it safe?
We talk about limit because of performance or is there a deeper, hidden problem after crossing limit 125?
There are lots of deeper, hidden problems. For example, a cursor uses a signed 8-bit integer to hold the index of the database it refers to. 127 is the maximum value for a signed 8-bit integer.
Related
I'm working on large scale component that generates unique/opaque tokens representing business entities. Over time there will be many billions of these records, but for the first year we're not expecting growth to exceed more than 2 billion individual items (probably less than 500 million).
The system itself is horizontally scaled but needs token generation to be idempotent; data integrity is maintained by using a contained but reasonably complex combination of transactional writes with embedded condition expressions AND standalone condition check write items.
The tokens themselves are UUIDs, and 'being efficient' are persisted as Binary attribute values (16 bytes) rather than the string representation (36 bytes), however the downside is that the data doesn't visualise nicely in query consoles making support hard if we encounter any bugs and/or broken data. Note there is no extra code complexity since we implement attributevalue.Marshaler interface to bind UUID (language) types to DynamoDB Binary attributes, and similarly do the same for any composite attributes.
My question relates to (mostly) data size/saving. Since the tokens are the partition keys, and some mapping columns are [token] -> [other token composite attributes], for example two UUIDs concatenated together into 32 bytes.
I wanted to keep really tight control over storage costs knowing that, over time, we will be spending ~$0.25/GB per month for this. My question is really three parts:
Are the PK/SK index size 'reserved' (i.e. padded) so it would make no difference at all to storage cost if we compress the overall field sizes down to the minimum possible size? (... I read somewhere that 100 bytes is typically reserved.
If they ARE padded, the cost savings for the data would be reasonably high, because each (tree) index node will be nearly as big as the data being mapped. (I assume a tree index is used once hashed PK has routed the query to the right server node/disk etc.)
Is there any observable query time performance benefit to compacting 36 bytes into 16 (beyond saving a few bytes across the network)? i.e. if Dynamo has to read fewer pages it'll work faster, but in practice are we talking microseconds at best?
This is a secondary concern, but is worth considering if there is a lot of concurrent access to the data. UUIDs will distribute partitions but inevitably sometimes we will have some more active partitions than others.
Are there any tools that can parse bytes back into human-readable UUIDs (or that we customise to inject behaviour to do this)?
This is concern, because making things small and efficient is ok, but supporting and resolving data issues will be difficult without significant tooling investment, and (unsurprisingly) the DynamoDB console, DynamoDB IntelliJ plugin and AWS NoSQL Workbench all garble the binary into unreadable characters.
No, the PK/SK types are not padded. There's 100 bytes of overhead per item stored.
Sending less data certainly won't hurt your performance. Don't expect a noticeable improvement though. If shorter values can keep your items at 1,024 bytes instead of 1,025 bytes then you save yourself a Write Unit during the save.
For the "garbled" binary values I assume you're looking at the base64 encoded values, which is a standard binary encoding standard which can be reversed by lots of tooling (now that you know the name of it).
I'm building a system that will eventually store extremely large number values (double or floating point types, not int) and I'm using firestore to store my data. Is there a cap for how big numbers can grow in a firestore database? I can't seem to find any documentation anywhere.
As per the documentation, Firestore allows only 64 bit signed integers. A 64-bit signed integer. It has a minimum value of -9,223,372,036,854,775,808 and a maximum value of 9,223,372,036,854,775,807 (inclusive).
Same for float: Firestore allows 64-bit double precision, IEEE 754
I just tried adding a random number over that limit and you get.
Then I tried requesting the document and that's the data you get:
It'll be hard to get the precise value if you store it as a number. You may be interested storing the number in string format if that numbers are that large. But do keep the 1 MiB (1,048,576 bytes) limit of a document in mind.
I have created a database with one single table (check the code bellow). I plan to insert 10 rows per minute, which is about 52 million rows in ten years from now.
My question is, what can I expect in terms of database capacity and how long it will take to execute select query. Of course, I know you can not provide me an absolute values, but if you can give me any tips on change/speed rates, traps etc. I would be very glad.
I need to tell you, there will be 10 different observations (this is why I will insert ten rows per minute).
create table if not exists my_table (
date_observation default current_timestamp,
observation_name text,
value_1 real(20),
value_1_name text,
value_2 real(20),
value_2_name text,
value_3 real(20),
value_3_name text);
Database capacity exceeds known storage device capacity as per Limits In SQLite.
The more pertinent paragraphs are :-
Maximum Number Of Rows In A Table
The theoretical maximum number of rows in a table is 2^64
(18446744073709551616 or about 1.8e+19). This limit is unreachable
since the maximum database size of 140 terabytes will be reached
first. A 140 terabytes database can hold no more than approximately
1e+13 rows, and then only if there are no indices and if each row
contains very little data.
Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive.
The maximum size of a database file is 2147483646 pages. At the
maximum page size of 65536 bytes, this translates into a maximum
database size of approximately 1.4e+14 bytes (140 terabytes, or 128
tebibytes, or 140,000 gigabytes or 128,000 gibibytes).
This particular upper bound is untested since the developers do not
have access to hardware capable of reaching this limit. However, tests
do verify that SQLite behaves correctly and sanely when a database
reaches the maximum file size of the underlying filesystem (which is
usually much less than the maximum theoretical database size) and when
a database is unable to grow due to disk space exhaustion.
Speed determination has many aspects and is thus not a simple how fast will it go, like a car. The file system, the memory, optimisation are all factors that need to be taken into consideration. As such the answer is the same as the length of the anecdotal piece of string.
Note 18446744073709551616 is if you utilise negative numbers otherwise the more frequently mentioned number of 9223372036854775807 is the limit (i.e a 64 bit signed integer)
To utilise negative rowid numbers and therefore the higher range you have to insert at least 1 negative value explicitly into a rowid or alias thereof as per If no negative ROWID values are inserted explicitly, then automatically generated ROWID values will always be greater than zero.
On Riak :
Is there a way to process data expiration or to dump old data to free some space?
Is it efficient ?
Edit: Thanks to Joe to provide the answer and its workaround (answer down).
Data expiration should be thought from the very beginning as it requires an additional index with a map-reduce algorithme.
Short answer: No, there is no publisher-provided expiry.
Longer answer: Include the write time, in a integer representation like Unix epoch, in a secondary index entry on each value that you want to be subject to expiry. The run a periodic job during off-peak times to do a ranged 2I query to get any entries from 0 to (now - TTL). This could be used as an input to a map/reduce job to do the actual deletes.
As to recovering disk space, leveldb is very slow about that. When a value is written to leveldb it starts in level 0, then as each level fills, compaction moves values to the next level, so your least recently written data resides on disk in the lowest levels. When you delete a value, a tombstone is written to level 0, which masks the previous value in the lower level, and as normal compaction occurs the tombstone is moved down as any other value would be. The disk space consumed by the old value is not reclaimed until the tombstone reaches the same level.
I have written a little c++ tool that uses the leveldb internal function CompactRange to perform this task. Here you can read the article about this.
With this we are able to delete an entire bucket (key by key) and wipe all tombstones. 50Gb of 75Gb are freed!
Unfortunately, this only works if leveldb is used as backend.
I have read their limits FAQ, they talk about many limits except limit of the whole database.
This is fairly easy to deduce from the implementation limits page:
An SQLite database file is organized as pages. The size of each page is a power of 2 between 512 and SQLITE_MAX_PAGE_SIZE. The default value for SQLITE_MAX_PAGE_SIZE is 32768.
...
The SQLITE_MAX_PAGE_COUNT parameter, which is normally set to 1073741823, is the maximum number of pages allowed in a single database file. An attempt to insert new data that would cause the database file to grow larger than this will return SQLITE_FULL.
So we have 32768 * 1073741823, which is 35,184,372,056,064 (35 trillion bytes)!
You can modify SQLITE_MAX_PAGE_COUNT or SQLITE_MAX_PAGE_SIZE in the source, but this of course will require a custom build of SQLite for your application. As far as I'm aware, there's no way to set a limit programmatically other than at compile time (but I'd be happy to be proven wrong).
It has new limits, now the database size limit is 256TB:
Every database consists of one or more "pages". Within a single database, every page is the same size, but different databases can have page sizes that are powers of two between 512 and 65536, inclusive. The maximum size of a database file is 4294967294 pages. At the maximum page size of 65536 bytes, this translates into a maximum database size of approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474 gigabytes or 256,000 gibibytes).
This particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit. However, tests do verify that SQLite behaves correctly and sanely when a database reaches the maximum file size of the underlying filesystem (which is usually much less than the maximum theoretical database size) and when a database is unable to grow due to disk space exhaustion.
The new limit is 281 terabytes. https://www.sqlite.org/limits.html
Though this is an old question, but let me share my findings for people who reach this question.
Although Sqlite documentation states that maximum size of database file is ~140 terabytes but your OS imposes its own restrictions on maximum file size for any type of file.
For e.g. if you are using FAT32 disk on Windows, maximum file size that I could achieve for sqLite db file was 2GB. (According to Microsoft site, limit on FAT 32 system is 4GB but still my sqlite db size was restricted to 2GB). While on Linux , I was able to reach 3 GB (where I stopped. it could have reached more size)
NOTE: I had written a small java program that will start populating sqlite db from 0 rows and go on populating until stop command is given.
The maximum number of bytes in a string or BLOB in SQLite is defined by the preprocessor macro SQLITE_MAX_LENGTH. The default value of this macro is 1 billion (1 thousand million or 1,000,000,000).
The current implementation will only support a string or BLOB length up to 231-1 or 2147483647
The default setting for SQLITE_MAX_COLUMN is 2000. You can change it at compile time to values as large as 32767. On the other hand, many experienced database designers will argue that a well-normalized database will never need more than 100 columns in a table.
SQLite does not support joins containing more than 64 tables.
The theoretical maximum number of rows in a table is 2^64 (18446744073709551616 or about 1.8e+19). This limit is unreachable since the maximum database size of 140 terabytes will be reached first.
Max size of DB : 140 terabytes
Please check URL for more info : https://www.sqlite.org/limits.html
I'm just starting to explore SQLite for a project I'm working on, but it seems to me that the effective size of a database is actually more flexible than the file system would seem to allow.
By utilizing the 'attach' capability, a database could be compiled that would exceed the file system's max file size by up to 125 times... so a FAT32 effective limit would actually be 500GB (125 x 4GB)... if the data could be balanced perfectly between the various files.