Limit column length to character length instead of byte length - mariadb

I use the utf8mb4 database character set and collate databases as utf8mb4_unicode_520_ci; I'm very aware that some characters may take up to 4 bytes per character.
In MariaDB is there a way to limit a value based on characters instead of bytes?
In example if I want to require a maximum and minimum number of characters (for something like a meta description for a page) I'm only interested in the character length and I'm well aware that if every single character uses 4 bytes that I should have the column support up to four times the maximum byte limit and have column types may (or may not) come in to play. I obviously can handle this in the programming language though I'd like to better familiarize myself with these aspects of databases.

VARCHAR(100) CHARACTER SET utf8mb4
is limited to 100 characters, not bytes. That example will take up to 400 bytes, plus a 2-byte length field.
To limit the minimum would require a CHECK clause.

Related

What is DynamoDB number and string data type storage space

What is the storage space for a number type in DynamoDB Number vs string type?
Say I have a number (1234789). If I store it as number type, then it will take just 4 bytes, and as string it will take 7 bytes?
Does DynamoDB stores all numbers as bigdecimal?
DynamoDb is a managed cloud service, so I think the way that they store data internally is not clear.
However, they transfer Numbers as Strings for language compatibility support and one of the things that affect RCU/WCU is transfer data size.
So, as far as your concern is about calculating provisioned throughput and costs, Number size should be considered as a String size.
As Per DynamoDB Documentation : Datatypes :
String
Strings are Unicode with UTF-8 binary encoding. The length of a string
must be greater than zero, and is constrained by the maximum DynamoDB
item size limit of 400 KB.
If you define a primary key attribute as a string type attribute, the
following additional constraints apply:
For a simple primary key, the maximum length of the first attribute value (the partition key) is 2048 bytes.
For a composite primary key, the maximum length of the second attribute value (the sort key) is 1024 bytes.
Number
Numbers can be positive, negative, or zero. Numbers can have up to 38
digits precision—exceeding this will result in an exception.
Positive range: 1E-130 to 9.9999999999999999999999999999999999999E+125
Negative range: -9.9999999999999999999999999999999999999E+125 to -1E-130
In DynamoDB, numbers are represented as variable length. Leading and
trailing zeroes are trimmed.
All numbers are sent across the network to DynamoDB as strings, to
maximize compatibility across languages and libraries. However,
DynamoDB treats them as number type attributes for mathematical
operations.
Note : If number precision is important, you should pass numbers to DynamoDB using strings that you convert from number type.
I Hope, this may help you get your answer.

How to optimize an Sqlite DB file size of INTEGERs?

I want to build a 20,000,000 record table in sqlite, but the file size is slightly larger than its TAB-separated plaintext representation.
What are the ways to optimize data size storage, specific to sqlite?
Details:
Each record has:
8 integers
3 enums (represented for now as 1 byte text),
7 text
I suspect that the numbers are not stored efficiently (value range 10,000,000 to 900,000,000)
According to the docs, I expect them to take 3-4 bytes, if stored as a number, and 8-9 bytes if stored as text (maybe additional termination byte or size indicator byte), meaning 1:2 ratio between storing as int : store as text).
But it doesn't appear so.
Your integers should take at least 3-4 bytes (3 Bytes for up to 2^24 =~ 16,000,000). Additionally SQLite always stores at least one byte for every column as size information (also for your 1 byte texts --> 2bytes in sum for each).
Some questions:
Do you use a compound primary key or a primary key other than a plain integer?
Do you use other indexes?
Did you try to vacuum the database? (command "vacuum") -- a SQLite database is not necessarily auto-vacuum't, so when data is deleted, the space stays reserved.
One further:
Do you already have your 20,000,000 entries or less? For small databases the storage overhead can be much larger than the real content size.

triple DES result length

If I encrypt emails so that I can store them in a database, the resulting string is longer than the email itself. Is there a maximum length to this resulting coded string? if so, does it depend on both key length and the email length? I need to know this so I can set my database fields to the correct length.
Thanks.
As Alex K. notes, for block ciphers (like DES), common modes will pad them out to a multiple of the block size. The block size for 3DES is 64-bits (8 bytes). The most common padding scheme is PKCS7, which pads the block with "n x n bytes." This is to say, if you need one bytes of padding, it pads with 0x01. If you need four bytes of padding, it pads with 0x04040404 (4x 4s). If your data is already the right length, it pads with a full block (8 bytes of 0x08 for 3DES).
The short version is that the padded cipher text for 3DES can be up to 8 bytes longer than the plaintext. If your encryption scheme is a typical, insecure implementation, this is the length. The fact that you're using 3DES (an obsolete cipher) makes it a bit more likely that it's also insecurely implemented, and so this is the answer.
But if your scheme is implemented well, then there could be quite a few other things attached to the message. There could be 8 bytes of initialization vector. There could be a salt of arbitrary length if you're using a password. There could be an HMAC. There could be lots of things that could add an arbitrary amount of space. (The RNCryptor format, for example, adds up to 82 bytes to the message.) So you need to know how your format is implemented.

NVARCHAR2 data types in an Oracle database with AL32UTF8 character set

I have inherited an Oracle 11g database that has a number of tables with NVARCHAR2 columns.
Is the NVARCHAR2 data type necessary to store Unicode if the database already has the AL32UTF8 character set, and if not, can these columns be converted to VARCHAR2?
Thanks.
If the database character set is AL32UTF8, a VARCHAR2 column will store Unicode data. Most likely, the columns should be converted to VARCHAR2.
Assuming that the national character set is AL16UTF16, which is the default and the only sensible national character set when the database character set already supports Unicode, it is possible that the choice to use NVARCHAR2 was intentional because there is some benefit to the UTF-16 encoding. For example, if those columns are storing primarily Japanese or Chinese data, UTF-16 would generally use 2 bytes of storage per character rather than 3 bytes in UTF-8. There may be other reasons that one would prefer one Unicode encoding to another that might come into play here as well. Most of the time, though, people creating NVARCHAR2 columns in a database that supports Unicode are doing so unintentionally, not because they did a thorough analysis of the benefits of different Unicode encodings.

Setting a column to varchar(256), will this waste more space if all content is maximum 100?

If I know that a table column only really needs to be varchar(100) i.e. data will not be longer than 100 characters, if I set the column to be varchar(256) will it make any difference?
From what I understand, since the column allows for variable length, having it at either 100 or 256 won't make any different so long as the data is never larger than 100.
Is this correct?
A varchar will only use as much space as inputted as opposed to a char which will pad with white space. Traffic size on a varchar column, therefore, is considerably smaller than a char column.
Correct, it will not make any difference.
I'd set it to the business rule of 100 characters, the size is a constraint and will preserve the integrity of the data. You are just a rogue script or application bug away from breaking the 100 character limit and then possibly having invalid 100+ characters stored in the field.

Resources