Firebase realtime DB - limitations of paths and keys? - firebase

In Firebase Realtime DB, what are the limits on:
keys
paths
nesting level
?
Meaning restrictions on lengths as well as disallowed/special chars/values.
And any other restrictions (or discouragements) there might be.
Is this deprecated pre-Google-integration document (link here) still up to date?
Length of a key: 768 bytes
Depth of child nodes: 32
I don't see max path length mentioned there.
What is the non-deprecated location for this documentation?
I cannot find an equivalent in https://firebase.google.com/docs/ .
As if some of the docs "got lost in the shuffle"...
Thanks for any hints.
EDIT: I've broadened it slightly - not just lengths but any restrictions that might apply.

The Firebase documentation says 768 bytes is still the limit for a Key, and that they use UTF-8 encoding. With UTF-8, a character is 1-4 bytes.
However, most characters are 1 byte, unless you use a character such as ♥, which is 3 bytes. Therefore, for normal use of a key, the character limit is 768. If you want to anticipate some outlandish characters, it may be best to be conservative and limit total characters to 500 ,600, or 700. Depends on how you want to use the keys.
Test your characters and strings here:
https://mothereff.in/byte-counter
Documentation here:
https://firebase.google.com/docs/database/usage/limits

This documentation mentions that firebase realtime database can be nested up to 32 levels. But as its mentioned there itself that it is not a good practice to nest your data. Denormalisation of data though seems redundant, it gives more flexibility when writing rules and when writing queries to the database.

Related

Storage cost / supportability / performance tradeoffs using compact attributes in DynamoDB

I'm working on large scale component that generates unique/opaque tokens representing business entities. Over time there will be many billions of these records, but for the first year we're not expecting growth to exceed more than 2 billion individual items (probably less than 500 million).
The system itself is horizontally scaled but needs token generation to be idempotent; data integrity is maintained by using a contained but reasonably complex combination of transactional writes with embedded condition expressions AND standalone condition check write items.
The tokens themselves are UUIDs, and 'being efficient' are persisted as Binary attribute values (16 bytes) rather than the string representation (36 bytes), however the downside is that the data doesn't visualise nicely in query consoles making support hard if we encounter any bugs and/or broken data. Note there is no extra code complexity since we implement attributevalue.Marshaler interface to bind UUID (language) types to DynamoDB Binary attributes, and similarly do the same for any composite attributes.
My question relates to (mostly) data size/saving. Since the tokens are the partition keys, and some mapping columns are [token] -> [other token composite attributes], for example two UUIDs concatenated together into 32 bytes.
I wanted to keep really tight control over storage costs knowing that, over time, we will be spending ~$0.25/GB per month for this. My question is really three parts:
Are the PK/SK index size 'reserved' (i.e. padded) so it would make no difference at all to storage cost if we compress the overall field sizes down to the minimum possible size? (... I read somewhere that 100 bytes is typically reserved.
If they ARE padded, the cost savings for the data would be reasonably high, because each (tree) index node will be nearly as big as the data being mapped. (I assume a tree index is used once hashed PK has routed the query to the right server node/disk etc.)
Is there any observable query time performance benefit to compacting 36 bytes into 16 (beyond saving a few bytes across the network)? i.e. if Dynamo has to read fewer pages it'll work faster, but in practice are we talking microseconds at best?
This is a secondary concern, but is worth considering if there is a lot of concurrent access to the data. UUIDs will distribute partitions but inevitably sometimes we will have some more active partitions than others.
Are there any tools that can parse bytes back into human-readable UUIDs (or that we customise to inject behaviour to do this)?
This is concern, because making things small and efficient is ok, but supporting and resolving data issues will be difficult without significant tooling investment, and (unsurprisingly) the DynamoDB console, DynamoDB IntelliJ plugin and AWS NoSQL Workbench all garble the binary into unreadable characters.
No, the PK/SK types are not padded. There's 100 bytes of overhead per item stored.
Sending less data certainly won't hurt your performance. Don't expect a noticeable improvement though. If shorter values can keep your items at 1,024 bytes instead of 1,025 bytes then you save yourself a Write Unit during the save.
For the "garbled" binary values I assume you're looking at the base64 encoded values, which is a standard binary encoding standard which can be reversed by lots of tooling (now that you know the name of it).

storing as integer vs string size

I have checked the Docs but I got confused a bit. When storing a long integer such as 265628143862153200. Would it be more efficient to store it as a string of integer.
This is what I need help with is the below calculation corrent?
Integer:
length of 265628143862153200 *8 ?
String:
length of 265628143862153200 +1 ?
The Firebase console is not meant to be part of administrative workflows. It's just for discovery and development. If you have production-grade procedures to follow, you should only write code for that using the provided SDKs. Typically, developers make their own admin sites to deal with data in Firesotre.
Also, you should know that JavaScript integers are not "big" enough to store data to the full size provided by Firestore. If you want to use the full size of a number field, you will have to use a system that supports Firestore's full 64 bit signed integer type.
If you must store numbers bigger than either limit, and be able to modify them by hand, consider instead storing multiple numbers, similar to the way Firestore's timestamp stores seconds and nanoseconds as separate numbers, so that the full value can overflow greater than signed 64 bits.

What is the maximum length of a FCM getToken? [duplicate]

Working with the "new" Firebase Cloud Messaging, I would like to reliably save client device registration_id tokens to the local server database so that the server software can send them push notifications.
What is the smallest size of database field that I should use to save 100% of client registration tokens generated?
I have found two different libraries that use TextField and VarChar(255) but nothing categorically defining the max length. In addition, I would like the server code to do a quick length check when receiving tokens to ensure they "look" right - what would be a good min length and set of characters to check for?
I think this part of FCM is still the same as GCM. Therefore, you should refer to this answer by #TrevorJohns:
The documentation doesn't specify any pattern, therefore any valid string is allowed. The format may change in the future; please do not validate this input against any pattern, as this may cause your app to break if this happens.
As with the "registration_id" field, the upper bound on size is the max size for a cookie, which is 4K (4096 bytes).
Emphasizing on the The format may change in the future part, I would suggest to stay safe and have a beyond the usual max (mentioned above) length. Since the format and length of a registration token may also vary.
For the usual length and characters, you can refer to these two answers the latter being much more definitive:
I hasn't seen any official information about format of GCM registrationId, but I've analyzed our database of such IDs and can make following conclusions:
in most cases length of a registrationID equals 162 symbols, but can be variations to 119 symbols, maybe other lengths too;
it consists only from this chars: [0-9a-zA-Z\-\_]*
every regID contains one or both of "delimiters": - (minus) or _ (underline)
I am now using Firebase Cloud Messaging instead of GCM.
The length of the registration_id I've got is 152.
I've also got ":" at the very beginning each time like what jamesc mentioned (e.g. bk3RNwTe3H0:CI2k_HHwgIpoDKCIZvvDMExUdFQ3P1).
I make the token as varchar(255) which is working for me.
However, the length of registration_id has no relationship with size
of 4k. You are allowed to send whatever size of the data through
network. Usually, cookies are limited to 4096 bytes, which consist of
name, value, expiry date etc.
This is a real fcm token:
c2aK9KHmw8E:APA91bF7MY9bNnvGAXgbHN58lyDxc9KnuXNXwsqUs4uV4GyeF06HM1hMm-etu63S_4C-GnEtHAxJPJJC4H__VcIk90A69qQz65toFejxyncceg0_j5xwoFWvPQ5pzKo69rUnuCl1GSSv
as you can see the length of token is: 152
I don't think the upper limit for a registration ID is 4K. It should be safe to assume that it is much lower than that.
The upper limit for a notification payload is 4KB (link), and the notification payload includes the token (link). Since the payload also needs to include the title, body, and other data too, the registration ID should be small.
That's what I understand from the docs ¯\_(ツ)_/¯
The last tokens I got were 163-chars long. I think it's safe to assume that they will never exceed 255 chars. Some comments in the other answer reported much higher lengths!
Update
So far, in 4 months that I'm running my app, there are over 100k registration IDs, and every single one of them is 163-chars long. It's very possible that Google maintains the ID length stable in order not to crash apps. Hence, I'd suggest
getting a few registration IDs in your local machine
measuring their length and verifying it's constant (or at least it doesn't change significantly)
picking a safe initial value, slightly higher than the ID length
I think it's unlikely for the length to change now, but I'll keep an eye. Please let me know if you noticed IDs of different lengths in your apps!

Google Translation API

Has anyone used Google translation API ? What is the max length limit for using it?
The limit was 500... now it is 5000 chars.
source
500 characters
source
At the moment, the throttle limit is 100,000 characters per day. Looks like you can apply to have that limit increased/removed.
I've used it to translate Japanese to English.
I don't believe the 500 char limit is true if you use http://code.google.com/p/jquery-translate/, but one thing that is true is you're restricted as to the number of requests you can make within a certain period of time. They also try to detect whether or not you're sending a lot of requests with a similar period, almost like a mini "denial of service" attack.
So when I did this I wrote a client with a random length sleep between requests. I also ran it on a grid so all the requests didn't come from a single IP address.
I had to translate ~2000 Java messages from a resource bundle from Japanese to English. It worked out pretty nicely, as long as the text was single words. Longer phrases with context came out awkwardly.
Please have look at this link it will give the correct answer at the bottom of the page.
https://developers.google.com/translate/v2/faq
What is the maximum number of characters per request?
The maximum size of each text to be translated is 5000 characters, not including any HTML tags.
You can send source strings of up to 5,000 characters, but there are a
few provisos that are sometimes lost.
You can only send the 5,000 characters via the POST method.
If you use GET method, you are limited to 2,000-character length limit on urls. If a url is longer than that, Google's servers will just reject it.
Note: 2,000-character limit including the path and the rest
of the query string as well + you must count uri encoding (for instance every space becomes a %20, every quotation
mark a %22)
The Cloud Translation API is optimized for translating of smaller requests. The recommended maximum length for each request is 5K characters (code points). However, the more characters that you include, the higher the response latency. For Cloud Translation - Advanced, the maximum number of code points for a single request is 30K. Cloud Translation - Basic has a maximum request size of 100K bytes.
https://cloud.google.com/translate/quotas

Which "good" block encryption algorithm has the shortest output?

I would like to give customers a random-looking order number but use 0, 1, 2, ... in the backend. That way the customer gets a non-password-protected order status URL with the encrypted order number and they cannot look at other customers' order numbers by adding or subtracting 1. This might replace a scheme where random order keys are generated, checked for uniqueness among all the previous orders, and re-generated until unique. When the web server gets a request to view an order, it decrypts the order number and retrieves the order.
To keep the URL short, what "good" encryption algorithm has the shortest block size? Is this scheme a good idea? (What if I was encrypting Apple, Inc. employee ids to keep Steve Jobs from asking for Employee #0?)
Observe that all the package tracking websites allow you to track packages without authentication. It would be fine to limit the amount of information shown on the password-free order status page.
Most block ciphers are going to use larger than 32-bit sized blocks, for security reasons.
However, I found one that is made specifically for what you are doing: Skip32
You may consider using a GUID, but perhaps you have reasons you want to avoid that. (Say, your app is done already.)
Edit:
Actually, if a GUID is permissible, then that gives you a range of 128 bits. You could easily use any other block cipher. The benefit to having a larger space (at the cost of long ID strings) is that you'll have much more protection from people guessing IDs. (Not that it an order ID by itself should be a security token anyways...)
If your idea is that just knowing the order number (or URL) is enough to get information on the order then:
The order number space needs to be extremely large, otherwise attackers and/or customers will conceivably search the order space, to see what can be seen.
You should consider that an attacker may launch gradual probing from numerous machines, and may be patient.
Probing the order number space can be mitigated by rate limiting, but that's very hard to apply in a web environment -- it's hard to distinguish your customer access from attacker access.
Consider also that the order number is not much of a secret, people could be sending around in emails; once it's out, it's impossible to retract.
So, for the convenience of one-click check-my-order-without-logging-in, you have created a permanent security risk.
Even if you make the order number space huge, you still have the problem that those URLs are floating around out there, maybe in possession of folks who shouldn't have gotten them.
It would be much much better to require a login session in order to see anything, then only show them the orders they're authorized to see. Then you don't have to worry about hiding order numbers or attackers guessing order numbers, because just the order number isn't enough information to access anything.
Recently I started using Hashids set of small libraries. The idea is to encrypt a number or list of numbers into hashed string like:
12345 => "NkK9"
[683, 94108, 123, 5] => "aBMswoO2UB3Sj"
The libraries are implemented in popular programming languages by various authors. They are also cross-compatible, which means you can encode the number in Python and then decode it JavaScript. It supports salts, alphabet definition and even exclusion of bad words.
Python:
hashids = Hashids(salt="this is my salt")
id = hashids.encode(683, 94108, 123, 5)
JS:
var hashids = new Hashids("this is my salt"),
numbers = hashids.decode("aBMswoO2UB3Sj");
This is not govt proof encryption but totally sufficient for some non-predictable permalink sharing sites.
Issues of whether you should actually be doing this aside, here's a very simple block cipher with a fixed key (since you only seem to need one permutation anyway).
static uint permute(uint id)
{
uint R = id & 0xFFFF, L = (id>>16) ^ (((((R>>5)^(R<<2)) + ((R>>3)^(R<<4))) ^ ((R^0x79b9) + R)) & 0xFFFF);
R ^= ((((L>>5)^(L<<2)) + ((L>>3)^(L<<4))) ^ ((L^0xf372) + L)) & 0xFFFF;
return ((L ^ ((((R>>5)^(R<<2)) + ((R>>3)^(R<<4))) ^ ((R^0x6d2b) + R))) << 16) | R;
}
Skip32 is much better as far as 32-bit block ciphers go, but it's a bit heavyweight when three (long) lines would do. :-)
I prototyped this idea using Blowfish, a block cipher with 64-bit blocks.
I don't think this scheme is that great of an idea. Why aren't you verifying that a user is logged in and has access to view a specified order?
If you REALLY want to just have all orders out there without any authentication, a GUID would be best.
Or, you could try to come up with order numbers prefixed with something about the customer. Like (PhoneNumber)(1...100)
To meet the requirement you could simply use a hash such as SHA-1 or MD5 on your indexes. These will provide the adequate security you require.
To bring down the size you can change to a different encoding; such as 64 bit.
I'd also very strongly recommend insist on using a salt, otherwise the hash values could easily be broken.

Resources