How does Firebase handle longs and doubles? - firebase

The Firebase Java API specifies that Long is a valid type to pass to setValue(). JavaScript only supports a single number type, the equivalent of Java's "double". So if I insert a number from JavaScript and retrieve it later from Java, am I going to get a Long or a Double? Is it a bad idea to use Longs in any cross-platform Firebase code, seeing as how a JavaScript client has no way of creating this type?

Numbers are slotted into either Longs or Doubles on the server. If the number maps exactly to a Long (i.e. is within the range of Longs, and does not have a decimal point), it will be stored as a Long. Otherwise, it will be stored as a Double.
Javascript does have less precision than Java when it comes to Longs, but if you remain within Javascript's limits, you shouldn't have a problem using Longs cross-platform.

Related

storing as integer vs string size

I have checked the Docs but I got confused a bit. When storing a long integer such as 265628143862153200. Would it be more efficient to store it as a string of integer.
This is what I need help with is the below calculation corrent?
Integer:
length of 265628143862153200 *8 ?
String:
length of 265628143862153200 +1 ?
The Firebase console is not meant to be part of administrative workflows. It's just for discovery and development. If you have production-grade procedures to follow, you should only write code for that using the provided SDKs. Typically, developers make their own admin sites to deal with data in Firesotre.
Also, you should know that JavaScript integers are not "big" enough to store data to the full size provided by Firestore. If you want to use the full size of a number field, you will have to use a system that supports Firestore's full 64 bit signed integer type.
If you must store numbers bigger than either limit, and be able to modify them by hand, consider instead storing multiple numbers, similar to the way Firestore's timestamp stores seconds and nanoseconds as separate numbers, so that the full value can overflow greater than signed 64 bits.

Why is a buffer used in Win32 API syscall cast to [1<<20]<type> array?

I'm writing a golang application which interacts with Windows Services using the windows/svc package.
When I'm looking at the package source code how syscalls are being done I see interesting cast construct:
name := syscall.UTF16ToString((*[1 << 20]uint16)(unsafe.Pointer(s.ServiceName))[:]
Extracted from mgr.go
This is a common patttern when dealing with Win32 API when one needs to pass a pre-allocated buffer to receive a value from Win32 API function, usually an array or a structure.
I understand that Win API returns a unicode string represented by its pointer and it is passed to the syscall.UTF16ToString(s []uint16) function to convert it to the go string in this case.
I'm confused from the part when an unsafe pointer is cast to the pointer to 1M array, *[1<<20]uint16.
Why the size if 1M [1<<20]?
Buffer for a value is allocated dynamically, not with fixed size of 1M.
You need to choose a static size for the array type, so 1<<20 is chosen to be large enough to allow for any reasonable buffer returned by the call.
There is nothing special about this size, sometimes you'll see 1<<31-1 since it's the largest array for 32bit platforms, or 1<<30 since it looks nicer. It really doesn't matter as long as the type can contain the returned data.

What is the Realm native integer size? Int vs Int8, Int16, Int32

TL;DR: I'm building a data set to share between iOS and Android. Should I tweak the integer sizes to match their actual ranges, or just make everything Integer and use Int in Swift and long in both Java and Swift?
In a typical SQL database, storing a large number of 4-byte integers would take ~4x more space than a 1-byte integer[1]. However, I read in this answer that integers are stored bit-packed and in the Realm Java help that The integer types byte, short, int, and long are all mapped to the same type (long actually) within Realm. So, reading between the lines, it seems that the on disk storage will be the same regardless of what integer sub-type I use.
So, from a pure Realm / database perspective should I just use Int & long in Swift & Java respectively? (I.e. leaving aside language differences, like casting, in-memory size etc.)
If an integer field is indexed, does that make any difference to the type chosen?
PS: Many thanks to the Realm team for their great docs and good support here on SO!
[1] Yes, I know it's more complicated than that.
Your interpretation is correct: the same underlying storage type is used for all integer types in Realm, and that storage type adjusts the number of bits it uses per value based on the range of values stored. For example, if you only store values in the range 0-15 then each value will use fewer bits than if you store values in the range 0-65,535. Similarly, all indexes on integer properties use a common storage type.

database datatype performance: int or string

I'm storing phone country codes. They range from 1 to about 300. What's going to be more performant for datatype: int or string? I'm using SQL server 2008 and linq-to-sql.
Thanks.
Note: Whoa, really wierd - you asked about phone codes and I wrote about ZIP codes. Sorry about that! I think the advice still stands though...
Original answer: Performance will most likely be negligible - assign the proper type based on what the data is. ZIP codes, while numeric (in the US at least), aren't numbers - they should be stored as strings.
It is very important to understand the semantic nature of the data you are storing. Once you understand what something is then you can begin to reason about how it should be stored. I am assuming that currently you are storing only the first 5 numbers of a US postal code (like this: 12345).
If you were to store this data as a number this would work. Then imagine that your manager tells you that there is a new requirement that the app you are building will start to collect ZIP codes in the ZIP+4 format (which looks like this: 12345-6789). Now you are stuck with a nasty refactoring that involves either changing the type in the database to varchar(10) or doing some crazy voodoo in your app to strip out the dash when you save the ZIP code and then add it back in for display later.
If you're really worried about space and performance then you could use a smallint (which equates to a int16). This will mean that the data will only take 2 bytes of storage (and 2 bytes in memory).
Given an option where I know the datatype will always be integer, I'll go for integer albeit smaller size - smallint / tinyint (depending on the required range).
I don't expect much difference in performance though.
How are you going to be using them and do any have leading zeros?
If you are going to be combining with phone numbers that are usually stored as string, you want to store them as a string as well or you will waste processing power converting them in every query.
If you aren't planning on doing math or joins with it, it is problably a bad idea to store as a number. Your data set is likely so small and the strings so tiny (300 is the max value) that using an int would probably gain you nothing in a join either.
Country codes are strings (notwithstanding that they use only the characters 0..9) and should be stored as such.
They are so few that you don't need to be concerned about this, though it would be simpler to apply a check constraint with an integer type.
my rule of thumb has always been.. do I need an average? For example, you can store a zip code as integer, but are you ever going to need the average zip code? Probably not. As such, store as char.. unless you may need more than 5 characters, in which case store as varchar.

What the best ways to use decimals and datetimes with protocol buffers?

I would like to find out what is the optimum way of storing some common data type that were not included in the list supported by protocol buffers.
datetime (seconds precision)
datetime (milliseconds precision)
decimals with fixed precision
decimals with variable precision
lots of bool values (if you have lots of them it looks like you'll have 1-2 bytes overhead for each of them due to their tags.
Also the idea is to map them very easy to corresponding C++/Python/Java data types.
The protobuf design rationale is most likely to keep data type support as "native" as possible, so that it's easy to adopt new languages in future. I suppose they could provide in-build message types, but where do you draw the line?
My solution was to create two message types:
DateTime
TimeSpan
This is only because I come from a C# background, where these types are taken for granted.
In retrospect, TimeSpan and DateTime may have been overkill, but it was a "cheap" way of avoiding conversion from h/m/s to s and vice versa; that said, it would have been simple to just implement a utility function such as:
int TimeUtility::ToSeconds(int h, int m, int s)
Bklyn, pointed out that heap memory is used for nested messages; in some cases this is clearly very valid - we should always be aware of how memory is used. But, in other cases this can be of less concern, where we're worried more about ease of implementation (this is the Java/C# philosophy I suppose).
There's also a small disadvantage to using non-intrinsic types with the protobuf TextFormat::Printer; you cannot specify the format in which it is displayed, so it'll look something like:
my_datetime {
seconds: 10
minutes: 25
hours: 12
}
... which is too verbose for some. That said, it would be harder to read if it were represented in seconds.
To conclude, I'd say:
If you're worried about memory/parsing efficiency, use seconds/milliseconds.
However, if ease of implementation is the objective, use nested messages (DateTime, etc).
Here are some ideas based on my experience with a wire protocol similar to Protocol Buffers.
datetime (seconds precision)
datetime (milliseconds precision)
I think the answer to these two would be the same, you would just typically be dealing with a smaller range of numbers in the case of seconds precision.
Use a sint64/sfixed64 to store the offset in seconds/milliseconds from some well-known epoch like midnight GMT 1/1/1970. This how Date objects are internally represented in Java. I'm sure there are analogs in Python and C++.
If you need time zone information, pass around your date/times in terms of UTC and model the pertinent time zone as a separate string field. For that, you can use the identifiers from the Olson Zoneinfo database since that has become somewhat standard.
This way you have a canonical representation for date/time, but you can also localize to whatever time zone is pertinent.
decimals with fixed precision
My first thought is to use a string similar to how one constructs Decimal objects from Python's decimal package. I suppose that could be inefficient relative to some numerical representation.
There may be better solutions depending on what domain you're working with. For example, if you're modeling a monetary value, maybe you can get away with using a uint32/64 to communicate the value in cents as opposed to fractional dollar amounts.
There are also some useful suggestions in this thread.
decimals with variable precision
Doesn't Protocol Buffers already support this with float/double scalar types? Maybe I've misunderstood this bullet point.
Anyway, if you had a need to go around those scalar types, you can encode using IEEE-754 to uint32 or uint64 (float vs double respectively). For example, Java allows you to extract the IEEE-754 representation and vice versa from Float/Double objects. There are analogous mechanisms in C++/Python.
lots of bool values (if you have lots
of them it looks like you'll have 1-2
bytes overhead for each of them due to
their tags.
If you are concerned about wasted bytes on the wire, you could use bit-masking techniques to compress many booleans into a single uint32 or uint64.
Because there isn't first class support in Protocol Buffers, all of these techniques require a bit of a gentlemens' contract between agents. Perhaps using a naming convention on your fields like "_dttm" or "_mask" would help communicate when a given field has additional encoding semantics above and beyond the default behavior of Protocol Buffers.
Sorry, not a complete answer, but a "me too".
I think this is a great question, one I'd love an answer to myself. The inability to natively describe fundamental types like datetimes and (for financial applications) fixed point decimals, or map them to language-specified or user-defined types is a real killer for me. Its more or less prevented me from being able to use the library, which I otherwise think is fantastic.
Declaring your own "DateTime" or "FixedPoint" message in the proto grammar isn't really a solution, because you'll still need to convert your platform's representation to/from the generated objects manually, which is error prone. Additionally, these nested messages get stored as pointers to heap-allocated objects in C++, which is wildly inefficient when the underlying type is basically just a 64-bit integer.
Specifically, I'd want to be able to write something like this in my proto files:
message Something {
required fixed64 time = 1 [cpp_type="boost::posix_time::ptime"];
required int64 price = 2 [cpp_type="fixed_point<int64_t, 4>"];
...
};
And I would be required to provide whatever glue was necessary to convert these types to/from fixed64 and int64 so that the serialization would work. Maybe thru something like adobe::promote?
For datetime with millisecond resolution I used an int64 that has the datetime as YYYYMMDDHHMMSSmmm. This makes it both concise and readable, and surprisingly, will last a very long time.
For decimals, I used byte[], knowing that there's no better representation that won't be lossy.

Resources