Firebase transaction precision loss - firebase

In Firebase real time database i have a path like this ../funds/balance with a value of 10.
I have a function doing a transaction which decrease the balance by 2 cents.
The first time i remove 2 cents everything works and balance is saved has 9.98.
I do it one more time everything console log show i read 9.98 from the db and after data is updated firebase console show 9.96.
Next time i do it remember that Firebase console shows 9.96 but when i console log the value received in the transaction is currently 9.96000001.
Someone nows why.

This is a well-known property of floating-point representations.
The issue arises because, in this case, because 0.02 cannot be represented exactly as a floating-point number. It can be represented exactly in decimal (2x10^-1), but in binary it has a recurring "1001" (1.1001100110011001... x2^-3).
What you are seeing is analogous to representing 1/3 in decimal as 0.333. You expect that subtracting 1/3 from one 3 times would leave 0, but subtracting 0.333 from one 3 times leaves 0.001.
The solution is not to represent money as floating-point. In SQL you'd use a MONEY, NUMERIC or DECIMAL type or, more generally, you can represent money as an integer number of pennies (or hundreths of a penny, etc., according to your needs).
There's an article called "What Every Computer Scientist Should Know About Floating-Point Arithmetic" that you can find online that explains the issue.

Related

Point of checkdigits in MRZs?

Not sure if this is the right subreddit to ask this question, but I will give it a shot. There is the ICAO standard for Machine Readable Zones as described here https://en.wikipedia.org/wiki/Machine-readable_passport. I don't see the point for check digits there.
If I have F instead of 5 for example in the MRZ code somewhere in the second line for example, all the checkdigits will be the same. What is the point in the first place for those check digits in the ICAO standard? Especially I don't see the point of the last check digits calculation since you could also calculate it by using the check digits from the second line and not all the letters/numbers.
Could someone explain why we need those checkdigits?
To be fair. This is not a subreddit. Anyway, there are multiple reasons that there are check digits inside the MRZ. The first reason is that automatic readers can check if the code is read well enough. The second reason is that it prevents a lot of fraud and identification theft. Some people that alter their travel documents do not know that there are check digits in place. So some people will get caught because they fail to edit the numbers.
Some countries now include PDF417 barcodes and/or QR-codes to reach better reads by machines. But keep in mind that not all governments/countries have access to high-tech devices, so the machine readable zone is still mandatory for a check with the naked eye.
Source: I work for a travel document verification company.
MRZ check digits are calculated on subsections of the entire MRZ. Each calculation serves as a check for each section. A final check digit is calculated on the sum of each sections and this digit serves as a double check of the individual check.
The below have same check digit of 8:
123456780
128456785
Whereas the subsection check digit matched after the tampering but the final check digit will detect this. Therefore, the final check digit adds additional robustness.
Although, I am wondering whether this visual check digit is mandatory because an eMRTD NFC chip BAC protocol also does a much stronger cryptographic check of the MRZ value.
UPDATES: My original claim that the composite check digit adds robustness to tampering is incorrect. Given the below TD1 MRZ:
IDSLV0012345678<<<<<<<<<<<<<<<
9306026F2708252SLV<<<<<<<<<<<4
JOHN<SMEAGOL<<WENDY<LIESSETTEF
An OCR scanner can either gave 0012345678 or OO12345678 for the document number portion and all check digits passes including the composite check digit. But there is no way to tell which document number is correct. It seems that an MRZ check digit has edge cases that cannot be helped.

Python 3.6+ Local Time Disambiguation (PEP 495)

I have some questions regarding PEP 495.
The classes in the datetime module (well, some of them) now accept a fold argument which value is 0 by default and this value can also be set to 1.
For example, datetime.datetime(..., fold=1).
In my country (Slovenia, Central Europe) we set our time one hour ahead and one hour back between 2 AM and 3 AM. In other countries it's between 1 AM and 2 AM, I think.
1st question: Is this fold smart enough to determine if daylight-saving time is set between 2 AM and 3 AM or if it is set between 1 AM and 2 AM?
2nd question: So setting fold to 1 takes into account the daylight-saving time, right? Is that what it does?
3rd question: Is my understanding of the fold argument even correct?
What is a fold?
A fold is a local time that's ambiguous. This happens whenever the clock is moved back. Take the next time change in Germany as an example:
On the 29th of October 2017 at 3 AM, the clocks will be set back to 2 AM.
Now Imagine that you tell someone that you'd like to meet on Oct 29th 2017 at 2:30 AM. Do you mean before or after the time change happened? It's ambiguous because there are two points in time where the clocks show this exact time.
The fold attribute that PEP 495 adds provides exactly that information. It's 0 for the time before the time change and 1 for the time after it.
This is how it is described in PEP 495:
This PEP adds a new attribute fold to instances of the datetime.time and datetime.datetime classes that can be used to differentiate between two moments in time for which local times are the same. The allowed values for the fold attribute will be 0 and 1 with 0 corresponding to the earlier and 1 to the later of the two possible readings of an ambiguous local time.
Daylight saving time and datetime objects
From the python datetime docs:
There are two kinds of date and time objects: “naive” and “aware”.
An aware object has sufficient knowledge of applicable algorithmic and political time adjustments, such as time zone and daylight saving time information, to locate itself relative to other aware objects. An aware object is used to represent a specific moment in time that is not open to interpretation [1].
A naive object does not contain enough information to unambiguously locate itself relative to other date/time objects. Whether a naive object represents Coordinated Universal Time (UTC), local time, or time in some other timezone is purely up to the program, just like it is up to the program whether a particular number represents metres, miles, or mass. Naive objects are easy to understand and to work with, at the cost of ignoring some aspects of reality.
For applications requiring aware objects, datetime and time objects have an optional time zone information attribute, tzinfo, that can be set to an instance of a subclass of the abstract tzinfo class. These tzinfo objects capture information about the offset from UTC time, the time zone name, and whether Daylight Saving Time is in effect. Note that only one concrete tzinfo class, the timezone class, is supplied by the datetime module. The timezone class can represent simple timezones with fixed offset from UTC, such as UTC itself or North American EST and EDT timezones. Supporting timezones at deeper levels of detail is up to the application. The rules for time adjustment across the world are more political than rational, change frequently, and there is no standard suitable for every application aside from UTC.
TL;DR:
The standard library provides:
Unambiguous local times (using the fold argument for times that would otherwise be ambiguous).
An tzinfo attribute for datetime objects that can be used to implement time zone and daylight saving time information.
The standard library does not provide:
Time conversion between different time zones that is aware of daylight saving time.
Answering your questions
Is this fold smart enough to determine if daylight-saving time is set between 2 AM and 3 AM or if it is set between 1 AM and 2 AM?
No, it's not, not even the python standard library provides this. I'm sure that there are third-party libraries that can do this, but I haven't used any of them until now.
So setting fold to 1 takes into account the daylight-saving time, right? Is that what it does?
Not really. The meaning of the fold argument is more like "is this the first or the second time that the clock displays this time". You only need this argument for the (usually) one hour where the clocks are moved back. Except from this case, local time is unambiguous, so you don't need the fold argument. Whether or not it's daylight saving time isn't relevant for the argument.
Is my understanding of the fold argument even correct?
Not really, but I hope that I could shed some light with my answers above.

Siemens DICOM Individual Slice Time (Private_1019_1029)

I'm seeing that the individual slice time information from the Private_0019_1029 field of the DICOM header has negative values and sometime only positive values.
I assumed that these times are with respect to the Volume Acquisition time recorded in the header.
Going by that assumption, it would mean that the Acquisition time varies. But upon checking the difference between successive volume acquisition times, I see that it's equal to TR.
So I'm at a loss about what's happening.
I'm trying to look at the raw fMRI data without slice time correction; hence it's necessary to have the individual slice times.
Does the moco series do time shifting in addition to motion correction? (I don't believe it used to, but your experience may show otherwise).
This indicates how their slice timing is measured. Try the computations with the raw and the moco series and see if the times line up. That may give you your answer.
When dealing with private tag, you should really include the Private Vendor ID, in your case the value of tag (0019,0010).
You may also want to have a look at the output of:
gdcmdump --csa input.dcm
This will dump the SIEMENS CSA header directly from the DICOM attribute.

How to handle centimeter from/to inches

I have an international application that handles lengths and weights of people, and stores these in a database. I was wondering how to deal with this in case users can switch between using centimeters/inches in the application.
I was thinking to always use centimeters in the database, and convert to inches if the user chose to use inches. But of course, if the user enters a length in inches and it is converted to and stored as centimeters, the value may change slightly because of rounding errors.
How would you handle this scenario?
There is much to consider in your question beyond the information that is available. Before deciding how to store and convert the information, you must know what your acceptable error is. For instance if you are calculating trajectory to intercept an incoming missile with another missile, extremely minute precision is necessary to be successful. If this is a medical application and being used to precisely control medication formulation it could be more important to be precise than if you are simply calculating BMI.
In short, pick a standard whether metric or other and stick with that for your storage type. depending on the precision required, choose the smallest unit of the measurement system that will provide you the accuracy you need. All display of units of a different measurement system would be converted from this base measurement.
And try not to over-engineer the solution. If it will not conceivably be important to measure out to 52 decimal places you are wasting effort and injecting unnecessary complication accounting for that scenario.
Personally I would use one of two methods.
Always store it in the same unit of measure
Store a unit of measure in a separate field so that you know if the units is CM or Inches.
I prefer the first method since it makes it easier to process.
So you would have rounding errors if you convert from inches to centimeters, and if you convert from centimeters to inches. The Problem would be the same, no matter what you store in the DB.
You could possibly store the values not in centimeters in the Database but in millimeters. So i think how smaller the unit is, so more exact it would be, even in case of conversion.
If the same user should be able to switch at front end, you should definitely store one field representing the value in one unit you decided, because the rounding errors will happen anyway.
If you have a group of users only dealing with inches and another only dealing with cm and each of these groups have their one database or at least "own values", then decide for two fields, value/unit (e.g. same software, different customer installation in different countries)
I'd store non float values representing for example micro meters (with an unsigned 32-bit you can represent everything from 4.2 km to 0.001 mm).
Not sure why you would need a database unless you were storing your conversion rates
There would be no way to detact metric or imperial because they are just numbers
Your rounding errors will happen in accordance with the degree of accuracy you wish to display....
Depending on what you're going to be doing with those values (whether you need to do much aggregation at the DB layer, etc), the best way to ensure there is no cumulative rounding error is to store in the original unit of measure with that unit of measure (id) in a separate column, and have a separate conversion table that you use for on-the-fly calculations when comparing, aggragating, etc.
This will not be super-efficient or convenient, however: you will always have to join to a conversion table before doing any work with the values stored.
We can do the task very simple. We have only coefficient for converting inches to cm and all calculations and result we did not save into DB. We can only multiply or divide the number on the ratio and we get result. So if you have cm need to multiply the ratio and get result. You can see how it work by the example I found it in 2 min: http://inchpro.com/metric-system/convert-inches-to-centimeters
I think you know that store all the values in a database occupies a lot of space.

How do computers figure out date information?

Most languages have some sort of date function where you really don't have to do any programming to get any date information you just get it from the date function/object. I am curious what goes on behind the scenes to make this happen?
Every computer has a system clock which keeps track of date and time. On the lowest level, date and time information it retrieved from there. Above that add timezone information, etc. from the operating system and you got a Date object or something similar.
Depending on your language/environment Date objects can either perform date calculation themselves or you have to use other functions to achieve that. Those ensure that leap years get handled correctly and no invalid date can be created.
But maybe I got your question wrong.
Typically a computer is storing a count of how many of a certain unit of time has gone by since a specific time and date in the past. In Unix systems, for example, this could be the number of seconds since the Unix Epoch, which is midnight, Jan 1st 1970 GMT. In Windows, this is the number of 100 ns intervals since 1601-01-0 (thanks Johannes Rössel). Or it could be something as simple as number of seconds since the computer was powered on.
So from the number of units that have gone by since that time/date, an OS can calculate the number of years, months, days, etc that have gone by. Of course all sorts of fun stuff like leap years and leap seconds have to be taken into account for this to occur.
Systems such as NTP (Network Time Protocol) can be used to synchronize a computer's internal count to atomic clocks via an NTP server over a network. To do this, they NTP takes into account the round trip time and learns the sorts of errors the link to the NTP server.
Date and time information is provided usually by operating system, so it's a system call. Operating system deals with realtime clock mounted on computer mainboard and powered by small battery (which lasts for years).
Well ... Most computers contain a "real-time clock", which counts time on the human scale of seconds, minutes etc. Traditionally, there is a small battery on the motherboard, that lets the chip either remember the time, or even keep counting it, even when the rest of the computer is powered off.
Many computers today use services like the network time protocol to periodically query a centralized high-precision clock, to set the current time. In this way, even if the battery is removed (or just fails), the computer will still know what time and date it is, and be able to update (to correct for errors in the real-time chip's time-keeping) that information as often as necessary.
Aside from the realtime clock, date calculations are mostly a software library function.
Dates are rather irregular and so behind the scenes a mixture of approximations, corrections and lookup-tables are used.
The representation of a date can vary as well, usually some (arbitrary) startdate is used. A common system, also used by astronomers are the Julian day numbers (not to be confused with the Julian calendar). Dates can be stored as seconds-since-start or as days-since-start (the latter is usually a floating point). Here are some more algorithms.
A surprising amount of surprisingly complicated code is required for date parsing, computation, creation etc.
For example, in Java, dates are computed, modified, stored etc via the Date, Calendar, and specifically and typically, the Gregorian Calendar implementation of Calendar. (You can download the SDK/JDK and look at the source for yourself.)
In short, what I took from a quick perusal of the source is: Date handling is non-trivial and not something you want to attempt on your own. Find a good library if at all possible, else you will almost certainly be reinventing the square wheel.
Your computer has a system clock and the BIOS has a timer function that can be updated from your OS. Languages only take the information from there and some can update it too.
Buy any of these books on Calendrical Calculations. They'll fill you in on how the date libraries work under the hood.
The date/time is often stored in terms of times since a certain date. For example ticks (100 nanosecond intervals) since January 1, 0001. It is also usueally stored in reference to UTC. The underlying methods in the OS, database, framework, application, etc. can then convert these to a more usable representation. Back in the day, systems would store component parts of the date, day, month, year, etc as a part of the data structure, but we learned our lesson with the Y2K mess that this probably isn't the best approach.
Most replies have been to do with how the current date is obtained. i.e. from system clock and so on.
If you want to know how it is stored and used there are many different implementations and it depends on the system.
I believe a common one is the use of a 64 bit signed integer in T-sql the 01/01/1970 is 0 so negative numbers are pre 1970 and positive on from that each increment adding 100 th of a second (think it's a 100th would need to check).
Why 01/01/1970 you may ask this is because the gregorian calendar is on a 400 year cycle. 01/01/1970 being the closes start of a cycle to the current date.
This is because "Every year that is exactly divisible by four is a leap year, except for years that are exactly divisible by 100; the centurial years that are exactly divisible by 400 are still leap years. For example, the year 1900 is not a leap year; the year 2000 is a leap year." Makes it very complicated I believe the 400 year cycle coincides with the days of the week repeating as well but would nee dto check. Basically it's very complicated.
Internally it is incredibly difficult to write the datetime library accounting for all these variations such as leap years, the fact there is no year zero..... Not to mention UTC, GMT UT1 times.
We had occasion when debugging a client problem to look at how SQL stores datetimes... fairly interesting and makes pretty good sense once you see it.
SQL uses 2 4 byte integers...
The first 4 bytes are the date in days since Jan. 1st, 1753. I believe the maximum year is supposed to be 9999, which doesn't exactly line up to the number of available integers in 4 bytes, but there you go.
The second 4 bytes are the time in milliseconds since midnight.

Resources