In Apache Thrift How Should Date Objects Be Represented - datetime

In the Thrift IDL there isn't a Date type. What's the best cross language mechanism to represent a date object. I think there are 2 ideal candidates but I'd love to hear other ideas.
String - in each language you could use something like strftime to convert the date back.
i32 - Time since epoch can be converted back.
I'm sure there are other things to think about besides conversion. Hoping people out there have some good feedback.

tldr; use an appropriate-encoded string unless there is a reason to do otherwise.
It depends on what is required. Here are some differences - keep in mind that modern computers are fast and conversion is likely only a small fraction of overall application time so "more processing" is generally not even be applicably measurable!
String (with ISO 8601 or the stricter XML dateTime):
"more space" / "more processing" (see above) / fixed size or variable size
standardized culture-neutal format
human readable and easily identifiable
supports timezones
more range (-9999 to 9999)
more/arbitrary precision (up to 1us)
lexicographically ordered (within same timezone and compatible format)
Epoch (UNIX variant):
"less space" / "less processing" / fixed size
standardized culture-neutral format
not human readable (a diligent coder should be able to identify "about now")
no timezones (can't even distinguish between "local" and UTC)
less range (1970 to 2034 with a signed 32-bit number)
less/fixed precision (1 second)
numerically ordered
(The Julian day is another encoding with many similarities to an Epoch time.)
Conclusion:
Unless space/performance is a proven issue - this requires a performance analysis and functional requirements - I'd pick the former. Computers today are a good bit faster than computers just a few years ago and much, much faster than computers decades old.

Just for posterity, you may be interested in temporenc (http://temporenc.org), a comprehensive binary encoding format for dates and times.

Related

How to make an R class to support currency as an integer?

Floating point is bad for storing currency values such as 3.33 or 3.10. Because performing math on floating point loses precision due, for example: 74.20+153.20==227.40 is TRUE in real life, but FALSE in R.
This Q&A thread talks about making a 'cents' field. Such that dollars_float 123.45 becomes cents_int 12345.
Why not use Double or Float to represent currency?
A solution that works in just about any language is to use integers instead, and count cents. For instance, 1025 would be $10.25. Several languages also have built-in types to deal with money. Among others, Java has the BigDecimal class, and C# has the decimal type.
How can we make an R class to store currency as an integer?
Would be a nice bonus if the class had a print method to automatically printed in a nice format like 2,222.22.
Here is what I use to print floats as currency:
paste("$", round(number_i_want_as_currency, 2))
I use this at the very end of the calculations just before printing to minimize rounding errors. The only thing it is missing from your format request is the commas every three digits.
If you wanted to store the values I would recommend leaving out the paste("$"...) and just doing...
currency_storage <- round(number_i_want_as_currency, 2)

Human readable alternative for UUIDs

I am working on a system that makes heavy use of pseudonyms to make privacy-critical data available to researchers. These pseudonyms should have the following properties:
They should not contain any information (e.g. time of creation, relation to other pseudonyms, encoded data, …).
It should be easy to create unique pseudonyms.
They should be human readable. That means they should be easy for humans to compare, copy, and understand when read out aloud.
My first idea was to use UUID4. They are quite good on (1) and (2), but not so much on (3).
An variant is to encode UUIDs with a wider alphabet, resulting in shorter strings (see for example shortuuid). But I am not sure whether this actually improves readability.
Another approach I am currently looking into is a paper from 2005 titled "An optimal code for patient identifiers" which aims to tackle exactly my problem. The algorithm described there creates 8-character pseudonyms with 30 bits of entropy. I would prefer to use a more widely reviewed standard though.
Then there is also the git approach: only display the first few characters of the actual pseudonym. But this would mean that a pseudonym could lose its uniqueness after some time.
So my question is: Is there any widely-used standard for human-readable unique ids?
Not aware of any widely-used standard for this. Here’s a non-widely-used one:
Proquints
https://arxiv.org/html/0901.4016
https://github.com/dsw/proquint
A UUID4 (128 bit) would be converted into 8 proquints. If that’s too much, you can take the last 64 bits of the UUID4 (= just take 64 random bits). This doesn’t make it magically lose uniqueness; only increases the likelihood of collisions, which was non-zero to begin with, and which you can estimate mathematically to decide if it’s still OK for your purposes.
Here you go UUID Readable
Generate Easy to Remember, Readable UUIDs, that are Shakespearean and Grammatically Correct Sentences
This article suggests to use the first few characters from a SHA-256 hash, similarly to what git does. UUIDs are typically based on SHA-1, so this is not all that different. The tradeoff between property (2) and (3) is in the number of characters.
With d being the number of digits, you get 2 ** (4 * d) identifiers in total, but the first collision is expected to happen after 2 ** (2 * d).
The big question is really not about the kind of identifier you use, it is how you handle collisions.

Why use ISO 8601 format for datetime in API instead of numeric milliseconds? [duplicate]

For passing times in JSON to/from a web API, why would I choose to use an ISO8601 string instead of simply the UTC epoch value? For example, both of these are the same:
Epoch = 1511324473
iso8601 = 2017-11-22T04:21:13Z
The epoch value is obviously shorter in length, which is always good for mobile data usage, and it's pretty simple to convert between epoch values and the language's local Date type variable.
I'm just not seeing the benefit to using an ISO string value.
Both are unambiguous and easy to parse in programs. The benefit of epoch like you have mentioned is that it is smaller and will be faster to process in your program. The downside is it means nothing to humans.
iso8901 dates are easy to read on their own and don't require the user to translate a number in to a recognizable date. The size increase in iso8601 is unnoticeable when compared to much much larger things like images.
Personally I would pick ease of reading over speed for an API as it will cut down on debugging time while inspecting values sent and received. In another situation such as passing times around internally you may wish to choose the speed of an integer over text so it depends which you think will be more useful.
Unix/Epoch Time
+ Compact
+ Easy to do arithmetic actions without any libraries, i.e. var tomorrow=now()+60*60*24
- Not human-readable
- Cannot represent dates before 1 January 1970
- Cannot represent dates after 19 January 2038 (if using Int32)
- Timezone and offset are "external" info, there is ambiguity if the value is UTC or any other offset.
- Officially the spec supports only seconds.
- When someone changes the value to milliseconds for better resolution, there is an ambiguity if the value is seconds or milliseconds.
- Older than ISO 8601 format
- Represents seconds since 1970 (as opposed to instant in time)
- Precision of seconds
ISO 8601 Time
+ Human readable
+ Represents instant in time, as opposed to seconds since 1970
+ Newer then Unix time format
+ Specifies representation of date, time, date-time, duration and interval!
+ Supports an offset representation
+ Precision of nanoseconds
- Less compact
- For any arithmetic actions, reach library is required (like java.time.OffsetDatetime)

Are floating point numbers the same in different systems conforming to IEEE754?

This is related to my latest question: What can I do about the difference between real numbers in R versus PostgreSQL?
I know very little about precision issues and the IEEE754 standard. I read this link, from which I quote (emphasis mine):
Because of its wide use, the format used to store Floating Point numbers in memory has been standardized by the Institute of Electrical and Electronic Engineers in something called IEEE 754.
To me, that means that if I the number 4104.694 should be equal in two different systems conforming to the standard. However, from my previous question, R and Postgres seem to represent this number differently:
des_num <- 4094.694
sprintf("%.64f", des_num)
# "4094.6939999999999599822331219911575317382812500000000000000000000000"
psql_num <- RPostgreSQL::dbGetQuery(con, "select 4104.694;")
sprintf("%.64f", psql_num)
# [1] "4104.6940000000004147295840084552764892578125000000000000000000000000"
Should I expect the same floating point number to be stored in exactly the same way in different systems conforming to the standard?
I don't know if R has an exact numeric type, but Postgres certainly does. Both the REAL and DECIMAL column types offer exact precision, see here. If you are working with Postgres as a data store behind your scripts, then if you use real or decimal you should be able to store something from R, and retrieve the exact same thing later on.

Hexadecimal Calculator Features. What do you want?

I plan on making a Hex Calculator one of these weekends for my android phone. I would like to put it up as a free application on the android market when I'm done. As a programmer, what do you think are some valuable features that I should consider?
Conversion between hex, binary and decimal would be nice
Showing current date and time in hex
Coloring of inputs like (FF, 00, 00) as RGB values
Usual arithmetic
Stack based calculation
Registers for saving of values for some future time
Defining variables for easier re-use
Too much? Doable?
Convert to/from decimal and binary
AND / OR / NOT / XOR / 2s Complement
Basic arithmetic ( plus,minus, multiply, divide)
Multiple memories
Besides the obvious adding and subtracting hex color values, the next hex operation I perform the most is averaging two (or even an array of) hex color values. Good luck with the project!
overflow flag for assembly programing.
when ever there is a arithmetic operation of two numbers (in two's compliment mode), this flag is raised when calculations fall out of range.
add octal to the conversions

Resources