Timestamp field in a dbf file (dBase 7 format) is not making sense - dbase

I've looked at both [1] and [2] and I'm completely confused (and since the dbf file is a version
4 file, [1] should apply well). For one thing why does [1] state that the timestamp's date portion is the # of days since 1/1/4713 BC? That's just very puzzling. Secondly, assuming that it is the # of days since 4713 BC, I'm having some trouble with the value I am getting.
First off, my dbf file has a timestamp field which has an 8 byte long value. The actual
date is 2000/8/16 17:21:41. In the dbf file, the 8 byte sequence is as follows
0x42ccb20e0340df00.
From [1], it says the first 4 bytes are for the date, and 2nd 4 bytes for the time. If the original
byte sequence is actually little-endian (0x42ccb20e) then that should be 0x0eb2cc42 which
comes to the value of 246598722. So date is 0x0eb2cc42 (246598722) and time is 0x00df4003
(14630915).
I must be missing something here or calculating something wrong. 246598722 is equivalent to 675612 years(assuming 1yr = 365 days, as adding leap years would confuse me..and shouldn't really be that much off).
From [2], I shouldn't use 01/01/4173bc as the basis but 12/31/1899 (well, 1/1/1900). But then, the date value I have isn't even in the range of what [2] shows.
Now if I take the actual value (2000/8/16) and use [1] and [2], I get the following:
method [1]: 2450501 days : (2000 - -4713) * 365 + (8 * 30) + 16
method [2]: 36756 days : [100 * 365 + 8 * 30 + 16] (over counting the # of days)
The dbf file isn't corrupted (otherwise, if I look at the timestamp in dBase, it'd crap out
and display something crazy).
I've thought of using big-endian, but that makes even less sense as the values are even larger. I've even thought of the possibility that it's actually the # of seconds elapsed since either date, but that makes the values with even less sense. i.e. 246598722 = # of seconds elapsed (counting back from 2000/8/16) will make the base year as 1812. (calculations: 246898722 / (3600 * 365) = 187.8985, so 2000 - 187.8985 = 1812.1015)
Can someone point out where I'm doing this wrong?
Thanks!
[1] - https://www.dbase.com/Knowledgebase/INT/db7_file_fmt.htm
[2] - Convert dBase Timestamp

For any dBASE questions, I would recommend to go to the dBASE newsgroups, they have a very helpful and knowledgeable community.

I've finally found the answer thanks to [3].
Basically, the timestamp 8 byte sequence is used as a whole with the following notes:
It's stored in big-endian.
The last byte is not used.
It's a Julian Day Number.
So in my case, it's 0x42ccb20e0340df00 and truncating the last byte,
I get 0x42ccb20e0340df.
Then the following python code gets the correct info:
import datetime
base = 0x42cc418ba99a00
frm_date = int('42ccb20e0340df', 16)
final_ts = (frm_date - base) / 500
final_date = datetime.datetime.utcfromtimestamp(final_ts)
which outputs 2000-8-16 17:21:41 and some milliseconds, which I just ignore.
So I'm guessing the theory is that the above code moves the 'base' date to
1970/1/1 from 1/1/1, which helps since utcfromtimestamp() doesn't
work with any value prior to 1970/1/1.
My confusion stems from the fact it doesn't use 4713BC as the
base year, instead it uses 1/1/1, though I'm still trying to figure out how to get the value 0x42cc418ba99a00 for 1970/1/1.
[3] - https://stackoverflow.com/a/60424157/10860403

Related

Informix FROM_UNIXTIME alternative

I was searching for a way to group data by interval (ex: every 30 minutes) using the date defined in that table, so i need to convert that date time to milliseconds so that i can divide it by the interval i need like in this query
SELECT FLOOR(UNIX_TIMESTAMP(timestamp)/(15 * 60 * 1000)) AS timekey
FROM table
GROUP BY timekey;
This query is running perfectly on SQL Server but on informix it's giving me the error
Routine (unix_timestamp) can not be resolved.
As it's not defined in IBM Informix server.
So i need a direct way to get epoch unix time from timestamp DATETIME YEAR TO FRACTION(3) column in IBM informix server like 'UNIX_TIMESTAMP' in SQL server.
If the timestamp column is of type DATETIME YEAR TO SECOND or similar, then you can convert it to a DECIMAL(18,5) number of seconds since the Unix Epoch, aka 1970-01-01 00:00:00Z (UTC; time zone offset +00:00) using a procedure such as this:
{
# "#(#)$Id: tounixtime.spl,v 1.6 2002/09/25 18:10:48 jleffler Exp $"
#
# Stored procedure TO_UNIX_TIME written by Jonathan Leffler (previously
# jleffler#informix.com and now jleffler#us.ibm.com). Includes fix for
# bug reported by Tsutomu Ogiwara <Tsutomu.Ogiwara#ctc-g.co.jp> on
# 2001-07-13. Previous version used DATETIME(0) SECOND TO SECOND
# instead of DATETIME(0:0:0) HOUR TO SECOND, and when the calculation
# extended the shorter constant to DATETIME HOUR TO SECOND, it added the
# current hour and minute fields, as documented in the Informix Guide to
# SQL: Syntax manual under EXTEND in the section on 'Expression'.
# Amended 2002-08-23 to handle 'eternity' and annotated more thoroughly.
# Amended 2002-09-25 to handle fractional seconds, as companion to the
# new stored procedure FROM_UNIX_TIME().
#
# If you run this procedure with no arguments (use the default), you
# need to worry about the time zone the database server is using because
# the value of CURRENT is determined by that, and you need to compensate
# for it if you are using a different time zone.
#
# Note that this version works for dates after 2001-09-09 when the
# interval between 1970-01-01 00:00:00+00:00 and current exceeds the
# range of INTERVAL SECOND(9) TO SECOND. Returning DECIMAL(18,5) allows
# it to work for all valid datetime values including fractional seconds.
# In the UTC time zone, the 'Unix time' of 9999-12-31 23:59:59 is
# 253402300799 (12 digits); the equivalent for 0001-01-01 00:00:00 is
# -62135596800 (11 digits). Both these values are unrepresentable in
# 32-bit integers, of course, so most Unix systems won't handle this
# range, and the so-called 'Proleptic Gregorian Calendar' used to
# calculate the dates ignores locale-dependent details such as the loss
# of days that occurred during the switch between the Julian and
# Gregorian calendar, but those are minutiae that most people can ignore
# most of the time.
}
CREATE PROCEDURE to_unix_time(d DATETIME YEAR TO FRACTION(5)
DEFAULT CURRENT YEAR TO FRACTION(5))
RETURNING DECIMAL(18,5);
DEFINE n DECIMAL(18,5);
DEFINE i1 INTERVAL DAY(9) TO DAY;
DEFINE i2 INTERVAL SECOND(6) TO FRACTION(5);
DEFINE s1 CHAR(15);
DEFINE s2 CHAR(15);
LET i1 = EXTEND(d, YEAR TO DAY) - DATETIME(1970-01-01) YEAR TO DAY;
LET s1 = i1;
LET i2 = EXTEND(d, HOUR TO FRACTION(5)) -
DATETIME(00:00:00.00000) HOUR TO FRACTION(5);
LET s2 = i2;
LET n = s1 * (24 * 60 * 60) + s2;
RETURN n;
END PROCEDURE;
Some of the commentary about email addresses is no longer valid – things have changed in the decade and a half since I wrote this.

R: data.table. How to save dates properly with fwrite?

I have a dataset. I can choose to load it on R from a Stata file or from a SPSS file.
In both cases it's loaded properly with the haven package.
The dates are recognized properly.
But when I save it to disk with data.table's fwrite function.
fwrite(ppp, "ppp.csv", sep=",", col.names = TRUE)
I have a problem, the dates dissapear and are converted to different numbers. For example, the date 1967-08-06 is saved in the csv file as -879
I've also tried playing with fwrite options, such as quote=FALSE, with no success.
I've uploaded a small sample of the files, the spss, the stata and the saved csv.
and this is the code, in order to do things easier for you.
library(haven)
library(data.table)
ppp <- read_sav("pspss.sav") # choose one of these two.
ppp <- read_dta("pstata.dta") # choose one of these two.
fwrite(ppp, "ppp.csv", sep=",", col.names = TRUE)
The real whole table has more than one thousand variables and one million individuals. That's why I would like to use a fast way to do things.
http://www73.zippyshare.com/v/OwzwbyQq/file.html
This is for #ArtificialBreeze:
> head(my)
# A tibble: 6 x 9
ID_2006_2011 TIS FECHA_NAC_2006 año2006 Edad_31_12_2006 SEXO_2006
<dbl> <chr> <date> <date> <dbl> <chr>
1 1.60701e+11 BBNR670806504015 1967-08-06 2006-12-31 39 M
2 1.60701e+11 BCBD580954916014 1958-09-14 2006-12-31 48 F
3 1.60701e+11 BCBL451245916015 1945-12-05 2006-12-31 61 F
4 1.60701e+11 BCGR610904916012 1961-09-04 2006-12-31 45 M
5 1.60701e+11 BCMR580148916015 1958-01-08 2006-12-31 48 F
6 1.60701e+11 BCMX530356917018 1953-03-16 2006-12-31 53 F
# ... with 3 more variables: PAIS_NAC_2006 <dbl>, FECHA_ALTA_TIS_2006 <date>,
# FECHA_ALTA_TIS_2006n <date>
Since this question was asked 6 months ago, fwrite has improved and been released to CRAN. I believe it should work as you wanted now; i.e. fast, direct and convenient date formatting. It now has the dateTimeAs argument as follows, copied from fwrite's manual page for v1.10.0 as on CRAN now. As time progresses, please check the latest version of the manual page.
====
dateTimeAs : How Date/IDate, ITime and POSIXct items are written.
"ISO" (default) - 2016-09-12, 18:12:16 and 2016-09-12T18:12:16.999999Z. 0, 3 or 6 digits of fractional seconds are printed if and when present for convenience, regardless of any R options such as digits.secs. The idea being that if milli and microseconds are present then you most likely want to retain them. R's internal UTC representation is written faithfully to encourage ISO standards, stymie timezone ambiguity and for speed. An option to consider is to start R in the UTC timezone simply with "$ TZ='UTC' R" at the shell (NB: it must be one or more spaces between TZ='UTC' and R, anything else will be silently ignored; this TZ setting applies just to that R process) or Sys.setenv(TZ='UTC') at the R prompt and then continue as if UTC were local time.
"squash" - 20160912, 181216 and 20160912181216999. This option allows fast and simple extraction of yyyy, mm, dd and (most commonly to group by) yyyymm parts using integer div and mod operations. In R for example, one line helper functions could use %/%10000, %/%100%%100, %%100 and %/%100 respectively. POSIXct UTC is squashed to 17 digits (including 3 digits of milliseconds always, even if 000) which may be read comfortably as integer64 (automatically by fread()).
"epoch" - 17056, 65536 and 1473703936.999999. The underlying number of days or seconds since the relevant epoch (1970-01-01, 00:00:00 and 1970-01-01T00:00:00Z respectively), negative before that (see ?Date). 0, 3 or 6 digits of fractional seconds are printed if and when present.
"write.csv" - this currently affects POSIXct only. It is written as write.csv does by using the as.character method which heeds digits.secs and converts from R's internal UTC representation back to local time (or the "tzone" attribute) as of that historical date. Accordingly this can be slow. All other column types (including Date, IDate and ITime which are independent of timezone) are written as the "ISO" option using fast C code which is already consistent with write.csv.
The first three options are fast due to new specialized C code. The epoch to date-part conversion uses a fast approach by Howard Hinnant (see references) using a day-of-year starting on 1 March. You should not be able to notice any difference in write speed between those three options. The date range supported for Date and IDate is [0000-03-01, 9999-12-31]. Every one of these 3,652,365 dates have been tested and compared to base R including all 2,790 leap days in this range. This option applies to vectors of date/time in list column cells, too. A fully flexible format string (such as "%m/%d/%Y") is not supported. This is to encourage use of ISO standards and because that flexibility is not known how to make fast at C level. We may be able to support one or two more specific options if required.
====
I had the same problem, and I just changed the date column to as.character before writing, and then changed it back to as.Date after reading. I don't know how it influences read and write times, but it was a good enough solution for me.
These numbers have sense :) It seems that fwrite change data format into "Matlab coding" where origin is "1970-01-01".
If you read your data, you can simply change number into date using these code:
my$FECHA_NAC_2006<-as.Date(as.numeric(my$FECHA_NAC_2006),origin="1970-01-01")
For example
as.Date(-879,origin="1970-01-01")
[1] "1967-08-06"
Since it seems there is no simple solution I'm trying to store column classes and change them back again.
I take the original dataset ppp,
areDates <- (sapply(ppp, class) == "Date")
I save it on an file and I can read it next time.
ppp <- fread("ppp.csv", encoding="UTF-8")
And now I change the classes of the newly read dataset back to the original one.
ppp[,names(ppp)[areDates] := lapply(.SD,as.Date),
.SDcols = areDates ]
Maybe someone can written it better with a for loop and the command set.
ppp[,lapply(.SD, setattr, "class", "Date") ,
.SDcols = areDates]
It can be also written with positions instead of a vector of TRUE and FALSE
You need to add the argument: dateTimeAs = "ISO".
By adding the argument dateTimeAs = and specifying the appropriate option, you will get dates writting in your csv file with the desired format AND with their respective time zone.
This is particularly important in the case of dealing with POSIXct variables, which are time zone dependant. The lack of this argument might affect the timestamps written in the csv file by shifting dates and times according to the difference of hours between time zones. Thus, the date/time variable POSIXct, you will need to add: dateTimeAs = "write.csv" ; unfortunately this option can be slow (https://www.rdocumentation.org/packages/data.table/versions/1.10.0/topics/fwrite?). Good luck!!!

Julia: conversion between different time periods

Full disclosure: I've only been using Julia for about a day, so it may be too soon to ask questions.
I'm not really understanding the utility of the Dates module's Period types. Let's say I had two times and I wanted to find the number of minutes between them. It seems like the natural thing to do would be to subtract the times and then convert the result to minutes. I can deal with not having a Minute constructor (which seems most natural to my Python-addled brain), but it seems like convert should be able to do something.
The "solution" of converting from Millisecond to Int to Minute seems a little gross. What's the better/right/idiomatic way of doing this? (I did RTFM, but maybe the answer is there and I missed it.)
y, m, d = (2015, 03, 16)
hr1, min1, sec1 = (8, 14, 00)
hr2, min2, sec2 = (9, 23, 00)
t1 = DateTime(y, m, d, hr1, min1, sec1)
t2 = DateTime(y, m, d, hr2, min2, sec2)
# println(t2 - t1) # 4140000 milliseconds
# Minute(t2 - t1) # ERROR: ArgumentError("Can't convert Millisecond to Minute")
# minute(t2 - t1) # ERROR: `minute` has no method matching
# minute(::Millisecond)
# convert(Minute, (t2-t1)) # ERROR: `convert` has no method matching
# convert(::Type{Minute}, ::Millisecond)
delta_t_ms = convert(Int, t2 - t1)
function ms_to_min(time_ms)
MS_PER_S = 1000
S_PER_MIN = 60
# recall that division is floating point unless you use div function
return div(time_ms, (MS_PER_S * S_PER_MIN))
end
delta_t_min = ms_to_min(delta_t_ms)
println(Minute(delta_t_min)) # 69 minutes
(My apologies for choosing a snicker-inducing time interval. I happened to convert two friends' birthdays into hours and minutes without really thinking about it.)
Good question; seems like we should add it! (Disclosure: I made the Dates module).
For real, we had conversions in there at one point, but then for some reason or another they were taken out (I think it revolved around whether inexact conversions should throw errors or not, which has recently been cleaned up quite a bit in Base for Ints/Floats). I think it definitely makes sense to add them back in. We actually have a handful in there for other operations, so obviously they're useful.
As always, it's also a matter of who has the time to code/test/submit and hopefully that's driven by people with real needs for the functionFeel free to submit a PR if you're feeling ambitious!

Converting a 19 digits time stamp to a real time (from .zvi file format)

After a long day of research,
Is anybody knows how to convert a 19 digits time stamp from the metadata of .zvi file (produce by axiovision, Zeiss) to a real time format ? (The output probably includes milliseconds)
An example time-stamp is: 4675873294709522577
Thanks !
Arnon
Matlab solution:
The main issue is not the x2mdate conversion (which simply adds the number of days between the year zero, when Matlab starts counting, and the year 1900, when Excel/zvi starts counting), but the same class issue as described above. This conversion to double can be done with typecast in Matlab:
myZVI = 4675946358764751269;
timestampDouble = typecast(int64(myZVI),'double');
myTime = datestr(timestampDouble + 693960, 'dd-mmm-yyyy HH:MM:SS.FFF');
693960 is the number of days between year zero and 1900; if you don't need an absolute date but just the difference between two timestamps, you don't even need this; for instance the interval between two of my video frames can be calculated like this:
myZVI2 = 4675946358764826427;
timestampDouble2 = typecast(int64(myZVI2),'double');
myTimeDifference = datestr(timestampDouble2 - timestampDouble,'SS.FFF');
hope this helps:-)
This is a Microsoft OLE Automation Date. But you've read it as a 64-bit long integer instead of the 64-bit double that it should be.
You didn't specify a language, so I will pick C#:
long l = 4675873294709522577L;
byte[] b = BitConverter.GetBytes(l);
double d = BitConverter.ToDouble(b, 0);
Debug.WriteLine(d); // 41039.901598693
DateTime dt = DateTime.FromOADate(d);
Debug.WriteLine(dt); // 5/10/2012 9:38:18 PM
More information in this thread.
An OLE Automation Date is basically the number of whole 24-hour days since 1/1/1900 without any particular time zone reference.

Convert 64bit timestamp to a readable value

In my dataset I have two timestamp columns. The first is microseconds since application was started - e.g., 1400805323. The second is described as 64bit timestamp which I'm hoping will indicate clock time, using NTP format of number of seconds from 1/1/1901.
Example of '64bit' timestamps:
129518309081725000
129518309082059000
129518309082393000
129518309082727000
129518309083060000
129518309083394000
129518309083727000
Is there any matlab/python code that could convert this into a readable format?
Any help much appreciated,
Steve
Assuming that these values were generated today, June 6th 2011, these values look like number of 100-nanosecond intervals since Jan 1st year 1601. This is how Windows NT stores FILETIME. For more concentrated info on this read this blog post of Raymond Chen. These articles also show how to convert it to anything else
See edit below for updated answer:
For NTP time, the 64bits are broken in to seconds and fraction of seconds. The top 32 bits is the seconds. The bottom 32 bits is the fraction of seconds. You get the fraction by dividing the fraction part by 2^32.
So step one, convert to a double.
If you like python that's easy enough, I didn't add any bounds checking:
def to_seconds(h):
return (h>>32) + ((float)(h&0xffffffff))/pow(2,32)
>>> to_seconds(129518309081725000)
30155831.26845886
The time module can covert that float to a readable time format.
import time
time.ctime(to_seconds(ntp_timestamp))
You'll need to worry about where the timestamp originated though. time.ctime assumes reletive to the Jan 1, 1970. So if your program is basing the ntp formats of time since program run, you'd need to add to the seconds to normalize the timestamp for ctime.
>>> time.ctime(to_seconds(129518309081725000))
'Tue Dec 15 17:37:11 1970'
EDIT:
PyGuy is right, the original timestamps are not ntp time numbers, they are Windows 64 bit timestamps.
Here is the new to_seconds method to convert the 100ns interval based on 1/1/1601 to the 1970 seconds interval:
def to_seconds(h):
s=float(h)/1e7 # convert to seconds
return s-11644473600 # number of seconds from 1601 to 1970
And the new output:
import time
time.ctime(to_seconds(129518309081725000))
'Mon Jun 6 04:48:28 2011'

Resources