How to convert chararray to datetime with milliseconds in pig latin - datetime

I wish to convert following value which is a chararray in pig
2016-05-11 23:59:57.628197
to
2016-05-11T23:59:57.628-05:00
How can I do it ?
Following is what I tried considering alias 'a2' contains list of datetime values in chararray in the column named 'input_date_value'
FOREACH a2 GENERATE input_date_value AS input_date:chararray,
ToDate(input_date_value,'YYYY-MM-DD HH:mm:ss.SSSSSS') AS modification_datetime:datetime;
For input -
2002-07-11 16:58:40.249764
Output is -
2002-01-11T16:58:40.249-05:00
The month values like '07' are not getting picked up,
The created timestamp has month set to 01' i.e. January everytime for all dates.
Can someone help. What am I doing wrong ?

https://pig.apache.org/docs/r0.11.1/func.html#to-date ToDate takes SimpleDateFormat only supports milliseconds http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
The -05:00 you see is the time zone ToDate is actually truncating to 3 digits as it supports only milliseconds

Use lowercase character d instead of uppercase D for parsing date values.
Now, I have managed to fix it myself on (In Pig 0.11)
Apparently Pig 0.11 does not support the date format components I used earlier for parsing the month and date.
I found below inference which hints on the incompatibility as mentioned https://www.w3.org/TR/NOTE-datetime
Use:
'YYYY-MM-dd HH:mm:ss.SSSSSS'
instead of 'YYYY-MM-DD HH:mm:ss.SSSSSS'
It now gives correct output.
Input:
2001-11-28 16:04:49.22388
Output:
2001-11-28T16:04:49.223-05:00

Related

Converting 2 digit date year in string to a Date Object

Hi I want to convert a 2 digit year date in string format to an Date object. But I an not sure what to use as the format
For 4 digit Dates it works fine, i.e
using Dates
t1 = "01/01/2017"
Date(t1, "dd/mm/yyyy")
# Out > 2017-01-01
But for 2 digit year
t2 = "27/01/17
Date(t2, "dd/mm/yy")
# Out > 0017-01-27
Any idea what to use as the formatting?
This seems not be implemented (yet). See the discussion here or the (open) pull request to implement it here.
It is a debated topic, as the default in other languages is to assign years >68 to the twenty century and those <=68 to the twenty-first century, that is a bit subjective, so the Julia developers preferred to go for the explicit way that the missing digits must be explicitly added.
So for now just add 2000 years:
t2 = "27/01/17"
Date(t2, "dd/mm/yy") + Dates.Year(2000)

how to convert a string date variable into datetime variable in sas

I have a string variable for datetime. Sometimes it is a whole number like 3040000 sometimes a decimal value like this 3130215.123.
I would like to convert this into a date time variable like mm-dd-yyyy.
Thanks in advance.
add: I think the value 3130215.123 refers to feb-15-2013 12:30:00.
I think these are
3YYMMDD.HHMM
3130215.123 -> 3|13|02|15|12|30
(the last 0 is assumed).
So, you need:
length year month day hour minute second $2;
year=substr(dtvar,2,2);
month=substr(dtvar,4,2);
day=substr(dtvar,6,2);
hour=translate(subpad(dtvar,9,2),'0',' ');
minute=translate(subpad(dtvar,11,2),'0',' ');
second=translate(subpad(dtvar,13,2),'0',' ');
then
new_dtvar=dhms(mdy(month,day,year),hour,minute,second);
Dates, Times, and Datetimes are stored as doubles. Only the format matters for display.
try
var1 = input(var,best.);
format var1 datetime19.;
in a data step to apply the format. Then look at the results.
Based on comments on the original question, the statements would be:
format var_date ddmmyyd10. var_time time5. var_datetime datetime19.;
var_date=input(put(int(17000000+(var)),10.),yymmdd10.);
var_time=input(put((var-int(var))*1e6,z6.),hhmmss6.);
var_datetime=dhms(var_date,hour(var_time),minute(var_time),0);
Haven't had a chance to test, so feel free to comment with any errors you get.

convert string to time in r

I have an array of time strings, for example 115521.45 which corresponds to 11:55:21.45 in terms of an actual clock.
I have another array of time strings in the standard format (HH:MM:SS.0) and I need to compare the two.
I can't find any way to convert the original time format into something useable.
I've tried using strptime but all it does is add a date (the wrong date) and get rid of time decimal places. I don't care about the date and I need the decimal places:
for example
t <- strptime(105748.35, '%H%M%OS') = ... 10:57:48
using %OSn (n = 1,2 etc) gives NA.
Alternatively, is there a way to convert a time such as 10:57:48 to 105748?
Set the options to allow digits in seconds, and then add the date you wish before converting (so that the start date is meaningful).
options(digits.secs=3)
strptime(paste0('2013-01-01 ',105748.35), '%Y-%M-%d %H%M%OS')

cast string directly to IDateTime

I am using the new version of data.table and especially the AWESOME fread function. My files contain dates that are loaded as strings (cause I don't know to do it otherwise) looking like 01APR2008:09:00:00.
I need to sort the data.table on those datetimes and then for the sort to be efficient to cast then in the IDateTime format (or anything alse I would not know yet).
> strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
[1] "2008-04-01 09:00:00"
> IDateTime(strptime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S"))
idate itime
1: 2008-04-01 09:00:00
> IDateTime("01APR2008:09:00:00","%d%b%Y:%H:%M:%S")
Error in charToDate(x) :
character string is not in a standard unambiguous format
It looks like I cannot do DT[ , newType := IDateTime(strptime(oldType, "%d%b%Y:%H:%M:%S"))].
My questions are then:
Is there a way to cast directly to IDateTime from fread, such that I can sort afterward efficiently?
If not, what is the most efficient way to go knowing that I would like to be able to sort DT by this datetime column
Unfortunately (for efficiency) strptime produces a POSIXlt type, which is unsupported by data.table and always will be due its size (40 bytes per date!) and structure. Although strftime produces the much better POSIXct, it still does it via POSIXlt. More info here :
http://stackoverflow.com/a/12788992/403310
Looking to base functions such as as.Date, it uses strptime too, creating an integer offset from epoch (oddly) stored as double. The IDate (and friends) class in data.table aims to achieve integer epoch offsets stored as, um, integer. Suitable for fast sorting by base::sort.list(method = "radix") (which is really a counting sort). IDate doesn't really aim to be fast at (usually one off) conversion.
So to convert string dates/times, rightly or wrongly, I tend to roll my own helper function.
If the string date is "2012-12-24" I'd lean towards: as.integer(gsub("-", "", col)) and proceed with YYYYMMDD integer dates. Similarly times can be HHMMDD as an integer. Two columns: date and time separately can be useful if you generally want to roll = TRUE within a day, but not to the previous day. Grouping by month is simple and fast: by = date %/% 100L. Adding and subtracting days is troublesome, but it is anyway because rarely do you want to add calendar days, rather weekdays or business days. So that's a lookup to your business day vector anyway.
In your case the character month would need a conversion to 1:12. There isn't a separator in your dates "01APR2008", so a substring would be one way followed by a match or fmatch on the month name. Are you in control of the file format? If so, numbers are better in an unambiguous format that sorts naturally such as %Y-%m-%d, or %Y%m%d.
I haven't yet got to how best do this in fread, so date/times are left as character currently because I'm not yet sure how to detect the date format or which type to output. It does need to output either integer or double dates though, rather than inefficient character. I suspect that my use of YYYYMMDD integers are seen as unconventional, so I'm a little hesitant to make that the default. They have their place, and there are pros and cons of epoch based dates too. Dates don't have to be always epoch based is all I'm suggesting.
What do you think? Btw, thanks for encouragement on fread; was nice to see.
I d'ont know how your file is structured, but from your comment you want to use the date field as a key. Why not to read it as a time series and format it when in reading?
Here I use zoo to do it.(Here I suppose that the date column is the first one,otherwise see index.colum argument)
ff <- function(x) as.POSIXct(strptime(x,"%d%b%Y:%H:%M:%S"))
h <- read.zoo(text = "03avril2008:09:00:00 125
02avril2008:09:30:00 126
05avril2008:09:10:00 127
04avril2008:09:20:00 128
01avril2008:09:00:00 128"
,FUN=ff)
You get your dates sorted in the right format and sorted.
The conversion is natural from POSIXct to IDateTime
IDateTime(index(h))
idate itime
1: 2008-04-01 09:00:00
2: 2008-04-02 09:30:00
3: 2008-04-03 09:00:00
4: 2008-04-04 09:20:00
5: 2008-04-05 09:10:00
Here sure you still do 2 conversions, But you do it when reading data, and the second you do it without dealing with any format problem.

Converting time format to numeric with R

In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))

Resources