In MomentJS library they give a way to add a number of days to the specific date:
//date math
moment('2016-03-12 13:00:00').add(1, 'day').format('LLL')
"March 13, 2016 1:00 PM"
Is it possible to add the specific number of days to the date saved in milliseconds?
I tried converting # of days to milliseconds and adding that number, but that would give me wrong date in the end.
I'm able to get the expected result adding the number of milliseconds
moment('2016-03-12 13:00:00').add(1, 'day').format('LLL')
'March 13, 2016 1:00 PM'
moment('2016-03-12 13:00:00').add(86400000, 'milliseconds').format('LLL')
'March 13, 2016 1:00 PM'
moment('2016-03-12 13:00:00').add(2, 'day').format('LLL')
'March 14, 2016 1:00 PM'
moment('2016-03-12 13:00:00').add(86400000*2, 'milliseconds').format('LLL')
'March 14, 2016 1:00 PM'
With start date in milliseconds
moment(1457784000000).add(86400000*2, 'milliseconds').format('LLL')
'March 14, 2016 1:00 PM'
moment('2016-03-12 13:00:00').add(86400000*2, 'milliseconds').format('LLL')
'March 14, 2016 1:00 PM'
I have several rows in my table like below:
row1: abc changed on 12 November, 2008 11:30 AM and its abc..region1
row2: defg updated 14 January, 2012 08:20 PM ......region2
row3: ghijkl corrected by 18 august, 2013 9:30 AM ..something..region3
My requirement is as follows:
All the above dates are in EST time zone and date format is exactly as above and does not change.
I want to update the dates in these rows from EST to different time zones as per the region in that row, and the format should be changed to something like 12 dec 2016 7:30 AM.
So the query I framed is (taking row1 as example) as below:
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)','\1-\3-\5 \7 \9',1,0,'i')
output:
abc changed on 12-November-2008 11:30 AM and its abc..region1
So I am happy with the above query till now because I get a string
with the formatted date. Even though this is not the final date
format, I can use this date to pass to some function which converts
this date according to the region do some processing and fianlly
return a date type.For the same purpose I add to_date in the above
query:
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)',
substr('\1-\3-\5 \7 \9',1),
1,0,'i')
output:
abc changed on 12-November-2008 11:30 AM and its
abc..region1 --> works fine till here
Now I am adding to_date to convert the date string type to real date
type to do some processing on it:
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)',
to_date(substr('\1-\3-\5 \7 \9',1),'dd-mon-yyyy HH:MI AM'),
1,0,'i')
This query is giving me an error:
ORA-01858: a non-numeric character found where a numeric was expected
I checked whether wrong parameters were being passed to
to_date(), and fired the query below, but it worked fine.
Select to_date('12-November-2008 11:30 AM','dd-mon-yyyy HH:MI AM')
from dual;
output:
12-Nov-2008
(I am not worried with the timestamp because itnternall it will be anyways in this date)
To avoid confusion I have numbered the substrings of the regular expression above:
([0-9]{2})-->1 ([[:blank:]])-->2
(January|February|March|April|May|June|July|August|September|October|November|December)-->3
(,[[:blank:]])-->4 ([0-9]{4})-->5 ([[:blank:]])-->6
([0-9]{2}:[0-9]{2})-->7 ([[:blank:]])-->8 (AM|PM)-->9
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]
{2})([[:blank:]])(AM|PM)','\1-\3-\5 \7 \9',1,0,'i')
Assuming your string always has the date in it in that particular format (and that there are no invalid dates etc etc) then the following should work for you:
WITH sample_data AS (SELECT ' the date is 12 November, 2008 11:30 AM' str FROM dual UNION ALL
SELECT 'Here''s a date of 1 March, 2015 1:43 pm' str FROM dual UNION ALL
SELECT '1 February,2016 9:43 AM' str FROM dual UNION ALL
SELECT 'And again it''s 21 May, 2016 9:43 AM and a little bit extra' str FROM dual)
SELECT str,
to_date(regexp_replace(str, '^.*?([[:digit:]]{1,2} [[:alpha:]]{3,9}, ?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M).*$', '\1', 1, 1, 'i'), 'dd Month yyyy, hh:mi am') dt
FROM sample_data;
STR DT
---------------------------------------------------------- -------------------
the date is 12 November, 2008 11:30 AM 12/11/2008 11:30:00
Here's a date of 1 March, 2015 1:43 pm 01/03/2015 13:43:00
1 February,2016 9:43 AM 01/02/2016 09:43:00
And again it's 21 May, 2016 9:43 AM and a little bit extra 21/05/2016 09:43:00
The regular expression can be broken down as follows:
^.*? - match any character (except new line) from the start of the line as few times as possible, which may be 0 or more.
([[:digit:]]{1,2} [[:alpha:]]{3,9}, ?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M) - this is the pattern we're looking for, and which we'll use to replace the whole string with (this is aliased as \1, which we can then pass into the replace string parameter).
.*$ - match any character up to the end of the string
The second part of the pattern can be further broken down as:
[[:digit:]]{1,2} - one or two digits
- a single space character
[[:alpha:]]{3,9} - three to nine letters (upper or lower case)
, ? - a comma followed by 0 or 1 spaces
[[:digit:]]{4} - four digits
- a single space character
[[:digit:]]{1,2} - one or two digits
\: - a single colon character
[[:digit:]]{1,2} - two digits
- a single space character
(A|P)M - either the letter A or P followed by an M
This should do the trick for you:
WITH sample_data AS (SELECT 'abc changed on 12 November, 2008 11:30 AM and its abc..region1' str FROM dual UNION ALL
SELECT 'defg updated 14 January, 2012 08:20 PM ......region2' str FROM dual UNION ALL
SELECT 'ghijkl corrected by 18 august, 2013 9:30 AM ..something..region3' str FROM dual)
SELECT str,
regexp_replace(str,
'(^.*?)(([[:digit:]]{1,2}) (January|February|March|April|May|June|July|August|September|October|November|December), (?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M))(.*$)',
'\1\3-\4-\5\7', 1, 1, 'i') dt
FROM sample_data;
STR DT
------------------------------------------------------------------- --------------------------------------------------------------------------------
abc changed on 12 November, 2008 11:30 AM and its abc..region1 abc changed on 12-November-2008 11:30 AM and its abc..region1
defg updated 14 January, 2012 08:20 PM ......region2 defg updated 14-January-2012 08:20 PM ......region2
ghijkl corrected by 18 august, 2013 9:30 AM ..something..region3 ghijkl corrected by 18-august-2013 9:30 AM ..something..region3
I have a huge dataframe that looks something like this:
Insider Trading Relationship Date \
SEC Form 4
Nov 16 04:06 PM Silverman Gene Director Nov 14
Oct 27 07:00 AM RAKOLTA JOHN JR Director Oct 26
Nov 16 04:09 PM LEIGHTON F THOMSON Chief Executive Officer Nov 15
Nov 02 04:20 PM Blumofe Robert EVP Platform Nov 01
Oct 28 04:03 PM MCCONNELL RICK M President Prods & Development Oct 28
I'm trying to change the index dtype into a datetime dtype via this code
pd.to_datetime(df2.index, format = '%b %d %I:%M %p')
but it's yielding the error:
Traceback (most recent call last):
File "<pyshell#126>", line 1, in <module>
pd.to_datetime(df2.index, format = '%b %d %I:%M %p')
File "C:\Python27\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 420, in to_datetime
return _convert_listlike(arg, box, format, name=arg.name)
File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 407, in _convert_listlike
raise e
Is there a way I can find the index of where the error is occurring?
It seems I can set errors to coerce which would just return a Nan as the date, but I would like to avoid that.
Thanks!
You are right, just finish the logic. Set to coerce and filter the index against all values being isnull() to find all the incorrect indices.
02/07/2016 12:43:23.324 PM
mm/dd/yyyy hh:mm:ss.SSS AM/PM -current format 12 hr
Please help to convert this to 24 hour format in pig...
File in HDFS
02/07/2016 12:43:23.324 PM
03/08/2016 08:12:15.123 AM
Commands in pig :
date_data = LOAD 'hdfs path' as (date: chararray);
todate_data = foreach date_data generate ToDate(date,'yyyy/MM/dd HH:mm:ss.SSS');
dump todate_data;
Gives the following Exception.
java.lang.IllegalArgumentException: Invalid format: "02/07/2016 12:43:23.324 PM" is malformed at "16 12:43:23.324 PM"
You will have to specify the format of the input.For example, your data is in 'MM/dd/yyyy hh:mm:ss.SSS aa' format.So use the below script.
date_data = LOAD 'hdfs path' as (date: chararray);
todate_data = foreach date_data generate ToDate(date,'MM/dd/yyyy hh:mm:ss.SSS aa');
dump todate_data;
Below is a working example.The input 4 dates are in 'dd/MM/yyyy hh:mm:ss.SSS aa' format.
INPUT
30/06/2016 02:43:23.324 PM
01/12/2016 12:43:23.324 AM
21/08/2016 06:43:23.324 PM
13/07/2016 12:43:23.324 AM
SCRIPT
A = LOAD 'test4.txt' AS (create_dt:chararray);
B = FOREACH A GENERATE ToDate(create_dt,'dd/MM/yyyy hh:mm:ss.SSS aa') AS create_dt;
DUMP B;
OUTPUT
As a beginner, creating timestamps or formatted dates ended up being a little more of a challenge than I would have expected. What are some basic examples for reference?
Ultimately you want to review the datetime documentation and become familiar with the formatting variables, but here are some examples to get you started:
import datetime
print('Timestamp: {:%Y-%m-%d %H:%M:%S}'.format(datetime.datetime.now()))
print('Timestamp: {:%Y-%b-%d %H:%M:%S}'.format(datetime.datetime.now()))
print('Date now: %s' % datetime.datetime.now())
print('Date today: %s' % datetime.date.today())
today = datetime.date.today()
print("Today's date is {:%b, %d %Y}".format(today))
schedule = '{:%b, %d %Y}'.format(today) + ' - 6 PM to 10 PM Pacific'
schedule2 = '{:%B, %d %Y}'.format(today) + ' - 1 PM to 6 PM Central'
print('Maintenance: %s' % schedule)
print('Maintenance: %s' % schedule2)
The output:
Timestamp: 2014-10-18 21:31:12
Timestamp: 2014-Oct-18 21:31:12
Date now: 2014-10-18 21:31:12.318340
Date today: 2014-10-18
Today's date is Oct, 18 2014
Maintenance: Oct, 18 2014 - 6 PM to 10 PM Pacific
Maintenance: October, 18 2014 - 1 PM to 6 PM Central
Reference link: https://docs.python.org/3.4/library/datetime.html#strftime-strptime-behavior
>>> import time
>>> print(time.strftime('%a %H:%M:%S'))
Mon 06:23:14
from datetime import datetime
dt = datetime.now() # for date and time
ts = datetime.timestamp(dt) # for timestamp
print("Date and time is:", dt)
print("Timestamp is:", ts)
You might want to check string to datetime operations for formatting.
from datetime import datetime
datetime_str = '09/19/18 13:55:26'
datetime_object = datetime.strptime(datetime_str, '%m/%d/%y %H:%M:%S')
print(type(datetime_object))
print(datetime_object) # printed in default format
Output:
<class 'datetime.datetime'>
2018-09-19 13:55:26