Conversion to datetime is yielding error "ValueError: day is out of range for month" How to find - datetime

I have a huge dataframe that looks something like this:
Insider Trading Relationship Date \
SEC Form 4
Nov 16 04:06 PM Silverman Gene Director Nov 14
Oct 27 07:00 AM RAKOLTA JOHN JR Director Oct 26
Nov 16 04:09 PM LEIGHTON F THOMSON Chief Executive Officer Nov 15
Nov 02 04:20 PM Blumofe Robert EVP Platform Nov 01
Oct 28 04:03 PM MCCONNELL RICK M President Prods & Development Oct 28
I'm trying to change the index dtype into a datetime dtype via this code
pd.to_datetime(df2.index, format = '%b %d %I:%M %p')
but it's yielding the error:
Traceback (most recent call last):
File "<pyshell#126>", line 1, in <module>
pd.to_datetime(df2.index, format = '%b %d %I:%M %p')
File "C:\Python27\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 420, in to_datetime
return _convert_listlike(arg, box, format, name=arg.name)
File "C:\Python27\lib\site-packages\pandas\tseries\tools.py", line 407, in _convert_listlike
raise e
Is there a way I can find the index of where the error is occurring?
It seems I can set errors to coerce which would just return a Nan as the date, but I would like to avoid that.
Thanks!

You are right, just finish the logic. Set to coerce and filter the index against all values being isnull() to find all the incorrect indices.

Related

Unable to convert EST/EDT timezone to another format

Through Python i'm trying to convert the future date into another format and subtract with current date but it's throwing error.
Python version = Python 3.6.8
from datetime import datetime
enddate = 'Thu Jun 02 08:00:00 EDT 2022'
todays = datetime.today()
print ('Tpday =',todays)
Modified_date1 = datetime.strptime(enddate, ' %a %b %d %H:%M:%S %Z %Y')
subtract_days= Modified_date1 - todays
print (subtract_days.days)
Output
Today = 2022-02-02 08:06:53.687342
Traceback (most recent call last):
File "1.py", line 106, in trusstore_output
Modified_date1 = datetime.strptime(enddate1, ' %a %b %d %H:%M:%S %Z %Y')
File "/usr/lib64/python3.6/_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "/usr/lib64/python3.6/_strptime.py", line 362, in _strptime
(data_string, format))
ValueError: time data ' Thu Jun 02 08:00:00 EDT 2022' does not match format ' %a %b %d %H:%M:%S %Z %Y'
During handling of the above exception, another exception occurred:
Linux server date
$ date
Wed Feb 2 08:08:36 CST 2022
Point 6 in the Documentation tells that not all Timezone formats are available to be parsed by strptime.
%Z [...]
So someone living in Japan may have JST, UTC, and GMT as valid values, but probably not EST. It will raise ValueError for invalid values.
If possible, you could get the server date with the -u flag and parse the UTC timestamp.
date -u
Mi 2. Feb 14:39:11 UTC 2022
PS:
Also watch out for the leading whitespace in your strings.
If EDT is available on your system, the Value Error could be a result of the a mixup between enddate and enddate1.
' Thu Jun 02 08:00:00 EDT 2022' vs. enddate = 'Thu Jun 02 08:00:00 EDT 2022'
Unfortunately, only a subset of timezones is supported by strptime.
If you can ensure that the input does not contain any other timezones than EDT or EST, you could replace these by the corresponding UTC offsets and use %z instead of %Z:
from datetime import datetime
date_str = "Thu Jun 02 08:00:00 EDT 2022"
date_str = date_str.replace("EDT", "-0400")
date_str = date_str.replace("EST", "-0500")
date_parsed = datetime.strptime(date_str, "%a %b %d %H:%M:%S %z %Y")
# 2022-06-02 08:00:00-04:00
print(date_parsed)

How to Extract logs between 2 timestamps in Unix

I need to extract the logs between two timestamps from a file in Unix. I basically kind of need those logs to be copied and output in a different file so that I can copy them.
Is there an efficient way to do this? The log format looks like this - The timestamp is in a separate line from the actual logs.
Tue 21 Apr 14:00:00 GMT 2020
{"items":[{"cpu.load": "0.94","total.memory": "6039.798 MB","free.memory": "4367.152 MB","used.memory": "1672.646 MB","total.physical.system.memory": "16.656 GB","total.free.physical.system.memory": "3860.197 MB","total.used.physical.system.memory": "12.796 GB","number.of.cpus": "8"}]}
Tue 21 Apr 18:00:00 GMT 2020
{"items":[{"cpu.load": "0.76","total.memory": "6039.798 MB","free.memory": "4352.656 MB","used.memory": "1687.142 MB","total.physical.system.memory": "16.656 GB","total.free.physical.system.memory": "3858.203 MB","total.used.physical.system.memory": "12.798 GB","number.of.cpus": "8"}]}
I am doing this but it only prints out the timestamp and not the actual logs
cat file.txt | awk -F, '{ if ($1>"Fri 21 Aug 14:00:00 GMT 2020" && $1<"Sat 22 Aug 18:00:00 GMT 2020") print }'
Can someone advice.

ora-01858 error with to_date in regexp_replace

I have several rows in my table like below:
row1: abc changed on 12 November, 2008 11:30 AM and its abc..region1
row2: defg updated 14 January, 2012 08:20 PM ......region2
row3: ghijkl corrected by 18 august, 2013 9:30 AM ..something..region3
My requirement is as follows:
All the above dates are in EST time zone and date format is exactly as above and does not change.
I want to update the dates in these rows from EST to different time zones as per the region in that row, and the format should be changed to something like 12 dec 2016 7:30 AM.
So the query I framed is (taking row1 as example) as below:
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)','\1-\3-\5 \7 \9',1,0,'i')
output:
abc changed on 12-November-2008 11:30 AM and its abc..region1
So I am happy with the above query till now because I get a string
with the formatted date. Even though this is not the final date
format, I can use this date to pass to some function which converts
this date according to the region do some processing and fianlly
return a date type.For the same purpose I add to_date in the above
query:
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)',
substr('\1-\3-\5 \7 \9',1),
1,0,'i')
output:
abc changed on 12-November-2008 11:30 AM and its
abc..region1 --> works fine till here
Now I am adding to_date to convert the date string type to real date
type to do some processing on it:
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]{2})([[:blank:]])(AM|PM)',
to_date(substr('\1-\3-\5 \7 \9',1),'dd-mon-yyyy HH:MI AM'),
1,0,'i')
This query is giving me an error:
ORA-01858: a non-numeric character found where a numeric was expected
I checked whether wrong parameters were being passed to
to_date(), and fired the query below, but it worked fine.
Select to_date('12-November-2008 11:30 AM','dd-mon-yyyy HH:MI AM')
from dual;
output:
12-Nov-2008
(I am not worried with the timestamp because itnternall it will be anyways in this date)
To avoid confusion I have numbered the substrings of the regular expression above:
([0-9]{2})-->1 ([[:blank:]])-->2
(January|February|March|April|May|June|July|August|September|October|November|December)-->3
(,[[:blank:]])-->4 ([0-9]{4})-->5 ([[:blank:]])-->6
([0-9]{2}:[0-9]{2})-->7 ([[:blank:]])-->8 (AM|PM)-->9
select regexp_replace(
'abc changed on 12 November, 2008 11:30 AM and its abc..region1',
'([0-9]{2})([[:blank:]]) (January|February|March|April|May|June|July|August|September|October|November|December)(,[[:blank:]])([0-9]{4})([[:blank:]])([0-9]{2}:[0-9]
{2})([[:blank:]])(AM|PM)','\1-\3-\5 \7 \9',1,0,'i')
Assuming your string always has the date in it in that particular format (and that there are no invalid dates etc etc) then the following should work for you:
WITH sample_data AS (SELECT ' the date is 12 November, 2008 11:30 AM' str FROM dual UNION ALL
SELECT 'Here''s a date of 1 March, 2015 1:43 pm' str FROM dual UNION ALL
SELECT '1 February,2016 9:43 AM' str FROM dual UNION ALL
SELECT 'And again it''s 21 May, 2016 9:43 AM and a little bit extra' str FROM dual)
SELECT str,
to_date(regexp_replace(str, '^.*?([[:digit:]]{1,2} [[:alpha:]]{3,9}, ?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M).*$', '\1', 1, 1, 'i'), 'dd Month yyyy, hh:mi am') dt
FROM sample_data;
STR DT
---------------------------------------------------------- -------------------
the date is 12 November, 2008 11:30 AM 12/11/2008 11:30:00
Here's a date of 1 March, 2015 1:43 pm 01/03/2015 13:43:00
1 February,2016 9:43 AM 01/02/2016 09:43:00
And again it's 21 May, 2016 9:43 AM and a little bit extra 21/05/2016 09:43:00
The regular expression can be broken down as follows:
^.*? - match any character (except new line) from the start of the line as few times as possible, which may be 0 or more.
([[:digit:]]{1,2} [[:alpha:]]{3,9}, ?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M) - this is the pattern we're looking for, and which we'll use to replace the whole string with (this is aliased as \1, which we can then pass into the replace string parameter).
.*$ - match any character up to the end of the string
The second part of the pattern can be further broken down as:
[[:digit:]]{1,2} - one or two digits
- a single space character
[[:alpha:]]{3,9} - three to nine letters (upper or lower case)
, ? - a comma followed by 0 or 1 spaces
[[:digit:]]{4} - four digits
- a single space character
[[:digit:]]{1,2} - one or two digits
\: - a single colon character
[[:digit:]]{1,2} - two digits
- a single space character
(A|P)M - either the letter A or P followed by an M
This should do the trick for you:
WITH sample_data AS (SELECT 'abc changed on 12 November, 2008 11:30 AM and its abc..region1' str FROM dual UNION ALL
SELECT 'defg updated 14 January, 2012 08:20 PM ......region2' str FROM dual UNION ALL
SELECT 'ghijkl corrected by 18 august, 2013 9:30 AM ..something..region3' str FROM dual)
SELECT str,
regexp_replace(str,
'(^.*?)(([[:digit:]]{1,2}) (January|February|March|April|May|June|July|August|September|October|November|December), (?[[:digit:]]{4} [[:digit:]]{1,2}\:[[:digit:]]{2} (A|P)M))(.*$)',
'\1\3-\4-\5\7', 1, 1, 'i') dt
FROM sample_data;
STR DT
------------------------------------------------------------------- --------------------------------------------------------------------------------
abc changed on 12 November, 2008 11:30 AM and its abc..region1 abc changed on 12-November-2008 11:30 AM and its abc..region1
defg updated 14 January, 2012 08:20 PM ......region2 defg updated 14-January-2012 08:20 PM ......region2
ghijkl corrected by 18 august, 2013 9:30 AM ..something..region3 ghijkl corrected by 18-august-2013 9:30 AM ..something..region3

Moment js timezone off by 1 hour

I can't figure out what I am doing wrong here.
Passing in a string to moment, with the format and calling .toDate().
toDate() ends up returning a time that is off by 1 hour
moment("2015-11-19T18:34:00-07:00", "YYYY-MM-DDTHH:mm:ssZ").toDate()
> Thu Nov 19 2015 17:34:00 GMT-0800 (PST)
The time should be 18:34, not 17:34. The timezone is showing -08, when it should be showing -07

How can I create basic timestamps or dates? (Python 3.4)

As a beginner, creating timestamps or formatted dates ended up being a little more of a challenge than I would have expected. What are some basic examples for reference?
Ultimately you want to review the datetime documentation and become familiar with the formatting variables, but here are some examples to get you started:
import datetime
print('Timestamp: {:%Y-%m-%d %H:%M:%S}'.format(datetime.datetime.now()))
print('Timestamp: {:%Y-%b-%d %H:%M:%S}'.format(datetime.datetime.now()))
print('Date now: %s' % datetime.datetime.now())
print('Date today: %s' % datetime.date.today())
today = datetime.date.today()
print("Today's date is {:%b, %d %Y}".format(today))
schedule = '{:%b, %d %Y}'.format(today) + ' - 6 PM to 10 PM Pacific'
schedule2 = '{:%B, %d %Y}'.format(today) + ' - 1 PM to 6 PM Central'
print('Maintenance: %s' % schedule)
print('Maintenance: %s' % schedule2)
The output:
Timestamp: 2014-10-18 21:31:12
Timestamp: 2014-Oct-18 21:31:12
Date now: 2014-10-18 21:31:12.318340
Date today: 2014-10-18
Today's date is Oct, 18 2014
Maintenance: Oct, 18 2014 - 6 PM to 10 PM Pacific
Maintenance: October, 18 2014 - 1 PM to 6 PM Central
Reference link: https://docs.python.org/3.4/library/datetime.html#strftime-strptime-behavior
>>> import time
>>> print(time.strftime('%a %H:%M:%S'))
Mon 06:23:14
from datetime import datetime
dt = datetime.now() # for date and time
ts = datetime.timestamp(dt) # for timestamp
print("Date and time is:", dt)
print("Timestamp is:", ts)
You might want to check string to datetime operations for formatting.
from datetime import datetime
datetime_str = '09/19/18 13:55:26'
datetime_object = datetime.strptime(datetime_str, '%m/%d/%y %H:%M:%S')
print(type(datetime_object))
print(datetime_object) # printed in default format
Output:
<class 'datetime.datetime'>
2018-09-19 13:55:26

Resources