Extract date from a string column containing timestamp in Pyspark - datetime

I have a dataframe which has a date in the following format:
+----------------------+
|date |
+----------------------+
|May 6, 2016 5:59:34 AM|
+----------------------+
I intend to extract the date from this in the format YYYY-MM-DD ; so the result should be for the above date - 2016-05-06.
But when I extract is using the following:
df.withColumn('part_date', from_unixtime(unix_timestamp(df.date, "MMM dd, YYYY hh:mm:ss aa"), "yyyy-MM-dd"))
I get the following date
2015-12-27
Can anyone please advise on this? I do not intend to convert my df to rdd to use datetime function from python and want to use this in the dataframe it self.

There are some errors with your pattern. Here's a suggestion:
from_pattern = 'MMM d, yyyy h:mm:ss aa'
to_pattern = 'yyyy-MM-dd'
df.withColumn('part_date', from_unixtime(unix_timestamp(df['date'], from_pattern), to_pattern)).show()
+----------------------+----------+
|date |part_date |
+----------------------+----------+
|May 6, 2016 5:59:34 AM|2016-05-06|
+----------------------+----------+

Related

Just for SQLite, is there an easy way to convert a column of text (like 21-Sep-2022) into a valid date format while query?

Just for SQLite, is there an easy way to convert a column of text (like 21-Sep-2022) into valid date format while query?
I know it's easy for other DBs, such as SQL Server and Oracle, to do so. They have existing function. I'm now meet the same situation in operating SQLite. But I did not find any "cast", "convert" or "date" function that could work and get a proper result.
I've tried DATE(), and it seems the text is not recognized and only NULL returns.
Something like this should do the job. Field name "f", table name "x".
select
-- YEAR
printf('%04d-',substr( f ,-4)) ||
-- LOOKUP FUNCTION for MONTH
printf('%02d-',
CASE substr(f, instr(f,'-')+1,3 )
WHEN 'Jan' THEN 1
WHEN 'Feb' THEN 2
WHEN 'Mar' THEN 3
WHEN 'Apr' THEN 4
WHEN 'May' THEN 5
WHEN 'Jun' THEN 6
WHEN 'Jul' THEN 7
WHEN 'Aug' THEN 8
WHEN 'Sep' THEN 9
WHEN 'Oct' THEN 10
WHEN 'Nov' THEN 11
WHEN 'Dec' THEN 12
END)
||
-- DAY
printf('%02d', substr(f, 1, instr(f,'-')) )
as thedate
from x
+-------------+
| Table f |
+-------------+
| 1-Jan-2023 |
| 19-Sep-2022 |
| 24-Dec-1989 |
+-------------+
+------------+
| thedate |
+------------+
| 2023-01-01 |
| 2022-09-19 |
| 1989-12-24 |
+------------+
The result is formatted YYYY-MM-DD, and can be processed as a date in SQLite.
Function will fail if some dates are not formatted correctly.

I want to find the day difference between 2 date column in azure app insight?

We have a log file where we store the searches happening on our platform. Now there is a departure date and I want to find the searches where departure date is after 330 days from today.
I am trying to run the query to find the difference between departure date column and logtime(entry time of the event into log). But getting the below error:
Query could not be parsed at 'datetime("departureDate")' on line [5,54]
Token: datetime("departureDate")
Line: 5
Position: 54
Date format of departure date is mm/dd/yyyy and logtime format is typical datetime format of app insight.
Query that I am running is below:
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',datetime("departureDate"),datetime("logTime")) > 200
As suggested I ran the below query but now I am getting 0 results but there is data that satisfy the given criteria.
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
Example:
departureDate
04/09/2020
logTime
8/13/2019 8:45:39 AM -04:00
I also tried the below query to check whether data format is supported or not and it gave correct response.
customEvents
| project datetime_diff('day', datetime('04/30/2020'),datetime('8/13/2019 8:25:51 AM -04:00'))
Please use the below query. Use todatetime statement to convert string to datetime
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
The double quotes inside datetime operator in where clause should be removed.
Your code should look like:
where datetime_diff('day',datetime(departureDate),datetime(logTime)) > 200

How to get month format with 0 after evaluate using Robot framework

I try to get month and year format like "06/19" after evaluated month but I got just "6/19".
Month and year
${currentYear}= Get Current Date result_format=%y
${currentDate}= Get Current Date
${datetime} = Convert Date ${currentDate} datetime
${getMonth}= evaluate ${datetime.month} - 1
log to console ${getMonth}/${currentYear}
I already tried another way by created variable #{MONTHSNO} ${EMPTY} 01 02 03 04 05 06 07 08 09 10 11 12 and return ${MONTHSNO}[${getMonth}]/${currentYear} I got 06/19 but I'm not sure the robot have another way to convert month to "06" by without to make the variable like these.
You can acheive this by using a custom keywords that will return the date in month/year format
Then you can use relativedelta() to subtract a month from your date
to install dateutil:
pip install python-dateutil
test.py
from datetime import datetime
from dateutil.relativedelta import relativedelta
def return_current_date_minus_one_month():
strDate = datetime.today()
Subtracted_date = strDate + relativedelta(months=-1)
Date = Subtracted_date.strftime('%m/%y')
return Date
test.robot
*** Settings ***
Library test.py
*** Test Cases ***
Month and year
${current_date} = Test.Return Current Date Minus One Month
log ${current_date}
result = ${current_date} = 06/19
When you run Evaluate command, you are running python commands. So let's take a look at datetime docs:
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
They have this part in the end saying that library time can be useful. So I suggest you to use this command to return month in 0X format:
${getMonth}= evaluate time.strftime("%m")
This just return 07 to me (because now it's July)
You can substract from your current date one month like this with the DateTime Library:
${date1}= Get Current Date result_format=%d.%m.%Y
${date2}= Substract Time To Date ${date1} 30 days date_format=%d.%m.%Y result_format=%m.%Y
maybe this helps

Pyspark: Convert String Datetime in 12 hour Clock to Date time with 24 hour clock (Time Zone Change)

Edit: Apologies, the sample data frame is a little off. Below is the corrected sample dataframe I'm trying to convert:
Timestamp (CST)
12/8/2018 05:23 PM
11/29/2018 10:20 PM
I tried the following code based on recommendation below but got null values returned.
df = df.withColumn('Timestamp (CST)_2', from_unixtime(unix_timestamp(col(('Timestamp (CST)')), "yyyy/MM/dd hh:mm:ss aa"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
df = df.withColumn("Timestamp (CST)_3", F.to_timestamp(F.col("Timestamp (CST)_2")))
--------------------------------------------------------------------------------
I have a field called "Timestamp (CST)" that is a string. It is in Central Standard Time.
Timestamp (CST)
2018-11-21T5:28:56 PM
2018-11-21T5:29:16 PM
How do I create a new column that takes "Timestamp (CST)" and change it to UTC and convert it to a datetime with the time stamp on the 24 hour clock?
Below is my desired table and I would like the datatype to be timestamp:
Timestamp (CST)_2
2018-11-21T17:28:56.000Z
2018-11-21T17:29:16.000Z
I tried the following code but all the results came back null:
df = df.withColumn("Timestamp (CST)_2", to_timestamp("Timestamp (CST)", "yyyy/MM/dd h:mm p"))
Firstly, import from_unixtime, unix_timestamp and col using
from pyspark.sql.functions import from_unixtime, unix_timestamp, col
Then, reconstructing your scenario in a DataFrame df_time
>>> cols = ['Timestamp (CST)']
>>> vals = [
... ('2018-11-21T5:28:56 PM',),
... ('2018-11-21T5:29:16 PM',)]
>>> df_time = spark.createDataFrame(vals, cols)
>>> df_time.show(2, False)
+---------------------+
|Timestamp (CST) |
+---------------------+
|2018-11-21T5:28:56 PM|
|2018-11-21T5:29:16 PM|
+---------------------+
Then, my approach would be
>>> df_time_twenfour = df_time.withColumn('Timestamp (CST)', \
... from_unixtime(unix_timestamp(col(('Timestamp (CST)')), "yyyy-MM-dd'T'hh:mm:ss aa"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
>>> df_time_twenfour.show(2, False)
+------------------------+
|Timestamp (CST) |
+------------------------+
|2018-11-21T17:28:56.000Z|
|2018-11-21T17:29:16.000Z|
+------------------------+
Notes
If you want time to be in 24-Hour format then, you would use HH instead of hh.
Since, you have a PM, you use aa in yyyy-MM-dd'T'hh:mm:ss aa to specify PM.
Your, input string has T in it so, you have to specify it as above format.
the option aa as mentioned in #pyy4917's answer might give legacy errors. To fix it, replace aa with a.
The full code as below:
df_time_twenfour = df_time.withColumn('Timestamp (CST)', \ ...
from_unixtime(unix_timestamp(col(('Timestamp (CST)')), \...
"yyyy-MM-dd'T'hh:mm:ss a"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))

convert the date into a specific format using moment js?

I have an date string as : Wed Aug 30 2017 00:00:00 GMT+0530 (IST) and I want to convert it into like this: 2017-8-30
Now I am doing this:
moment($scope.date.selectedDate).format('YYYY-M-DD') and it is giving the right time but throws a warning as :
moment construction falls back to js date
As the input is JS date so you need to pass input format as well. This can be done by:
moment('Wed Aug 30 2017 00:00:00 GMT+0530', 'ddd MMM DD YYYY HH:mm:ss GMT+-HH:mm').format('YYYY-M-DD');
https://jsfiddle.net/o01ktajp/1/
Relative to the warning you can refer to this post Deprecation warning: moment construction falls back to js Date.
The easiest solution would be to pass the date string in the ISO format.
As for the date, if you simply want to display the date in the UI with that format you can use the 'date' angular filter: https://docs.angularjs.org/api/ng/filter/date.
In your case you could use it like this:
$scope.date.selectedDate | date: 'YYYY-M-DD'
Br,
You can do:
var d = new Date('Wed Aug 30 2017 00:00:00 GMT+0530');
var formated = moment(d).format('YYYY-M-DD');

Resources