PySpark Convert String Column to Datetime Type - datetime

I have the TIMESTAMP data like:
[29:23:59:45]
This stands for whatever month 29, 23:59:45
How can I convert in PySpark to like DAY 29, TIME:23:59:45?
Possibly using something like
from datetime import datetime
dVal = datetime.strptime('[29:23:59:45]', '%d/%h/%m/%s')

This is a classic example for which is needed to use a User Defined Function (UDF).
from datetime import datetime
from spark.sql import functions as F
def toDate(x):
return datetime.strptime(x, '%m %H:%M:%S')
toDate = F.udf(toDate)
new_df = df.withColumn('date', toDate(F.col('timestamp'))
where, df is supposed to be the old dataframe containing a column named 'timestamp' as you reported.

Related

Convert string to date with date and time together in Hive

I need convert string to datetime (date and time together).
I try this:
cast(to_date(from_unixtime(unix_timestamp('20190303164305', 'yyyyMMddHHmmss'))) as date) as date_data_chamada
timezone: Brazil
But this way returns just date, like this: 2019-03-03, and I need: 2019-03-03 16:43:05
Thanks!
Full code:
INSERT INTO p_b.este PARTITION (dt_originacao_fcdr)
SELECT
tp_registro_fcdr,
seq_registro_fcdr,
tp_cdr_fcdr,
dt_atendimento_fcdr,
data_atendimento_completa_fcdr,
cast(from_unixtime(unix_timestamp(data_atendimento_completa_fcdr, 'yyyyMMddHHmmss'),"yyyy-MM-dd HH:mm:ss")as timestamp) as date_data_atendimento_fcdr,
hr_atendimento_fcdr,
duracao_atend_fcdr,
hr_originacao_fcdr,
duracao_total_fcdr,
duracao_chamada_tarifada_fcdr,
st_chamada_fcdr,
fim_sel_orig_fcdr
FROM p_b.norm;
Remove date casting and to_date functions as you are expecting timestamp!
Example:
hive> select from_unixtime(unix_timestamp('20190303164305', 'yyyyMMddHHmmss'),"yyyy-MM-dd HH:mm:ss") as date_data_chamada;
RESULT:
2019-03-03 16:43:05
If you use to_date or cast('string' as date) then hive results only date(yyyy-MM-dd)!
Ex:
hive> select to_date(from_unixtime(unix_timestamp('20190303164305', 'yyyyMMddHHmmss'),"yyyy-MM-dd HH:mm:ss")) as date_data_chamada;
--2019-03-03
Pass the second argument format string to from_unixtime. Note that the returned type is string.
from_unixtime(unix_timestamp('20190303164305','yyyyMMddHHmmss'),'yyyy-MM-dd HH:mm:ss')

Ecto/Elixir, How can I query by date?

I am working on statistics page of my app and trying to query data by date.
To get the date range, I use Calendar.Date
date_range = Date.days_after_until(start_date, end_date, true)
|> Enum.to_list
And it returns date list of dates and each date looks like "2017-04-07". So with the date I got from date_range, I tried to query but it triggers an error like below.
where cannot be cast to type Ecto.DateTime in query: from o in Myapp.Order,
where: o.created_date >= ^~D[2017-04-07]
For created_date field of Order, I made field like this,
field :created_date, Ecto.DateTime.
If I want to query by date, how can I query it?
Thank in advance.
It looks like you're trying to compare a date and datetime. You need to cast one of them to the other type so the comparison works, e.g. convert the datetime in the database to a date:
date = ~D[2017-01-01]
from p in Post, where: fragment("?::date", p.inserted_at) >= ^date
or convert the Elixir Date to NaiveDateTime:
{:ok, datetime} = NaiveDateTime.new(date, ~T[00:00:00])
from p in Post, where: p.inserted_at >= ^datetime
If you have a start and end date, you just need to add an and to either. You don't need to generate the whole list of dates using any library.
from p in Post,
where: fragment("?::date", p.inserted_at) >= ^start_date and
fragment("?::date", p.inserted_at) <= ^end_date
or
from p in Post,
where: p.inserted_at >= ^start_datetime and
p.inserted_at <= ^end_datetime

Reformat dates in column

I have some data in an SQLite DB of the form:
id column1 date
111 280 1/1/2014
114 275 1/2/2014
The date field is of type TEXT. I've been made aware (https://www.sqlite.org/lang_datefunc.html) that I should have the dates formatted like YYYY-MM-DD to take advantage of SQLite's datetime functionality. Is there a query I could run to change the format from
mm/dd/yyyy
to
YYYY-MM-DD
in place?
Your current date format has four possible forms:
m/d/yyyy
m/dd/yyyy
mm/d/yyyy
mm/dd/yyyy
To rearrange the fields, extract them with substr() and then combine them again.
It might be possible to determine the positions of the slashes with instr(), but for a one-off conversion, just using four queries is simpler:
UPDATE MyTable
SET date = substr(date, 6, 4) || '-' ||
substr(date, 1, 2) || '-' || '0' ||
substr(date, 4, 1)
WHERE date LIKE '__/_/____';
-- this is mm/d/yyyy; similarly for the other forms, modify positions and zeros
Without any frills such as exception handling!
This approach is slightly simpler because strptime doesn't mind about presence or absence of leading zeroes in days and months.
>>> from datetime import datetime
>>> import sqlite3
>>> con = sqlite3.connect(':memory:')
>>> cur = con.cursor()
>>> cur.execute('CREATE TABLE EXAMPLE (date_column text)')
<sqlite3.Cursor object at 0x00000000038D07A0>
>>> cur.execute('INSERT INTO EXAMPLE VALUES ("1/1/2014")')
<sqlite3.Cursor object at 0x00000000038D07A0>
>>> def transformDate(aDate):
... tempDate = datetime.strptime(aDate, '%d/%m/%Y')
... return tempDate.strftime('%Y-%m-%d')
...
>>> transformDate('1/1/2014')
'2014-01-01'
>>> con.create_function('transformDate', 1, transformDate)
>>> cur.execute('UPDATE EXAMPLE SET date_column = transformDate(date_column)')
<sqlite3.Cursor object at 0x00000000038D07A0>

how to get objects which were created today ? (Django-RestFramework)

Field of Model :
time = models.DateTimeField()
How to get objects which were created (only) today (from 00:00:00 to 23:59:59)
like:
objects = Model.objects.filter(time__gt=?????????)
or ?
Thanks
You can use datetime.date.today() to get the current date and then filter objects based on today's date.
You can do something like:
import datetime
today = datetime.date.today() # date representing today's date
qs = MyModel.objects.filter(time__gt=today) # filter objects created today
Here, qs represents the objects which were created today.
Another solution is to use range which is used to perform lookup between two dates.
Here, start_date represents 00:00:00 and end_date represents 23:59:59.
import datetime
today = datetime.datetime.today()
start_date = datetime.datetime(year=today.year, month=today.month, day=today.day, hour=0, minute=0, second=0) # represents 00:00:00
end_date = datetime.datetime(year=today.year, month=today.month, day=today.day, hour=23, minute=59, second=59) # represents 23:59:59
qs = MyModel.objects.filter(time__range=(start_date, end_date)) # today's objects
Since you are using Django Rest Framework, you might need to override the get_queryset() method in your view and return the queryset containing the objects that were created today.
class MyView(..):
def get_queryset(self):
..
return qs # return the queryset created using the above logic
import datetime
today = datetime.date.today()
qs = MyModel.objects.filter(time__date=today)

Python TimeDelta Add Day to Supplied Argument

Not sure how to approach this one.
User supplies an argument, ie, program.exe '2001-08-12'
I need to add a single day to that argument - this will represent a date range for another part of the program. I am aware that you can add or subtract from the current day but how does one add or subtract from a user supplied date?
import datetime
...
date=time.strptime(argv[1], "%y-%m-%d");
newdate=date + datetime.timedelta(days=1)
Arnauds Code is valid,Just see how to use it :) :-
>>> import datetime
>>> x=datetime.datetime.strptime('2001-08-12','%Y-%m-%d')
>>> newdate=x + datetime.timedelta(days=1)
>>> newdate
datetime.datetime(2001, 8, 13, 0, 0)
>>>
Okay, here's what I've got:
import sys
from datetime import datetime
user_input = sys.argv[1] # Get their date string
year_month_day = user_input.split('-') # Split it into [year, month, day]
year = int(year_month_day[0])
month = int(year_month_day[1])
day = int(year_month_day[2])
date_plus_a_day = datetime(year, month, day+1)
I understand this is a little long, but I wanted to make sure each step was clear. I'll leave shortening it up to you if you want it shorter.

Resources