Date Time Conversions in PySpark - datetime

Can someone please explain me how the below epoch time
epoch time/unix-timestamp :1668443121840
converts to the date : 2022-11-14T16:25:21.840+0000
How is the conversion taking place and additionally how to identify an epoch timestamp if it is mentioned in seconds, milliseconds, microseconds or nanoseconds?
Additionally, is there a function in pyspark to convert the date back to epoch timestamp?
Thanks! in advance.
I tried a number of methods but I am not achieving the expected result:
t = datetime.datetime.strptime('2021-11-12 02:12:23', '%Y-%m-%d %H:%M:%S')
print(t.strftime('%s'))
As I am not able to control the format or accuracy in terms of seconds, milliseconds, microseconds or nanoseconds.

The epoch time/unix-timestamp uses a reference date: 00:00:00 UTC on 1 January 1970. It counts the seconds/milliseconds from that date.
The value you are looking for is in miliseconds, so you would have to calculate the milliseconds and concatenate with the epoch time:
import pyspark.sql.functions as F
df = spark.createDataFrame([('2022-11-14T16:25:21.840+0000',)]).toDF("timestamp")\
df\
.withColumn("timestamp",F.to_timestamp(F.col("timestamp")))\
.withColumn("epoch_seconds",F.unix_timestamp("timestamp"))\
.withColumn("epoch_miliseconds",F.concat(F.unix_timestamp("timestamp"), F.date_format("timestamp", "S")))\
.show(truncate=False)
# +----------------------+-------------+-----------------+
# |timestamp |epoch_seconds|epoch_miliseconds|
# +----------------------+-------------+-----------------+
# |2022-11-14 16:25:21.84|1668443121 |16684431218 |
# +----------------------+-------------+-----------------+

The unix timestamp counts the seconds that have elapsed since 00:00:00 UTC on 1 January 1970.
To convert dates to unix-timestamp in PySpark, you can use the unix_timestamp function.

Related

Convert UTC micros to julia date time

I have a datetime I'm getting from golang that is in unix microseconds (The number of microseconds since January 1st 1970).
1652681499679534
I want to get it into a julia datetime. What is the proper calculation for that?
julia> unix2datetime(1652681499679534 / 10^6)
2022-05-16T06:11:39.680

how to convert epoch time to human readable time in milliseconds

I have a data which has epoch time and I need to extract the human readable time in year, month, day, minutes, seconds, milliseconds.
epoch time before conversion:
1517166673385
After conversion I need it to be in this format:
20180128191113385
I have written the following function and it works well, but it takes a long time. I am searching for a faster function because I have thousands of files to process.
getDTI<- function(echotime){
DTItemp<-as.POSIXct(as.POSIXct(as.numeric(substr(echotime,1,10)), origin="1970-01-01", tz="GMT"), origin="1970-01-01", tz="GMT")
DTI<-paste0(substr(DTItemp,1,4),substr(DTItemp,6,7),substr(DTItemp,9,10),substr(DTItemp,12,13),substr(DTItemp,15,16),substr(DTItemp,18,19),substr(test$STIME,11,13))
return(DTI)
}
a = 1517166673385
paste0(format(as.POSIXct(a/1000,origin="1970-01-01", tz="GMT"),"%Y%m%d%H%M%S"),sprintf("%03d",a%%1000))
[1] "20180128191113385"
in a function form:
fun=function(a){
paste0(format(as.POSIXct(a/1000,origin="1970-01-01",
tz="GMT"),"%Y%m%d%H%M%S"),sprintf("%03d",a%%1000))
}
d=c(1517166673385, 1517701556075)
fun(d)
[1] "20180128191113385" "20180203234556075"

R - UTC to LOCAL time given Olson timezones

I have time series data from 1974-2013 with a column for datetimeUTC (YYYY-MM-DD hh:mm +0000), and a column for the timezones in Olson format (e.g., Canada/Pacific, Canada/Eastern). I can convert the whole UTCdatetime column to a common timezone like this:
dataset$datetimeEST <- strptime(
dataset$datetimeUTC, format="%Y-%m-%d %H:%M:%S%z", tz="Canada/Eastern"
)
How do I convert datetimeUTC to datetimeLOCAL, given the corresponding timezone in each row?
Let me back up a bit. I have data from across the country (6 timezones) formatted in ISO8601 representation for 1974-2013. The timestamps are in local standard time throughout the year (i.e. DST is disregarded even if civilian time in the region observes DST). I need to do datetime calculations which are probably safest to do in UTC time, so that's easy. But, I also need to pull data for specific civil time periods, taking into account DST, and do calculations and plots (e.g., all the data for rush hour at locations across all 6 timezones) for that subsetted data.
The datetimeCLOCKTIME that I calculated below appears to be doing what I want for plotting, but gives the wrong answer when doing datetime calculations because it stored the datetime in the timezone of my local machine without having actually converted the time. The solution offered by #thelatemail is what I'm looking for, but I haven't been able to get it to work in Windows on the test dataset for 2012 (see below). Also, I was using strptime which converts to POXITlt, and his solution is in POXITct. I'm new to R, so any help would be infinitely appreciated.
Test dataset:
dataset <- data.frame (timestampISO8601 = c("2012-04-25T22:00:00-08:00","2012-04-25T22:15:00-08:00","2012-04-25T22:30:00-08:00","2012-04-25T22:45:00-08:00","2012-04-25T23:00:00-08:00","2012-04-25T23:15:00-08:00","2012-04-25T23:30:00-08:00","2012-04-25T23:45:00-08:00","2012-04-26T00:00:00-08:00","2012-04-26T00:15:00-08:00","2012-04-26T00:30:00-08:00","2012-04-26T00:45:00-08:00","2012-04-26T01:00:00-08:00","2012-04-26T01:15:00-08:00","2012-04-26T01:30:00-08:00","2012-04-26T01:45:00-08:00","2012-04-26T02:00:00-08:00","2012-04-25T22:00:00-03:30","2012-04-25T22:15:00-03:30","2012-04-25T22:30:00-03:30","2012-04-25T22:45:00-03:30","2012-04-25T23:00:00-03:30","2012-04-25T23:15:00-03:30","2012-04-25T23:30:00-03:30","2012-04-25T23:45:00-03:30","2012-04-26T00:00:00-03:30","2012-04-26T00:15:00-03:30","2012-04-26T00:30:00-03:30","2012-04-26T00:45:00-03:30","2012-04-26T01:00:00-03:30","2012-04-26T01:15:00-03:30","2012-04-26T01:30:00-03:30","2012-04-26T01:45:00-03:30","2012-04-26T02:00:00-03:30"), olson = c("Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Pacific","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland","Canada/Newfoundland"), value = c(0,0,1,2,5,11,17,19,20,19,17,11,5,2,1,0,0,-3,-3,-2,-1,2,8,14,16,17,16,14,8,2,-1,-2,-3,-3), stringsAsFactors=FALSE)
Remove the ":" from the UTC offset. (R is expecting the format nnnn for the UTC offset):
dataset$timestampR<- paste(substr(dataset$timestampISO8601,1,22),substr(dataset$timestampISO8601,24,25),sep="")
When converting to UTC time, R defaults to -ve for the UTC offset, making -ve offsets in the timestamps positive:
dataset$datetimeUTC <- strptime(dataset$timestampR, format="%Y-%m-%dT%H:%M:%S%z", tz="UTC")
When converting to MACHINE time like this, R reads the input time and converts it to the time in the timezone of the local machine - in my case, this is Canada/Eastern:
dataset$datetimeMACHINE <- strptime(dataset$timestampR, format="%Y-%m-%dT%H:%M:%S%z")
When converting to CLOCKTIME time like this, R reads the input time and assigns the time zone of the local machine (currently EDT on my machine) without doing any time conversions:
dataset$datetimeCLOCKTIME <- strptime(dataset$timestampR,format="%Y-%m-%dT%H:%M:%S")
See the structure of the dataset:
str(dataset)
Plotting behaviours are different
library(ggplot2)
qplot(data=dataset,x=datetimeUTC,y=value)
qplot(data=dataset,x=datetimeMACHINE,y=value)
qplot(data=dataset,x=datetimeCLOCKTIME,y=value)
Calculation results differ. Incorrect calculation result for datetimeCLOCKTIME:
range (dataset$datetimeUTC)
range (dataset$datetimeMACHINE)
range (dataset$datetimeCLOCKTIME)
dataset$datetimeUTC[34] - dataset$datetimeUTC[1]
dataset$datetimeMACHINE[34] - dataset$datetimeMACHINE[1]
dataset$datetimeCLOCKTIME[34] - dataset$datetimeCLOCKTIME[1]
You could format back and forth a bit to get a local time representation in a character format. E.g.:
dataset <- data.frame(
datetimeUTC=c("2014-01-01 00:00 +0000","2014-01-01 00:00 +0000"),
olson=c("Canada/Eastern", "Canada/Pacific"),
stringsAsFactors=FALSE
)
# datetimeUTC olson
#1 2014-01-01 00:00 +0000 Canada/Eastern
#2 2014-01-01 00:00 +0000 Canada/Pacific
dataset$localtime <- with(dataset,
mapply(function(dt,ol) format(
as.POSIXct(dt,"%Y-%m-%d %H:%M %z",tz=ol),
"%Y-%m-%d %H:%M %z"),
datetimeUTC, olson
)
)
# datetimeUTC olson localtime
#1 2014-01-01 00:00 +0000 Canada/Eastern 2013-12-31 19:00 -0500
#2 2014-01-01 00:00 +0000 Canada/Pacific 2013-12-31 16:00 -0800
If you have only two time zones to convert to and know the difference in time between UTC and those two. Using #thelatemail's dataset
transform(dataset,
localtime=as.POSIXct(datetimeUTC, "%Y-%m-%d %H:%M %z")-
c(5*3600,8*3600)[as.numeric(factor(olson))])
# datetimeUTC olson localtime
#1 2014-01-01 00:00 +0000 Canada/Eastern 2013-12-31 19:00:00
#2 2014-01-01 00:00 +0000 Canada/Pacific 2013-12-31 16:00:00

In what date format is 1339698600000 = 15 June 2012?

I am using bootstrap-datepicker and get a value of 1339698600000 for the selected date of 15th June 2012.
What dateformat is this? How do I convert it to human readable format?
Is there any resource where I can find many more formats?
That is the number of milliseconds since January 1, 1970 (POSIX epoch). You can divide it by 1000 to get the number of seconds since epoch which is a standard way to represent time.
It's the number of milliseconds since 1/1/1970. To convert to human readable, Add that many milliseconds to a 1/1/1970 date object.

Converting datetime character string to double value of milliseconds since 1 Jan 1960

I've found out how to convert a Stata datetime format from milliseconds since Jan 1960 in R from a related question (see below):
as.POSIXct(874022400000/1000, origin="1960-01-01")
I am looking to do the opposite in R: i.e. given a datetime expressed as a character string, find out how to return the datetime value as milliseconds since 01 Jan 1960 00:00:00. Any suggestions would be much appreciated.
Use as.numeric to coerce the date-time back into seconds since the epoch. Since R uses 1970 as its origin, you have to additionally account for the 1960-1970 offset. Lastly, of course, take care of the seconds to milliseconds conversion.
> mydate = as.POSIXct(874022400000/1000, origin="1960-01-01")
> 1000 * (as.numeric(mydate) - as.numeric(as.POSIXct('1960-01-01')))
[1] 874022400000

Resources