Separating date and time from timestamp - r

I have looked at different options from previous answers, but none has given me the correct output.
I would like to separate timestamp into date and time using R
sorted_transactions_table$TRANSACTION_DATE <- as.Date(sorted_transactions_table$TRANSACTION_TIME)
I have tried this but I get an error:
Error in charToDate(x) : character string is not in a standard
unambiguous format
Timestamp from my dataset is in the format:
01-OCT-18 12.01.23.000000 AM

Convert it into standard datetime format first and then use format
df$TRANSACTION_DATE <- as.POSIXct(df$TRANSACTION_DATE,
format = "%d-%b-%y %H.%M.%OS %p")
transform(df, Date = as.Date(TRANSACTION_DATE),
#Also Date = format(TRANSACTION_DATE, "%Y-%m-%d") would work
time = format(TRANSACTION_DATE, "%T"))
# col1 TRANSACTION_DATE Date time
#1 1 2018-10-01 12:01:23 2018-10-01 12:01:23
#2 2 2018-10-01 12:02:23 2018-10-01 12:02:23
#3 3 2018-10-01 12:03:23 2018-10-01 12:03:23
You could also do this in dplyr chain
library(dplyr)
df %>%
mutate(TRANSACTION_DATE = as.POSIXct(TRANSACTION_DATE,
format = "%d-%b-%y %H.%M.%OS %p"),
Date = as.Date(TRANSACTION_DATE),
time = format(TRANSACTION_DATE, "%T"))
Read ?strptime for all formatting options.
data
Using a reproducible example
df <- data.frame(col1 = 1:3, TRANSACTION_DATE = c("01-OCT-18 12.01.23.000000 AM",
"01-OCT-18 12.02.23.000000 AM", "01-OCT-18 12.03.23.000000 AM"))
df
# col1 TRANSACTION_DATE
#1 1 01-OCT-18 12.01.23.000000 AM
#2 2 01-OCT-18 12.02.23.000000 AM
#3 3 01-OCT-18 12.03.23.000000 AM

I would use the lubridate package:
library(lubridate)
library(dplyr)
df %>%
mutate(TRANSACTION_DATE = dmy_hms(TRANSACTION_DATE),
Date = date(TRANSACTION_DATE),
time = format(TRANSACTION_DATE, "%T"))

Related

How to change Time format in R?

I have added a new column RIDE_LENGTH using mutate function as follows.
df2 <- mutate(df2, RIDE_LENGTH = (ENDED_AT - STARTED_AT)
ENDED AT & STARTED AT is in HH:MM:SS format, but my new column is showing the result in seconds only
example : 12:05:00 - 12:03:00 = 120 secs.
I need the answer to be in the same format as 00:02:00.
If anyone can tell me how to do that would be a great help.
You can use
library(lubridate)
RIDE_LENGTH <- seconds_to_period(RIDE_LENGTH)
There are a few ways in the lubridate package, depending on your desired output. Take your pick:
library(dplyr)
df <- data.frame(
STARTED_AT = as.POSIXct("2022-06-06 12:03:00 UTC"),
ENDED_AT = as.POSIXct("2022-06-06 12:05:00 UTC")
)
df |>
mutate(
RIDE_LENGTH_base = ENDED_AT - STARTED_AT,
RIDE_LENGTH_lubridate_difftime = lubridate::as.difftime(ENDED_AT - STARTED_AT),
RIDE_LENGTH_period = lubridate::as.period(ENDED_AT - STARTED_AT),
RIDE_LENGTH_duration = lubridate::as.duration(ENDED_AT - STARTED_AT)
)
# STARTED_AT ENDED_AT RIDE_LENGTH_base RIDE_LENGTH_lubridate_difftime RIDE_LENGTH_period RIDE_LENGTH_interval
# 1 2022-06-06 12:03:00 2022-06-06 12:05:00 2 mins 2 mins 2M 0S 120s (~2 minutes)

R: readxl and date format

I read in an excel file, where 1 column contains dates in different format: excel format (e.g. 43596) and text (e.g. "01.01.2020").
To convert excel format one can use as.Date(as.numeric(df$date), origin = "1899-12-30")
to convert text one can use as.Date(df$date, format = "%d.%m.%Y")
These work for individual values, but when I try ifelse as:
df$date <- ifelse(length(df$date)==5,
as.Date(as.numeric(df$date), origin = "1899-12-30"),
as.Date(df$date, format = "%d.%m.%Y"))
or a for loop:
for (i in length(x)) {
if(nchar(x[i])==5) {
y[i] <- as.Date(as.numeric(x[i]), origin = "1899-12-30")
} else {x[i] <- as.Date(x[i], , format = "%d.%m.%Y"))}
} print(x)
It does not work because of:
"character string is not in a standard unambiguous format"
Maybe you could advice a better solution to convert/ replace different date formats in the appropriate one?
I have 2 solutions for it.
Changing the code, which I don't like because you are depending on xlsx date formats:
> df <- tibble(date = c("01.01.2020","43596"))
>
> df$date <- as.Date(ifelse(nchar(df$date)==5,
+ as.Date(as.numeric(df$date), origin = "1899-12-30"),
+ as.Date(df$date, format = "%d.%m.%Y")), origin = "1970-01-01")
Warning message:
In as.Date(as.numeric(df$date), origin = "1899-12-30") :
NAs introducidos por coerción
>
> df$date
[1] "2020-01-01" "2019-05-11"
>
Save the document as CSV and use read_csv() function from readr package. That solves everything !!!!
You could use sapply to apply ifelse to each value:
df$date <- as.Date(sapply(df$date,function(date) ifelse(nchar(date)==5,
as.Date(as.numeric(date), origin = "1899-12-30"),
as.Date(date, format = "%d.%m.%Y"))),
origin="1970-01-01")
df
# A tibble: 6 x 2
contract date
<dbl> <date>
1 231429 2019-05-11
2 231437 2020-01-07
3 231449 2021-01-01
4 231459 2020-03-03
5 231463 2020-10-27
6 231466 2011-03-17
A tidyverse solution using rowwise
library(dplyr)
library(lubridate)
df %>%
rowwise() %>%
mutate(date_new=as.Date(ifelse(grepl("\\.",date),
as.character(dmy(date)),
as.character(as.Date(as.numeric(date), origin="1899-12-30"))))) %>%
ungroup()
# A tibble: 6 × 3
contract date date_new
<dbl> <chr> <date>
1 231429 43596 2019-05-11
2 231437 07.01.2020 2020-01-07
3 231449 01.01.2021 2021-01-01
4 231459 03.03.2020 2020-03-03
5 231463 44131 2020-10-27
6 231466 40619 2011-03-17

Change the format of date into default format and add weekday column to the respective timestamps

I have column called trip_start_timestamp with values like "01/23/2020 03:00:00 PM" and column datatype is factor. I am looking to have the column values as "2020/01/23 15:00:00" and the weekday for the specific value like "thursday".
You can use mdy_hms from lubridate to get data into POSIXct and use weekdays to get day of the week.
library(dplyr)
library(lubridate)
df %>%
mutate(trip_start_timestamp = mdy_hms(trip_start_timestamp),
weekday = weekdays(trip_start_timestamp))
# trip_start_timestamp weekday
#1 2020-01-23 15:00:00 Thursday
#2 2020-01-25 01:00:00 Saturday
In base R :
df$trip_start_timestamp <- as.POSIXct(df$trip_start_timestamp,
format = '%m/%d/%Y %I:%M:%S %p', tz = 'UTC')
df$weekday <- weekdays(df$trip_start_timestamp)
df
data
df <- data.frame(trip_start_timestamp = factor(c("01/23/2020 03:00:00 PM",
"01/25/2020 01:00:00 AM")))

Mutate and format multiple date columns [duplicate]

This question already has an answer here:
Convert multiple character columns to as.Date and time in R
(1 answer)
Closed 2 years ago.
I have a tibble containing some date columns formatted as strings:
library(tidyverse)
df<-tibble(dates1 = c("2020-08-03T00:00:00.000Z", "2020-08-03T00:00:00.000Z"),
dates2 = c("2020-08-05T00:00:00.000Z", "2020-08-05T00:00:00.000Z"))
I want to convert the strings from YMD-HMS to DMY-HMS. Can someone explain to me why this doesn't work:
df %>%
mutate_at(vars(starts_with("dates")), as.Date, format="%d/%m/%Y %H:%M:%S")
Whereas this does?
df %>% mutate(dates1 = format(as.Date(dates1), "%d/%m/%Y %H:%M:%S")) %>%
mutate(dates2 = format(as.Date(dates2), "%d/%m/%Y %H:%M:%S"))
Finally, is it possible to assign these columns as 'datetime' columns (e.g. dttm) rather than chr once the date formatting has taken place?
The format argument which you are passing is for as.Date whereas what you really want is to pass it for format function. You can use an anonymous function for that or use formula syntax.
library(dplyr)
df %>%
mutate(across(starts_with("dates"), ~format(as.Date(.), "%d/%m/%Y %H:%M:%S")))
# A tibble: 2 x 2
# dates1 dates2
# <chr> <chr>
#1 03/08/2020 00:00:00 05/08/2020 00:00:00
#2 03/08/2020 00:00:00 05/08/2020 00:00:00
To represent data as date or datetime R uses standard way of representing them which is Y-M-D H:M:S, you can change the representation using format but then the output would be character as above.
df %>%
mutate(across(starts_with("dates"), lubridate::ymd_hms))
# dates1 dates2
# <dttm> <dttm>
#1 2020-08-03 00:00:00 2020-08-05 00:00:00
#2 2020-08-03 00:00:00 2020-08-05 00:00:00

R converting a factor YYYY-MM to a date

I have a dataframe with a date in the form YYYY-MM, class factor and I am trying to convert it to class date.
I tried:
Date <- c("2015-08","2015-09","2015-08")
Val <- c(1,2,3)
df <- data.frame(Date,Val)
df[,1] <- as.POSIXct(as.character(df[,1]), format = "%Y-%m") 
df
But this does not work. I would be grateful for your help.
1) Convert the dates to zoo's "yearmon" class and then to "Date" class:
> library(zoo)
> transform(df, Date = as.Date(as.yearmon(Date)))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
The question did not specify which date to convert to so we used the first of the month. Had the last of the month been wanted we could have used this instead:
transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
2) Another possibility not using zoo is to just add the day of the month yourself and then convert to "Date" class.
> transform(df, Date = paste(Date, 1, sep = "-"))
Date Val
1 2015-08-01 1
2 2015-09-01 2
3 2015-08-01 3
3) Alternately, might want to just use "yearmon" directly since that directly models year and month with no day.
> library(zoo)
> transform(df, Date = as.yearmon(Date))
Date Val
1 Aug 2015 1
2 Sep 2015 2
3 Aug 2015 3
Note: Do not use "POSIXct" class as this gives a time zone dependent result that can cause subtle errors if you are not careful. A date in one time zone is not necessarily the same as in another time zone.
R does not support Dates in the format "%Y-%m"... A day is needed
You can do the following:
as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
Resulting in
"2015-08-01 CEST" "2015-09-01 CEST" "2015-08-01 CEST"

Resources