How to convert a "char" column to datetime column in large datasets - r

I am working with large datasets and in which one column is represented as char data type instead of a DateTime datatype. I trying it convert but I am unable to convert it.
Could you please suggest any suggestions for this problem? it would be very helpful for me
Thanks in advance
code which i am using right now
c_data$dt_1 <- lubridate::parse_date_time(c_data$started_at,"ymd HMS")
getting output:
2027- 05- 20 20:10:03
but desired output is
2020-05-20 10:03

Here is another way using lubridate:
library(lubridate)
df <- tibble(start_at = c("27/05/2020 10:03", "25/05/2020 10:47"))
df %>%
mutate(start_at = dmy_hms(start_at))
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 20:10:03
2 2020-05-25 20:10:47

In R, dates and times have a single format. You can change it's format to your required format but then it would be of type character.
If you want to keep data in the format year-month-day min-sec you can use format as -
format(Sys.time(), '%Y-%m-%d %M:%S')
#[1] "2021-08-27 17:54"
For the entire column you can apply this as -
c_data$dt_2 <- format(c_data$dt_1, '%Y-%m-%d %M:%S')
Read ?strptime for different formatting options.

Using anytime
library(dplyr)
library(anytime)
addFormats("%d/%m/%Y %H:%M")
df %>%
mutate(start_at = anytime(start_at))
-output
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 10:03:00
2 2020-05-25 10:47:00

Related

Transform number chain into date

I am trying to transform a list of numbers (e.g. 20200119) into a valid date (here: 2020-01-19)
This is my trial data:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
And this is what I tried so far:
df <- as_date(df)
df <- as.Date.numeric(df)
df <- as.Date.factor(df)
Neither of them works unfortunately.
I also tried to seperate the numbers, but I couldn't achieve either.
Can somebody help me?
Convert it to a character and convert it then to a Date with given format %Y%m%d:
as.Date(as.character(df$Dates), "%Y%m%d")
#[1] "2020-01-19" "2018-07-18" "2018-07-29" "2015-05-02" "2001-03-01"
Another option using strptime with the right format like this:
df <- data.frame(c(20200119, 20180718, 20180729, 20150502, 20010301))
colnames(df)[1] = "Dates"
df$Dates2 <- strptime(df$Dates, format = "%Y%m%d")
df
#> Dates Dates2
#> 1 20200119 2020-01-19
#> 2 20180718 2018-07-18
#> 3 20180729 2018-07-29
#> 4 20150502 2015-05-02
#> 5 20010301 2001-03-01
Created on 2023-01-12 with reprex v2.0.2

Automatically convert formats of of date-time data to date only in r

I have a dataframe that has been put together by binding the data together after reading in multiple .csv files. The data comprises 6 variables and approx. 560,000 observations.
One variable 'date.time' unfortunately is currently in two formats dd/mm/yyyy hh:mm:ss and dd/mm/yy hh:mm. What I would like to do is mutate() the variable to a date only format.
I have tried df %>% mutate(date = as.Date(dmy_hms(date.time)) but I get an error failed to parse as you would expect given I have two date/time formats in the same column.
Another way I have tried is df %>% mutate(date = anydate(date.time)) using anydate() from the anytime package, but this is far too slow and the CPU environment I'm working in uses all available memory given the size of the dataframe.
I'm hoping there is a swift and easy way of addressing this.
Thanks.
How about this:
library(tidyverse)
library(lubridate)
``` r
library(tidyverse)
library(lubridate)
df %>%
mutate(time_temp = dmy_hms(time, quiet = TRUE)) %>%
mutate(time = if_else(is.na(time_temp),
dmy_hm(time, quiet = TRUE),
time_temp)) %>%
select(-time_temp)
#> # A tibble: 4 x 1
#> time
#> <dttm>
#> 1 2020-01-01 00:00:01
#> 2 2020-01-02 00:01:01
#> 3 2020-01-04 00:02:00
#> 4 2020-01-03 01:02:00
reprex data
df <- tibble(
time = c("01/01/2020 00:00:01", "02/01/2020 00:01:01", "04/01/20 00:02", "03/01/20 01:02")
)

Mutate and format multiple date columns [duplicate]

This question already has an answer here:
Convert multiple character columns to as.Date and time in R
(1 answer)
Closed 2 years ago.
I have a tibble containing some date columns formatted as strings:
library(tidyverse)
df<-tibble(dates1 = c("2020-08-03T00:00:00.000Z", "2020-08-03T00:00:00.000Z"),
dates2 = c("2020-08-05T00:00:00.000Z", "2020-08-05T00:00:00.000Z"))
I want to convert the strings from YMD-HMS to DMY-HMS. Can someone explain to me why this doesn't work:
df %>%
mutate_at(vars(starts_with("dates")), as.Date, format="%d/%m/%Y %H:%M:%S")
Whereas this does?
df %>% mutate(dates1 = format(as.Date(dates1), "%d/%m/%Y %H:%M:%S")) %>%
mutate(dates2 = format(as.Date(dates2), "%d/%m/%Y %H:%M:%S"))
Finally, is it possible to assign these columns as 'datetime' columns (e.g. dttm) rather than chr once the date formatting has taken place?
The format argument which you are passing is for as.Date whereas what you really want is to pass it for format function. You can use an anonymous function for that or use formula syntax.
library(dplyr)
df %>%
mutate(across(starts_with("dates"), ~format(as.Date(.), "%d/%m/%Y %H:%M:%S")))
# A tibble: 2 x 2
# dates1 dates2
# <chr> <chr>
#1 03/08/2020 00:00:00 05/08/2020 00:00:00
#2 03/08/2020 00:00:00 05/08/2020 00:00:00
To represent data as date or datetime R uses standard way of representing them which is Y-M-D H:M:S, you can change the representation using format but then the output would be character as above.
df %>%
mutate(across(starts_with("dates"), lubridate::ymd_hms))
# dates1 dates2
# <dttm> <dttm>
#1 2020-08-03 00:00:00 2020-08-05 00:00:00
#2 2020-08-03 00:00:00 2020-08-05 00:00:00

R lubridate as_date does not convert datetime to date [duplicate]

This question already has an answer here:
Convert factor to date class for multiple columns
(1 answer)
Closed 2 years ago.
I read in an array from Excel using read_excel, and get two datetime columns, but what I need is two columns of dates
User DOB Answer_dt Question Answer
<chr> <dttm> <dttm> <int> <int>
1 User1 1900-01-01 00:00:00 2017-01-26 00:00:00 1 7
2 User2 1900-01-01 00:00:00 2017-01-26 00:00:00 2 8
I would like the datetime columns to be converted to dates (the times are irrelevant), and have tried using mutate and lubridate in various combinations, but have succeeded only in getting an error message that I don't understand:
> library(lubridate)
> dt <- eML_daily[1, "DOB"]
> dt
# A tibble: 1 x 1
DOB
<dttm>
1 1900-01-01 00:00:00
Warning message:
`...` is not empty.
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
> as_date(dt)
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
> as_date(df[,"DOB"])
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
I don't understand the warning messages, and can't quite see what I am doing wrong. Surely it should be a simple matter to convert from dttm to date and discard the time, which I don't need.
I'd be very appreciative for a pointer.
Sincerely and with many thanks in advance
Thomas Philips
In as_date(dt) you are attempting to convert a tibble to a datetime. That unsurprisingly fails. In as_date(df[,"DOB"]), I can't say what you are trying to do as you haven't given us df.
Working example;
library(tidyverse)
library(lubridate)
dt <- tibble(x=as_datetime("2017-01-26 00:00:00"))
dt
# A tibble: 1 x 1
x
<dttm>
1 2017-01-26 00:00:00
dt %>% mutate(x=as_date(x))
# A tibble: 1 x 1
x
<date>
1 2017-01-26
You can use as.Date to convert date-time columns to date.
If you want to change columns 2 and 3 to date, you can do.
eML_daily[2:3] <- lapply(eML_daily[2:3], as.Date)
Or with dplyr :
library(dplyr)
eML_daily %>% mutate(across(2:3, as.Date))
#For dplyr < 1.0.0
#eML_daily %>% mutate_at(2:3, as.Date)
Have you tried to convert it to character first?
Here's a quick sample:
x <- tibble(dt = c(Sys.time(),Sys.time() - 345767)) %>%
mutate(dt = as_date(as.character(dt)))

R aggregate a dataframe by hours from a date with time field

I'm relatively new to R but I am very familiar with Excel and T-SQL.
I have a simple dataset that has a date with time and a numeric value associated it. What I'd like to do is summarize the numeric values by-hour of the day. I've found a couple resources for working with time-types in R but I was hoping to find a solution similar to is offered excel (where I can call a function and pass-in my date/time data and have it return the hour of the day).
Any suggestions would be appreciated - thanks!
library(readr)
library(dplyr)
library(lubridate)
df <- read_delim('DateTime|Value
3/14/2015 12:00:00|23
3/14/2015 13:00:00|24
3/15/2015 12:00:00|22
3/15/2015 13:00:00|40',"|")
df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S")))) %>%
group_by(hour_of_day) %>%
summarise(meanValue = mean(Value))
breakdown:
Convert column of DateTime (character) into formatted time then use hour() from lubridate to pull out just that hour value and put it into new column named hour_of_day.
> df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S"))))
Source: local data frame [4 x 3]
DateTime Value hour_of_day
1 3/14/2015 12:00:00 23 12
2 3/14/2015 13:00:00 24 13
3 3/15/2015 12:00:00 22 12
4 3/15/2015 13:00:00 40 13
The group_by(hour_of_day) sets the groups upon which mean(Value) is computed in the via the summarise(...) call.
this gives the result:
hour_of_day meanValue
1 12 22.5
2 13 32.0

Resources