Split Date and time variable with sparklyr - r

I'm trying to deal with a date and time variable (dttm) in a spark data frame. I'm using sparklyr and dplyr. Here is my issue...
Each row of the column in question is in this format:
2018-06-11 22:06:45
I want to split this date and time column (dttm) into two columns :
the first one with the date : 2018-06-11 (yyyy-mm-dd)
the second one with the time : 22:06:45 (hh:mm:ss)
So in the first place, I used regexp_replace and mutate to create the time column :
spark_df %>% mutate(time = regexp_replace(date_and_time, "^[^_]* ", ""))
Here is what I obtain in my new column "time":
00:06:45
So the code is nearly working, the only issue is that the two first digit are converting in 00.

Maybe this could be a good starting point if it doesn't solve your problem.
dates <- data.frame(date =
c("2018-06-11 22:06:45", "2018-06-11 22:07:45", "2019-06-11 22:06:45"))
tbl <- copy_to(sc, dates)
tbl %>% mutate(new_date = as.POSIXct(date)) %>%
mutate(day = as.Date(new_date),
time = paste0(hour(new_date), ":", minute(new_date), ":",
second(new_date)))
# date new_date day time
# <chr> <dttm> <date> <chr>
# 1 2018-06-11 22:06:45 2018-06-11 12:06:45 2018-06-11 22:6:45
# 2 2018-06-11 22:07:45 2018-06-11 12:07:45 2018-06-11 22:7:45
# 3 2019-06-11 22:06:45 2019-06-11 12:06:45 2019-06-11 22:6:45

Related

changing date/time variable to time that starts at 00:00:00 in r

I'm looking for a simple and correct way to change the date/time (POSIXct) format into a time that starts at 00:00:00.
I couldn't find an answer to this in R language, but if I overlooked one, please tell me :)
So I have this :
date/time
v1
2022-02-16 15:07:15
38937
2022-02-16 15:07:17
39350
And I would like this :
time
v1
00:00:00
38937
00:00:02
39350
Can somebody help me with this?
Thanks :)
You can calculate the difference between the two datetimes in seconds, and add i to a random date starting at "00:00:00", before formatting it to only including the time. See the time column in the reprex underneath:
library(dplyr)
ibrary(lubridate)
df %>%
mutate(
date = lubridate::ymd_hms(date),
seconds = as.numeric(date - first(date)),
time = format(
lubridate::ymd_hms("2022-01-01 00:00:00") + seconds,
format = "%H:%M:%S"
)
)
#> # A tibble: 2 × 4
#> date v1 seconds time
#> <dttm> <dbl> <dbl> <chr>
#> 1 2022-02-16 15:07:15 38937 0 00:00:00
#> 2 2022-02-16 15:07:17 39350 2 00:00:02
Created on 2022-03-30 by the reprex package (v2.0.1)
Note that this will be misleading if you ever have over 24 hours between two datetimes. In these cases you should probably include the date.
Data
df <- tibble::tribble(
~date, ~v1,
"2022-02-16 15:07:15", 38937,
"2022-02-16 15:07:17", 39350
)
You can deduct all date/time with the first record of date/time, and change the result to type of time by the hms() function in the hms package.
library(dplyr)
library(hms)
df %>%
mutate(`date/time` = hms::hms(as.numeric(as.POSIXct(`date/time`) - as.POSIXct(first(`date/time`)))))
date/time v1
1 00:00:00 38937
2 00:00:02 39350
Note that in this method, even if the time difference is greater than 1 day, it'll be reflected in the result, for example:
df <- read.table(header = T, check.names = F, sep = "\t", text = "
date/time v1
2022-02-16 15:07:15 38937
2022-02-18 15:07:17 39350")
df %>%
mutate(`date/time` = hms::hms(as.numeric(as.POSIXct(`date/time`) - as.POSIXct(first(`date/time`)))))
date/time v1
1 00:00:00 38937
2 48:00:02 39350

How to convert a "char" column to datetime column in large datasets

I am working with large datasets and in which one column is represented as char data type instead of a DateTime datatype. I trying it convert but I am unable to convert it.
Could you please suggest any suggestions for this problem? it would be very helpful for me
Thanks in advance
code which i am using right now
c_data$dt_1 <- lubridate::parse_date_time(c_data$started_at,"ymd HMS")
getting output:
2027- 05- 20 20:10:03
but desired output is
2020-05-20 10:03
Here is another way using lubridate:
library(lubridate)
df <- tibble(start_at = c("27/05/2020 10:03", "25/05/2020 10:47"))
df %>%
mutate(start_at = dmy_hms(start_at))
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 20:10:03
2 2020-05-25 20:10:47
In R, dates and times have a single format. You can change it's format to your required format but then it would be of type character.
If you want to keep data in the format year-month-day min-sec you can use format as -
format(Sys.time(), '%Y-%m-%d %M:%S')
#[1] "2021-08-27 17:54"
For the entire column you can apply this as -
c_data$dt_2 <- format(c_data$dt_1, '%Y-%m-%d %M:%S')
Read ?strptime for different formatting options.
Using anytime
library(dplyr)
library(anytime)
addFormats("%d/%m/%Y %H:%M")
df %>%
mutate(start_at = anytime(start_at))
-output
# A tibble: 2 x 1
start_at
<dttm>
1 2020-05-27 10:03:00
2 2020-05-25 10:47:00

Automatically convert formats of of date-time data to date only in r

I have a dataframe that has been put together by binding the data together after reading in multiple .csv files. The data comprises 6 variables and approx. 560,000 observations.
One variable 'date.time' unfortunately is currently in two formats dd/mm/yyyy hh:mm:ss and dd/mm/yy hh:mm. What I would like to do is mutate() the variable to a date only format.
I have tried df %>% mutate(date = as.Date(dmy_hms(date.time)) but I get an error failed to parse as you would expect given I have two date/time formats in the same column.
Another way I have tried is df %>% mutate(date = anydate(date.time)) using anydate() from the anytime package, but this is far too slow and the CPU environment I'm working in uses all available memory given the size of the dataframe.
I'm hoping there is a swift and easy way of addressing this.
Thanks.
How about this:
library(tidyverse)
library(lubridate)
``` r
library(tidyverse)
library(lubridate)
df %>%
mutate(time_temp = dmy_hms(time, quiet = TRUE)) %>%
mutate(time = if_else(is.na(time_temp),
dmy_hm(time, quiet = TRUE),
time_temp)) %>%
select(-time_temp)
#> # A tibble: 4 x 1
#> time
#> <dttm>
#> 1 2020-01-01 00:00:01
#> 2 2020-01-02 00:01:01
#> 3 2020-01-04 00:02:00
#> 4 2020-01-03 01:02:00
reprex data
df <- tibble(
time = c("01/01/2020 00:00:01", "02/01/2020 00:01:01", "04/01/20 00:02", "03/01/20 01:02")
)

Mutate and format multiple date columns [duplicate]

This question already has an answer here:
Convert multiple character columns to as.Date and time in R
(1 answer)
Closed 2 years ago.
I have a tibble containing some date columns formatted as strings:
library(tidyverse)
df<-tibble(dates1 = c("2020-08-03T00:00:00.000Z", "2020-08-03T00:00:00.000Z"),
dates2 = c("2020-08-05T00:00:00.000Z", "2020-08-05T00:00:00.000Z"))
I want to convert the strings from YMD-HMS to DMY-HMS. Can someone explain to me why this doesn't work:
df %>%
mutate_at(vars(starts_with("dates")), as.Date, format="%d/%m/%Y %H:%M:%S")
Whereas this does?
df %>% mutate(dates1 = format(as.Date(dates1), "%d/%m/%Y %H:%M:%S")) %>%
mutate(dates2 = format(as.Date(dates2), "%d/%m/%Y %H:%M:%S"))
Finally, is it possible to assign these columns as 'datetime' columns (e.g. dttm) rather than chr once the date formatting has taken place?
The format argument which you are passing is for as.Date whereas what you really want is to pass it for format function. You can use an anonymous function for that or use formula syntax.
library(dplyr)
df %>%
mutate(across(starts_with("dates"), ~format(as.Date(.), "%d/%m/%Y %H:%M:%S")))
# A tibble: 2 x 2
# dates1 dates2
# <chr> <chr>
#1 03/08/2020 00:00:00 05/08/2020 00:00:00
#2 03/08/2020 00:00:00 05/08/2020 00:00:00
To represent data as date or datetime R uses standard way of representing them which is Y-M-D H:M:S, you can change the representation using format but then the output would be character as above.
df %>%
mutate(across(starts_with("dates"), lubridate::ymd_hms))
# dates1 dates2
# <dttm> <dttm>
#1 2020-08-03 00:00:00 2020-08-05 00:00:00
#2 2020-08-03 00:00:00 2020-08-05 00:00:00

R Dplyr and string values, how to split and get the second element? vapply/sapply

Been having difficulty with this one data frame manipulation in R.
I have two columns for well height and a date-time string ("yyyy-mm-dd HH:MM:ss").
I would like to extract all the rows from this table that occur at midnight (00:00:00).
I could manipulate this table in seconds with python, but I want to figure it out in R using strsplit() instead of POSIXct.
How do I mutate the table so that I split the date-time string and extract just the time value into a new column?
I think the answer is in vapply, but I have been drenching myself in manuals the last couple weeks and still can't figure it out.
Welcome to SO. it can be done in multiple ways. Try this:
## some data
df <- data.frame(height=c(11,12),time = c("1999-9-9 00:00:00","1999-9-9 00:00:02"),stringsAsFactors = FALSE)
df
#> height time
#> 1 11 1999-9-9 00:00:00
#> 2 12 1999-9-9 00:00:02
## In base R
df2<- df
df2$hms <- do.call(rbind,strsplit(df2$time," "))[,2]
df2[df2$hms=="00:00:00",]
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00
## In tidyverse
library(dplyr)
df3 <- df %>%
mutate(hms = gsub(".*(..:..:..).*","\\1",time)) %>%
filter(hms == "00:00:00")
df3
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00
Created on 2018-10-04 by the reprex package (v0.2.1)
You don't provide an example, so here is my guess:
Let's say you have a character vector (could be a column):
dateTimes <- c("1999-01-01 11:11:11", "1999-01-01 12:12:12", "1999-01-01 13:13:13")
You extract the times in the end:
ans <- sub(".*-\\d+\\s", "", dateTimes, perl = T)
#[1] "11:11:11" "12:12:12" "13:13:13"
Save them into a new variable or column:
When you want to extract rows that occur at 00:00:00 simply use a string comparison and subset your data:
df1[ans == "00:00:00",]

Resources