Adding a new column and calculating ride length from start to finish - r

I'm trying to add a new column to my data set and calculate the ride_length from start to finish.
Example of what glimpse returns:
$ started_at <chr> "23/01/2021 16:14",
$ ended_at <chr> "23/01/2021 16:24",
My code:
data_trip_cleaned$ride_length <- difftime(data_trip_cleaned$started_at,data_trip_cleaned$ended_at,units = "mins")
Error:
Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format

Your error suggests difftime can't interpret the format of your date/time automatically. From ?difftime:
"Function difftime calculates a difference of two date/time objects
and returns an object of class difftime with an attribute indicating
the units."
Are your started_at and ended_at class = datetime? If not, look at ?as.POSIXct. Confirm this works like you are expecting:
as.POSIXct("23/01/2021 16:24", format = "%d/%m/%Y %H:%M")
# "2021-01-23 16:24:00 EST"
For each column:
data_trip_cleaned$started_at <- as.POSIXct(
data_trip_cleaned$started_at, format = "%d/%m/%Y %H:%M")
data_trip_cleaned$ended_at <- as.POSIXct(
data_trip_cleaned$ended_at, format = "%d/%m/%Y %H:%M")
# or many columns
datetimes <- c("started_at", "ended_at")
t(lapply(df[,datetimes], FUN = function(x) as.POSIXct(x, format = "%d/%m/%Y %H:%M")))
# Then calculate difference
data_trip_cleaned$diff <- data_trip_cleaned$ended_at - data_trip_cleaned$started_at
# Alternatively
difftime(data_trip_cleaned$ended_at, data_trip_cleaned$started_at, unit = "secs")
# See ?difftime to see other options for units=

Related

Caused by error in `as.POSIXlt.character()`: ! character string is not in a standard unambiguous format. How can I change CHR to POSIXCT?

I have a huge data frame called 'cyclist_trip_data_all'
head(cyclist_trip_data_all)
The most important columns of dates have chr class and I need it to be converted to POSIXct for calculations.
combined_ctd <-
mutate(cyclist_trip_data_all, ride_length =
difftime(ended_at,started_at,units='mins'))
The above throws the unambiguous format error at us and I'm pretty sure because the two columns are chr class.
cyclist_trip_data_all %>%
transmute(ended_at = as.POSIXct(ended_at,tz="",tryFormats=
c("%Y-%m-%d %H:%M:%OS",
"%Y/%m/%d %H:%M:%OS",
"%Y-%m-%d %H:%M",
"%Y/%m/%d %H:%M",
"%Y-%m-%d",
"%Y/%m/%d")))
head(cyclist_trip_data_all$ended_at)
Using the above code throws similar unambiguous format error:
Error in `transmute()`:
! Problem while computing `ended_at = as.POSIXct(...)`.
Caused by error in `as.POSIXlt.character()`:
! character string is not in a standard unambiguous format
Traceback:
1. cyclist_trip_data_all %>% transmute(ended_at = as.POSIXct(ended_at,
. tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS",
. "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d")))
2. transmute(., ended_at = as.POSIXct(ended_at, tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS",
. "%Y/%m/%d %H:%M:%OS", "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M",
. "%Y-%m-%d", "%Y/%m/%d")))
3. transmute.data.frame(., ended_at = as.POSIXct(ended_at, tz = "",
. tryFormats = c("%Y-%m-%d %H:%M:%OS", "%Y/%m/%d %H:%M:%OS",
. "%Y-%m-%d %H:%M", "%Y/%m/%d %H:%M", "%Y-%m-%d", "%Y/%m/%d")))
4. mutate_cols(.data, dots, caller_env = caller_env())
5. withCallingHandlers({
. for (i in seq_along(dots)) {
I've tried as.POSIXct(as.numeric(as.character())) function instead but it changes every date values in the column into NA introduced by coersion.
Also tried:
cyclist_trip_data_all$ended_at <- as.POSIXct(cyclist_trip_data_all$ended_at,format="%Y-%m-%d %H:%M:%S",tz="UTC")
cyclist_trip_data_all$started_at <- as.POSIXct(cyclist_trip_data_all$ended_at,format="%Y-%m-%d %H:%M:%S",tz="UTC")
And it also changes every value to NA.
How do I actually change it to a POSIXct class without it becoming NA?
To note, this error doesn't happen in RStudio but it's happening in my Kaggle notebook for R.
There are alternatives -- at some point I found having to chase (known) formats to be too repetitive and boring and wrote a package that does it for me (and fast):
> library(anytime)
> datevec <- c("5/30/2021 11:58", "5/30/2021 12:10", "5/30/2021 11:29")
> anytime(datevec)
[1] "2021-05-30 11:58:00 CDT" "2021-05-30 12:10:00 CDT" "2021-05-30 11:29:00 CDT"
>
The example just shows the first three of your (non-reproducibly presented) dates. The package has other functions too for converting dates, or specific timezones as well as formatters. Take a look: anytime at CRAN -- and yes it of course also works in pipes and with other packages and whatnot. It "just" aims to take care of converting 'any time or date in any format' to POSIXct or Date.

as.POSIXct returning a double when used in a function instead of DateTime

I have a messy database to deal with where the date time was sometimes stored as 24 hour format with no seconds and other times it was stored as 12 hour time format with AM/PM at the end (could have happened during a Windows update of our measurement computer or something, I don't know).
I want to convert the DateTime string to a usable DateTime object with as.POSIXct but when I try the follow code it is converted into a double (checked the class it is also numeric)
main_function <- function(res_df)
{
res_df <- res_df %>%
mutate(DateTime = sapply(DateTime, date_time_convert))
}
date_time_convert <- function(dt_string, tz="Europe/Amsterdam")
{
if(str_detect(dt_string, "M")){
dt_format <- "%m/%d/%Y %I:%M:%S %p"
}else
{
dt_format <- "%m/%d/%Y %H:%M"
}
as.POSIXct(dt_string, format=dt_format, tz=tz)
}
When I debug, the code executes properly in the function (returns a DateTime object), but when it moves into my dataframe the dates are all converted into doubles.
sapply and similar do not always play well with POSIXt as an output. Here's an alternative: use do.call(c, lapply(..., date_time_convert)).
Demo with sample data:
vec <- c("2021-01-01", "2022-01-01")
### neither 'sapply(..)' nor 'unlist(lapply(..))' work
sapply(vec, as.POSIXct)
# 2021-01-01 2022-01-01
# 1609477200 1641013200
unlist(lapply(vec, as.POSIXct))
# [1] 1609477200 1641013200
do.call(c, lapply(vec, as.POSIXct))
# [1] "2021-01-01 EST" "2022-01-01 EST"
which means your code would be
res_df %>%
mutate(DateTime = do.call(c, lapply(DateTime, date_time_convert)))

convert date with time formats [R]

how can i convert a date from a format like yyyymmdd H:M to yyyy-mm-dd H:M, basicaly from 20200101 00:00 to 2020-01-01 00:00. i have tried multiple as.Date formats and cant obtain the result i want
example of what i have :
dates <- c("20200101 00:00", "20200101 01:00")
want <- as.Date(have, format="%Y%m%d %H:%M")
the output:
> want<- as.Date(dates, format="%Y%m%d %H:%M")
> want
[1] "2020-03-01" "2020-03-01"
There's two pieces here. One is converting to date time class, such as POSIXt. Then there is how this object is printed. Under the hood it's all represented the same, but you can control how it's displayed.
The format argument in any of the conversion functions (as.Date or as_datetime) is describing how to parse the string representation into the components of a data time object (e.g. where in the string to find the minutes). You need to then use something like format or strftime to then control how the values are printed/displayed.
Below is what I think you're aiming for:
dates_as_strings <- c("20200101 00:00", "20200101 01:00")
dates_as_datetime_objs <- lubridate::as_datetime(dates_as_strings, format="%Y%m%d %H:%M")
strftime(dates_as_datetime_objs, "%Y-%m-%d %H:%M", tz = "UTC")
#> [1] "2020-01-01 00:00" "2020-01-01 01:00"
Created on 2021-05-21 by the reprex package (v1.0.0)

How to change date format from YYYY/MM/DD to DD/MM/YYYY

I have a column of dates which were read in as character values (yes, they are supposed to be the same):
str(df$date)
$ date : chr "30/08/2017" "30/08/2017" "30/08/2017" "30/08/2017"
I then convert the values to Date format:
str(df$date)
$ date : Date, format: "2017-08-30" "2017-08-30" "2017-08-30"
The problem is that no matter which method I choose to use, the resulting dates are always in YYYY/MM/DD format, which is not what I want; they should be in DD/MM/YYYY format.
I try:
df$date <- as.Date(df$date, format = "%d/%m/%Y")
df$date <- strptime(df$date, format = "%d/%m/%Y")
df$date <- as.POSIXct(df$date, format = "%d/%m/%Y")
and they all produce the same format.
I have read numerous similar Stack Overflow posts as well as some guides and have tried things like getting and setting my system locale (United Kingdom) and all is correct in that respect.
Where am I going wrong?
I try:
df$date <- as.Date(df$date, format = "%d/%m/%Y")
df$date <- strptime(df$date, format = "%d/%m/%Y")
df$date <- as.POSIXct(df$date, format = "%d/%m/%Y")
and they all produce the same format.
R has two very similarly named functions: strptime, which converts from character strings to Date data, and strftime, which converts Dates to formatted strings. To make matters worse, the documentation for these two functions is combined, so it can be very hard to keep their uses straight. You want strftime, in this case.
You can also use format:
date = c("30/08/2017", "30/08/2017", "30/08/2017", "30/08/2017")
date <- as.Date(date, format = "%d/%m/%Y")
# > date
# [1] "2017-08-30" "2017-08-30" "2017-08-30" "2017-08-30"
date = format(date, "%m/%d/%Y")
# > date
# [1] "08/30/2017" "08/30/2017" "08/30/2017" "08/30/2017"
Turns into character class:
# > class(date)
# [1] "character"
This will help you:
$variable = '2018/09/18';
$date = str_replace('/', '/', $variable);
echo date('d/m/Y', strtotime($date));
Please check and let me know if you need any more help.

bigr dataframe error 'do not know how to convert 'df$V3' to class “POSIXct”'

I'm trying to create a calculated column with bigr. First load the data:
> df <- bigr.frame (dataPath = "/data.csv",
dataSource="DEL", delimiter=",", header=F,
coltypes = c("integer", "character", "character"))
Then attempt to add a column:
> df$posixct = as.POSIXct(df$V3, tz="UTC", format="%Y-%m-%d %H:%M:%S")
However, I get the error message:
Error in as.POSIXct.default(df$V3, tz = "UTC", format = "%Y-%m-%d %H:%M:%S") :
do not know how to convert 'df$V3' to class “POSIXct”
I've taken a look at the class:
> class(df$V3)
[1] "bigr.vector"
attr(,"package")
[1] "bigr"
The values in the V3 column look like this:
2005-01-01 00:00:00
2005-01-01 00:10:00
...
I'm not sure how to proceed - any tips?
Update:
I've tried converting to a character:
> df$posixct = as.POSIXct(
as.character(df$V3), tz="UTC", format="%Y-%m-%d %H:%M:%S")
But receive the following error
Error in as.POSIXct.default(as.character(df$V3), tz = "UTC", format = "%Y-%m-%d %H:%M:%S") :
do not know how to convert 'as.character(df$V3)' to class “POSIXct”

Resources