I have a data Y . Y has a column time .
time column looks like this:
For example, 20211201000010 means 2021-12-01 00:00:10 .
time <- strptime(Y$time, format = "%Y%m%d%H%M%S")
start_time <- min(time)
In this code, start_time is 2021-12-01 00:00:02.
But I want to round up the start_timeas 2021-12-01 00:00:10,since the start_time should be 10 seconds interval for my data.
How can I round up 2021-12-01 00:00:02 as 2021-12-01 00:00:10 ?
lubridate package is always our friends for datetime work.
library(lubridate)
xx1 <- '20211201010002'
ymd_hms(xx1) %>%
ceiling_date(unit = '10s')
[1] "2021-12-01 01:00:10 UTC"
You may need to calculate the remainder (divide by 10) before you format the data.
e.g.
20211201000002 %% 10 = 2;
20211201000010 %% 10 = 0
Then you find the first 0 in your list.
Related
I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"
I'm looking for a simple and correct way to change the date/time (POSIXct) format into a time that starts at 00:00:00.
I couldn't find an answer to this in R language, but if I overlooked one, please tell me :)
So I have this :
date/time
v1
2022-02-16 15:07:15
38937
2022-02-16 15:07:17
39350
And I would like this :
time
v1
00:00:00
38937
00:00:02
39350
Can somebody help me with this?
Thanks :)
You can calculate the difference between the two datetimes in seconds, and add i to a random date starting at "00:00:00", before formatting it to only including the time. See the time column in the reprex underneath:
library(dplyr)
ibrary(lubridate)
df %>%
mutate(
date = lubridate::ymd_hms(date),
seconds = as.numeric(date - first(date)),
time = format(
lubridate::ymd_hms("2022-01-01 00:00:00") + seconds,
format = "%H:%M:%S"
)
)
#> # A tibble: 2 × 4
#> date v1 seconds time
#> <dttm> <dbl> <dbl> <chr>
#> 1 2022-02-16 15:07:15 38937 0 00:00:00
#> 2 2022-02-16 15:07:17 39350 2 00:00:02
Created on 2022-03-30 by the reprex package (v2.0.1)
Note that this will be misleading if you ever have over 24 hours between two datetimes. In these cases you should probably include the date.
Data
df <- tibble::tribble(
~date, ~v1,
"2022-02-16 15:07:15", 38937,
"2022-02-16 15:07:17", 39350
)
You can deduct all date/time with the first record of date/time, and change the result to type of time by the hms() function in the hms package.
library(dplyr)
library(hms)
df %>%
mutate(`date/time` = hms::hms(as.numeric(as.POSIXct(`date/time`) - as.POSIXct(first(`date/time`)))))
date/time v1
1 00:00:00 38937
2 00:00:02 39350
Note that in this method, even if the time difference is greater than 1 day, it'll be reflected in the result, for example:
df <- read.table(header = T, check.names = F, sep = "\t", text = "
date/time v1
2022-02-16 15:07:15 38937
2022-02-18 15:07:17 39350")
df %>%
mutate(`date/time` = hms::hms(as.numeric(as.POSIXct(`date/time`) - as.POSIXct(first(`date/time`)))))
date/time v1
1 00:00:00 38937
2 48:00:02 39350
Consider this simple example
bogus <- function(start_time, end_time){
print(paste('hey this starts on', start_time, 'until', end_time))
}
start_time <- ymd('2018-01-01')
end_time <- ymd('2018-05-01')
> bogus(start_time, end_time)
[1] "hey this starts on 2018-01-01 until 2018-05-01"
Unfortunately, doing so with a long time range does not work with my real-life bogus function, so I need to break my original time range into monthly pieces.
In other words the first call would be bogus(ymd('2018-01-01'), ymd('2018-01-31')), the second one bogus(ymd('2018-02-01'), ymd('2018-02-28')), etc.
Is there a simple way to do using purrr and lubridate?
Thanks
Are you looking for something like:
library(lubridate)
seq_dates <- seq(start_time, end_time - 1, by = "month")
lapply(seq_dates, function(x) print(paste('hey this starts on', x, 'until', ceiling_date(x, unit = "month") - 1)))
You could also do a short bogus function like:
bogus <- function(start_var, end_var) {
require(lubridate)
seq_dates <- seq(as.Date(start_var), as.Date(end_var) - 1, by = "month")
printed_statement <- lapply(seq_dates, function(x) paste('hey this starts on', x, 'until', ceiling_date(x, unit = "month") - 1))
for (i in printed_statement) { print(i) }
}
And call it like:
bogus("2018-01-01", "2018-05-01")
Output:
[1] "hey this starts on 2018-01-01 until 2018-01-31"
[1] "hey this starts on 2018-02-01 until 2018-02-28"
[1] "hey this starts on 2018-03-01 until 2018-03-31"
[1] "hey this starts on 2018-04-01 until 2018-04-30"
This way you can just give minimum start and maximum end date and get everything in-between.
With base:
seqdate<-seq.Date(start_time,end_time,by="1 month")
dateranges<-data.frame(start.dates=seqdate[1:length(seqdate)-1],
end.dates=seqdate[2:length(seqdate)]-1)
start.dates end.dates
1 2018-01-01 2018-01-31
2 2018-02-01 2018-02-28
3 2018-03-01 2018-03-31
4 2018-04-01 2018-04-30
I am currently working on a project involving data of delivery timings. The data can be both negative (indicating that the delivery was not late but actually ahead of the estimate) or positive (indicating that it was indeed late).
I would like to obtain the five number summary and interquartile range using the fivenum() function on the data. However, because all of the values are positive, my statistics are not accurate. The following is an example of the data I am working with:
Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1 00:01:29 00:00:00 00:05:08
2 00:12:19 00:00:00 00:04:52
3 00:02:55 00:00:00 00:05:42
4 00:06:14 00:00:00 00:14:34
5 -00:06:05 00:00:00 00:01:42
6 00:09:58 00:00:00 00:02:56
From this, I am interested in the Delivery.Late variable and would like to perform exploratory / diagnostic statistics on it.
I have used the chron package to convert the column data into chronological objects but chron(object) always takes the absolute value of the time and turns it into a positive value. Here is a sample of my code:
library(chron)
feb_01_07 <- read.csv("~/filepath/data.csv")
#converting factor to time
feb_01_07[,19] <- chron(times=feb_01_07$Delivery.Late)
#Five number summary and interquartile range for $Delivery.Late column
fivenum(feb_01_07$Delivery.Late, na.rm=TRUE)
After running fivenum() I get the results:
[1] 00:01:29 00:02:55 00:06:09 00:09:58 00:12:19
Which is inaccurate because the lowest number (the first term), should in fact, be -00:06:05 and not 00:01:29. -00:06:05 was converted to a positive chronological object and became the median instead.
How can I convert them to time objects while maintaining the negative values?Thanks so much for any insight!
Can do something like this:
library(chron)
delivery_late <- c("00:01:29", "00:12:19", "-00:06:05")
not_late_idx <- grep(pattern = "^-.*", x = delivery_late)
times <- chron(times=delivery_late)
times[not_late_idx] <- -1*times[not_late_idx]
1) chron times can represent negative times but will render them as negative numbers. We can present it as a negative times object like this:
library(chron)
# convert string in form [-]HH:MM:SS to times object
neg_times <- function(x) ifelse(grepl("-", x), - times(sub("-", "", x)), times(x))
DF <- read.table("data.dat")
test <- transform(DF, Delivery.Late = neg_times(Delivery.Late))
giving:
> test
Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1 0.001030093 00:00:00 00:05:08
2 0.008553241 00:00:00 00:04:52
3 0.002025463 00:00:00 00:05:42
4 0.004328704 00:00:00 00:14:34
5 -0.004224537 00:00:00 00:01:42
6 0.006921296 00:00:00 00:02:56
and we could also define a formatting routine:
# format a possibly negative times object
format_neg_times <- function(x) {
paste0(ifelse(x < 0, "-", ""), format(times(abs(x))))
}
format_neg_times(test[[1]])
## [1] "00:01:29" "00:12:19" "00:02:55" "00:06:14" "-00:06:05" "00:09:58"
2) The example in the question only has times that are before noon. If it is always the case that the times are between -12:00:00 and 12:00:00 then we could represent negative times as x + 1 like this:
library(chron)
wrap_neg_times <- function(x) times(neg_times(x) %% 1)
DF <- read.table("data.dat")
test2 <- transform(DF, Delivery.Late = wrap_neg_times(Delivery.Late))
giving:
> test2
Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1 00:01:29 00:00:00 00:05:08
2 00:12:19 00:00:00 00:04:52
3 00:02:55 00:00:00 00:05:42
4 00:06:14 00:00:00 00:14:34
5 23:53:55 00:00:00 00:01:42
6 00:09:58 00:00:00 00:02:56
format_wrap_neg_times <- function(x) {
format_neg_times(ifelse(x > 0.5, x - 1, x))
}
format_wrap_neg_times(test2[[1]])
## [1] "00:01:29" "00:12:19" "00:02:55" "00:06:14" "-00:06:05" "00:09:58"
Note
The input in reproducible form:
Lines <- "
Delivery.Late Reaction.Time Time.Until.Send.To.Vendor
1 00:01:29 00:00:00 00:05:08
2 00:12:19 00:00:00 00:04:52
3 00:02:55 00:00:00 00:05:42
4 00:06:14 00:00:00 00:14:34
5 -00:06:05 00:00:00 00:01:42
6 00:09:58 00:00:00 00:02:56"
cat(Lines, file = "data.dat")
Update
Fix.
I have a data frame with hour stamp and corresponding temperature measured. The measurements are taken at random intervals over time continuously. I would like to convert the hours to respective date-time and temperature measured. My data frame looks like this: (The measurement started at 20/05/2016)
Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25
I would like to create a data.frame with respective date-time and Temp like below:
Time, Temp
2016-05-20 09:25,28
2016-05-20 10:35,28.2
2016-05-20 18:25,29
2016-05-20 23:50,30
2016-05-21 01:10,31
2016-05-21 12:00,36
2016-05-22 02:00,25
I am thankful for any comments and tips on the packages or functions in R, I can have a look to do this. Thanks for your time.
A possible solution in base R:
df$Time <- as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',df$Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT'))
df$Time <- df$Time + cumsum(c(0,diff(df$Time)) < 0) * 86400 # 86400 = 60 * 60 * 24
which gives:
> df
Time Temp
1 2016-05-20 09:25:00 28.0
2 2016-05-20 10:35:00 28.2
3 2016-05-20 18:25:00 29.0
4 2016-05-20 23:50:00 30.0
5 2016-05-21 01:10:00 31.0
6 2016-05-21 12:00:00 36.0
7 2016-05-22 02:00:00 25.0
An alternative with data.table (off course you can also use cumsum with diff instead of rleid & shift):
setDT(df)[, Time := as.POSIXct(strptime(paste('2016-05-20', sprintf('%05.2f',Time)), format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
(rleid(Time < shift(Time, fill = Time[1]))-1) * 86400]
Or with dplyr:
library(dplyr)
df %>%
mutate(Time = as.POSIXct(strptime(paste('2016-05-20',
sprintf('%05.2f',Time)),
format = '%Y-%m-%d %H.%M', tz = 'GMT')) +
cumsum(c(0,diff(Time)) < 0)*86400)
which will both give the same result.
Used data:
df <- read.table(text='Time, Temp
09.25,28
10.35,28.2
18.25,29
23.50,30
01.10,31
12.00,36
02.00,25', header=TRUE, sep=',')
You can use a custom date format combined with some code that detects when a new day begins (assuming the first measurement takes place earlier in the day than the last measurement of the previous day).
# starting day
start_date = "2016-05-20"
values=read.csv('values.txt', colClasses=c("character",NA))
last=c(0,values$Time[1:nrow(values)-1])
day=cumsum(values$Time<last)
Time = strptime(paste(start_date,values$Time), "%Y-%m-%d %H.%M")
Time = Time + day*86400
values$Time = Time