Is there a possiblity to subset seconds in xts?
2013-01-01 00:01:00 2.2560000
2013-01-01 00:02:00 2.3883333
2013-01-01 00:03:00 1.8450000
2013-01-01 00:04:00 1.6966667
2013-01-01 00:04:03 1.3100000
2013-01-01 00:05:00 0.8533333
I want to get the line not having :00 seconds in the End!
2013-01-01 00:04:03 1.3100000
Usually I would subset by Time [T09:00/T09:30] but right now I want the lines not having a timestamp with 00 for seconds. Thanks!
First, I have to get your example data into R (please use dput next time)
lines <- '2013-01-01 00:01:00 2.2560000
2013-01-01 00:02:00 2.3883333
2013-01-01 00:03:00 1.8450000
2013-01-01 00:04:00 1.6966667
2013-01-01 00:04:03 1.3100000
2013-01-01 00:05:00 0.8533333'
tmp <- read.table(text=lines)
x <- xts(tmp[, 3], as.POSIXct(paste(tmp[, 1], tmp[, 2])))
You can use the .indexsec function to extract rows where the second is (or isn't) 0.
x[.indexsec(x) == 0]
# [,1]
#2013-01-01 00:01:00 2.2560000
#2013-01-01 00:02:00 2.3883333
#2013-01-01 00:03:00 1.8450000
#2013-01-01 00:04:00 1.6966667
#2013-01-01 00:05:00 0.8533333
x[.indexsec(x) != 0]
# [,1]
#2013-01-01 00:04:03 1.31
Another idea would be to use the unexported xts:::startof function which is analogous to the endpoints function.
x[xts:::startof(x, "mins")]
# [,1]
#2013-01-01 00:01:00 2.2560000
#2013-01-01 00:02:00 2.3883333
#2013-01-01 00:03:00 1.8450000
#2013-01-01 00:04:00 1.6966667
#2013-01-01 00:05:00 0.8533333
Or, if you only one the row that does not end in 00, you can use negative subsetting:
x[-xts:::startof(x, "mins")]
# [,1]
#2013-01-01 00:04:03 1.31
Here's how to do it by merging with a zero width xts object that has the index that you want.
merge(xts(, seq(start(x), end(x), by="min")), x, all=FALSE)
# x
#2013-01-01 00:01:00 2.2560000
#2013-01-01 00:02:00 2.3883333
#2013-01-01 00:03:00 1.8450000
#2013-01-01 00:04:00 1.6966667
#2013-01-01 00:05:00 0.8533333
Related
I am working on a problem where I need to calculate the time difference in minutes. I have the time values in hh:mm:ss format in a column (more than 28,000 values).
I have been using the following function to calculate the time difference.
tdiff <- dt[dt, Time_Diff := c(abs(diff(as.numeric(Time))),0.30), Student_ID]
where dt --> is the ordered data table and
0.30 --> 30 minutes assigned to the last activity of the student in a course.
This works, but it is not considering the midnight time.
Thanks to #niko for his help and this problem is solved, however the '30 minutes' that should be assigned to each student's last activity is still not done. Any help in this direction will be greatly appreciated. Thank you.
The expected output is described below
S_Id Date Time Time_Diff Time_Diff(minutes)
A 10/08/2018 23:49:00 00:01:00 1 minutes
A 10/08/2018 23:50:00 00:09:00 9
A 10/08/2018 23:59:00 00:02:00 2
A 10/09/2018 00:01:00 00:09:00 9
A 10/09/2018 00:10:00 08:02:00 482
A 10/09/2018 08:12:00 04:08:00 248
A 10/09/2018 12:20:00 10:01:00 601
A 10/09/2018 22:21:00 01:35:00 95
A 10/09/2018 23:56:00 00:09:00 9
A 10/10/2018 00:05:00 00:05:00 5
A 10/10/2018 00:10:00 00:02:00 2
A 10/10/2018 00:12:00 00:30:00 30
B 10/08/2018 23:49:00 00:01:00 1
B 10/08/2018 23:50:00 00:09:00 9
B 10/08/2018 23:59:00 00:02:00 2
B 10/09/2018 00:01:00 00:09:00 9
B 10/09/2018 00:10:00 08:02:00 482
B 10/09/2018 08:12:00 04:08:00 248
B 10/09/2018 12:20:00 10:01:00 601
B 10/09/2018 22:21:00 01:35:00 95
B 10/09/2018 23:56:00 00:09:00 9
B 10/10/2018 00:05:00 00:05:00 5
B 10/10/2018 00:10:00 00:02:00 2
B 10/10/2018 00:12:00 00:30:00 30
C 10/08/2018 23:49:00 00:01:00 1
C 10/08/2018 23:50:00 00:09:00 9
C 10/08/2018 23:59:00 00:02:00 2
C 10/09/2018 00:01:00 00:09:00 9
C 10/09/2018 00:10:00 08:02:00 482
C 10/09/2018 08:12:00 04:08:00 248
C 10/09/2018 12:20:00 10:01:00 601
C 10/09/2018 22:21:00 01:35:00 95
C 10/09/2018 23:56:00 00:09:00 9
C 10/10/2018 00:05:00 00:05:00 5
C 10/10/2018 00:10:00 00:02:00 2
C 10/10/2018 00:12:00 00:30:00 30
Try converting date and time to POSIXct
# dt is your data frame
diff(as.POSIXct(paste(dt$Date, dt$Time), format='%m/%d/%Y %H:%M:%S')) # or '%d/%m/%Y %H:%M:%S'
That should do the trick.
Data
dt <- structure(list(Date = c("10/08/2018", "10/08/2018", "10/08/2018", "10/09/2018", "10/09/2018",
"10/09/2018", "10/09/2018", "10/09/2018", "10/09/2018", "10/10/2018",
"10/10/2018", "10/10/2018"),
Time = c("23:49:00", "23:50:00", "23:59:00", "00:01:00", "00:10:00", "08:12:00",
"12:20:00", "22:21:00", "23:56:00", "00:05:00", "00:10:00", "00:12:00")),
class = "data.frame", row.names = c(NA, -12L))
I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00
I have a dataframe where I splitted the datetime column by date and time (two columns). However, when I group by time it gives me duplicates in time. So, to analyze it I used table() on time column, and it gave me duplicates also. This is a sample of it:
> table(df$time)
00:00:00 00:00:00 00:15:00 00:15:00 00:30:00 00:30:00
2211 1047 2211 1047 2211 1047
As you may see, when I splitted one of the "unique" values kept a " " inside. Is there a easy way to solve this?
PS: The datatype of the time column is character.
EDIT: Code added
df$datetime <- as.character.Date(df$datetime)
x <- colsplit(df$datetime, ' ', names = c('Date','Time'))
df <- cbind(df, x)
There are a number of approaches. One of them is to use appropriate functions to extract Dates and Times from Datetime column:
df <- data.frame(datetime = seq(
from=as.POSIXct("2018-5-15 0:00", tz="UTC"),
to=as.POSIXct("2018-5-16 24:00", tz="UTC"),
by="30 min") )
head(df$datetime)
#[1] "2018-05-15 00:00:00 UTC" "2018-05-15 00:30:00 UTC" "2018-05-15 01:00:00 UTC" "2018-05-15 01:30:00 UTC"
#[5] "2018-05-15 02:00:00 UTC" "2018-05-15 02:30:00 UTC"
df$Date <- as.Date(df$datetime)
df$Time <- format(df$datetime,"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 01:30:00 2018-05-15 01:30:00
# 5 2018-05-15 02:00:00 2018-05-15 02:00:00
# 6 2018-05-15 02:30:00 2018-05-15 02:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 01:30:00 02:00:00 02:30:00 03:00:00 03:30:00 04:00:00 04:30:00 05:00:00 05:30:00
#3 2 2 2 2 2 2 2 2 2 2 2
#06:00:00 06:30:00 07:00:00 07:30:00 08:00:00 08:30:00 09:00:00 09:30:00 10:00:00 10:30:00 11:00:00 11:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00 16:00:00 16:30:00 17:00:00 17:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#18:00:00 18:30:00 19:00:00 19:30:00 20:00:00 20:30:00 21:00:00 21:30:00 22:00:00 22:30:00 23:00:00 23:30:00
#2 2 2 2 2 2 2 2 2 2 2 2
#If the data were given as character strings and contain extra spaces the above approach will still work
df <- data.frame(datetime=c("2018-05-15 00:00:00","2018-05-15 00:30:00",
"2018-05-15 01:00:00", "2018-05-15 02:00:00",
"2018-05-15 00:00:00","2018-05-15 00:30:00"),
stringsAsFactors=FALSE)
df$Date <- as.Date(df$datetime)
df$Time <- format(as.POSIXct(df$datetime, tz="UTC"),"%H:%M:%S")
head(df)
# datetime Date Time
# 1 2018-05-15 00:00:00 2018-05-15 00:00:00
# 2 2018-05-15 00:30:00 2018-05-15 00:30:00
# 3 2018-05-15 01:00:00 2018-05-15 01:00:00
# 4 2018-05-15 02:00:00 2018-05-15 02:00:00
# 5 2018-05-15 00:00:00 2018-05-15 00:00:00
# 6 2018-05-15 00:30:00 2018-05-15 00:30:00
table(df$Time)
#00:00:00 00:30:00 01:00:00 02:00:00
# 2 2 1 1
reshape2::colsplit accepts regular expressions, so you could split on '\s+' which matches 1 or more whitespace characters.
You can find out more about regular expressions in R using ?base::regex. The syntax is generally constant between languages, so you can use pretty much any regex tutorial. Take a look at https://regex101.com/. This site evaluates your regular expressions in real time and shows you exactly what each part is matching. It is extremely helpful!
Keep in mind that in R, as compared to most other languages, you must double the number of backslashes \. So \s (to match 1 whitespace character) must be written as \\s in R.
ALL;
I just have a data file with two columns, one is time series, one is values. Normally, the time interval between tow rows is exact 5 mins,but sometimes it is larger than 5 mins
A sample is as below:
dd <- data.table(date = c("2015-07-01 00:00:00", "2015-07-01 00:05:00", "2015-07-01 00:20:00","2015-07-01 00:25:00","2015-07-01 00:30:00"),
value = c(9,1,10,12,0))
what i want to do is to check the time interval between two rows, when the time interval is larger than 5 mins, then insert a new row below with 0 value, so , the result could be :
date value
2015-07-01 00:00:00 9
2015-07-01 00:05:00 1
2015-07-01 00:10:00 0
2015-07-01 00:15:00 0
2015-07-01 00:20:00 10
2015-07-01 00:25:00 12
2015-07-01 00:30:00 0
any suggestion and idea is welcome :)
We can do a join after converting to 'date' to DateClass
dd[, date := as.POSIXct(date)][]
dd[dd[, .(date=seq(min(date), max(date), by = "5 min"))], on = 'date'
][is.na(value), value := 0][]
# date value
#1: 2015-07-01 00:00:00 9
#2: 2015-07-01 00:05:00 1
#3: 2015-07-01 00:10:00 0
#4: 2015-07-01 00:15:00 0
#5: 2015-07-01 00:20:00 10
#6: 2015-07-01 00:25:00 12
#7: 2015-07-01 00:30:00 0
I am trying to fill in the gaps in one of my time series by merging a full day time series into my original time series. But for some reason I get duplicate entries and all the rest of my data is NA.
My data looks like this:
> head(data)
TIME Water_Temperature
1 2016-08-22 00:00:00 81.000
2 2016-08-22 00:01:00 80.625
3 2016-08-22 00:02:00 85.000
4 2016-08-22 00:03:00 80.437
5 2016-08-22 00:04:00 85.000
6 2016-08-22 00:05:00 80.375
> tail(data)
TIME Water_Temperature
1398 2016-08-22 23:54:00 19.5
1399 2016-08-22 23:55:00 19.5
1400 2016-08-22 23:56:00 19.5
1401 2016-08-22 23:57:00 19.5
1402 2016-08-22 23:58:00 19.5
1403 2016-08-22 23:59:00 19.5
In between are some minutes missing (1403 rows instead of 1440). I tried to fill them in using:
data.length <- length(data$TIME)
time.min <- data$TIME[1]
time.max <- data$TIME[data.length]
all.dates <- seq(time.min, time.max, by="min")
all.dates.frame <- data.frame(list(TIME=all.dates))
merged.data <- merge(all.dates.frame, data, all=T)
But that gives me a result of 1449 rows instead of 1440. The first eight minutes are duplicates in the time stamp column and all other values in Water_Temperature are NA. Looks like this:
> merged.data[1:25,]
TIME Water_Temperature
1 2016-08-22 00:00:00 NA
2 2016-08-22 00:00:00 81.000
3 2016-08-22 00:01:00 NA
4 2016-08-22 00:01:00 80.625
5 2016-08-22 00:02:00 NA
6 2016-08-22 00:02:00 85.000
7 2016-08-22 00:03:00 NA
8 2016-08-22 00:03:00 80.437
9 2016-08-22 00:04:00 NA
10 2016-08-22 00:04:00 85.000
11 2016-08-22 00:05:00 NA
12 2016-08-22 00:05:00 80.375
13 2016-08-22 00:06:00 NA
14 2016-08-22 00:06:00 80.812
15 2016-08-22 00:07:00 NA
16 2016-08-22 00:07:00 80.812
17 2016-08-22 00:08:00 NA
18 2016-08-22 00:08:00 80.937
19 2016-08-22 00:09:00 NA
20 2016-08-22 00:10:00 NA
21 2016-08-22 00:11:00 NA
22 2016-08-22 00:12:00 NA
23 2016-08-22 00:13:00 NA
24 2016-08-22 00:14:00 NA
25 2016-08-22 00:15:00 NA
> tail(merged.data)
TIME Water_Temperature
1444 2016-08-22 23:54:00 NA
1445 2016-08-22 23:55:00 NA
1446 2016-08-22 23:56:00 NA
1447 2016-08-22 23:57:00 NA
1448 2016-08-22 23:58:00 NA
1449 2016-08-22 23:59:00 NA
Does anyone has an idea whats going wrong?
EDIT:
Using the xts and zoo package now to do the job by doing:
library(xts)
library(zoo)
df1.zoo<-zoo(data[,-1],data[,1])
df2 <- as.data.frame(as.zoo(merge(as.xts(df1.zoo), as.xts(zoo(,seq(start(df1.zoo),end(df1.zoo),by="min"))))))
Very easy and effective!
Instead of merge use rbind which gives you an irregular time series without NAs to start with. If you really want a regular time series with a frequency of say 1 minute you can build a time based sequence as an index and merge it with your data after ( after using rbind) and fill the resulting NAs with na.locf. Hope this helps.
you can try merging with full_join from tidyverse
This works for me with two dataframes (daily values) sharing a column named date.
big_data<-my_data %>%
reduce(full_join, by="Date")