Related
I have two fields in a dataframe that are of the class "times". Call it Time1 and Time2. I am trying to find the time difference between the two.
CombinedFrame2$Duration <- difftime(CombinedFrame2$Time1, CombinedFrame2$Time2)
Error in as.POSIXct.numeric(CombinedFrame2$Time1) :
'origin' must be supplied
How do I get the classes to cooperate to do the calculation?
Example:
Time1 Time2 Duration
5:30:00 6:24:00 0:54:00
$ Time1 : POSIXlt, format: "2019-07-10 16:07:00" "2019-07-10 22:05:00" "2019-07-10 22:20:00" "2019-07-10 22:43:00" ...
$ Time2 : POSIXlt, format: "2019-07-10 22:05:00" "2019-07-10 22:20:00" "2019-07-10 22:43:00" "2019-07-10 23:15:00" ...
> dput(head(CombinedFrame2[,c("Time1", "Time2")]))
structure(list(Time1 = structure(list(sec = c(0, 0, 0, 0,
0, 0), min = c(7L, 5L, 20L, 43L, 15L, 35L), hour = c(16L, 22L,
22L, 22L, 23L, 23L), mday = c(11L, 11L, 11L, 11L, 11L, 11L),
mon = c(6L, 6L, 6L, 6L, 6L, 6L), year = c(119L, 119L, 119L,
119L, 119L, 119L), wday = c(4L, 4L, 4L, 4L, 4L, 4L), yday = c(191L,
191L, 191L, 191L, 191L, 191L), isdst = c(1L, 1L, 1L, 1L,
1L, 1L), zone = c("EDT", "EDT", "EDT", "EDT", "EDT", "EDT"
), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_)), class = c("POSIXlt", "POSIXt"
)), Time2 = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(5L,
20L, 43L, 15L, 35L, 55L), hour = c(22L, 22L, 22L, 23L, 23L, 23L
), mday = c(11L, 11L, 11L, 11L, 11L, 11L), mon = c(6L, 6L, 6L,
6L, 6L, 6L), year = c(119L, 119L, 119L, 119L, 119L, 119L), wday = c(4L,
4L, 4L, 4L, 4L, 4L), yday = c(191L, 191L, 191L, 191L, 191L, 191L
), isdst = c(1L, 1L, 1L, 1L, 1L, 1L), zone = c("EDT", "EDT",
"EDT", "EDT", "EDT", "EDT"), gmtoff = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), class = c("POSIXlt",
"POSIXt"))), row.names = c("1:1", "1:2", "1:3", "1:4", "1:5",
"1:6"), class = "data.frame")
You need to make sure that your time is formatted correctly. See the code below.
You can use strptime() to format your time into hours, minutes, and seconds.
time1 <- "5:30:00"
time2 <- "6:24:00"
time1a <- strptime(time1,format="%H:%M:%S")
time2a <- strptime(time2,format="%H:%M:%S")
duration <- difftime(time2a,time1a)
I am trying to use the aggregate function to get 100 Hz data into 1 minute averages. However, when I use this function the 1-min averages are incorrect. A sample of the data is below. I am using the following code to calculate the 1-min values. The code does not break but the calculations are incorrect.
aggregate(list(X = df$`Gyroscope X`,
Y = df$`Gyroscope Y`,
Z = df$`Gyroscope Z`),
list(minofday = cut(df$Timestamp, "1 min")),mean)
Timestamp Gyroscope X Gyroscope Y Gyroscope Z
2018-07-10T10:25:00.0000000 41.381838 -21.667482 -118.896492
2018-07-10T10:25:00.0100000 48.046268 -12.399903 -110.917976
2018-07-10T10:25:00.0200000 49.102786 -7.36084 -106.485602
2018-07-10T10:25:00.0300000 44.338382 -9.215699 -102.296759
2018-07-10T10:25:00.0400000 34.724123 -11.308594 -96.108404
2018-07-10T10:25:00.0500000 19.622804 -15.225221 -88.122564
2018-07-10T10:25:00.0600000 13.240968 -26.539308 -85.274663
2018-07-10T10:25:00.0700000 13.397218 -31.933596 -80.127568
2018-07-10T10:25:00.0800000 16.333009 -29.663088 -73.027348
2018-07-10T10:25:00.0900000 17.384645 -29.745485 -67.694096
2018-07-10T10:25:00.1000000 16.546632 -30.08423 -67.565922
Assuming OP's data varies by the min (note the modified data), here is how to do it with base R and dplyr:
df$Timestamp <- as.POSIXct(df$Timestamp, format = "%Y-%m-%dT%H:%M:%S")
aggregate(list(X = df$Gyroscope_X,
Y = df$Gyroscope_Y,
Z = df$Gyroscope_Z),
list(minofday = cut(df$Timestamp, "1 min")), mean)
or a more concise way:
aggregate(. ~ minofday, mean, data = cbind(setNames(df[,-1], c("X", "Y", "Z")),
minofday = cut(df$Timestamp, "1 min")))
Result:
minofday X Y Z
1 2018-07-10 10:24:00 48.57453 -9.880371 -108.70179
2 2018-07-10 10:25:00 27.78422 -19.314983 -95.13774
3 2018-07-10 10:26:00 16.85883 -29.704286 -70.36072
4 2018-07-10 10:27:00 16.54663 -30.084230 -67.56592
With lubridate and summarize_all from dplyr:
library(dplyr)
library(lubridate)
df %>%
mutate(Timestamp = ymd_hms(Timestamp)) %>%
group_by(minofday = cut(Timestamp, "1 min")) %>%
summarize_all(mean) %>%
select(-Timestamp)
Result:
# A tibble: 4 x 4
minofday Gyroscope_X Gyroscope_Y Gyroscope_Z
<fct> <dbl> <dbl> <dbl>
1 2018-07-10 10:24:00 48.6 -9.88 -109.
2 2018-07-10 10:25:00 27.8 -19.3 -95.1
3 2018-07-10 10:26:00 16.9 -29.7 -70.4
4 2018-07-10 10:27:00 16.5 -30.1 -67.6
Data:
df <- read.table(text = " Timestamp Gyroscope_X Gyroscope_Y Gyroscope_Z
2018-07-10T10:25:00.0000000 41.381838 -21.667482 -118.896492
2018-07-10T10:24:00.0100000 48.046268 -12.399903 -110.917976
2018-07-10T10:24:00.0200000 49.102786 -7.36084 -106.485602
2018-07-10T10:25:00.0300000 44.338382 -9.215699 -102.296759
2018-07-10T10:25:00.0400000 34.724123 -11.308594 -96.108404
2018-07-10T10:25:00.0500000 19.622804 -15.225221 -88.122564
2018-07-10T10:25:00.0600000 13.240968 -26.539308 -85.274663
2018-07-10T10:25:00.0700000 13.397218 -31.933596 -80.127568
2018-07-10T10:26:00.0800000 16.333009 -29.663088 -73.027348
2018-07-10T10:26:00.0900000 17.384645 -29.745485 -67.694096
2018-07-10T10:27:00.1000000 16.546632 -30.08423 -67.565922", header = TRUE)
Since you are dealing with timestamps the xts package has a lot of functions that can help you. For rolling up timestamps period.apply can help you out. The endpoints part can roll up the data from microseconds all the way up to years.
# don't load the timestamp column that one goes to the order.by part
df1_xts <- xts(df1[, -1], order.by = df1$Timestamp)
# roll up to seconds.
period.apply(df1_xts, endpoints(df1_xts, on = "mins"), colMeans)
Gyroscope_X Gyroscope_Y Gyroscope_Z
2018-07-10 10:25:00 28.55624 -20.46759 -90.59249
If you timestamp column is not yet a date time object you can use this:
df1$Timestamp <- strptime(df1$Timestamp, format = "%Y-%m-%dT%H:%M:%OS")
data:
df1 <- structure(list(Timestamp = structure(list(sec = c(0, 0.01, 0.02,
0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1), min = c(25L,
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), hour = c(10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mday = c(10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mon = c(6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(118L, 118L,
118L, 118L, 118L, 118L, 118L, 118L, 118L, 118L, 118L), wday = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), yday = c(190L, 190L,
190L, 190L, 190L, 190L, 190L, 190L, 190L, 190L, 190L), isdst = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), zone = c("CEST", "CEST",
"CEST", "CEST", "CEST", "CEST", "CEST", "CEST", "CEST", "CEST",
"CEST"), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_)), class = c("POSIXlt", "POSIXt")),
Gyroscope_X = c(41.381838, 48.046268, 49.102786, 44.338382,
34.724123, 19.622804, 13.240968, 13.397218, 16.333009, 17.384645,
16.546632), Gyroscope_Y = c(-21.667482, -12.399903, -7.36084,
-9.215699, -11.308594, -15.225221, -26.539308, -31.933596,
-29.663088, -29.745485, -30.08423), Gyroscope_Z = c(-118.896492,
-110.917976, -106.485602, -102.296759, -96.108404, -88.122564,
-85.274663, -80.127568, -73.027348, -67.694096, -67.565922
)), row.names = c(NA, -11L), class = "data.frame")
I have two columns of PosixLT times with no NA values , yet NA values show up upon check
> sum(is.na(check$start))
[1] 19
> sum(is.na(check$end))
[1] 23
The data is present in the cells, so why does this happen? I have heard that this can happen with PosixLT but even when I convert this to posixCT, there is very strange behavior. How does one go about solving this?
> as.POSIXct(check$start, format = "%Y-%m-%d %H:%M:%S", tz = "CST6CDT")
[1] NA "2014-03-09 01:35:01 CST" NA "2014-03-09 01:53:30 CST" NA
[6] NA NA NA NA "2014-03-09 04:17:11 CDT"
[11] NA NA "2015-03-08 01:54:43 CST" NA NA
[16] NA NA NA NA NA
[21] NA NA NA
> dput(check)
structure(list(start = structure(list(sec = c(24, 1, 27, 30,
8, 21, 40, 9, 43, 11, 31, 43, 43, 55, 39, 54, 41, 19, 2, 35,
6, 54, 40), min = c(45L, 35L, 14L, 53L, 36L, 37L, 47L, 48L, 54L,
17L, 57L, 53L, 54L, 3L, 52L, 22L, 34L, 28L, 41L, 42L, 52L, 52L,
53L), hour = c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mday = c(9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L), mon = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
year = c(114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L,
114L, 114L, 114L, 115L, 115L, 115L, 115L, 115L, 115L, 115L,
115L, 115L, 115L, 115L, 115L), wday = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), yday = c(67L, 67L, 67L, 67L, 67L, 67L, 67L,
67L, 67L, 67L, 67L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L,
66L, 66L, 66L, 66L), isdst = c(-1L, 0L, -1L, 0L, -1L, -1L,
-1L, -1L, -1L, 1L, -1L, -1L, 0L, -1L, -1L, -1L, -1L, -1L,
-1L, -1L, -1L, -1L, -1L), zone = c("", "CST", "", "CST",
"", "", "", "", "", "CDT", "", "", "CST", "", "", "", "",
"", "", "", "", "", ""), gmtoff = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"), tzone = c("CST6CDT", "CST", "CDT")), end = structure(list(
sec = c(7, 59, 38, 45, 29, 46, 39, 14, 52, 29, 37, 5, 23,
41, 10, 43, 46, 46, 53, 24, 57, 13, 51), min = c(55L, 47L,
30L, 2L, 43L, 51L, 53L, 56L, 54L, 54L, 57L, 56L, 6L, 3L,
13L, 29L, 37L, 32L, 48L, 47L, 55L, 55L, 55L), hour = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), mday = c(9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L), mon = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
year = c(114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L,
114L, 114L, 114L, 115L, 115L, 115L, 115L, 115L, 115L, 115L,
115L, 115L, 115L, 115L, 115L), wday = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), yday = c(67L, 67L, 67L, 67L, 67L, 67L, 67L,
67L, 67L, 67L, 67L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L,
66L, 66L, 66L, 66L), isdst = c(-1L, -1L, -1L, -1L, -1L, -1L,
-1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L,
-1L, -1L, -1L, -1L, -1L), zone = c("", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", ""), gmtoff = c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
)), .Names = c("sec", "min", "hour", "mday", "mon", "year",
"wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt",
"POSIXt"), tzone = c("CST6CDT", "CST", "CDT"))), .Names = c("start",
"end"), row.names = c(1559963L, 1560092L, 1560157L, 1560220L,
1560240L, 1560247L, 1560252L, 1560253L, 1560255L, 1560258L, 1560260L,
2004432L, 2004583L, 2004591L, 2004594L, 2004596L, 2004598L, 2004599L,
2004600L, 2004603L, 2004609L, 2004610L, 2004611L), class = "data.frame")
How works is.na in this context ?
> is.na.POSIXlt
function (x)
is.na(as.POSIXct(x))
<bytecode: 0x0000000014232980>
How does as.POSIXct behave here ?
> as.POSIXct(check$start)
[1] NA "2014-03-09 01:35:01 CST" NA "2014-03-09 01:53:30 CST"
[5] NA NA NA NA
[9] NA "2014-03-09 04:17:11 CDT" NA NA
[13] "2015-03-08 01:54:43 CST" NA NA NA
[17] NA NA NA NA
[21] NA NA NA
Ok, but WHY ????
Let's check the doc of as.POSIXct:
Any conversion that needs to go between the two date-time classes
requires a time zone: conversion from "POSIXlt" to "POSIXct" will
validate times in the selected time zone. One issue is what happens at
transitions to and from DST, for example in the UK
Let's see:
> check$start$zone
[1] "" "CST" "" "CST" "" "" "" "" "" "CDT" "" "" "CST" "" "" "" "" "" "" ""
[21] "" "" ""
An here's the dragons, there's no timezone except for 4 entries, so as.POSIXct can't tell if the dates are valid (within DST change or not ?) as you can see with:
> check$start$isdst
[1] -1 0 -1 0 -1 -1 -1 -1 -1 1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
So the converstion between POSIXlt (your dataframe) and POSIXct can't guess if the date is valid, and return NA.
One fixing method could be to enforce a timezone on all records:
> check$start <- as.POSIXlt(strftime(check$start,tz="CST"),tz="CST6CDT")
> is.na(check$start)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Plenty of material on stackoverflow regarding calculating time differences between rows/entries/observations. However, I'm stumped why I'm getting NA's in unusual positions.
I have 3 columns, DATETIME which is posixlt, GRP800 which is the group (factor), and TIME800 which is supposed to represent the time elapsed between each observation for each group. My particular code was derived from Calculate differences between rows faster than a for loop?.
df$TIME800<-unlist(by(df$DATETIME,df$GRP800,function(x)c(NA,diff(x))))
It does appear to function properly for the first group but then I am getting NA's in the middle of the 2nd group. I've tried several approaches using diff and it's producing the identical output. I'm quite puzzled. Any advice would be greatly appreciated.
DATETIME GRP800 TIME800
1 2013-07-16 16:01:30 1 NA
2 2013-07-16 20:00:54 1 3.990000
3 2013-07-17 00:01:30 1 4.010000
4 2013-07-17 04:01:00 1 3.991667
5 2013-07-17 08:00:50 1 3.997222
6 2013-07-17 12:01:46 1 4.015556
7 2013-07-17 16:00:50 1 3.984444
8 2013-07-17 20:01:00 1 4.002778
9 2013-07-18 00:01:18 1 4.005000
10 2013-07-18 04:01:02 1 3.995556
11 2013-07-18 08:00:50 1 3.996667
12 2013-07-18 12:01:18 2 NA
13 2013-07-18 16:01:02 2 3.970833
14 2013-07-18 20:00:59 2 4.007500
15 2013-07-19 00:01:31 2 3.997222
16 2013-07-19 04:01:18 2 4.011111
17 2013-07-19 08:01:02 2 NA
18 2013-07-19 12:01:57 2 2.007500
19 2013-07-19 20:01:00 2 NA
20 2013-07-20 00:01:00 2 2.003333
> dput(df[1:20,])
structure(list(DATETIME = structure(list(sec = c(30, 54, 30,
0, 50, 46, 50, 0, 18, 2, 50, 18, 2, 59, 31, 18, 2, 57, 0, 0),
min = c(1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L,
0L, 1L, 1L, 1L, 1L, 1L, 1L), hour = c(16L, 20L, 0L, 4L, 8L,
12L, 16L, 20L, 0L, 4L, 8L, 12L, 16L, 20L, 0L, 4L, 8L, 12L,
20L, 0L), mday = c(16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L,
18L, 18L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L
), mon = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(113L, 113L, 113L,
113L, 113L, 113L, 113L, 113L, 113L, 113L, 113L, 113L, 113L,
113L, 113L, 113L, 113L, 113L, 113L, 113L), wday = c(2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L), yday = c(196L, 196L, 197L, 197L, 197L, 197L,
197L, 197L, 198L, 198L, 198L, 198L, 198L, 198L, 199L, 199L,
199L, 199L, 199L, 200L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
zone = c("MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT",
"MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT", "MDT",
"MDT", "MDT", "MDT", "MDT"), gmtoff = c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst",
"zone", "gmtoff"), class = c("POSIXlt", "POSIXt")), GRP800 = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), TIME800 = c(NA, 3.99, 4.01, 3.991666667, 3.997222222,
4.015555556, 3.984444444, 4.002777778, 4.005, 3.995555556, 3.996666667,
NA, 3.970833333, 4.0075, 3.997222222, 4.011111111, NA, 2.0075,
NA, 2.003333333)), .Names = c("DATETIME", "GRP800", "TIME800"
), row.names = c(NA, 20L), class = "data.frame")
Here is a sample of the data I'm currently working on:
x <- structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
min = c(59L, 32L, 23L, 26L, 20L, 14L, 1L, 5L, 32L, 2L),
hour = c(10L, 15L, 12L, 12L, 16L, 18L, 18L, 9L, 14L, 12L),
mday = c(9L, 15L, 2L, 15L, 20L, 26L, 11L, 22L, 9L, 16L),
mon = c(4L, 11L, 10L, 7L, 9L, 8L, 10L, 8L, 8L, 4L),
year = c(111L, 111L, 111L, 111L, 111L, 111L, 111L, 111L, 111L, 111L),
wday = c(1L, 4L, 3L, 1L, 4L, 1L, 5L, 4L, 5L, 1L),
yday = c(128L, 348L, 305L, 226L, 292L, 268L, 314L, 264L, 251L, 135L),
isdst = c(0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L)),
.Names = c("sec", "min", "hour", "mday", "mon", "year",
"wday", "yday", "isdst"),
class = c("POSIXlt", "POSIXt"))
So that
> x
[1] "2011-05-09 10:59:00" "2011-12-15 15:32:00" "2011-11-02 12:23:00"
[4] "2011-08-15 12:26:00" "2011-10-20 16:20:00" "2011-09-26 18:14:00"
[7] "2011-11-11 18:01:00" "2011-09-22 09:05:00" "2011-09-09 14:32:00"
[10] "2011-05-16 12:02:00"
Say I want to tabulate the distribution of x by month. This is how I accomplish it:
> table(strftime(x, '%m'))
05 08 09 10 11 12
2 1 3 1 2 1
Now I want to do a similar tabulation, but this time I want to group the data by bimester (and possibly by trimester or semester, later on). I've taken a look at the help page for strptime, but couldn't find an appropriate separator.
This is the best I have come up with so far:
> table(cut(x = as.numeric(strftime(x, '%m')),
breaks = c(1, 3, 5, 7, 9, 11, 13),
labels = c('1-2', '3-4', '5-6', '7-8', '9-10', '11-12'),
right = FALSE))
1-2 3-4 5-6 7-8 9-10 11-12
0 0 2 1 4 3
It is a convoluted way of reaching this, but it's OK for a simple example and a single case. However, this approach will give me headaches down the road, since I'll want those data to remain POSIX (not to mention it makes my code scarier than it should). Is there an elegant solution for this?
If you're sticking with table and vectors (as opposed to have a rectangular data/output, in which case I'd use data.table), you could do:
table(2*(x$mon %/% 2) + 1)
#
# 5 7 9 11
# 2 1 4 3
You could do away with using any type of format-ting of the date values themselves and just create a lookup vector for your groupings. This would also allow total flexibility in specifying what months fit into what categories. E.g.:
src <- factor(rep(c('01-02','03-04','05-06','07-08','09-10','11-12'),each=2))
src[x$mon+1]
#[1] 05-06 11-12 11-12 07-08 09-10 09-10 11-12 09-10 09-10 05-06
#Levels: 01-02 03-04 05-06 07-08 09-10 11-12
table(src[x$mon+1])
#01-02 03-04 05-06 07-08 09-10 11-12
# 0 0 2 1 4 3