Function: calculating seconds between data points - r
I have the following column in my data frame:
DateTime
1 2011-10-03 08:00:04
2 2011-10-03 08:00:05
3 2011-10-03 08:00:06
4 2011-10-03 08:00:09
5 2011-10-03 08:00:15
6 2011-10-03 08:00:24
7 2011-10-03 08:00:30
8 2011-10-03 08:00:42
9 2011-10-03 08:01:01
10 2011-10-03 08:01:24
11 2011-10-03 08:01:58
12 2011-10-03 08:02:34
13 2011-10-03 08:03:25
14 2011-10-03 08:04:26
15 2011-10-03 08:06:00
With dput:
> dput(smallDF)
structure(list(DateTime = structure(c(1317621604, 1317621605,
1317621606, 1317621609, 1317621615, 1317621624, 1317621630, 1317621642,
1317621661, 1317621684, 1317621718, 1317621754, 1317621805, 1317621866,
1317621960, 1317622103, 1317622197, 1317622356, 1317622387, 1317622463,
1317622681, 1317622851, 1317623061, 1317623285, 1317623404, 1317623498,
1317623612, 1317623849, 1317623916, 1317623994, 1317624174, 1317624414,
1317624484, 1317624607, 1317624848, 1317625023, 1317625103, 1317625179,
1317625200, 1317625209, 1317625229, 1317625238, 1317625249, 1317625264,
1317625282, 1317625300, 1317625315, 1317625339, 1317625353, 1317625365,
1317625371, 1317625381, 1317625395, 1317625415, 1317625423, 1317625438,
1317625458, 1317625469, 1317625487, 1317625500, 1317625513, 1317625533,
1317625548, 1317625565, 1317625581, 1317625598, 1317625613, 1317625640,
1317625661, 1317625674, 1317625702, 1317625715, 1317625737, 1317625758,
1317625784, 1317625811, 1317625826, 1317625841, 1317625862, 1317625895,
1317625909, 1317625935, 1317625956, 1317625973, 1317626001, 1317626043,
1317626062, 1317626100, 1317626113, 1317626132, 1317626153, 1317626179,
1317626212, 1317626239, 1317626271, 1317626296, 1317626323, 1317626361,
1317626384, 1317626407), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = "DateTime", row.names = c(NA,
-100L), class = "data.frame")
My goal: I want to calculate the time difference, in seconds, between each measurement.
Edit:
I'm looking to get the following result, where the time difference (in seconds) between each data point is calculated, except for the first value of the day (line 3), when the time is calculate relative to 8 am:
DateTime Seconds
1 2011-09-30 21:59:02 6
2 2011-09-30 21:59:04 2
3 2011-10-03 08:00:04 4
4 2011-10-03 08:00:05 1
5 2011-10-03 08:00:06 1
6 2011-10-03 08:00:09 3
7 2011-10-03 08:00:15 5
8 2011-10-03 08:00:24 9
9 2011-10-03 08:00:30 6
10 2011-10-03 08:00:42 12
11 2011-10-03 08:01:01 19
12 2011-10-03 08:01:24 23
13 2011-10-03 08:01:58 34
14 2011-10-03 08:02:34 36
15 2011-10-03 08:03:25 51
16 2011-10-03 08:04:26 61
17 2011-10-03 08:06:00 94
However, the measurements start at 8:00 am, so if the value is the first of the day, the number of seconds relative to 8:00 am need to be calculated. In the example above, the first measurement ends at 8:00:04 so using the $sec attribute of POSIX could work here, but on other days the first value may happen a few minutes after 8:00 o'clock.
I've tried to achieve that goal with the following function:
SecondsInBar <- function(x, startTime){
# First data point or first of day
if (x == 1 || x > 1 && x$wkday != x[-1]$wkday){
seconds <- as.numeric(difftime(x,
as.POSIXlt(startTime, format = "%H:%M:%S"),
units = "secs"))
# else calculate time difference
} else {
seconds <- as.numeric(difftime(x, x[-1], units = "secs"))
}
return (seconds)
}
Which then could be called with SecondsInBar(smallDF$DateTime, "08:00:00").
There are at least two problems with this function, but I don't know how to solve these:
The code segment x$wkday != x[-1]$wkday returns a $ operator is
invalid for atomic vectors error,
And the as.POSIXlt(startTime, format = "%H:%M:%S") uses the
current date, which makes the difftime calculation erroneous.
My question:
Where am I going wrong with this function?
And: is this approach a viable way or should I approach it from a different angle?
How about something along these lines:
smallDF$DateTime - as.POSIXct(paste(strftime(smallDF$DateTime,"%Y-%m-%d"),"07:00:00"))
Time differences in secs
[1] 4 5 6 9 15 24 30 42 61 84 118 154 205 266 360
[16] 503 597 756 787 863 1081 1251 1461 1685 1804 1898 2012 2249 2316 2394
[31] 2574 2814 2884 3007 3248 3423 3503 3579 3600 3609 3629 3638 3649 3664 3682
[46] 3700 3715 3739 3753 3765 3771 3781 3795 3815 3823 3838 3858 3869 3887 3900
[61] 3913 3933 3948 3965 3981 3998 4013 4040 4061 4074 4102 4115 4137 4158 4184
[76] 4211 4226 4241 4262 4295 4309 4335 4356 4373 4401 4443 4462 4500 4513 4532
[91] 4553 4579 4612 4639 4671 4696 4723 4761 4784 4807
attr(,"tzone")
[1] ""
Note that I used 7am as when I copied your data my it decided to interpret it as BST.
As for your errors, you can't use $ to get elements of a date with POSIXct (which is how smallDF$DateTime is defined), only with POSIXlt. And for the second error, if you don't supply a date, it has to assume the current date, as there is no other information to draw upon.
Edit
Now its been clarified, I would propose a different approach: split your data frame by day, and then combine the times with the reference time and do diff on that, using lapply to loop over days:
#modify dataframe to add extra day to second half
smallDF[51:100,1] <- smallDF[51:100,1]+86400
smallDF2 <- split(smallDF,strftime(smallDF$DateTime,"%Y-%m-%d"))
lapply(smallDF2,function(x) diff(c(as.POSIXct(paste(strftime(x$DateTime[1],"%Y-%m-%d"),"07:00:00")),x$DateTime)))
$`2011-10-03`
Time differences in secs
[1] 4 1 1 3 6 9 6 12 19 23 34 36 51 61 94 143 94 159 31
[20] 76 218 170 210 224 119 94 114 237 67 78 180 240 70 123 241 175 80 76
[39] 21 9 20 9 11 15 18 18 15 24 14 12
$`2011-10-04`
Time differences in secs
[1] 3771 10 14 20 8 15 20 11 18 13 13 20 15 17 16
[16] 17 15 27 21 13 28 13 22 21 26 27 15 15 21 33
[31] 14 26 21 17 28 42 19 38 13 19 21 26 33 27 32
[46] 25 27 38 23 23
Related
R: How plot negative and positive anomaly (for this data) with ggplot? [duplicate]
This question already has answers here: How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?) (4 answers) Closed 5 years ago. I have this df x acc 1 1902-01-01 0.782887804 2 1903-01-01 -0.003144199 3 1904-01-01 0.100006276 4 1905-01-01 0.326173392 5 1906-01-01 1.285114692 6 1907-01-01 2.844399973 7 1920-01-01 -0.300232190 8 1921-01-01 1.464389342 9 1922-01-01 0.142638653 10 1923-01-01 -0.020162385 11 1924-01-01 0.361928571 12 1925-01-01 0.616325588 13 1926-01-01 -0.108206003 14 1927-01-01 -0.318441954 15 1928-01-01 -0.267884586 16 1929-01-01 -0.022473777 17 1930-01-01 -0.294452983 18 1931-01-01 -0.654927109 19 1932-01-01 -0.263508341 20 1933-01-01 0.622530992 21 1934-01-01 1.009666043 22 1935-01-01 0.675484421 23 1936-01-01 1.209162008 24 1937-01-01 1.655280986 25 1948-01-01 2.080021785 26 1949-01-01 0.854572563 27 1950-01-01 0.997540963 28 1951-01-01 1.000244163 29 1952-01-01 0.958322941 30 1953-01-01 0.816259474 31 1954-01-01 0.814488644 32 1955-01-01 1.233694537 33 1958-01-01 0.460120970 34 1959-01-01 0.344201474 35 1960-01-01 1.601430139 36 1961-01-01 0.387850967 37 1962-01-01 -0.385954401 38 1963-01-01 0.699355708 39 1964-01-01 0.084519926 40 1965-01-01 0.708964572 41 1966-01-01 1.456280443 42 1967-01-01 1.479412638 43 1968-01-01 1.199000726 44 1969-01-01 0.282942042 45 1970-01-01 -0.181724504 46 1971-01-01 0.012170186 47 1972-01-01 -0.095891043 48 1973-01-01 -0.075384446 49 1974-01-01 -0.156668145 50 1975-01-01 -0.303023258 51 1976-01-01 -0.516027310 52 1977-01-01 -0.826791524 53 1980-01-01 -0.947112221 54 1981-01-01 -1.634878300 55 1982-01-01 -1.955298323 56 1987-01-01 -1.854447550 57 1988-01-01 -1.458955443 58 1989-01-01 -1.256102245 59 1990-01-01 -0.864108585 60 1991-01-01 -1.293373024 61 1992-01-01 -1.049530431 62 1993-01-01 -1.002526230 63 1994-01-01 -0.868783614 64 1995-01-01 -1.081858981 65 1996-01-01 -1.302103374 66 1997-01-01 -1.288048194 67 1998-01-01 -1.455750340 68 1999-01-01 -1.015467069 69 2000-01-01 -0.682789640 70 2001-01-01 -0.811058004 71 2002-01-01 -0.972374057 72 2003-01-01 -0.536505225 73 2004-01-01 -0.518686263 74 2005-01-01 -0.976298621 75 2006-01-01 -0.946429713 I would like plot the data in this kind: where on x axes there is column x of df, and on y axes column acc. Is possible plot it with ggplot? I tried with this code: ggplot(df,aes(x=x,y=acc))+ geom_linerange(data =df , aes(colour = ifelse(acc <0, "blue", "red")),ymin=min(df),ymax=max(cdf)) but the result is this: Please, how I can do it?
Is this what you want? I'm not sure. ggplot(data = df,mapping = aes(x,acc))+geom_segment(data = df , mapping = aes(x=x,y=ystart,xend=x,yend=acc,color=col)) df$x=year(as.Date(df$x)) df$ystart=0 df$col=ifelse(df$acc>=0,"blue","red")
How to calculate average time interval based on unique value?
I'm having trouble when trying to calculate the average time interval (how many days) between appearances of the same value in another column. My data looks like this: dt subject_id 2016-09-13 77 2016-11-07 1791 2016-09-18 1332 2016-08-31 84 2016-08-23 89 2016-08-23 41 2016-09-15 41 2016-10-12 93 2016-10-05 93 2016-11-09 94 2016-10-25 94 2016-11-03 94 2016-10-09 375 2016-10-14 11 2016-09-27 11 2016-09-13 11 2016-08-23 11 2016-08-27 11 And I want to get something like this: subject_id mean_day 41 23 93 7 94 7.5 11 13 I tried to use: aggregate(dt~subject_id, data, mean) But it can't calculate mean from Date values. Any ideas?
My first approach would be something like this: df$dt <- as.Date(df$dt) library(dplyr) df %>% group_by(subject_id) %>% summarise((max(dt) - min(dt))/(n()-1)) # <int> <time> #1 11 13.0 days #2 41 23.0 days #3 77 NaN days #4 84 NaN days #5 89 NaN days #6 93 7.0 days #7 94 7.5 days #8 375 NaN days #9 1332 NaN days #10 1791 NaN days I think it's a starting point for you ... you can modify as you want.
insert new rows to the time series data, with date added automatically
I have a time-series data frame looks like: TS.1 2015-09-01 361656.7 2015-09-02 370086.4 2015-09-03 346571.2 2015-09-04 316616.9 2015-09-05 342271.8 2015-09-06 361548.2 2015-09-07 342609.2 2015-09-08 281868.8 2015-09-09 297011.1 2015-09-10 295160.5 2015-09-11 287926.9 2015-09-12 323365.8 Now, what I want to do is add some new data points (rows) to the existing data frame, say, 320123.5 323521.7 How can I added corresponding date to each row? The data is just sequentially inhered from the last row. Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data: df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31)) new.x <- c(32, 33) This adds the extra observations along with the proper sequence of dates: new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x) Then just rbind them to get your expanded data frame: rbind(df, new.df) date x 1 2015-01-01 1 2 2015-01-02 2 3 2015-01-03 3 4 2015-01-04 4 5 2015-01-05 5 6 2015-01-06 6 7 2015-01-07 7 8 2015-01-08 8 9 2015-01-09 9 10 2015-01-10 10 11 2015-01-11 11 12 2015-01-12 12 13 2015-01-13 13 14 2015-01-14 14 15 2015-01-15 15 16 2015-01-16 16 17 2015-01-17 17 18 2015-01-18 18 19 2015-01-19 19 20 2015-01-20 20 21 2015-01-21 21 22 2015-01-22 22 23 2015-01-23 23 24 2015-01-24 24 25 2015-01-25 25 26 2015-01-26 26 27 2015-01-27 27 28 2015-01-28 28 29 2015-01-29 29 30 2015-01-30 30 31 2015-01-31 31 32 2015-02-01 32 33 2015-02-02 33
How to identify the records that belong to a certain time interval when I know the start and end records of that interval? (R)
So, here is my problem. I have a dataset of locations of radiotagged hummingbirds I’ve been following as part of my thesis. As you might imagine, they fly fast so there were intervals when I lost track of where they were until I eventually found them again. Now I am trying to identify the segments where the bird was followed continuously (i.e., the intervals between “Lost” periods). ID Type TimeStart TimeEnd Limiter Starter Ender 1 Observed 6:45:00 6:45:00 NO Start End 2 Lost 6:45:00 5:31:00 YES NO NO 3 Observed 5:31:00 5:31:00 NO Start NO 4 Observed 9:48:00 9:48:00 NO NO NO 5 Observed 10:02:00 10:02:00 NO NO NO 6 Observed 10:18:00 10:18:00 NO NO NO 7 Observed 11:00:00 11:00:00 NO NO NO 8 Observed 13:15:00 13:15:00 NO NO NO 9 Observed 13:34:00 13:34:00 NO NO NO 10 Observed 13:43:00 13:43:00 NO NO NO 11 Observed 13:52:00 13:52:00 NO NO NO 12 Observed 14:25:00 14:25:00 NO NO NO 13 Observed 14:46:00 14:46:00 NO NO End 14 Lost 14:46:00 10:47:00 YES NO NO 15 Observed 10:47:00 10:47:00 NO Start NO 16 Observed 10:57:00 11:00:00 NO NO NO 17 Observed 11:10:00 11:10:00 NO NO NO 18 Observed 11:19:00 11:27:55 NO NO NO 19 Observed 11:28:05 11:32:00 NO NO NO 20 Observed 11:45:00 12:09:00 NO NO NO 21 Observed 11:51:00 11:51:00 NO NO NO 22 Observed 12:11:00 12:11:00 NO NO NO 23 Observed 13:15:00 13:15:00 NO NO End 24 Lost 13:15:00 7:53:00 YES NO NO 25 Observed 7:53:00 7:53:00 NO Start NO 26 Observed 8:48:00 8:48:00 NO NO NO 27 Observed 9:25:00 9:25:00 NO NO NO 28 Observed 9:26:00 9:26:00 NO NO NO 29 Observed 9:32:00 9:33:25 NO NO NO 30 Observed 9:33:35 9:33:35 NO NO NO 31 Observed 9:42:00 9:42:00 NO NO NO 32 Observed 9:44:00 9:44:00 NO NO NO 33 Observed 9:48:00 9:48:00 NO NO NO 34 Observed 9:48:30 9:48:30 NO NO NO 35 Observed 9:51:00 9:51:00 NO NO NO 36 Observed 9:54:00 9:54:00 NO NO NO 37 Observed 9:55:00 9:55:00 NO NO NO 38 Observed 9:57:00 10:01:00 NO NO NO 39 Observed 10:02:00 10:02:00 NO NO NO 40 Observed 10:04:00 10:04:00 NO NO NO 41 Observed 10:06:00 10:06:00 NO NO NO 42 Observed 10:20:00 10:33:00 NO NO NO 43 Observed 10:34:00 10:34:00 NO NO NO 44 Observed 10:39:00 10:39:00 NO NO End Note: When there is a “Start” and an “End” in the same row it’s because the non-lost period consists only of that record. I was able to identify the records that start or end these “non-lost” periods (under the columns “Starter” and “Ender”), but now I want to be able to identify those periods by giving them unique identifiers (period A,B,C or 1,2,3, etc). Ideally, the name of the identifier would be the name of the start point for that period (i.e., ID[ Starter==”Start”]) I'm looking for something like this: ID Type TimeStart TimeEnd Limiter Starter Ender Period 1 Observed 6:45:00 6:45:00 NO Start End 1 2 Lost 6:45:00 5:31:00 YES NO NO Lost 3 Observed 5:31:00 5:31:00 NO Start NO 3 4 Observed 9:48:00 9:48:00 NO NO NO 3 5 Observed 10:02:00 10:02:00 NO NO NO 3 6 Observed 10:18:00 10:18:00 NO NO NO 3 7 Observed 11:00:00 11:00:00 NO NO NO 3 8 Observed 13:15:00 13:15:00 NO NO NO 3 9 Observed 13:34:00 13:34:00 NO NO NO 3 10 Observed 13:43:00 13:43:00 NO NO NO 3 11 Observed 13:52:00 13:52:00 NO NO NO 3 12 Observed 14:25:00 14:25:00 NO NO NO 3 13 Observed 14:46:00 14:46:00 NO NO End 3 14 Lost 14:46:00 10:47:00 YES NO NO Lost 15 Observed 10:47:00 10:47:00 NO Start NO 15 16 Observed 10:57:00 11:00:00 NO NO NO 15 17 Observed 11:10:00 11:10:00 NO NO NO 15 18 Observed 11:19:00 11:27:55 NO NO NO 15 19 Observed 11:28:05 11:32:00 NO NO NO 15 20 Observed 11:45:00 12:09:00 NO NO NO 15 21 Observed 11:51:00 11:51:00 NO NO NO 15 22 Observed 12:11:00 12:11:00 NO NO NO 15 23 Observed 13:15:00 13:15:00 NO NO End 15 24 Lost 13:15:00 7:53:00 YES NO NO Lost Would this be too hard to do in R? Thanks!
> d <- data.frame(Limiter = rep("NO", 44), Starter = rep("NO", 44), Ender = rep("NO", 44), stringsAsFactors = FALSE) > d$Starter[c(1, 3, 15, 25)] <- "Start" > d$Ender[c(1, 13, 23, 44)] <- "End" > d$Limiter[c(2, 14, 24)] <- "Yes" > d$Period <- ifelse(d$Limiter == "Yes", "Lost", which(d$Starter == "Start")[cumsum(d$Starter == "Start")]) > d Limiter Starter Ender Period 1 NO Start End 1 2 Yes NO NO Lost 3 NO Start NO 3 4 NO NO NO 3 5 NO NO NO 3 6 NO NO NO 3 7 NO NO NO 3 8 NO NO NO 3 9 NO NO NO 3 10 NO NO NO 3 11 NO NO NO 3 12 NO NO NO 3 13 NO NO End 3 14 Yes NO NO Lost 15 NO Start NO 15 16 NO NO NO 15 17 NO NO NO 15 18 NO NO NO 15 19 NO NO NO 15 20 NO NO NO 15 21 NO NO NO 15 22 NO NO NO 15 23 NO NO End 15 24 Yes NO NO Lost 25 NO Start NO 25 26 NO NO NO 25 27 NO NO NO 25 28 NO NO NO 25 29 NO NO NO 25 30 NO NO NO 25 31 NO NO NO 25 32 NO NO NO 25 33 NO NO NO 25 34 NO NO NO 25 35 NO NO NO 25 36 NO NO NO 25 37 NO NO NO 25 38 NO NO NO 25 39 NO NO NO 25 40 NO NO NO 25 41 NO NO NO 25 42 NO NO NO 25 43 NO NO NO 25 44 NO NO End 25
Generate entries in time series data
I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks! trans_date ammount 1 2004-12-01 2968.91 2 2005-04-01 500.62 3 2005-05-01 434.30 4 2005-06-01 549.15 5 2005-07-01 276.77 6 2005-09-01 548.64 7 2005-10-01 761.69 8 2005-11-01 636.77 9 2005-12-01 1517.58 10 2006-03-01 719.09 11 2006-04-01 1231.88 12 2006-05-01 580.46 13 2006-07-01 1468.43 14 2006-10-01 692.22 15 2006-11-01 505.81 16 2006-12-01 1589.70 17 2007-03-01 1559.82 18 2007-06-01 764.98 19 2007-07-01 964.77 20 2007-09-01 405.18 21 2007-11-01 112.42 22 2007-12-01 1134.08 23 2008-02-01 269.72 24 2008-03-01 208.96 25 2008-04-01 353.58 26 2008-05-01 756.00 27 2008-06-01 747.85 28 2008-07-01 781.62 29 2008-09-01 195.36 30 2008-10-01 424.24 31 2008-12-01 166.23 32 2009-02-01 237.11 33 2009-04-01 110.94 34 2009-07-01 191.29 35 2009-11-01 153.42 36 2009-12-01 222.87 37 2010-09-01 1259.97 38 2010-11-01 375.61 39 2010-12-01 496.48 40 2011-02-01 360.07 41 2011-03-01 324.95 42 2011-04-01 566.93 43 2011-06-01 281.19 44 2011-08-01 428.04 'data.frame': 44 obs. of 2 variables: $ trans_date : Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ... $ ammount: num 2969 501 434 549 277 ...
you can use seq.Date and merge: > str(df) 'data.frame': 44 obs. of 2 variables: $ trans_date: Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ... $ ammount : num 2969 501 434 549 277 ... > mns <- data.frame(trans_date = seq.Date(min(df$trans_date), max(df$trans_date), by = "month")) > df2 <- merge(mns, df, all = TRUE) > df2$ammount <- ifelse(is.na(df2$ammount), 0, df2$ammount) > head(df2) trans_date ammount 1 2004-12-01 2968.91 2 2005-01-01 0.00 3 2005-02-01 0.00 4 2005-03-01 0.00 5 2005-04-01 500.62 6 2005-05-01 434.30 and if you need months until current, use this: mns <- data.frame(trans_date = seq.Date(min(df$trans_date), Sys.Date(), by = "month")) note that it is sufficient to call simply seq instead of seq.Date if the parameters are Date class.
If you're using xts objects, you can use timeBasedSeq and merge.xts. Assuming your original data is in an object Data: # create xts object: # no comma on the first subset (Data['ammount']) keeps column name; # as.Date needs a vector, so use comma (Data[,'trans_date']) x <- xts(Data['ammount'],as.Date(Data[,'trans_date'])) # create a time-based vector from 2004-12-01 to 2011-08-01. The "m" denotes # monthly time-steps. By default this returns a yearmon class. Use # retclass="Date" to return a Date vector. d <- timeBasedSeq(paste(start(x),end(x),"m",sep="/"), retclass="Date") # merge x with an "empty" xts object, xts(,d), filling with zeros y <- merge(x,xts(,d),fill=0)