Function: calculating seconds between data points - r

I have the following column in my data frame:
DateTime
1 2011-10-03 08:00:04
2 2011-10-03 08:00:05
3 2011-10-03 08:00:06
4 2011-10-03 08:00:09
5 2011-10-03 08:00:15
6 2011-10-03 08:00:24
7 2011-10-03 08:00:30
8 2011-10-03 08:00:42
9 2011-10-03 08:01:01
10 2011-10-03 08:01:24
11 2011-10-03 08:01:58
12 2011-10-03 08:02:34
13 2011-10-03 08:03:25
14 2011-10-03 08:04:26
15 2011-10-03 08:06:00
With dput:
> dput(smallDF)
structure(list(DateTime = structure(c(1317621604, 1317621605,
1317621606, 1317621609, 1317621615, 1317621624, 1317621630, 1317621642,
1317621661, 1317621684, 1317621718, 1317621754, 1317621805, 1317621866,
1317621960, 1317622103, 1317622197, 1317622356, 1317622387, 1317622463,
1317622681, 1317622851, 1317623061, 1317623285, 1317623404, 1317623498,
1317623612, 1317623849, 1317623916, 1317623994, 1317624174, 1317624414,
1317624484, 1317624607, 1317624848, 1317625023, 1317625103, 1317625179,
1317625200, 1317625209, 1317625229, 1317625238, 1317625249, 1317625264,
1317625282, 1317625300, 1317625315, 1317625339, 1317625353, 1317625365,
1317625371, 1317625381, 1317625395, 1317625415, 1317625423, 1317625438,
1317625458, 1317625469, 1317625487, 1317625500, 1317625513, 1317625533,
1317625548, 1317625565, 1317625581, 1317625598, 1317625613, 1317625640,
1317625661, 1317625674, 1317625702, 1317625715, 1317625737, 1317625758,
1317625784, 1317625811, 1317625826, 1317625841, 1317625862, 1317625895,
1317625909, 1317625935, 1317625956, 1317625973, 1317626001, 1317626043,
1317626062, 1317626100, 1317626113, 1317626132, 1317626153, 1317626179,
1317626212, 1317626239, 1317626271, 1317626296, 1317626323, 1317626361,
1317626384, 1317626407), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = "DateTime", row.names = c(NA,
-100L), class = "data.frame")
My goal: I want to calculate the time difference, in seconds, between each measurement.
Edit:
I'm looking to get the following result, where the time difference (in seconds) between each data point is calculated, except for the first value of the day (line 3), when the time is calculate relative to 8 am:
DateTime Seconds
1 2011-09-30 21:59:02 6
2 2011-09-30 21:59:04 2
3 2011-10-03 08:00:04 4
4 2011-10-03 08:00:05 1
5 2011-10-03 08:00:06 1
6 2011-10-03 08:00:09 3
7 2011-10-03 08:00:15 5
8 2011-10-03 08:00:24 9
9 2011-10-03 08:00:30 6
10 2011-10-03 08:00:42 12
11 2011-10-03 08:01:01 19
12 2011-10-03 08:01:24 23
13 2011-10-03 08:01:58 34
14 2011-10-03 08:02:34 36
15 2011-10-03 08:03:25 51
16 2011-10-03 08:04:26 61
17 2011-10-03 08:06:00 94
However, the measurements start at 8:00 am, so if the value is the first of the day, the number of seconds relative to 8:00 am need to be calculated. In the example above, the first measurement ends at 8:00:04 so using the $sec attribute of POSIX could work here, but on other days the first value may happen a few minutes after 8:00 o'clock.
I've tried to achieve that goal with the following function:
SecondsInBar <- function(x, startTime){
# First data point or first of day
if (x == 1 || x > 1 && x$wkday != x[-1]$wkday){
seconds <- as.numeric(difftime(x,
as.POSIXlt(startTime, format = "%H:%M:%S"),
units = "secs"))
# else calculate time difference
} else {
seconds <- as.numeric(difftime(x, x[-1], units = "secs"))
}
return (seconds)
}
Which then could be called with SecondsInBar(smallDF$DateTime, "08:00:00").
There are at least two problems with this function, but I don't know how to solve these:
The code segment x$wkday != x[-1]$wkday returns a $ operator is
invalid for atomic vectors error,
And the as.POSIXlt(startTime, format = "%H:%M:%S") uses the
current date, which makes the difftime calculation erroneous.
My question:
Where am I going wrong with this function?
And: is this approach a viable way or should I approach it from a different angle?

How about something along these lines:
smallDF$DateTime - as.POSIXct(paste(strftime(smallDF$DateTime,"%Y-%m-%d"),"07:00:00"))
Time differences in secs
[1] 4 5 6 9 15 24 30 42 61 84 118 154 205 266 360
[16] 503 597 756 787 863 1081 1251 1461 1685 1804 1898 2012 2249 2316 2394
[31] 2574 2814 2884 3007 3248 3423 3503 3579 3600 3609 3629 3638 3649 3664 3682
[46] 3700 3715 3739 3753 3765 3771 3781 3795 3815 3823 3838 3858 3869 3887 3900
[61] 3913 3933 3948 3965 3981 3998 4013 4040 4061 4074 4102 4115 4137 4158 4184
[76] 4211 4226 4241 4262 4295 4309 4335 4356 4373 4401 4443 4462 4500 4513 4532
[91] 4553 4579 4612 4639 4671 4696 4723 4761 4784 4807
attr(,"tzone")
[1] ""
Note that I used 7am as when I copied your data my it decided to interpret it as BST.
As for your errors, you can't use $ to get elements of a date with POSIXct (which is how smallDF$DateTime is defined), only with POSIXlt. And for the second error, if you don't supply a date, it has to assume the current date, as there is no other information to draw upon.
Edit
Now its been clarified, I would propose a different approach: split your data frame by day, and then combine the times with the reference time and do diff on that, using lapply to loop over days:
#modify dataframe to add extra day to second half
smallDF[51:100,1] <- smallDF[51:100,1]+86400
smallDF2 <- split(smallDF,strftime(smallDF$DateTime,"%Y-%m-%d"))
lapply(smallDF2,function(x) diff(c(as.POSIXct(paste(strftime(x$DateTime[1],"%Y-%m-%d"),"07:00:00")),x$DateTime)))
$`2011-10-03`
Time differences in secs
[1] 4 1 1 3 6 9 6 12 19 23 34 36 51 61 94 143 94 159 31
[20] 76 218 170 210 224 119 94 114 237 67 78 180 240 70 123 241 175 80 76
[39] 21 9 20 9 11 15 18 18 15 24 14 12
$`2011-10-04`
Time differences in secs
[1] 3771 10 14 20 8 15 20 11 18 13 13 20 15 17 16
[16] 17 15 27 21 13 28 13 22 21 26 27 15 15 21 33
[31] 14 26 21 17 28 42 19 38 13 19 21 26 33 27 32
[46] 25 27 38 23 23

Related

R: How plot negative and positive anomaly (for this data) with ggplot? [duplicate]

This question already has answers here:
How to fill with different colors between two lines? (originally: fill geom_polygon with different colors above and below y = 0 (or any other value)?)
(4 answers)
Closed 5 years ago.
I have this df
x acc
1 1902-01-01 0.782887804
2 1903-01-01 -0.003144199
3 1904-01-01 0.100006276
4 1905-01-01 0.326173392
5 1906-01-01 1.285114692
6 1907-01-01 2.844399973
7 1920-01-01 -0.300232190
8 1921-01-01 1.464389342
9 1922-01-01 0.142638653
10 1923-01-01 -0.020162385
11 1924-01-01 0.361928571
12 1925-01-01 0.616325588
13 1926-01-01 -0.108206003
14 1927-01-01 -0.318441954
15 1928-01-01 -0.267884586
16 1929-01-01 -0.022473777
17 1930-01-01 -0.294452983
18 1931-01-01 -0.654927109
19 1932-01-01 -0.263508341
20 1933-01-01 0.622530992
21 1934-01-01 1.009666043
22 1935-01-01 0.675484421
23 1936-01-01 1.209162008
24 1937-01-01 1.655280986
25 1948-01-01 2.080021785
26 1949-01-01 0.854572563
27 1950-01-01 0.997540963
28 1951-01-01 1.000244163
29 1952-01-01 0.958322941
30 1953-01-01 0.816259474
31 1954-01-01 0.814488644
32 1955-01-01 1.233694537
33 1958-01-01 0.460120970
34 1959-01-01 0.344201474
35 1960-01-01 1.601430139
36 1961-01-01 0.387850967
37 1962-01-01 -0.385954401
38 1963-01-01 0.699355708
39 1964-01-01 0.084519926
40 1965-01-01 0.708964572
41 1966-01-01 1.456280443
42 1967-01-01 1.479412638
43 1968-01-01 1.199000726
44 1969-01-01 0.282942042
45 1970-01-01 -0.181724504
46 1971-01-01 0.012170186
47 1972-01-01 -0.095891043
48 1973-01-01 -0.075384446
49 1974-01-01 -0.156668145
50 1975-01-01 -0.303023258
51 1976-01-01 -0.516027310
52 1977-01-01 -0.826791524
53 1980-01-01 -0.947112221
54 1981-01-01 -1.634878300
55 1982-01-01 -1.955298323
56 1987-01-01 -1.854447550
57 1988-01-01 -1.458955443
58 1989-01-01 -1.256102245
59 1990-01-01 -0.864108585
60 1991-01-01 -1.293373024
61 1992-01-01 -1.049530431
62 1993-01-01 -1.002526230
63 1994-01-01 -0.868783614
64 1995-01-01 -1.081858981
65 1996-01-01 -1.302103374
66 1997-01-01 -1.288048194
67 1998-01-01 -1.455750340
68 1999-01-01 -1.015467069
69 2000-01-01 -0.682789640
70 2001-01-01 -0.811058004
71 2002-01-01 -0.972374057
72 2003-01-01 -0.536505225
73 2004-01-01 -0.518686263
74 2005-01-01 -0.976298621
75 2006-01-01 -0.946429713
I would like plot the data in this kind:
where on x axes there is column x of df, and on y axes column acc.
Is possible plot it with ggplot?
I tried with this code:
ggplot(df,aes(x=x,y=acc))+
geom_linerange(data =df , aes(colour = ifelse(acc <0, "blue", "red")),ymin=min(df),ymax=max(cdf))
but the result is this:
Please, how I can do it?
Is this what you want? I'm not sure.
ggplot(data = df,mapping = aes(x,acc))+geom_segment(data = df , mapping = aes(x=x,y=ystart,xend=x,yend=acc,color=col))
df$x=year(as.Date(df$x))
df$ystart=0
df$col=ifelse(df$acc>=0,"blue","red")

How to calculate average time interval based on unique value?

I'm having trouble when trying to calculate the average time interval (how many days) between appearances of the same value in another column.
My data looks like this:
dt subject_id
2016-09-13 77
2016-11-07 1791
2016-09-18 1332
2016-08-31 84
2016-08-23 89
2016-08-23 41
2016-09-15 41
2016-10-12 93
2016-10-05 93
2016-11-09 94
2016-10-25 94
2016-11-03 94
2016-10-09 375
2016-10-14 11
2016-09-27 11
2016-09-13 11
2016-08-23 11
2016-08-27 11
And I want to get something like this:
subject_id mean_day
41 23
93 7
94 7.5
11 13
I tried to use:
aggregate(dt~subject_id, data, mean)
But it can't calculate mean from Date values. Any ideas?
My first approach would be something like this:
df$dt <- as.Date(df$dt)
library(dplyr)
df %>%
group_by(subject_id) %>%
summarise((max(dt) - min(dt))/(n()-1))
# <int> <time>
#1 11 13.0 days
#2 41 23.0 days
#3 77 NaN days
#4 84 NaN days
#5 89 NaN days
#6 93 7.0 days
#7 94 7.5 days
#8 375 NaN days
#9 1332 NaN days
#10 1791 NaN days
I think it's a starting point for you ... you can modify as you want.

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

How to identify the records that belong to a certain time interval when I know the start and end records of that interval? (R)

So, here is my problem. I have a dataset of locations of radiotagged hummingbirds I’ve been following as part of my thesis. As you might imagine, they fly fast so there were intervals when I lost track of where they were until I eventually found them again.
Now I am trying to identify the segments where the bird was followed continuously (i.e., the intervals between “Lost” periods).
ID Type TimeStart TimeEnd Limiter Starter Ender
1 Observed 6:45:00 6:45:00 NO Start End
2 Lost 6:45:00 5:31:00 YES NO NO
3 Observed 5:31:00 5:31:00 NO Start NO
4 Observed 9:48:00 9:48:00 NO NO NO
5 Observed 10:02:00 10:02:00 NO NO NO
6 Observed 10:18:00 10:18:00 NO NO NO
7 Observed 11:00:00 11:00:00 NO NO NO
8 Observed 13:15:00 13:15:00 NO NO NO
9 Observed 13:34:00 13:34:00 NO NO NO
10 Observed 13:43:00 13:43:00 NO NO NO
11 Observed 13:52:00 13:52:00 NO NO NO
12 Observed 14:25:00 14:25:00 NO NO NO
13 Observed 14:46:00 14:46:00 NO NO End
14 Lost 14:46:00 10:47:00 YES NO NO
15 Observed 10:47:00 10:47:00 NO Start NO
16 Observed 10:57:00 11:00:00 NO NO NO
17 Observed 11:10:00 11:10:00 NO NO NO
18 Observed 11:19:00 11:27:55 NO NO NO
19 Observed 11:28:05 11:32:00 NO NO NO
20 Observed 11:45:00 12:09:00 NO NO NO
21 Observed 11:51:00 11:51:00 NO NO NO
22 Observed 12:11:00 12:11:00 NO NO NO
23 Observed 13:15:00 13:15:00 NO NO End
24 Lost 13:15:00 7:53:00 YES NO NO
25 Observed 7:53:00 7:53:00 NO Start NO
26 Observed 8:48:00 8:48:00 NO NO NO
27 Observed 9:25:00 9:25:00 NO NO NO
28 Observed 9:26:00 9:26:00 NO NO NO
29 Observed 9:32:00 9:33:25 NO NO NO
30 Observed 9:33:35 9:33:35 NO NO NO
31 Observed 9:42:00 9:42:00 NO NO NO
32 Observed 9:44:00 9:44:00 NO NO NO
33 Observed 9:48:00 9:48:00 NO NO NO
34 Observed 9:48:30 9:48:30 NO NO NO
35 Observed 9:51:00 9:51:00 NO NO NO
36 Observed 9:54:00 9:54:00 NO NO NO
37 Observed 9:55:00 9:55:00 NO NO NO
38 Observed 9:57:00 10:01:00 NO NO NO
39 Observed 10:02:00 10:02:00 NO NO NO
40 Observed 10:04:00 10:04:00 NO NO NO
41 Observed 10:06:00 10:06:00 NO NO NO
42 Observed 10:20:00 10:33:00 NO NO NO
43 Observed 10:34:00 10:34:00 NO NO NO
44 Observed 10:39:00 10:39:00 NO NO End
Note: When there is a “Start” and an “End” in the same row it’s because the non-lost period consists only of that record.
I was able to identify the records that start or end these “non-lost” periods (under the columns “Starter” and “Ender”), but now I want to be able to identify those periods by giving them unique identifiers (period A,B,C or 1,2,3, etc).
Ideally, the name of the identifier would be the name of the start point for that period (i.e., ID[ Starter==”Start”])
I'm looking for something like this:
ID Type TimeStart TimeEnd Limiter Starter Ender Period
1 Observed 6:45:00 6:45:00 NO Start End 1
2 Lost 6:45:00 5:31:00 YES NO NO Lost
3 Observed 5:31:00 5:31:00 NO Start NO 3
4 Observed 9:48:00 9:48:00 NO NO NO 3
5 Observed 10:02:00 10:02:00 NO NO NO 3
6 Observed 10:18:00 10:18:00 NO NO NO 3
7 Observed 11:00:00 11:00:00 NO NO NO 3
8 Observed 13:15:00 13:15:00 NO NO NO 3
9 Observed 13:34:00 13:34:00 NO NO NO 3
10 Observed 13:43:00 13:43:00 NO NO NO 3
11 Observed 13:52:00 13:52:00 NO NO NO 3
12 Observed 14:25:00 14:25:00 NO NO NO 3
13 Observed 14:46:00 14:46:00 NO NO End 3
14 Lost 14:46:00 10:47:00 YES NO NO Lost
15 Observed 10:47:00 10:47:00 NO Start NO 15
16 Observed 10:57:00 11:00:00 NO NO NO 15
17 Observed 11:10:00 11:10:00 NO NO NO 15
18 Observed 11:19:00 11:27:55 NO NO NO 15
19 Observed 11:28:05 11:32:00 NO NO NO 15
20 Observed 11:45:00 12:09:00 NO NO NO 15
21 Observed 11:51:00 11:51:00 NO NO NO 15
22 Observed 12:11:00 12:11:00 NO NO NO 15
23 Observed 13:15:00 13:15:00 NO NO End 15
24 Lost 13:15:00 7:53:00 YES NO NO Lost
Would this be too hard to do in R?
Thanks!
> d <- data.frame(Limiter = rep("NO", 44), Starter = rep("NO", 44), Ender = rep("NO", 44), stringsAsFactors = FALSE)
> d$Starter[c(1, 3, 15, 25)] <- "Start"
> d$Ender[c(1, 13, 23, 44)] <- "End"
> d$Limiter[c(2, 14, 24)] <- "Yes"
> d$Period <- ifelse(d$Limiter == "Yes", "Lost", which(d$Starter == "Start")[cumsum(d$Starter == "Start")])
> d
Limiter Starter Ender Period
1 NO Start End 1
2 Yes NO NO Lost
3 NO Start NO 3
4 NO NO NO 3
5 NO NO NO 3
6 NO NO NO 3
7 NO NO NO 3
8 NO NO NO 3
9 NO NO NO 3
10 NO NO NO 3
11 NO NO NO 3
12 NO NO NO 3
13 NO NO End 3
14 Yes NO NO Lost
15 NO Start NO 15
16 NO NO NO 15
17 NO NO NO 15
18 NO NO NO 15
19 NO NO NO 15
20 NO NO NO 15
21 NO NO NO 15
22 NO NO NO 15
23 NO NO End 15
24 Yes NO NO Lost
25 NO Start NO 25
26 NO NO NO 25
27 NO NO NO 25
28 NO NO NO 25
29 NO NO NO 25
30 NO NO NO 25
31 NO NO NO 25
32 NO NO NO 25
33 NO NO NO 25
34 NO NO NO 25
35 NO NO NO 25
36 NO NO NO 25
37 NO NO NO 25
38 NO NO NO 25
39 NO NO NO 25
40 NO NO NO 25
41 NO NO NO 25
42 NO NO NO 25
43 NO NO NO 25
44 NO NO End 25

Generate entries in time series data

I want to generate a row (with zero ammount) for each missing month (until the current) in the following dataframe. Can you please give me a hand in this? Thanks!
trans_date ammount
1 2004-12-01 2968.91
2 2005-04-01 500.62
3 2005-05-01 434.30
4 2005-06-01 549.15
5 2005-07-01 276.77
6 2005-09-01 548.64
7 2005-10-01 761.69
8 2005-11-01 636.77
9 2005-12-01 1517.58
10 2006-03-01 719.09
11 2006-04-01 1231.88
12 2006-05-01 580.46
13 2006-07-01 1468.43
14 2006-10-01 692.22
15 2006-11-01 505.81
16 2006-12-01 1589.70
17 2007-03-01 1559.82
18 2007-06-01 764.98
19 2007-07-01 964.77
20 2007-09-01 405.18
21 2007-11-01 112.42
22 2007-12-01 1134.08
23 2008-02-01 269.72
24 2008-03-01 208.96
25 2008-04-01 353.58
26 2008-05-01 756.00
27 2008-06-01 747.85
28 2008-07-01 781.62
29 2008-09-01 195.36
30 2008-10-01 424.24
31 2008-12-01 166.23
32 2009-02-01 237.11
33 2009-04-01 110.94
34 2009-07-01 191.29
35 2009-11-01 153.42
36 2009-12-01 222.87
37 2010-09-01 1259.97
38 2010-11-01 375.61
39 2010-12-01 496.48
40 2011-02-01 360.07
41 2011-03-01 324.95
42 2011-04-01 566.93
43 2011-06-01 281.19
44 2011-08-01 428.04
'data.frame': 44 obs. of 2 variables:
$ trans_date : Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount: num 2969 501 434 549 277 ...
you can use seq.Date and merge:
> str(df)
'data.frame': 44 obs. of 2 variables:
$ trans_date: Date, format: "2004-12-01" "2005-04-01" "2005-05-01" "2005-06-01" ...
$ ammount : num 2969 501 434 549 277 ...
> mns <- data.frame(trans_date = seq.Date(min(df$trans_date), max(df$trans_date), by = "month"))
> df2 <- merge(mns, df, all = TRUE)
> df2$ammount <- ifelse(is.na(df2$ammount), 0, df2$ammount)
> head(df2)
trans_date ammount
1 2004-12-01 2968.91
2 2005-01-01 0.00
3 2005-02-01 0.00
4 2005-03-01 0.00
5 2005-04-01 500.62
6 2005-05-01 434.30
and if you need months until current, use this:
mns <- data.frame(trans_date = seq.Date(min(df$trans_date), Sys.Date(), by = "month"))
note that it is sufficient to call simply seq instead of seq.Date if the parameters are Date class.
If you're using xts objects, you can use timeBasedSeq and merge.xts. Assuming your original data is in an object Data:
# create xts object:
# no comma on the first subset (Data['ammount']) keeps column name;
# as.Date needs a vector, so use comma (Data[,'trans_date'])
x <- xts(Data['ammount'],as.Date(Data[,'trans_date']))
# create a time-based vector from 2004-12-01 to 2011-08-01. The "m" denotes
# monthly time-steps. By default this returns a yearmon class. Use
# retclass="Date" to return a Date vector.
d <- timeBasedSeq(paste(start(x),end(x),"m",sep="/"), retclass="Date")
# merge x with an "empty" xts object, xts(,d), filling with zeros
y <- merge(x,xts(,d),fill=0)

Resources