How to insert rows at variable positions in a dataframe - r

My original data frame shows changes on one variable (act, measured in seconds) over approx a 2-week period for several individuals (identified by Ring). My problem is that this variable stretches over the change of date (i.e. at midnight) and I wanted to split it in two: from time[i] till just at midnight, and from midnight until time[i+1]. I have added a few more variables that I need for computing these two operations:
modify the ith row (only when date changes) so I can get the portion of act[i] before midnight
insert one extra row (only when date changes) and assign it the other portion of act[i].
For example:
ith row: 01-01-2000 23:55:00 act= 360 seconds
i+1th row: 02-01-2000 00:01:00 act= 30 seconds
i+2th row: 02-01-2000 00:01:30 act= 50 seconds
.
.
.
My goal is to get:
ith row: 01-01-2000 23:55:00 act= 300 seconds # modified row
i+1th row: 02-01-2000 00:00:00 act= 60 seconds # inserted row
i+2th row: 02-01-2000 00:01:00 act= 30 seconds # previously row i+1th
i+3th row: 02-01-2000 00:01:30 act= 30 seconds #previously row i+2th
.
.
.
Data associated to each individual (Ring) stretch over a different period of time, thereby resulting in date changes between individuals that shoudn't be taken into account.
Below, a selection of my ~ 90000-row dataframe (xact) that shows date changes within and between individuals (Ring) and next my code:
Ring time act wd date clock timepos timemn actmn jul
156 6106933 09/06/11 21:37:45 267 dry 09/06/11 21:37:45 2011-06-09 21:37:45 2011-06-10 8535 15134
157 6106933 09/06/11 21:42:12 3417 wet 09/06/11 21:42:12 2011-06-09 21:42:12 2011-06-10 8268 15134
158 6106933 09/06/11 22:39:09 51 dry 09/06/11 22:39:09 2011-06-09 22:39:09 2011-06-10 4851 15134
159 6106933 09/06/11 22:40:00 7317 wet 09/06/11 22:40:00 2011-06-09 22:40:00 2011-06-10 4800 15134
160 6106933 10/06/11 00:41:57 24 dry 10/06/11 00:41:57 2011-06-10 00:41:57 2011-06-11 83883 15135
529 6106933 11/06/11 22:41:57 3177 wet 11/06/11 22:41:57 2011-06-11 22:41:57 2011-06-12 4683 15136
530 6106933 11/06/11 23:34:54 6 dry 11/06/11 23:34:54 2011-06-11 23:34:54 2011-06-12 1506 15136
531 6106933 11/06/11 23:35:00 1779 wet 11/06/11 23:35:00 2011-06-11 23:35:00 2011-06-12 1500 15136
532 6106933 12/06/11 00:04:39 594 dry 12/06/11 00:04:39 2011-06-12 00:04:39 2011-06-13 86121 15137
533 6106933 12/06/11 00:14:33 18840 wet 12/06/11 00:14:33 2011-06-12 00:14:33 2011-06-13 85527 15137
7024 6134701 24/07/11 15:24:14 6 dry 24/07/11 15:24:14 2011-07-24 15:24:14 2011-07-25 30946 15179
7025 6134701 24/07/11 15:24:20 6 wet 24/07/11 15:24:20 2011-07-24 15:24:20 2011-07-25 30940 15179
7026 6134701 24/07/11 15:24:26 810 dry 24/07/11 15:24:26 2011-07-24 15:24:26 2011-07-25 30934 15179
R = unique(xact$Ring)
for ( m in R ) {
for ( i in 1:nrow(xact) ) {
if( xact$jul[i] < xact$jul[i+1] ) {
# modify row i (jul= Julian date)
xact[i] <- c( xact$Ring[i], xact$time[i], xact$actmn[i], xact$wd[i], xact$date[i], xact$clock[i], xact$timepos[i], xact$timemn[i], xact$actmn[i], xact$jul[i] )
# add new row between row i and row i+1
r <- i
newrow <- c( xact$Ring[i], xact$timemn[i], as.numeric(xact$timepos[i+1] - xact$timemn[i]), xact$wd[i], xact$date[i+1], xact$clock[i+1], xact$timemn[i], xact$timemn[i], xact$actmn[i], xact$jul[i+1] )
insertRow <- function( xact, newrow, r ) {
xact[seq( r+1, nrow(xact) + 1), ] <- xact[seq( r, nrow(xact) ), ]
xact[r,] <- newrow
xact
}
}
}
}
I tried to adapt an existing code Add new row to dataframe, at specific row-index, not appended? but produces this message:
I would appreciate any help.
Santi

Here is an example with made-up data:
#create data
DF <- data.frame(time=seq(from=strptime("2013-01-01 01:00","%Y-%m-%d %H:%M"),to=strptime("2013-01-03 01:00","%Y-%m-%d %H:%M"),by=3500))
DF$ring <- 1:2
DF <- DF[order(DF$ID),]
#apply per ring
library(plyr)
DF <- ddply(DF,.(ring),function(df){
#index of date changes
ind <- c(FALSE,diff(as.POSIXlt(df$time)$yday)==1)
add <- df[ind,]
add$time <- round(add$time,"days")
#you can simply rbind and order, no need for inserting
df <- rbind(df,add)
df <- df[order(df$time),]
#it's easier to calculate act here
df$act <- c(diff(as.numeric(df$time)),NA)
df})
time ring act
1 2013-01-01 01:00:00 1 7000
2 2013-01-01 02:56:40 1 7000
3 2013-01-01 04:53:20 1 7000
4 2013-01-01 06:50:00 1 7000
5 2013-01-01 08:46:40 1 7000
6 2013-01-01 10:43:20 1 7000
7 2013-01-01 12:40:00 1 7000
8 2013-01-01 14:36:40 1 7000
9 2013-01-01 16:33:20 1 7000
10 2013-01-01 18:30:00 1 7000
11 2013-01-01 20:26:40 1 7000
12 2013-01-01 22:23:20 1 5800
13 2013-01-02 00:00:00 1 1200
14 2013-01-02 00:20:00 1 7000
15 2013-01-02 02:16:40 1 7000
16 2013-01-02 04:13:20 1 7000
17 2013-01-02 06:10:00 1 7000
18 2013-01-02 08:06:40 1 7000
19 2013-01-02 10:03:20 1 7000
20 2013-01-02 12:00:00 1 7000
21 2013-01-02 13:56:40 1 7000
22 2013-01-02 15:53:20 1 7000
23 2013-01-02 17:50:00 1 7000
24 2013-01-02 19:46:40 1 7000
25 2013-01-02 21:43:20 1 7000
26 2013-01-02 23:40:00 1 NA
27 2013-01-01 01:58:20 2 7000
28 2013-01-01 03:55:00 2 7000
29 2013-01-01 05:51:40 2 7000
30 2013-01-01 07:48:20 2 7000
31 2013-01-01 09:45:00 2 7000
32 2013-01-01 11:41:40 2 7000
33 2013-01-01 13:38:20 2 7000
34 2013-01-01 15:35:00 2 7000
35 2013-01-01 17:31:40 2 7000
36 2013-01-01 19:28:20 2 7000
37 2013-01-01 21:25:00 2 7000
38 2013-01-01 23:21:40 2 2300
39 2013-01-02 00:00:00 2 4700
40 2013-01-02 01:18:20 2 7000
41 2013-01-02 03:15:00 2 7000
42 2013-01-02 05:11:40 2 7000
43 2013-01-02 07:08:20 2 7000
44 2013-01-02 09:05:00 2 7000
45 2013-01-02 11:01:40 2 7000
46 2013-01-02 12:58:20 2 7000
47 2013-01-02 14:55:00 2 7000
48 2013-01-02 16:51:40 2 7000
49 2013-01-02 18:48:20 2 7000
50 2013-01-02 20:45:00 2 7000
51 2013-01-02 22:41:40 2 4700
52 2013-01-03 00:00:00 2 2300
53 2013-01-03 00:38:20 2 NA

Related

Aggregate on a daily basis in R

I'm borrowing the reproducible example given here:
Aggregate daily level data to weekly level in R
since it's pretty much close to what I want to do.
Interval value
1 2012-06-10 552
2 2012-06-11 4850
3 2012-06-12 4642
4 2012-06-13 4132
5 2012-06-14 4190
6 2012-06-15 4186
7 2012-06-16 1139
8 2012-06-17 490
9 2012-06-18 5156
10 2012-06-19 4430
11 2012-06-20 4447
12 2012-06-21 4256
13 2012-06-22 3856
14 2012-06-23 1163
15 2012-06-24 564
16 2012-06-25 4866
17 2012-06-26 4421
18 2012-06-27 4206
19 2012-06-28 4272
20 2012-06-29 3993
21 2012-06-30 1211
22 2012-07-01 698
23 2012-07-02 5770
24 2012-07-03 5103
25 2012-07-04 775
26 2012-07-05 5140
27 2012-07-06 4868
28 2012-07-07 1225
29 2012-07-08 671
30 2012-07-09 5726
31 2012-07-10 5176
In his question, he asks to aggregate on weekly intervals, what I'd like to do is aggregate on a "day of the week basis".
So I'd like to have a table similar to that one, adding the values of all the same day of the week:
Day of the week value
1 "Sunday" 60000
2 "Monday" 50000
3 "Tuesday" 60000
4 "Wednesday" 50000
5 "Thursday" 60000
6 "Friday" 50000
7 "Saturday" 60000
You can try:
aggregate(d$value, list(weekdays(as.Date(d$Interval))), sum)
We can group them by weekly intervals using weekdays :
library(dplyr)
df %>%
group_by(Day_Of_The_Week = weekdays(as.Date(Interval))) %>%
summarise(value = sum(value))
# Day_Of_The_Week value
# <chr> <int>
#1 Friday 16903
#2 Monday 26368
#3 Saturday 4738
#4 Sunday 2975
#5 Thursday 17858
#6 Tuesday 23772
#7 Wednesday 13560
We can do this with data.table
library(data.table)
setDT(df1)[, .(value = sum(value)), .(Dayofweek = weekdays(as.Date(Interval)))]
# Dayofweek value
#1: Sunday 2975
#2: Monday 26368
#3: Tuesday 23772
#4: Wednesday 13560
#5: Thursday 17858
#6: Friday 16903
#7: Saturday 4738
using lubridate https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html
df1$Weekday=wday(arrive,label=TRUE)
library(data.table)
df1=data.table(df1)
df1[,sum(value),Weekday]

Subtract successive rows in a dataframe grouped by id

I have the following data frame:
id day total_amount
1 2015-07-09 1000
1 2015-10-22 100
1 2015-11-12 200
1 2015-11-27 2392
1 2015-12-16 123
6 2015-07-09 200
7 2015-07-09 1000
7 2015-08-27 100018
7 2015-11-25 1000
8 2015-08-27 1000
8 2015-12-07 10000
8 2016-01-18 796
8 2016-03-31 10000
15 2015-09-10 1500
15 2015-09-30 1000
I need to subtract every two successive time in day column if they have the same id until reaching the last row of that id then start subtracting times in day column this time for new id, something similar to following lines in output is expected:
7 2015-07-09 1000 2015-08-27 - 2015-07-09
7 2015-08-27 100018 2015-07-09 - 2015-08-27
7 2015-07-09 1000 0
8 2015-08-27 1000 2015-12-07 - 2015-08-27
8 2015-12-07 10000 2016-01-18 - 2015-12-07
8 2016-01-18 796 2016-03-31 - 2016-01-18
8 2016-03-31 10000 0
15 2015-09-10 1000 2015-09-30 - 2015-09-10
15 2015-09-30 1000 2015-10-01 - 2015-09-30
15 2015-10-01 1000
To get the difference as number of days you could try:
library(dplyr)
group_by(df, id) %>% mutate(new = as.Date(lead(day)) - as.Date(day))
Source: local data frame [15 x 4]
Groups: id [5]
id day total_amount new
(int) (fctr) (int) (dfft)
1 1 2015-07-09 1000 105 days
2 1 2015-10-22 100 21 days
3 1 2015-11-12 200 15 days
4 1 2015-11-27 2392 19 days
5 1 2015-12-16 123 NA days
6 6 2015-07-09 200 NA days
7 7 2015-07-09 1000 49 days
8 7 2015-08-27 100018 90 days
9 7 2015-11-25 1000 NA days
10 8 2015-08-27 1000 102 days
11 8 2015-12-07 10000 42 days
12 8 2016-01-18 796 73 days
13 8 2016-03-31 10000 NA days
14 15 2015-09-10 1500 20 days
15 15 2015-09-30 1000 NA days
EDITED
To subtract the last date from the current date you can use:
# First save the above result as `df1`:
df1[is.na(df1["new"]), "new"] <- as.Date(unlist(df1[is.na(df1["new"]), "day"]))
- Sys.Date()

Create a vector in R by summing rows based on multiple criteria

I have financial data which is currently in 15 minute intervals, but I want to convert the intervals from 15 minutes to 30 minutes before I conduct the rest of my analysis. As such, I would like to sum the traded volumes for two adjacent 15 minute intervals and take the closing price of the second 15 minute sub-interval (ie the end of the 30 minute period).
I have shown below an example of the data (df) and the desired output (df.30min) using an sapply function. This works fine for the example below, but given that I am analysing 10 years of daily data with 50 companies and 27 intervals per day the processing time is excessive, even for one year of data. I have similar issues if I try a for loop.
I am new to R so I am hoping that there is a fairly easy solution using one of the built in functions.
In my actual dataset there are 27 x 15 minute intervals (10:00-16:45). I would like my final "30 minute" dataset to have one 15 minute interval from 13:30-13:45. Also, there may be other anomalies where the stock exchange opened late / closed early or where a stock was put on a trading halt partway through a day. (I have managed to map the times in my data to the correct Interval using a lookup table with a match function.) Given the imperfect structure of my data I am after a solution that is not reliant on a complete set and perfectly even number of 15 minute intervals. In Excel I would use a sumifs function.
set.seed(1)
df <- data.frame(
Company = rep(c("Co A", "Co B", "Co C"), each = 8),
Date = as.Date(rep(c("2005-01-01", "2005-01-02"), times = 3, each = 4)),
Time = as.factor(c("10:00:00", "10:15:00", "10:30:00", "10:45:00")),
Interval = as.factor(c(1,1,2,2)),
Interval.End = as.factor(c(0,1)),
Close = abs(round(rnorm(24),1))*10+100,
Volume = abs(round(rnorm(24),1))*10)
> df
Company Date Time Interval Interval.End Close Volume
1 Co A 2005-01-01 10:00:00 1 0 106 6
2 Co A 2005-01-01 10:15:00 1 1 102 1
3 Co A 2005-01-01 10:30:00 2 0 108 2
4 Co A 2005-01-01 10:45:00 2 1 116 15
5 Co A 2005-01-02 10:00:00 1 0 103 5
6 Co A 2005-01-02 10:15:00 1 1 108 4
7 Co A 2005-01-02 10:30:00 2 0 105 14
8 Co A 2005-01-02 10:45:00 2 1 107 1
9 Co B 2005-01-01 10:00:00 1 0 106 4
10 Co B 2005-01-01 10:15:00 1 1 103 1
11 Co B 2005-01-01 10:30:00 2 0 115 14
12 Co B 2005-01-01 10:45:00 2 1 104 4
13 Co B 2005-01-02 10:00:00 1 0 106 4
14 Co B 2005-01-02 10:15:00 1 1 122 1
15 Co B 2005-01-02 10:30:00 2 0 111 11
16 Co B 2005-01-02 10:45:00 2 1 100 8
17 Co C 2005-01-01 10:00:00 1 0 100 2
18 Co C 2005-01-01 10:15:00 1 1 109 3
19 Co C 2005-01-01 10:30:00 2 0 108 7
20 Co C 2005-01-01 10:45:00 2 1 106 6
21 Co C 2005-01-02 10:00:00 1 0 109 7
22 Co C 2005-01-02 10:15:00 1 1 108 7
23 Co C 2005-01-02 10:30:00 2 0 101 4
24 Co C 2005-01-02 10:45:00 2 1 120 8
df.30min <- df[-which(df$Interval.End == 0),]
df.30min$Volume <-sapply(seq_len(nrow(df.30min)),
function(i) sum(df$Volume[df$Company == df.30min$Company[i] &
df$Date == df.30min$Date[i] &
df$Interval == df.30min$Interval[i]]))
> df.30min
Company Date Time Interval Interval.End Close Volume
2 Co A 2005-01-01 10:15:00 1 1 102 7
4 Co A 2005-01-01 10:45:00 2 1 116 17
6 Co A 2005-01-02 10:15:00 1 1 108 9
8 Co A 2005-01-02 10:45:00 2 1 107 15
10 Co B 2005-01-01 10:15:00 1 1 103 5
12 Co B 2005-01-01 10:45:00 2 1 104 18
14 Co B 2005-01-02 10:15:00 1 1 122 5
16 Co B 2005-01-02 10:45:00 2 1 100 19
18 Co C 2005-01-01 10:15:00 1 1 109 5
20 Co C 2005-01-01 10:45:00 2 1 106 13
22 Co C 2005-01-02 10:15:00 1 1 108 14
24 Co C 2005-01-02 10:45:00 2 1 120 12
Using library dplyr, you can try something like this:
library(dplyr)
df %>% arrange(Company, Date, Time, Interval, Interval.End) %>% group_by(Company, Date, Interval) %>% summarise(Time = Time[2], Interval.End = Interval.End[2], Close = Close[2], Volume = sum(Volume))
Source: local data frame [12 x 7]
Groups: Company, Date [?]
Company Date Interval Time Interval.End Close Volume
(fctr) (date) (fctr) (fctr) (fctr) (dbl) (dbl)
1 Co A 2005-01-01 1 10:15:00 1 102 7
2 Co A 2005-01-01 2 10:45:00 1 116 17
3 Co A 2005-01-02 1 10:15:00 1 108 9
4 Co A 2005-01-02 2 10:45:00 1 107 15
5 Co B 2005-01-01 1 10:15:00 1 103 5
6 Co B 2005-01-01 2 10:45:00 1 104 18
7 Co B 2005-01-02 1 10:15:00 1 122 5
8 Co B 2005-01-02 2 10:45:00 1 100 19
9 Co C 2005-01-01 1 10:15:00 1 109 5
10 Co C 2005-01-01 2 10:45:00 1 106 13
11 Co C 2005-01-02 1 10:15:00 1 108 14
12 Co C 2005-01-02 2 10:45:00 1 120 12
If your data frame is already arranged properly, you can get rid of that arrange part above.
Note: I am assuming there are always two intervals (0, 1) and therefore using hardcoded value of 2. If this is not the case, you can use the proper subsetting.
We can do this using data.table
library(data.table)
setDT(df)[order(Company, Date, Time, Interval),
list(Time=Time[2L], Interval.End = Interval.End[2L],
Close = Close[2L], Volume = sum(Volume)),
by = .(Company, Date, Interval)]
# Company Date Interval Time Interval.End Close Volume
# 1: Co A 2005-01-01 1 10:15:00 1 102 7
# 2: Co A 2005-01-01 2 10:45:00 1 116 17
# 3: Co A 2005-01-02 1 10:15:00 1 108 9
# 4: Co A 2005-01-02 2 10:45:00 1 107 15
# 5: Co B 2005-01-01 1 10:15:00 1 103 5
# 6: Co B 2005-01-01 2 10:45:00 1 104 18
# 7: Co B 2005-01-02 1 10:15:00 1 122 5
# 8: Co B 2005-01-02 2 10:45:00 1 100 19
# 9: Co C 2005-01-01 1 10:15:00 1 109 5
#10: Co C 2005-01-01 2 10:45:00 1 106 13
#11: Co C 2005-01-02 1 10:15:00 1 108 14
#12: Co C 2005-01-02 2 10:45:00 1 120 12

insert new rows to the time series data, with date added automatically

I have a time-series data frame looks like:
TS.1
2015-09-01 361656.7
2015-09-02 370086.4
2015-09-03 346571.2
2015-09-04 316616.9
2015-09-05 342271.8
2015-09-06 361548.2
2015-09-07 342609.2
2015-09-08 281868.8
2015-09-09 297011.1
2015-09-10 295160.5
2015-09-11 287926.9
2015-09-12 323365.8
Now, what I want to do is add some new data points (rows) to the existing data frame, say,
320123.5
323521.7
How can I added corresponding date to each row? The data is just sequentially inhered from the last row.
Is there any package can do this automatically, so that the only thing I do is to insert new data point?
Here's some play data:
df <- data.frame(date = seq(as.Date("2015-01-01"), as.Date("2015-01-31"), "days"), x = seq(31))
new.x <- c(32, 33)
This adds the extra observations along with the proper sequence of dates:
new.df <- data.frame(date=seq(max(df$date) + 1, max(df$date) + length(new.x), "days"), x=new.x)
Then just rbind them to get your expanded data frame:
rbind(df, new.df)
date x
1 2015-01-01 1
2 2015-01-02 2
3 2015-01-03 3
4 2015-01-04 4
5 2015-01-05 5
6 2015-01-06 6
7 2015-01-07 7
8 2015-01-08 8
9 2015-01-09 9
10 2015-01-10 10
11 2015-01-11 11
12 2015-01-12 12
13 2015-01-13 13
14 2015-01-14 14
15 2015-01-15 15
16 2015-01-16 16
17 2015-01-17 17
18 2015-01-18 18
19 2015-01-19 19
20 2015-01-20 20
21 2015-01-21 21
22 2015-01-22 22
23 2015-01-23 23
24 2015-01-24 24
25 2015-01-25 25
26 2015-01-26 26
27 2015-01-27 27
28 2015-01-28 28
29 2015-01-29 29
30 2015-01-30 30
31 2015-01-31 31
32 2015-02-01 32
33 2015-02-02 33

R: aggregating data between events

I have trade data as follows:
TradeNumber OpenTime CloseTime Profit TradeHour Equity
1 01/01/2014 13:10 01/01/2014 14:40 10 13 520
2 01/01/2014 13:25 01/01/2014 13:28 20 13 520
3 01/01/2014 13:29 01/01/2014 15:40 -50 13 520
4 01/01/2014 13:30 01/01/2014 14:05 -5 13 520
5 01/01/2014 14:12 01/01/2014 14:40 12 14 560
6 01/01/2014 14:21 01/01/2014 14:45 -16 14 560
7 01/01/2014 14:50 01/01/2014 14:59 -14 14 560
8 01/01/2014 14:58 01/01/2014 15:05 56 14 560
I am looking to find, for each trade, the sum of the profits of all other trades that closed in the same hour, but before that particular trade, and add it to equity at the time of the trade. So, in the given example, the result would be:
TradeNumber OpenTime CloseTime Profit TradeHour Equity
1 01/01/2014 13:10 01/01/2014 14:40 10 13 520
2 01/01/2014 13:25 01/01/2014 13:28 20 13 520
3 01/01/2014 13:29 01/01/2014 15:40 -50 13 520 + 20
4 01/01/2014 13:30 01/01/2014 14:05 -5 13 520 + 20
5 01/01/2014 14:12 01/01/2014 14:40 12 14 560
6 01/01/2014 14:21 01/01/2014 14:45 -16 14 560 - 5
7 01/01/2014 14:50 01/01/2014 14:59 -14 14 560+10-5+12-16
8 01/01/2014 14:58 01/01/2014 15:05 56 14 560+10-5+12-16
Trade number 8, for example, opened at 14:58 on 01/01/2014. Prior to it opening, there were 4 other trades that closed in that hour (trades 1, 4, 5 and 6). I would, therefore, like to add the profits from those 4 trades to the equity at the beginning of the hour and place that number in the equity column of the trades data.
for (i in 1:nrow(tradeData))
{
tradeData$EquityUSD1 [i] = tradeData$Equity [i] + sum(tradeData$Profit[tradeData$CloseTime <= tradeData$OpenTime[i] & tradeData$CloseTime >= tradeData$tradeHour[i,1]])
}
This works, but is quite slow and I would like to speed it up, as there are tens of thousands of trades.
Any ideas? Please let me know if I have omitted any important data/info
Thanks
Haven't test the speed with a larger data set yet..
dt
## TradeNumber OpenTime CloseTime Profit TradeHour Equity
## 1 1 01/01/2014 13:10 01/01/2014 14:40 10 13 520
## 2 2 01/01/2014 13:25 01/01/2014 13:28 20 13 520
## 3 3 01/01/2014 13:29 01/01/2014 15:40 -50 13 520
## 4 4 01/01/2014 13:30 01/01/2014 14:05 -5 13 520
## 5 5 01/01/2014 14:12 01/01/2014 14:40 12 14 560
## 6 6 01/01/2014 14:21 01/01/2014 14:45 -16 14 560
## 7 7 01/01/2014 14:50 01/01/2014 14:59 -14 14 560
## 8 8 01/01/2014 14:58 01/01/2014 15:05 56 14 560
require(data.table)
setDT(dt)
dt[,OpenTime:=as.POSIXct(OpenTime,format="%m/%d/%Y %H:%M")]
dt[,CloseTime:=as.POSIXct(CloseTime,format="%m/%d/%Y %H:%M")]
dt[,Equity.new:=Equity+sum(dt$Profit[hour(OpenTime)==hour(dt$CloseTime) & OpenTime > dt$CloseTime]), by="TradeNumber"]
dt
## TradeNumber OpenTime CloseTime Profit TradeHour Equity Equity.new
## 1: 1 2014-01-01 13:10:00 2014-01-01 14:40:00 10 13 520 520
## 2: 2 2014-01-01 13:25:00 2014-01-01 13:28:00 20 13 520 520
## 3: 3 2014-01-01 13:29:00 2014-01-01 15:40:00 -50 13 520 540
## 4: 4 2014-01-01 13:30:00 2014-01-01 14:05:00 -5 13 520 540
## 5: 5 2014-01-01 14:12:00 2014-01-01 14:40:00 12 14 560 555
## 6: 6 2014-01-01 14:21:00 2014-01-01 14:45:00 -16 14 560 555
## 7: 7 2014-01-01 14:50:00 2014-01-01 14:59:00 -14 14 560 561
## 8: 8 2014-01-01 14:58:00 2014-01-01 15:05:00 56 14 560 561
The following code seems to produce the output you want, assuming your data is in a data frame called tradedata:
ddply( tradedata , .(TradeHour) ,
mutate,
Equity=Equity+ cumsum(Profit) - Profit )
If you want to include the profit of the particular trade, remove the -Profit.
You could run this in parallel by giving ddply the .parallel=TRUE option. An answer with data.table may be quicker, however. It would be interesting to see which works best.

Resources