Looping over two R dataframes to create a third dataframe - r

I am trying to backtest a trading strategy.
I am coding in R 3.6
The data is in two dataframes. The first is five years of daily price activity (i.e. fiveyrdaily). The second dataframe is price activity for the same five years only at the one minute level (i.e. fiveyrminutes). This data includes sessions low, high, open, and close.
My strategy is to loop over the respective days of the minute dataframe, while referring to and populating data on the daily dataframe to determine the following:
Condition that warrants the opening of a position (i.e. long or short).
Whether the target or stop price has been achieved.
Log the "trade" on a third dataframe (ORBOorders).
The embedded loops are not working the way I thought they would and I can't figure out why.
I know I can make the code simplier but it keeps getting drawn out so I can figure out where it is I'm going wrong.
I understand that appending to a vector in a loop is not the preferred way to handle vectors but I don't know how large the vector will be after the program is complete. I'd love to hear any ideas.
This is a strategy that seeks to catch price as it moves out of the range established in the first five minutes of the day.
This is the code to identify the direction of price movement after the initial 5 minutes. The counter is not working. The goal is to know the index of a specific event within the subsetted fiveyrminute dataframe (i.e. todaysdate); I will use this in the next batch of code below. The counter starts at 6 for the sixth minute. The output is never higher than 7, so, it's not counting with each iteration of the loop. I can't figure out why.
for (i in 1:length(fiveyrdaily$Date)){
todaysdate <- subset(fiveyrminutes, fiveyrdaily$Date[i] == fiveyrminutes$Date) # subset so I'm only seeing the respective days in the minutes dataframe
counter3 <- 6 # start a counter so I can track which position I'm in within the minutes dataframe, start after ORBO
for (t in counter:length(todaysdate$Date)){ #loop over the minutes dataframe
if ((fiveyrdaily$FiveHighClose[i] == 0) & (todaysdate$Close[t] > fiveyrdaily$FiveMinuteHigh[i])) {
fiveyrdaily$FiveHighClose[i] <- todaysdate$Close[t]
fiveyrdaily$FiveHighAfterClose[i] <- todaysdate$High[t] #record the session's high
counter3 <- counter3 + 1
fiveyrdaily$counterhigh[i] <- counter3 # record the position of this event within the subset of the minute dataframe
}
else if ((fiveyrdaily$FiveLowClose[i] == 0) & (todaysdate$Close[t] < fiveyrdaily$FiveMinuteLow[i])) {
fiveyrdaily$FiveLowClose[i] <- todaysdate$Close[t]
fiveyrdaily$FiveLowAfterClose[i] <- todaysdate$Low[t] # record the session's low
counter3 <- counter3 + 1
fiveyrdaily$counterlow[i] <- counter3
}
counter3 <- counter + 1
}
}
This is the "for" loop to determine if to buy and record the outcome.
#Open a long position
for (i in 1:length(fiveyrdaily$Date)){
if ((fiveyrdaily$counterhigh[i] < fiveyrdaily$counterlow[i]) & openposition < 2) {
#entryprice is the price predetermined ticks above the high
entryprice <- fiveyrdaily$FiveHighAfterClose[i] + (openticksaway * tickvalue)
#create stoploss
stoplossprice <- entryprice - stoplossvalue
#uncover target closest to five minute high
if (possibletargets > fiveyrdaily$FiveHighAfterClose[i]){ fiveyrdaily$ORBOtarget[i] <- min(possibletargets)}
poscounter <- fiveyrdaily$counterhigh[i]
beginforloop <- todaysdate[poscounter]
for (c in beginforloop:length(todaysdate$Date)){ #see where to open position starting from the occurence of the high
if ((entryprice > todaysdate$Low[c]) & (entryprice < todaysdate$High[c]) & (openposition < 2)){ #determine if conditions warrant entry
openposition <- openposition + 1 # open a position
if (openposition > 0){ #trade management
openpos <- 1
if ((stoplossprice < todaysdate$high[c]) & (stoplossprice > todaysdate$Low[c])){ # determine if stoploss has been hit
orderdate <- c(orderdate, todaysdate[c]) #enter data into orders dataframe
orderstrategy <- c(orderstrategy, "ORBO")
ordertype <- c(ordertype, "Long")
ordersymbol <- c(ordersymbol, "ES")
orderentry <- c(orderentry, entryprice)
orderclose <- c(orderclose, stoplossprice)
orderprofit <- c(orderprofit, abs((entryprice - stoplossprice) * pointvalue))
openpos <- 0
}
else if ((fiveyrdaily$ORBOtarget[i] < todaysdate$high[c]) & (fiveyrdaily$ORBOtarget[i] > todaysdate$Low[c])){ # determine if target has been hit
orderdate <- c(orderdate, todaysdate[c]) #enter data into orders dataframe
orderstrategy <- c(orderstrategy, "ORBO")
ordertype <- c(ordertype, "Long")
ordersymbol <- c(ordersymbol, "ES")
orderentry <- c(orderentry, entryprice)
orderclose <- c(orderclose, stoplossprice)
orderprofit <- c(orderprofit, abs((fiveyrdaily$ORBOtarget[i] - entryprice) * pointvalue))
openpos <- 0
}
}
}
}
}
}
Here is the data that is in the daily dataframe - fiveyrdaily:
> tail(fiveyrdaily)
Date Time Open High Low Close Vol OI UpperBand
1254 06/12/2019 16:15 2883.50 2889.75 2875.25 2881.00 1205406 2495060 2919.12
1255 06/13/2019 16:15 2894.75 2900.50 2886.75 2898.50 523312 448119 2925.39
1256 06/14/2019 16:15 2893.00 2899.75 2884.25 2894.75 1318938 951568 2927.99
1257 06/17/2019 16:15 2895.50 2902.75 2892.00 2896.25 1649621 1595842 2932.71
1258 06/18/2019 16:15 2914.00 2936.50 2910.25 2926.25 2257843 2093571 2944.19
1259 06/19/2019 16:15 2925.50 2936.75 2915.25 2933.50 1639495 2093571 2954.61
LowerBand MidLine PP RSI OverBot OverSld SlowK SlowD OverBot.1
1254 2751.13 2835.13 2892.58 56.24 70 30 86.82 87.54 80
1255 2749.21 2837.30 2882.00 59.06 70 30 87.60 88.11 80
1256 2748.24 2838.11 2895.25 58.19 70 30 89.01 87.81 80
1257 2746.94 2839.82 2892.92 58.46 70 30 91.79 89.47 80
1258 2743.68 2843.94 2897.00 63.41 70 30 92.63 91.14 80
1259 2740.02 2847.31 2924.33 64.51 70 30 95.20 93.21 80
OverSld.1 Volume Momentum ZeroLine InOrOut GapUpOrDown TypeOfDay PPTrend
1254 20 1205406 49.25 0 Disregard Disregard Bear Uptrend
1255 20 523312 93.50 0 Disregard Gap Up Bull Uptrend
1256 20 1318938 114.75 0 Disregard Disregard Bull Uptrend
1257 20 1649621 105.75 0 Disregard Disregard Bull Uptrend
1258 20 2257843 173.75 0 Disregard Gap Up Bull Uptrend
1259 20 1639495 184.00 0 Disregard Disregard Bull Uptrend
FiveMinuteLow FiveMinuteHigh FiveMinHtoL DollarFiveHtoL FiveHighClose
1254 2881.25 2889.75 8.50 425.0 0.00
1255 2892.00 2895.75 3.75 187.5 2896.00
1256 2886.25 2893.75 7.50 375.0 2894.75
1257 2892.00 2897.00 5.00 250.0 2897.25
1258 2910.25 2915.50 5.25 262.5 2920.75
1259 2923.25 2927.25 4.00 200.0 2930.00
FiveHighAfterClose FiveLowClose FiveLowAfterClose ORBOtarget counterhigh
1254 0.00 2881.00 2880.75 0 0
1255 2896.00 2891.75 2891.00 0 6
1256 2894.75 2886.00 2885.00 0 7
1257 2897.50 0.00 0.00 0 7
1258 2921.75 0.00 0.00 0 7
1259 2931.50 2922.50 2922.25 0 7
counterlow DollarOpentoHigh DollarOpentoClose DollarOpentoLow
1254 7 312.5 125.0 412.5
1255 7 287.5 187.5 400.0
1256 7 337.5 87.5 437.5
1257 0 362.5 37.5 175.0
1258 0 1125.0 612.5 187.5
1259 7 562.5 400.0 512.5
This is the data that is in the minutes dataframe - fiveyrminutes:
Date Time Open High Low Close Up Down UpperBand
509796 06/19/2019 16:10 2932.25 2932.50 2932.25 2932.50 717 430 2935.66
509797 06/19/2019 16:11 2932.25 2932.50 2932.25 2932.25 125 276 2935.46
509798 06/19/2019 16:12 2932.25 2932.75 2932.25 2932.75 612 604 2934.95
509799 06/19/2019 16:13 2932.50 2933.25 2932.50 2933.00 830 153 2934.66
509800 06/19/2019 16:14 2933.25 2933.25 2932.75 2933.00 676 376 2934.26
509801 06/19/2019 16:15 2932.75 2934.00 2932.75 2933.25 2929 2026 2933.90
LowerBand MidLine PP RSI OverBot OverSld SlowK SlowD OverBot.1
509796 2930.27 2932.96 2932.42 47.94 70 30 45.45 40.66 80
509797 2930.24 2932.85 2932.42 46.22 70 30 53.70 46.21 80
509798 2930.45 2932.70 2932.33 50.07 70 30 63.83 54.33 80
509799 2930.56 2932.61 2932.58 51.92 70 30 72.73 63.42 80
509800 2930.76 2932.51 2932.92 51.92 70 30 83.33 73.30 80
509801 2930.97 2932.44 2933.00 53.90 70 30 85.00 80.35 80
OverSld.1 Volume Momentum ZeroLine
509796 20 1147 -0.50 0
509797 20 401 -1.00 0
509798 20 1216 1.75 0
509799 20 983 1.75 0
509800 20 1052 2.00 0
509801 20 4955 1.75 0
This is the output for the orders dataframe, notice how it's empty - ORBOorders:
ORBOorders
orderdate orderstrategy ordertype ordersymbol orderentry orderclose
1 0 0
orderprofit
1 0
These are the problems:
-counter3 isn't working (to find after which point I should buy)
-The second batch of code gives this error:
Error in beginforloop:length(todaysdate$Date) : argument of length 0
In addition: Warning message: In if (possibletargets >
fiveyrdaily$FiveHighAfterClose[i]) { : the condition has length > 1
and only the first element will be used
-The ORBOorder dataframe has absolutely no data in it.
Thanks for any help in advance!

Related

Create a vector from a specific sequence of intervals

I have 20 intervals:
10 intervals from 1 to 250 of size 25:
[1.25] [26.50] [51.75] [76.100] [101.125] [126.150] ... [226.250]
10 intervals from 251 to 1000 of size 75:
[251,325] [326,400] [401,475] [476,550] [551,625] ... [926,1000]
I would like to create a vector composed of the first 5 elements of each interval like:
(1,2,3,5, 26,27,28,29,30, 51,52,53,54,55, 76,77,78,79,80, ....,
251,252,253,254,255, 326,327,328,329,330, ...)
How create this vector using R?
Let's assume you have two interval like :
interval1 <- seq(1.25, 226.250, 25)
interval2 <- seq(251, 1000, 75)
We can create a new interval combining the two and then use mapply to create sequence
new_interval <- c(as.integer(interval1), interval2)
c(mapply(`:`, new_interval, new_interval + 4))
#[1] 1 2 3 4 5 26 27 28 29 30 51 52 53 54 .....
#[89] ..... 779 780 851 852 853 854 855 926 927 928 929 930

Store values in a cell dataframe

I am trying to store in multiple cells in a dataframe. But, my code is storing the data in the last cell (on the dd array). Please see my output below.
Can somebody please correct me? Cannot figure out what I am doing wrong.
Thanks in advance,
MyData <- read.csv(file="Pat_AR_035.csv", header=TRUE, sep=",")
dd <- unique(MyData$POLICY_NUM)
for (j in length(dd)) {
myDF <- data.frame(i=1:length(dd), m=I(vector('list', length(dd))))
myDF$m[[j]] <- data.frame(j,MyData[which(MyData$POLICY_NUM==dd[j] & MyData$ACRES), ],ncol(MyData),nrow(MyData))
}
[[60]]
NULL
[[61]]
NULL
[[62]]
NULL
[[63]]
j OBJECTID DIVISION POLICY_SYM POLICY_NUM YIELD_ID LINE_ID RH_CLU_ID ACRES PLANT_DATE ACRE_TYPE CLU_DETERM STATE COUNTY FARM_SERIA TRACT
1646 63 1646 8 MP 754033 3 20 39565604 8.56 5/3/2014 PL A 3 35 109 852
1647 63 1647 8 MP 754033 1 10 39565605 30.07 4/19/2014 PL A 3 35 109 852
1648 63 1648 8 MP 754033 1 10 39565606 56.59 4/19/2014 PL A 3 35 109 852
CLU_NUMBER FIELD_ACRE RMA_CLU_ID UPDATE_DAT Percent_Ar RHCLUID Field1 OBJECTID_1 DIVISION_1 STATE_1 COUNTY_1
1646 3 8.56 F68E591A-ECC2-470B-A012-201C3BB20D7F 9/21/2014 63.4990 39565604 1646 1646 8 3 35
1647 1 30.07 eb04cfc0-e78b-415f-b447-9595c81ef09e 9/21/2014 100.0000 39565605 1647 1647 8 3 35
1648 2 56.59 5922d604-e31c-4b9d-b846-9f38e2d18abe 9/21/2014 92.1442 39565606 1648 1648 8 3 35
POLICY_N_1 YIELD_ID_1 RH_CLU_ID_ short_dist coords_x1 coords_x2 optional SHAPE_Leng SHAPE_Area ncol.MyData. nrow.MyData.
1646 754033 3 39565604 5.110837 516747.8 -221751.4 TRUE 831.3702 34634.73 35 1757
1647 754033 1 39565605 5.606284 515932.1 -221702.0 TRUE 1469.4800 121611.46 35 1757
1648 754033 1 39565606 5.325399 516380.1 -221640.9 TRUE 1982.8757 228832.22 35 1757
for (j in length(dd))
This doesn’t iterate over dd — it iterates over a single number: the length of dd. Not much of an iteration. You probably meant to write the following or something similar:
for (j in seq_along(dd))
However, there are more issues with your code. For instance, the myDF variable is continuously overwritten inside your loop, which probably isn’t what you intended at all. Instead, you should probably create objects in an lapply statement and forego the loop.

Creating data continuously using rnorm until an outlier occurs in R

Sorry for the confusing title, but i wasn't sure how to title what i am trying to do. My objective is to create a dataset of 1000 obs each would be the length of the run. I have created a phase1 dataset, from which a set of control limits are produced. What i am trying to do now is create a phase2 dataset most likely using rnorm. what im trying to do is create a repeat loop that will continuously create values in the phase2 dataset until one of those values is outside of the control limits produced from the phase1 dataset. for example if i had 3.0 and -3.0 as control limits the phase2 dataset would create a bunch of observations until obs 398 when the value here happens to be 3.45, thus stopping the creation of data. my objective is then to record the number 398. Furthermore, I am then trying to loop the code back to the phase1 dataset/ control limits portion and create a new set of control limits and then run another phase2, until i have 1000 run lengths recorded. the code i have for the phase1/ control limits works fine and looks like this:
nphase1=50
nphase2=1000
varcount=1
meanshift= 0
sigmashift= 1
##### phase1 dataset/ control limits #####
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- apply(phase1, 2, mean)
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
I have previously created this code in SAS and it looks like this. might be a better reference for what i am trying to achieve then me trying to explain it.
%macro phase2_dataset (n=,varcount=, meanshift=, sigmashift=, nphase1=,simID=,);
%do z=1 %to &n;
%phase1_dataset (n=&nphase1, varcount=&varcount);
data phase2; set control_limits n=lastobs;
call streaminit(0);
do until (phase2_var1<Lower_SPC_limit_method1_var1 or
phase2_var1>Upper_SPC_limit_method1_var1);
phase2_var1 = rand("normal", &meanshift, &sigmashift);
output;
end;
run;
ods exclude all;
proc means data=phase2;
var phase2_var1;
ods output summary=x;
run;
ods select all;
data run_length; set x;
keep Phase2_var1_n;
run;
proc append base= QA.Phase2_dataset&simID data=Run_length force; run;
%end;
%mend;
Also been doing research about using a while loop in replace of the repeat loop.
Im new to R so Any ideas you are able to throw my way are greatly appreciated. Thanks!
Using a while loop indeed seems to be the way to go. Here's what I think you're looking for:
set.seed(10) #Making results reproducible
replicate(100, { #100 is easier to display here
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- colMeans(phase1) #Slightly better than apply
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
#Phase 2
x <- 0
count <- 0
while(x > Lower_SPC_Limit_Method1 && x < Upper_SPC_Limit_Method1) {
x <- rnorm(1)
count <- count + 1
}
count
})
The result is:
[1] 225 91 97 118 304 275 550 58 115 6 218 63 176 100 308 844 90 2758
[19] 161 311 1462 717 2446 74 175 91 331 210 118 1517 420 32 39 201 350 89
[37] 64 385 212 4 72 730 151 7 1159 65 36 333 97 306 531 1502 26 18
[55] 67 329 75 532 64 427 39 352 283 483 19 9 2 1018 137 160 223 98
[73] 15 182 98 41 25 1136 405 474 1025 1331 159 70 84 129 233 2 41 66
[91] 1 23 8 325 10 455 363 351 108 3
If performance becomes a problem, perhaps it would be interesting to explore some improvements, like creating more numbers with rnorm() at a time and then counting how many are necessary to exceed the limits and repeat if necessary.

Binning a dataframe with equal frequency of samples

I have binned my data using the cut function
breaks<-seq(0, 250, by=5)
data<-split(df2, cut(df2$val, breaks))
My split dataframe looks like
... ...
$`(15,20]`
val ks_Result c
15 60 237
18 70 247
... ...
$`(20,25]`
val ks_Result c
21 20 317
24 10 140
... ...
My bins looks like
> table(data)
data
(0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35]
0 0 0 7 128 2748 2307
(35,40] (40,45] (45,50] (50,55] (55,60] (60,65] (65,70]
1404 11472 1064 536 7389 1008 1714
(70,75] (75,80] (80,85] (85,90] (90,95] (95,100] (100,105]
2047 700 329 1107 399 376 323
(105,110] (110,115] (115,120] (120,125] (125,130] (130,135] (135,140]
314 79 1008 77 474 158 381
(140,145] (145,150] (150,155] (155,160] (160,165] (165,170] (170,175]
89 660 15 1090 109 824 247
(175,180] (180,185] (185,190] (190,195] (195,200] (200,205] (205,210]
1226 139 531 174 1041 107 257
(210,215] (215,220] (220,225] (225,230] (230,235] (235,240] (240,245]
72 671 98 212 70 95 25
(245,250]
494
When I mean the bins, I get on an average of ~900 samples
> mean(table(data))
[1] 915.9
I want to tell R to make irregular bins in such a way that each bin will contain on an average 900 samples (e.g. (0, 27] = 900, (27,28.5] = 900, and so on). I found something similar here, which deals with only one variable, not the whole dataframe.
I also tried Hmisc package, unfortunately the bins don't contain equal frequency!!
library(Hmisc)
data<-split(df2, cut2(df2$val, g=30, oneval=TRUE))
data<-split(df2, cut2(df2$val, m=1000, oneval=TRUE))
Assuming you want 50 equal sized buckets (based on your seq) statement, you can use something like:
df <- data.frame(var=runif(500, 0, 100)) # make data
cut.vec <- cut(
df$var,
breaks=quantile(df$var, 0:50/50), # breaks along 1/50 quantiles
include.lowest=T
)
df.split <- split(df, cut.vec)
Hmisc::cut2 has this option built in as well.
Can be done by the function provided here by Joris Meys
EqualFreq2 <- function(x,n){
nx <- length(x)
nrepl <- floor(nx/n)
nplus <- sample(1:n,nx - nrepl*n)
nrep <- rep(nrepl,n)
nrep[nplus] <- nrepl+1
x[order(x)] <- rep(seq.int(n),nrep)
x
}
data<-split(df2, EqualFreq2(df2$val, 25))

Find the non zero values and frequency of those values in R

I have a data which has two parameters, they are data/time and flow. The flow data is intermittent flow. Lets say at times there is zero flow and suddenly the flow starts and there will be non-zero values for sometime and then the flow will be zero again. I want to understand when the non-zero values occur and how long does each non-zero flow last. I have attached the sample dataset at this location https://www.dropbox.com/s/ef1411dq4gyg0cm/sampledataflow.csv
The data is 1 minute data.
I was able to import the data into R as follows:
flow <- read.csv("sampledataflow.csv")
summary(flow)
names(flow) <- c("Date","discharge")
flow$Date <- strptime(flow$Date, format="%m/%d/%Y %H:%M")
sapply(flow,class)
plot(flow$Date, flow$discharge,type="l")
I made plot to see the distribution but couldn't get a clue where to start to get the frequency of each non zero values. I would like to see a output table as follows:
Date Duration in Minutes
Please let me know if I am not clear here. Thanks.
Additional Info:
I think we need to check the non-zero value first and then find how many non zero values are there continuously before it reaches zero value again. What I want to understand is the flow release durations. For eg. in one day there might be multiple releases and I want to note at what time did the release start and how long did it continue before coming to value zero. I hope this explain the problem little better.
The first point is that you have too many NA in your data. In case you want to look into it.
If I understand correctly, you require the count of continuous 0's followed by continuous non-zeros, zeros, non-zeros etc.. for each date.
This can be achieved with rle of course, as also mentioned by #mnel under comments. But there are quite a few catches.
First, I'll set up the data with non-NA entries:
flow <- read.csv("~/Downloads/sampledataflow.csv")
names(flow) <- c("Date","discharge")
flow <- flow[1:33119, ] # remove NA entries
# format Date to POSIXct to play nice with data.table
flow$Date <- as.POSIXct(flow$Date, format="%m/%d/%Y %H:%M")
Next, I'll create a Date column:
flow$g1 <- as.Date(flow$Date)
Finally, I prefer using data.table. So here's a solution using it.
# load package, get data as data.table and set key
require(data.table)
flow.dt <- data.table(flow)
# set key to both "Date" and "g1" (even though, just we'll use just g1)
# to make sure that the order of rows are not changed (during sort)
setkey(flow.dt, "Date", "g1")
# group by g1 and set data to TRUE/FALSE by equating to 0 and get rle lengths
out <- flow.dt[, list(duration = rle(discharge == 0)$lengths,
val = rle(discharge == 0)$values + 1), by=g1][val == 2, val := 0]
> out # just to show a few first and last entries
# g1 duration val
# 1: 2010-05-31 120 0
# 2: 2010-06-01 722 0
# 3: 2010-06-01 138 1
# 4: 2010-06-01 32 0
# 5: 2010-06-01 79 1
# ---
# 98: 2010-06-22 291 1
# 99: 2010-06-22 423 0
# 100: 2010-06-23 664 0
# 101: 2010-06-23 278 1
# 102: 2010-06-23 379 0
So, for example, for 2010-06-01, there are 722 0's followed by 138 non-zeros, followed by 32 0's followed by 79 non-zeros and so on...
I looked a a small sample of the first two days
> do.call( cbind, tapply(flow$discharge, as.Date(flow$Date), function(x) table(x > 0) ) )
2010-06-01 2010-06-02
FALSE 1223 911
TRUE 217 529 # these are the cumulative daily durations of positive flow.
You may want this transposed in which case the t() function should succeed. Or you could use rbind.
If you jsut wante the number of flow-postive minutes, this would also work:
tapply(flow$discharge, as.Date(flow$Date), function(x) sum(x > 0, na.rm=TRUE) )
#--------
2010-06-01 2010-06-02 2010-06-03 2010-06-04 2010-06-05 2010-06-06 2010-06-07 2010-06-08
217 529 417 463 0 0 263 220
2010-06-09 2010-06-10 2010-06-11 2010-06-12 2010-06-13 2010-06-14 2010-06-15 2010-06-16
244 219 287 234 31 245 311 324
2010-06-17 2010-06-18 2010-06-19 2010-06-20 2010-06-21 2010-06-22 2010-06-23 2010-06-24
299 305 124 129 295 296 278 0
To get the lengths of intervals with discharge values greater than zero:
tapply(flow$discharge, as.Date(flow$Date), function(x) rle(x>0)$lengths[rle(x>0)$values] )
#--------
$`2010-06-01`
[1] 138 79
$`2010-06-02`
[1] 95 195 239
$`2010-06-03`
[1] 57 360
$`2010-06-04`
[1] 6 457
$`2010-06-05`
integer(0)
$`2010-06-06`
integer(0)
... Snipped output
If you want to look at the distribution of these durations you will need to unlist that result. (And remember that the durations which were split at midnight may have influenced the counts and durations.) If you just wanted durations without dates, then use this:
flowrle <- rle(flow$discharge>0)
flowrle$lengths[!is.na(flowrle$values) & flowrle$values]
#----------
[1] 138 79 95 195 296 360 6 457 263 17 203 79 80 85 30 189 17 270 127 107 31 1
[23] 2 1 241 311 229 13 82 299 305 3 121 129 295 3 2 291 278

Resources