arrange dataframe based on one column eliminating the unwanted responses - r

I have this data
date signal
1 2009-01-13 09:55:00 4645.00 4838.931 5358.883 Buy2
2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
3 2009-01-15 09:55:00 4485.00 4653.316 5274.384 Buy2
4 2009-01-16 09:55:00 4580.00 4537.693 5141.435 Buy1
5 2009-01-19 09:55:00 4532.00 4548.088 4891.041 Buy2
6 2009-01-27 09:55:00 4190.00 4183.503 4548.497 Buy1
7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
9 2009-02-09 09:55:00 4469.00 4203.437 4376.277 Sell1
10 2009-02-12 09:55:00 4469.90 4220.845 4503.798 Sell2
11 2009-02-13 09:55:00 4553.00 4261.980 4529.777 Sell1
12 2009-02-16 09:55:00 4347.20 4319.656 4564.387 Sell2
13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
15 2009-03-02 09:55:00 3636.00 3846.423 4036.020 Buy2
16 2009-03-12 09:55:00 3420.00 3372.665 3734.949 Buy1
17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
19 2009-03-18 09:55:00 3721.00 3363.735 3682.293 Sell1
20 2009-03-20 09:55:00 3687.00 3440.651 3784.778 Sell2
and have to arrange it in this form
2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
So that data is arranged in order of Buy1 Sell1 Sell2 Buy2 and eliminating the middle observations.
I have tried several dplyr:filter commands but none is giving the desired output.

If I have well understood your problem, the following code should solve it. It is adapted from this discussion.
The idea is to define your sequence as a pattern:
pattern <- c("Buy1", "Sell1", "Sell2", "Buy2")
Then find the position of this pattern in your column:
library(zoo)
pos <- which(rollapply(data$signal, 4, identical, pattern, fill = FALSE, align = "left"))
and extract the rows following the position of your patterns:
rows <- unlist(lapply(pos, function(x, n) seq(x, x+n-1), 4))
data_filtered <- data[rows,]
VoilĂ .
EDIT
Since I had misunderstood your problem, here is a new solution.
You want to retrieve the sequence "Buy1", "Sell1", "Sell2", "Buy2" in your column, and eliminate the observations that do not fit in this sequence. I do not see a trivial vectorised solution, so here is a loop to solve that. Depending on the size of your data, you may want to implement a similar algorithm in RCPP or vectorise it in some ways.
sequence <- c("Buy1", "Sell1", "Sell2", "Buy2")
keep <- logical(length(data$signal))
s <- 0
for (i in seq(1, length(data$signal))){
if (sequence[s +1] == data$signal[i]){
keep[i] <- T
s <- (s + 1) %% 4
} else {
keep[i] <- F
}
}
data_filtered <- data[keep,]
Tell me if this work better.
If anyone has a vectorised solution, I would be curious to see it.

You can coerce the column data$signal into a factor and define the levels.
data$signal <- as.factor(data.$signal, levels = c("Buy1","Sell1","Buy2","Sell2")
Then you can sort it
sorted.data <- data[order(signal),]
Here is a great answer that talks about what you want to do:
Sort data frame column by factor

Here is a Rcpp solution:
library(Rcpp)
cppFunction('LogicalVector FindHit(const CharacterVector x, const CharacterVector y) {
LogicalVector res(x.size());
int k = 0;
for(int i = 0; i < x.size(); i++){
if(x[i] == y[k]){
res[i] = true;
k = (k + 1) % y.size();
}
}
return res;
}')
dtt[FindHit(dtt$V6, c('Buy1', 'Sell1', 'Sell2', 'Buy2')),]
# V1 V2 V3 V4 V5 V6
# 2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
# 7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
# 8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
# 13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
# 14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
# 17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
# 18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
Here is the dtt:
> dput(dtt)
structure(list(V1 = c("2009-01-13", "2009-01-14", "2009-01-15",
"2009-01-16", "2009-01-19", "2009-01-27", "2009-01-30", "2009-02-02",
"2009-02-09", "2009-02-12", "2009-02-13", "2009-02-16", "2009-02-17",
"2009-02-27", "2009-03-02", "2009-03-12", "2009-03-13", "2009-03-17",
"2009-03-18", "2009-03-20"), V2 = c("09:55:00", "09:55:00", "09:55:00",
"09:55:00", "09:55:00", "09:55:00", "09:55:00", "09:55:00", "09:55:00",
"09:55:00", "09:55:00", "09:55:00", "09:55:00", "09:55:00", "09:55:00",
"09:55:00", "09:55:00", "09:55:00", "09:55:00", "09:55:00"),
V3 = c(4645, 4767.5, 4485, 4580, 4532, 4190, 4436, 4217,
4469, 4469.9, 4553, 4347.2, 4161.05, 3875.55, 3636, 3420,
3656, 3650, 3721, 3687), V4 = c(4838.931, 4718.254, 4653.316,
4537.693, 4548.088, 4183.503, 4155.236, 4152.626, 4203.437,
4220.845, 4261.98, 4319.656, 4371.474, 3862.085, 3846.423,
3372.665, 3372.1, 3360.421, 3363.735, 3440.651), V5 = c(5358.883,
5336.703, 5274.384, 5141.435, 4891.041, 4548.497, 4377.907,
4390.802, 4376.277, 4503.798, 4529.777, 4564.387, 4548.912,
4101.929, 4036.02, 3734.949, 3605.357, 3663.322, 3682.293,
3784.778), V6 = c("Buy2", "Buy1", "Buy2", "Buy1", "Buy2",
"Buy1", "Sell1", "Sell2", "Sell1", "Sell2", "Sell1", "Sell2",
"Buy2", "Buy1", "Buy2", "Buy1", "Sell1", "Sell2", "Sell1",
"Sell2")), row.names = c(NA, -20L), class = "data.frame")

Related

How to select rows where two dates are close to N-days apart for various number of N's efficiently?

Let's say I have the following data.table :
DT = structure(list(date = structure(c(17774, 16545, 15398, 17765,
17736, 16342, 15896, 17928, 16692, 18022), class = "Date"), exdate = structure(c(17809,
16549, 15605, 17781, 17746, 16361, 16060, 17977, 16724, 18033
), class = "Date"), price_at_entry = c(301.66, 205.27, 33.81,
321.64, 297.43, 245.26, 122.27, 312.21, 253.19, 255.34), strike_price = c(195,
212.5, 37, 255, 430, 120, 46, 320, 440, 245)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
DT[, `:=`(DTE = as.integer(difftime(exdate, date, unit = 'days')))]
date exdate price_at_entry strike_price DTE
1: 2018-08-31 2018-10-05 301.66 195.0 35
2: 2015-04-20 2015-04-24 205.27 212.5 4
3: 2012-02-28 2012-09-22 33.81 37.0 207
4: 2018-08-22 2018-09-07 321.64 255.0 16
5: 2018-07-24 2018-08-03 297.43 430.0 10
6: 2014-09-29 2014-10-18 245.26 120.0 19
7: 2013-07-10 2013-12-21 122.27 46.0 164
8: 2019-02-01 2019-03-22 312.21 320.0 49
9: 2015-09-14 2015-10-16 253.19 440.0 32
10: 2019-05-06 2019-05-17 255.34 245.0 11
I want to subset the data.table for days which DTE is within 10 units of various DTE_target values. My current solution is to use rbindlist and lapply to basically loop through the values of DTE_target. Something like this:
rbindlist(
lapply(
c(7,30,60,90), function(DTE_target){
data[data[,.I[abs(DTE-DTE_target) == min(abs(DTE-DTE_target))
& abs(DTE-DTE_target) < 10], by = date]$V1][ , DTE_target := DTE_target]
})
)
date exdate price_at_entry strike_price DTE DTE_target
1: 2015-04-20 2015-04-24 205.27 212.5 4 7
2: 2018-08-22 2018-09-07 321.64 255.0 16 7
3: 2018-07-24 2018-08-03 297.43 430.0 10 7
4: 2019-05-06 2019-05-17 255.34 245.0 11 7
5: 2018-08-31 2018-10-05 301.66 195.0 35 30
6: 2015-09-14 2015-10-16 253.19 440.0 32 30
Is there a more data.table like efficient solution? I need to basically use this process on potentially billions of rows. I am also open to a PostgreSQL solution if possible as well. Also after obtaining the above result, I repeat a similar process using price_at_entry and strike_price. ( which in its current form introduces even more looping )
Maybe it's possible to use rolling joins? If I join data on itself using date and exdate as the keys and roll = 10. But I cannot seem to get a solution that makes sense.
Any help would be appreciated. Thanks!
EDIT:::::
I can't believe I missed this... Here is a potential solution that I need to keep exploring but seems to be very efficient.
DTE_target = c(7,14,30,60,90,120,150, 180, 210, 240, 270, 300)
# create a map of Target DTEs with the +/- range
# ( for some reason i have to duplicate the column for the join to pull DTE_target)
DTE_table = data.table(DTE = DTE_target, DTE_low = DTE_target - 10,
DTE_high = DTE_target + 10,
DTE_target = DTE_target)
# map on nearest
DTE_table[DT, on = .(DTE), roll = "nearest"]
# subset on low/high range
DTE_table[DT, on = .(DTE), roll = "nearest"][DTE >= DTE_low & DTE <= DTE_high]
EDIT::::
based on #Henrik's comment
DT[DTE_table, on = .(DTE >= DTE_low, DTE <= DTE_high), DTE_target := i.DTE_target]
For each DTE_target, find DTE rows within 10 units range. It will output a boolean array.
DT[, DTE := as.integer(difftime(exdate, date, unit = 'days')) ]
DTE_target <- c(7,30, 60, 90)
val = 10
bool_arr <- DT[, lapply(DTE_target, function(x) abs(DTE - x) <= val) ]
Then loop through the array and find any row with TRUE. Use it to extract the rows from the original DT datatable.
selected_rows <- apply(bool_arr, 1, any)
DT[selected_rows, ]
Here is full code and output
library(data.table)
DTE_target <- c(7,30, 60, 90)
val = 10 # 10 units value
DT[apply(DT[, lapply(DTE_target, function(x) abs(DTE - x) <= val) ], 1, any), ]
# date exdate price_at_entry strike_price DTE
#1: 2018-08-31 2018-10-05 301.66 195.0 35
#2: 2015-04-20 2015-04-24 205.27 212.5 4
#3: 2018-08-22 2018-09-07 321.64 255.0 16
#4: 2018-07-24 2018-08-03 297.43 430.0 10
#5: 2015-09-14 2015-10-16 253.19 440.0 32
#6: 2019-05-06 2019-05-17 255.34 245.0 11
Now use the filtered dataset to perform above function on other columns: price_at_entry and strike_price
Since you have a billion rows in data, you can split data into chunks apply the above function to speed things up.
Solution - 2: using mutually not exclusive target values: 30 and 31
DTE_target <- c(7,30, 31, 60, 90)
bool_arr <- DT[, lapply(DTE_target, function(x) abs(DTE - x) <= val) ]
target_vals <- apply(bool_arr, 1, any)
dt_vals <- apply(bool_arr, 1, function(x) DTE_target[x])
rm(bool_arr) # remove bool_arr from memory to free up space
DT[target_vals, ][, `:=`(DTE_t = dt_vals[target_vals])][]
rm(target_vals)
rm(dt_vals)
# date exdate price_at_entry strike_price DTE DTE_t
#1: 2018-08-31 2018-10-05 301.66 195.0 35 30,31
#2: 2015-04-20 2015-04-24 205.27 212.5 4 7
#3: 2018-08-22 2018-09-07 321.64 255.0 16 7
#4: 2018-07-24 2018-08-03 297.43 430.0 10 7
#5: 2015-09-14 2015-10-16 253.19 440.0 32 30,31
#6: 2019-05-06 2019-05-17 255.34 245.0 11 7
Solution -3
Data:
library(data.table)
setDT(DT)
DT = rbindlist( lapply( 1:10^6, function(i){ DT } ) )
DTE_target <- c(7,30, 31, 60, 90)
val=10
Code
system.time({
DT[, id := .I]
DT[, DTE := as.integer(difftime(exdate, date, unit = 'days')) ]
DT[, DTE_t := paste(DTE_target[ abs(DTE - DTE_target)<=val], collapse = "," ), by = id]
DT[, id := NULL]
})
#user system elapsed
#91.90 0.46 92.48
Output:
head(DT, 10)
# date exdate price_at_entry strike_price DTE DTE_t
# 1: 2018-08-31 2018-10-05 301.66 195.0 35 30,31
# 2: 2015-04-20 2015-04-24 205.27 212.5 4 7
# 3: 2012-02-28 2012-09-22 33.81 37.0 207
# 4: 2018-08-22 2018-09-07 321.64 255.0 16 7
# 5: 2018-07-24 2018-08-03 297.43 430.0 10 7
# 6: 2014-09-29 2014-10-18 245.26 120.0 19
# 7: 2013-07-10 2013-12-21 122.27 46.0 164
# 8: 2019-02-01 2019-03-22 312.21 320.0 49
# 9: 2015-09-14 2015-10-16 253.19 440.0 32 30,31
# 10: 2019-05-06 2019-05-17 255.34 245.0 11 7

Turning a List of Transactions into Hourly/Daily Prices in R

I've downloaded a list of every Bitcoin transaction on a large exchange since 2013. What I have now looks like this:
Time Price Volume
1 2013-03-31 22:07:49 93.3 80.628518
2 2013-03-31 22:08:13 100.0 20.000000
3 2013-03-31 22:08:14 100.0 1.000000
4 2013-03-31 22:08:16 100.0 5.900000
5 2013-03-31 22:08:19 100.0 29.833879
6 2013-03-31 22:08:21 100.0 20.000000
7 2013-03-31 22:08:25 100.0 10.000000
8 2013-03-31 22:08:29 100.0 1.000000
9 2013-03-31 22:08:31 100.0 5.566121
10 2013-03-31 22:09:27 93.3 33.676862
I'm trying to work with the data in R, but my computer isn't powerful enough to handle processing it when I run getSymbols(BTC_XTS). I'm trying to convert it to a format like the following (price action over a day):
Date Open High Low Close Volume Adj.Close
1 2014-04-11 32.64 33.48 32.15 32.87 28040700 32.87
2 2014-04-10 34.88 34.98 33.09 33.40 33970700 33.40
3 2014-04-09 34.19 35.00 33.95 34.87 21597500 34.87
4 2014-04-08 33.10 34.43 33.02 33.83 35440300 33.83
5 2014-04-07 34.11 34.37 32.53 33.07 47770200 33.07
6 2014-04-04 36.01 36.05 33.83 34.26 41049900 34.26
7 2014-04-03 36.66 36.79 35.51 35.76 16792000 35.76
8 2014-04-02 36.68 36.86 36.56 36.64 14522800 36.64
9 2014-04-01 36.16 36.86 36.15 36.49 15734000 36.49
10 2014-03-31 36.46 36.58 35.73 35.90 15153200 35.90
I'm new to R, and any response would be greatly appreciated!
I don't know what you could mean when you say your "computer isn't powerful enough to handle processing it when [you] run getSymbols(BTC_XTS)". getSymbols retrieves data... why do you need to retrieve data you already have?
Also, you have no adjusted close data, so it's not possible to have an Adj.Close column in the output.
You can get what you want by coercing your input data to xts and calling to.daily on it. For example:
require(xts)
Data <- structure(list(Time = c("2013-03-31 22:07:49", "2013-03-31 22:08:13",
"2013-03-31 22:08:14", "2013-03-31 22:08:16", "2013-03-31 22:08:19",
"2013-03-31 22:08:21", "2013-03-31 22:08:25", "2013-03-31 22:08:29",
"2013-03-31 22:08:31", "2013-03-31 22:09:27"), Price = c(93.3,
100, 100, 100, 100, 100, 100, 100, 100, 93.3), Volume = c(80.628518,
20, 1, 5.9, 29.833879, 20, 10, 1, 5.566121, 33.676862)), .Names = c("Time",
"Price", "Volume"), class = "data.frame", row.names = c(NA, -10L))
x <- xts(Data[,-1], as.POSIXct(Data[,1]))
d <- to.daily(x, name="BTC")

rollapply : Is it possible to add end date for each sliding window?

A dummy zoo object is created as
z <- zoo(11:15, as.Date(31:45))
as.data.frame(z)
z
1970-02-01 11
1970-02-02 12
1970-02-03 13
1970-02-04 14
1970-02-05 15
1970-02-06 11
1970-02-07 12
1970-02-08 13
1970-02-09 14
1970-02-10 15
1970-02-11 11
1970-02-12 12
1970-02-13 13
1970-02-14 14
1970-02-15 15
rollapply function can be used to calculate mean as:
as.data.frame(rollapply(z, width=3, by=2, mean, align="left"))
1970-02-01 12.00000
1970-02-03 14.00000
1970-02-05 12.66667
1970-02-07 13.00000
1970-02-09 13.33333
1970-02-11 12.00000
1970-02-13 14.00000
Format which I want :
Is it possible to add another column (II column/ end window) having end date as shown below [using rollapply or some other method using xts/zoo object as used above]
start_window end_window mean
1970-02-01 1970-02-03 12.00000
1970-02-03 1970-02-05 14.00000
1970-02-05 1970-02-07 12.66667
1970-02-07 1970-02-09 13.00000
1970-02-09 1970-02-11 13.33333
1970-02-11 1970-02-13 12.00000
1970-02-13 1970-02-15 14.00000
Please suggest a way to do so. Thanks in advance
1) zoo has a fortify.zoo method which produces a data frame with an Index column so suppose r is the output of the rollapply given in the question. Then for a width of 3 the end dates are 2 days past the corresponding start dates so:
library(ggplot2)
r <- rollapply(z, width=3, by=2, mean, align="left") # as in question
DF <- transform(fortify(r), end_date = Index + 2)
giving:
> DF
Index r end_date
1 1970-02-01 12.00000 1970-02-03
2 1970-02-03 14.00000 1970-02-05
3 1970-02-05 12.66667 1970-02-07
4 1970-02-07 13.00000 1970-02-09
5 1970-02-09 13.33333 1970-02-11
6 1970-02-11 12.00000 1970-02-13
7 1970-02-13 14.00000 1970-02-15
If the column order and column names must be as shown then:
DF <- setNames(DF[c(1, 3:2)], c("start_date", "end_date", "mean"))
2) Assuming r from above, this would also work:
data.frame(start_date = time(r), end_date = time(r) + 2, mean = coredata(r))
You can make a simple hack by just adding the results of two rollapply-s into a dataframe.
#Your code
library(zoo)
z <- zoo(11:15, as.Date(31:45))
as.data.frame(z)
as.data.frame(rollapply(z, width=3, by=2, mean, align="left"))
Data for start and end of the reference
frame1 <- as.data.frame(rollapply(z, width=3, by=2, mean, align="left"))
frame2 <- as.data.frame(rollapply(z, width=3, by=2, mean, align="right"))
Add them to a data frame
frame3 <- data.frame(Start = row.names(frame1), Finish = row.names(frame2), frame1[1])
row.names(frame3) <- c(1:length(frame3[,1]))
names(frame3)[3] <- "Mean"
Result
frame3
Start Finish Mean
1 1970-02-01 1970-02-03 12.00000
2 1970-02-03 1970-02-05 14.00000
3 1970-02-05 1970-02-07 12.66667
4 1970-02-07 1970-02-09 13.00000
5 1970-02-09 1970-02-11 13.33333
6 1970-02-11 1970-02-13 12.00000
7 1970-02-13 1970-02-15 14.00000

how do you make a sequence using along.with for unique values in r

Lets suppose I have a vector of numeric values
[1] 2844 4936 4936 4972 5078 6684 6689 7264 7264 7880 8133 9018 9968 9968 10247
[16] 11267 11508 11541 11607 11717 12349 12349 12364 12651 13025 13086 13257 13427 13427 13442
[31] 13442 13442 13442 14142 14341 14429 14429 14429 14538 14872 15002 15064 15163 15163 15324
[46] 15324 15361 15361 15400 15624 15648 15648 15648 15864 15864 15881 16332 16847 17075 17136
[61] 17136 17196 17843 17925 17925 18217 18455 18578 18578 18742 18773 18806 19130 19195 19254
[76] 19254 19421 19421 19429 19585 19686 19729 19729 19760 19760 19901 20530 20530 20530 20581
[91] 20629 20629 20686 20693 20768 20902 20980 21054 21079 21156
and I want to create a sequence along this vector but for unique numbers. for example
length(unique(vector))
is 74 and there are a total of 100 values in the vector. The sequence should have numbers ranging from 1 - 74 only but with length 100 as some numbers will be repeated.
Any idea on how this can be done?
Thanks.
Perhaps
res <- as.numeric(factor(v1))
head(res)
#[1] 1 2 2 3 4 5
Or
res1 <- match(v1, unique(v1))
Or
library(fastmatch)
res2 <- fmatch(v1, unique(v1))
Or
res3 <- findInterval(v1, unique(v1))
data
v1 <- c(2844, 4936, 4936, 4972, 5078, 6684, 6689, 7264, 7264, 7880,
8133, 9018, 9968, 9968, 10247, 11267, 11508, 11541, 11607, 11717,
12349, 12349, 12364, 12651, 13025, 13086, 13257, 13427, 13427,
13442, 13442, 13442, 13442, 14142, 14341, 14429, 14429, 14429,
14538, 14872, 15002, 15064, 15163, 15163, 15324, 15324, 15361,
15361, 15400, 15624, 15648, 15648, 15648, 15864, 15864, 15881,
16332, 16847, 17075, 17136, 17136, 17196, 17843, 17925, 17925,
18217, 18455, 18578, 18578, 18742, 18773, 18806, 19130, 19195,
19254, 19254, 19421, 19421, 19429, 19585, 19686, 19729, 19729,
19760, 19760, 19901, 20530, 20530, 20530, 20581, 20629, 20629,
20686, 20693, 20768, 20902, 20980, 21054, 21079, 21156)
You could use .GRP from "data.table" for this:
library(data.table)
y <- as.data.table(x)[, y := .GRP, by = x]
head(y)
# x y
# 1: 2844 1
# 2: 4936 2 ## Note the duplicated value
# 3: 4936 2 ## in these rows, corresponding to x
# 4: 4972 3
# 5: 5078 4
# 6: 6684 5
tail(y)
# x y
# 1: 20768 69
# 2: 20902 70
# 3: 20980 71
# 4: 21054 72
# 5: 21079 73
# 6: 21156 74 ## "y" values go to 74

Intraday high/low clustering

I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30
You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)

Resources