read.csv in R reading dates differently - r

I have two very similar csv files. Stock prices for 2 different stocks downloaded from the same source in the same format. However, read.csv in R is reading them differently.
> tab1=read.csv(path1)
> tab2=read.csv(path2)
> head(tab1)
Date Open High Low Close Volume Adj.Close
1 2014-12-01 158.35 162.92 157.12 157.12 2719100 156.1488
2 2014-11-03 153.14 160.86 152.98 160.09 2243400 159.1004
3 2014-10-01 141.16 154.44 130.60 153.77 3825900 152.0036
4 2014-09-02 143.30 147.87 140.66 141.68 2592900 140.0525
5 2014-08-01 140.15 145.39 138.43 144.00 2027100 142.3459
6 2014-07-01 143.41 146.43 140.60 140.89 2131100 138.4461
> head(tab2)
Date Open High Low Close Volume Adj.Close
1 12/1/2014 73.39 75.20 71.75 72.29 1561400 71.92211
2 11/3/2014 69.28 74.92 67.88 73.74 1421600 72.97650
3 10/1/2014 66.18 74.95 63.42 69.21 1775400 68.49341
4 9/2/2014 68.34 68.57 65.49 66.32 1249200 65.63333
5 8/1/2014 67.45 68.99 65.88 68.26 1655400 67.20743
6 7/1/2014 64.07 69.50 63.09 67.46 1733600 66.41976
If I try to use colClasses in read.csv then the dates for the second table are read incorrectly.
> tab1=read.csv(path1,colClasses=c("Date",rep("numeric",6)))
> tab2=read.csv(path2,colClasses=c("Date",rep("numeric",6)))
> head(tab1)
Date Open High Low Close Volume Adj.Close
1 2014-12-01 158.35 162.92 157.12 157.12 2719100 156.1488
2 2014-11-03 153.14 160.86 152.98 160.09 2243400 159.1004
3 2014-10-01 141.16 154.44 130.60 153.77 3825900 152.0036
4 2014-09-02 143.30 147.87 140.66 141.68 2592900 140.0525
5 2014-08-01 140.15 145.39 138.43 144.00 2027100 142.3459
6 2014-07-01 143.41 146.43 140.60 140.89 2131100 138.4461
> head(tab2)
Date Open High Low Close Volume Adj.Close
1 0012-01-20 73.39 75.20 71.75 72.29 1561400 71.92211
2 0011-03-20 69.28 74.92 67.88 73.74 1421600 72.97650
3 0010-01-20 66.18 74.95 63.42 69.21 1775400 68.49341
4 0009-02-20 68.34 68.57 65.49 66.32 1249200 65.63333
5 0008-01-20 67.45 68.99 65.88 68.26 1655400 67.20743
6 0007-01-20 64.07 69.50 63.09 67.46 1733600 66.41976
Not sure how I can make this issue reproducible without attaching the .csv files. I'm attaching snapshots of the two files. Any help will be appreciated.
Thanks

This can be solved by reading in the dates as a character vector and then calling strptime() inside transform():
transform(read.csv(path2,colClasses=c('character',rep('numeric',6))),Date=as.Date(strptime(Date,'%m/%d/%Y')));
## Date Open High Low Close Volume Adj.Close
## 1 2014-12-01 73.39 75.20 71.75 72.29 1561400 71.92211
## 2 2014-11-03 69.28 74.92 67.88 73.74 1421600 72.97650
## 3 2014-10-01 66.18 74.95 63.42 69.21 1775400 68.49341
## 4 2014-09-02 68.34 68.57 65.49 66.32 1249200 65.63333
## 5 2014-08-01 67.45 68.99 65.88 68.26 1655400 67.20743
## 6 2014-07-01 64.07 69.50 63.09 67.46 1733600 66.41976
Edit: You can try to "detect" the date format dynamically using your own assumptions, but this will only be as reliable as your assumptions:
readStockData <- function(path) {
tab <- read.csv(path,colClasses=c('character',rep('numeric',6)));
tab$Date <- as.Date(tab$Date,if (grepl('^\\d+/\\d+/\\d+$',tab$Date[1])) '%m/%d/%Y' else '%Y-%m-%d');
tab;
};
readStockData(path1);
## Date Open High Low Close Volume Adj.Close
## 1 2014-12-01 158.35 162.92 157.12 157.12 2719100 156.1488
## 2 2014-11-03 153.14 160.86 152.98 160.09 2243400 159.1004
## 3 2014-10-01 141.16 154.44 130.60 153.77 3825900 152.0036
## 4 2014-09-02 143.30 147.87 140.66 141.68 2592900 140.0525
## 5 2014-08-01 140.15 145.39 138.43 144.00 2027100 142.3459
## 6 2014-07-01 143.41 146.43 140.60 140.89 2131100 138.4461
readStockData(path2);
## Date Open High Low Close Volume Adj.Close
## 1 2014-12-01 73.39 75.20 71.75 72.29 1561400 71.92211
## 2 2014-11-03 69.28 74.92 67.88 73.74 1421600 72.97650
## 3 2014-10-01 66.18 74.95 63.42 69.21 1775400 68.49341
## 4 2014-09-02 68.34 68.57 65.49 66.32 1249200 65.63333
## 5 2014-08-01 67.45 68.99 65.88 68.26 1655400 67.20743
## 6 2014-07-01 64.07 69.50 63.09 67.46 1733600 66.41976
In the above I've made the assumption that there is at least one record in the file and that all records use the same Date format, thus the first Date value (tab$Date[1]) can be used for the detection.

Related

How to plot lagged data against other data in R

I would like to lag one variable by, say, 10 time steps and plot it against the other variable which remains the same. I would like to do this for various lags to see if there is a time period that the first variable influences the other. The data I have is daily and after lagging I am separating into Dec-Feb data only. The problem I am having is the plot and correlation between the lagged variable and the other data is coming out the same as the non-lagged plot and correlation every time. I am not sure how to achieve this.
A sample of my data frame "data" can be seen below.
Date x y
14158 2017-10-05 1.913918e+00 -0.1538234614
14159 2017-10-06 1.479714e+00 -0.1937094170
14160 2017-10-07 8.783669e-01 -0.1703790211
14161 2017-10-08 5.706581e-01 -0.1294144428
14162 2017-10-09 4.979405e-01 -0.0666569815
14163 2017-10-10 3.233477e-01 0.0072006102
14164 2017-10-11 3.057630e-01 0.0863445067
14165 2017-10-12 5.877673e-01 0.1097707831
14166 2017-10-13 1.208526e+00 0.1301967193
14167 2017-10-14 1.671705e+00 0.1728109268
14168 2017-10-15 1.810979e+00 0.2264911145
14169 2017-10-16 1.426651e+00 0.2702958315
14170 2017-10-17 1.241140e+00 0.3242637704
14171 2017-10-18 8.997498e-01 0.3879727861
14172 2017-10-19 5.594161e-01 0.4172990825
14173 2017-10-20 3.980254e-01 0.3915170864
14174 2017-10-21 2.138538e-01 0.3249736995
14175 2017-10-22 3.926440e-01 0.2224834840
14176 2017-10-23 2.268644e-01 0.0529143372
14177 2017-10-24 5.664923e-01 -0.0081443464
14178 2017-10-25 6.167520e-01 0.0312073984
14179 2017-10-26 7.751882e-02 0.0043897693
14180 2017-10-27 -5.634851e-02 -0.0726825266
14181 2017-10-28 -2.122061e-01 -0.1711305549
14182 2017-10-29 -8.500991e-01 -0.2068581639
14183 2017-10-30 -1.039685e+00 -0.2909120824
14184 2017-10-31 -3.057745e-01 -0.3933633317
14185 2017-11-01 -1.288774e-01 -0.3726346136
14186 2017-11-02 -5.608007e-03 -0.2425754386
14187 2017-11-03 4.853990e-01 -0.0503543980
14188 2017-11-04 5.822672e-01 0.0896130098
14189 2017-11-05 8.491505e-01 0.1299151006
14190 2017-11-06 1.052999e+00 0.0749888307
14191 2017-11-07 1.170470e+00 0.0287317882
14192 2017-11-08 7.919862e-01 0.0788187381
14193 2017-11-09 4.574565e-01 0.1539981316
14194 2017-11-10 4.552032e-01 0.2034393145
14195 2017-11-11 -3.621350e-01 0.2077476707
14196 2017-11-12 -8.053965e-01 0.1759558604
14197 2017-11-13 -8.307459e-01 0.1802858410
14198 2017-11-14 -9.421325e-01 0.2175529008
14199 2017-11-15 -9.880204e-01 0.2392924580
14200 2017-11-16 -7.448127e-01 0.2519253751
14201 2017-11-17 -8.081435e-01 0.2614254732
14202 2017-11-18 -1.216806e+00 0.2629971336
14203 2017-11-19 -1.122674e+00 0.3469995055
14204 2017-11-20 -1.242597e+00 0.4553094014
14205 2017-11-21 -1.294885e+00 0.5049438231
14206 2017-11-22 -9.325514e-01 0.4684133163
14207 2017-11-23 -4.632281e-01 0.4071673624
14208 2017-11-24 -9.689322e-02 0.3710270269
14209 2017-11-25 4.704467e-01 0.4126721465
14210 2017-11-26 8.682453e-01 0.3745057653
14211 2017-11-27 5.105564e-01 0.2373454931
14212 2017-11-28 4.747265e-01 0.1650783370
14213 2017-11-29 5.905379e-01 0.2632154120
14214 2017-11-30 4.083787e-01 0.3888834762
14215 2017-12-01 3.451736e-01 0.5008047592
14216 2017-12-02 5.161312e-01 0.5388177242
14217 2017-12-03 7.109279e-01 0.5515360710
14218 2017-12-04 4.458635e-01 0.5127537202
14219 2017-12-05 -3.986610e-01 0.3896493238
14220 2017-12-06 -5.968253e-01 0.1095843268
14221 2017-12-07 -1.604398e-01 -0.2455506506
14222 2017-12-08 -4.384744e-01 -0.5801038215
14223 2017-12-09 -7.255016e-01 -0.8384627087
14224 2017-12-10 -9.691828e-01 -0.9223171538
14225 2017-12-11 -1.140588e+00 -0.8177806761
14226 2017-12-12 -1.956622e-01 -0.5250998474
14227 2017-12-13 -1.083792e-01 -0.3430768534
14228 2017-12-14 -8.016345e-02 -0.3163476104
14229 2017-12-15 8.899266e-01 -0.2813253830
14230 2017-12-16 1.322833e+00 -0.2545953062
14231 2017-12-17 1.547972e+00 -0.2275373110
14232 2017-12-18 2.164907e+00 -0.3217205817
14233 2017-12-19 2.276258e+00 -0.5773412429
14234 2017-12-20 1.862291e+00 -0.7728091393
14235 2017-12-21 1.125083e+00 -0.9099696881
14236 2017-12-22 7.737118e-01 -1.2441963604
14237 2017-12-23 7.863508e-01 -1.4802661587
14238 2017-12-24 4.313111e-01 -1.4111320559
14239 2017-12-25 -8.814799e-02 -1.0024805520
14240 2017-12-26 -3.615127e-01 -0.4943077147
14241 2017-12-27 -5.011363e-01 -0.0308588186
14242 2017-12-28 -8.474088e-01 0.3717555895
14243 2017-12-29 -7.283247e-01 0.8230450219
14244 2017-12-30 -4.566981e-01 1.2495961116
14245 2017-12-31 -4.577034e-01 1.4805369230
14246 2018-01-01 1.946166e-01 1.5310004017
14247 2018-01-02 5.203149e-01 1.5384595802
14248 2018-01-03 5.024570e-02 1.4036679018
14249 2018-01-04 -7.065297e-01 1.0749574137
14250 2018-01-05 -8.741815e-01 0.7608524752
14251 2018-01-06 1.589530e-01 0.7891084646
14252 2018-01-07 8.632378e-01 1.1230358751
I am using
lagged <- lag(ts(x), k=10)
This is so the tsp isn't ignored. However, when I do
cor(data$x, data$y)
and
cor(lagged, data$y)
the result is the same, where I would have thought it would have been different. How do I get this lag to work before I can go ahead separate via date?
Many thanks!

Having trouble with sorting data by date in decreasing order

I am using R to learn about the Capital Asset Pricing Model. I am inputting historical data from yahoo finances which is by default, set to ascending order by date. I am using order function but it does not seem to be working. Here is my code:
#This is the pre-processing for historical stock prices against returns of another stock(usually NASDAQ)
frmkt.returns <- function (file1, file2){
#input downloaded files from excel and convert to workable format
file.1 <- read.table(file1,header = TRUE, sep = ",")[,c("Date","Adj.Close")]
file.2 <- read.table(file2,header = TRUE, sep = ",")[,c("Date","Adj.Close")]
file.1 <- merge(file.1, file.2, by = "Date")
file.1[,c("Date")] <- as.Date(file.1[,c("Date")])
file.1 <- file.1[rev(order(file.1$Date)),]
#Perform operations to get rates of returns on stocks
file.1[-nrow(file.1),-1] <- file.1[-nrow(file.1),-1]/ file.1[-1,-1]-1
file.1 <- file.1[-nrow(file.1),]
#Input 5 year Treasurey Bond as referrence to returns of free market
#Convert to workable format and merge with Stock returns object
tbonds <- read.table("/Users/bhumphreys/Downloads/fiveYR_treasurey.csv", header = TRUE, sep = ",")[,c("Date","Adj.Close")]
names(tbonds)[2] <- "tbonds.returns"
tbonds[,c("Date")] <- as.Date(tbonds[,c("Date")])
file.1 <- merge(file.1, tbonds, by= "Date")
file.1$tbonds.returns <- file.1$tbonds.returns/100
names(file.1)[2:3] <- c("stock.returns", "nasdaq.returns")
file.1[,c("stock.returns","nasdaq.returns")] <- file.1[,c("stock.returns", "nasdaq.returns")]- file.1[,"tbonds.returns"]
return(file.1)
}
This is the output I keep getting:
frmkt.returns(xomFile,snpFile)
Date stock.returns nasdaq.returns tbonds.returns
1 2012-02-01 3.136297e-02 2.957670e-02 0.00725
2 2012-03-01 -6.330922e-03 2.877178e-02 0.00899
3 2012-04-02 -1.470687e-02 2.250261e-02 0.01021
4 2012-05-01 -9.137423e-02 -1.766622e-02 0.00835
5 2012-06-01 8.206149e-02 -9.709351e-02 0.00620
6 2012-07-02 8.268583e-03 6.175071e-02 0.00669
7 2012-08-01 5.283751e-03 7.540821e-04 0.00643
8 2012-09-04 4.131724e-02 1.531680e-02 0.00622
9 2012-10-01 -9.261786e-03 2.195070e-02 0.00620
10 2012-11-01 -3.446171e-02 -1.897965e-02 0.00728
11 2012-12-03 -2.431944e-02 -1.897973e-02 0.00628
12 2013-01-02 3.188471e-02 2.994473e-02 0.00763
13 2013-02-01 -7.079322e-03 2.593275e-02 0.00877
14 2013-03-01 -1.226536e-03 -4.155914e-03 0.00748
15 2013-04-01 -2.000930e-02 2.138199e-02 0.00758
16 2013-05-01 1.711385e-02 6.591916e-03 0.00655
17 2013-06-03 -1.164647e-02 2.614938e-02 0.01032
18 2013-07-01 2.367151e-02 -2.948047e-02 0.01396
19 2013-08-01 -7.886746e-02 4.191165e-02 0.01500
20 2013-09-03 -2.956996e-02 -5.603171e-02 0.01672
21 2013-10-01 2.738862e-02 1.946154e-02 0.01422
22 2013-11-01 3.643796e-02 2.558564e-02 0.01373
23 2013-12-02 6.837443e-02 8.076057e-03 0.01421
24 2014-01-02 -1.064880e-01 9.801299e-05 0.01716
25 2014-02-03 3.762477e-02 -6.354628e-02 0.01437
26 2014-03-03 3.638218e-05 4.500339e-02 0.01461
27 2014-04-01 3.102342e-02 4.157888e-03 0.01740
28 2014-05-01 -2.822288e-02 -1.750584e-02 0.01653
29 2014-06-02 -1.446795e-02 5.959814e-03 0.01596
30 2014-07-01 -3.384238e-02 8.557262e-03 0.01656
31 2014-08-01 -4.471628e-03 -4.114060e-02 0.01673
32 2014-09-02 -7.125370e-02 2.320441e-02 0.01686
33 2014-10-01 1.145283e-02 -4.485805e-02 0.01683
34 2014-11-03 -7.337974e-02 2.047610e-02 0.01634
35 2014-12-01 5.885659e-03 2.447699e-03 0.01521
36 2015-01-02 -7.058784e-02 -1.386193e-02 0.01618
37 2015-02-02 8.585158e-03 -2.999691e-02 0.01185
38 2015-03-02 -5.576196e-02 3.199194e-02 0.01578
39 2015-04-01 1.462245e-02 -4.051051e-02 0.01326
40 2015-05-01 -3.184140e-02 8.525832e-03 0.01507
41 2015-06-01 -3.903412e-02 -1.392837e-02 0.01556
42 2015-07-01 -6.497674e-02 -3.326737e-02 0.01702
43 2015-08-03 -5.637230e-02 -5.214170e-03 0.01514
44 2015-09-01 -2.686886e-02 -1.028315e-01 0.01504
45 2015-10-01 9.914473e-02 -8.490621e-03 0.01370
46 2015-11-02 -2.008928e-02 7.804346e-02 0.01564
47 2015-12-01 -6.139235e-02 -1.663497e-02 0.01596
48 2016-01-04 -1.863287e-02 -6.013920e-02 0.01735
49 2016-02-01 2.505843e-02 -5.023954e-02 0.01383
>
I apologize for the lengthy code but as you can see the data did not change from its default. I have also tried
file.1 <- file.1[order(file.1$Date, decreasing = TRUE),]
Please help me reverse this data.
Currently, you are merging the second time with tbonds after your order file.1 by descending Date. By default, merge sorts by the by columns:
Consider specifying no sort on the second merge:
file.1 <- merge(file.1, tbonds, by= "Date", sort=FALSE)
Alternatively, move your descending order after all merges:
file.1 <- merge(file.1, tbonds, by= "Date")
file.1 <- file.1[order(file.1$Date, decreasing = TRUE),]
Use lubridate package to change date format and then order.
#Get your data
df2 <- read.table(text =
' Date stock.returns nasdaq.returns tbonds.returns
1 2012-02-01 3.136297e-02 2.957670e-02 0.00725
2 2012-03-01 -6.330922e-03 2.877178e-02 0.00899
3 2012-04-02 -1.470687e-02 2.250261e-02 0.01021
4 2012-05-01 -9.137423e-02 -1.766622e-02 0.00835
5 2012-06-01 8.206149e-02 -9.709351e-02 0.00620
6 2012-07-02 8.268583e-03 6.175071e-02 0.00669
7 2012-08-01 5.283751e-03 7.540821e-04 0.00643
8 2012-09-04 4.131724e-02 1.531680e-02 0.00622
9 2012-10-01 -9.261786e-03 2.195070e-02 0.00620
10 2012-11-01 -3.446171e-02 -1.897965e-02 0.00728
11 2012-12-03 -2.431944e-02 -1.897973e-02 0.00628
12 2013-01-02 3.188471e-02 2.994473e-02 0.00763
13 2013-02-01 -7.079322e-03 2.593275e-02 0.00877
14 2013-03-01 -1.226536e-03 -4.155914e-03 0.00748
15 2013-04-01 -2.000930e-02 2.138199e-02 0.00758
16 2013-05-01 1.711385e-02 6.591916e-03 0.00655
17 2013-06-03 -1.164647e-02 2.614938e-02 0.01032
18 2013-07-01 2.367151e-02 -2.948047e-02 0.01396
19 2013-08-01 -7.886746e-02 4.191165e-02 0.01500
20 2013-09-03 -2.956996e-02 -5.603171e-02 0.01672
21 2013-10-01 2.738862e-02 1.946154e-02 0.01422
22 2013-11-01 3.643796e-02 2.558564e-02 0.01373
23 2013-12-02 6.837443e-02 8.076057e-03 0.01421
24 2014-01-02 -1.064880e-01 9.801299e-05 0.01716
25 2014-02-03 3.762477e-02 -6.354628e-02 0.01437
26 2014-03-03 3.638218e-05 4.500339e-02 0.01461
27 2014-04-01 3.102342e-02 4.157888e-03 0.01740
28 2014-05-01 -2.822288e-02 -1.750584e-02 0.01653
29 2014-06-02 -1.446795e-02 5.959814e-03 0.01596
30 2014-07-01 -3.384238e-02 8.557262e-03 0.01656
31 2014-08-01 -4.471628e-03 -4.114060e-02 0.01673
32 2014-09-02 -7.125370e-02 2.320441e-02 0.01686
33 2014-10-01 1.145283e-02 -4.485805e-02 0.01683
34 2014-11-03 -7.337974e-02 2.047610e-02 0.01634
35 2014-12-01 5.885659e-03 2.447699e-03 0.01521
36 2015-01-02 -7.058784e-02 -1.386193e-02 0.01618
37 2015-02-02 8.585158e-03 -2.999691e-02 0.01185
38 2015-03-02 -5.576196e-02 3.199194e-02 0.01578
39 2015-04-01 1.462245e-02 -4.051051e-02 0.01326
40 2015-05-01 -3.184140e-02 8.525832e-03 0.01507
41 2015-06-01 -3.903412e-02 -1.392837e-02 0.01556
42 2015-07-01 -6.497674e-02 -3.326737e-02 0.01702
43 2015-08-03 -5.637230e-02 -5.214170e-03 0.01514
44 2015-09-01 -2.686886e-02 -1.028315e-01 0.01504
45 2015-10-01 9.914473e-02 -8.490621e-03 0.01370
46 2015-11-02 -2.008928e-02 7.804346e-02 0.01564
47 2015-12-01 -6.139235e-02 -1.663497e-02 0.01596
48 2016-01-04 -1.863287e-02 -6.013920e-02 0.01735
49 2016-02-01 2.505843e-02 -5.023954e-02 0.01383', header = TRUE)
# Convert the date to required format I am assuming it is ymd format
df2 <- df2[order(df2$Date, decreasing = TRUE), , drop = FALSE]
# order the data according to date in descending order
df2 <- df2[order(df2$Date, decreasing = TRUE), , drop = FALSE]

Turning a List of Transactions into Hourly/Daily Prices in R

I've downloaded a list of every Bitcoin transaction on a large exchange since 2013. What I have now looks like this:
Time Price Volume
1 2013-03-31 22:07:49 93.3 80.628518
2 2013-03-31 22:08:13 100.0 20.000000
3 2013-03-31 22:08:14 100.0 1.000000
4 2013-03-31 22:08:16 100.0 5.900000
5 2013-03-31 22:08:19 100.0 29.833879
6 2013-03-31 22:08:21 100.0 20.000000
7 2013-03-31 22:08:25 100.0 10.000000
8 2013-03-31 22:08:29 100.0 1.000000
9 2013-03-31 22:08:31 100.0 5.566121
10 2013-03-31 22:09:27 93.3 33.676862
I'm trying to work with the data in R, but my computer isn't powerful enough to handle processing it when I run getSymbols(BTC_XTS). I'm trying to convert it to a format like the following (price action over a day):
Date Open High Low Close Volume Adj.Close
1 2014-04-11 32.64 33.48 32.15 32.87 28040700 32.87
2 2014-04-10 34.88 34.98 33.09 33.40 33970700 33.40
3 2014-04-09 34.19 35.00 33.95 34.87 21597500 34.87
4 2014-04-08 33.10 34.43 33.02 33.83 35440300 33.83
5 2014-04-07 34.11 34.37 32.53 33.07 47770200 33.07
6 2014-04-04 36.01 36.05 33.83 34.26 41049900 34.26
7 2014-04-03 36.66 36.79 35.51 35.76 16792000 35.76
8 2014-04-02 36.68 36.86 36.56 36.64 14522800 36.64
9 2014-04-01 36.16 36.86 36.15 36.49 15734000 36.49
10 2014-03-31 36.46 36.58 35.73 35.90 15153200 35.90
I'm new to R, and any response would be greatly appreciated!
I don't know what you could mean when you say your "computer isn't powerful enough to handle processing it when [you] run getSymbols(BTC_XTS)". getSymbols retrieves data... why do you need to retrieve data you already have?
Also, you have no adjusted close data, so it's not possible to have an Adj.Close column in the output.
You can get what you want by coercing your input data to xts and calling to.daily on it. For example:
require(xts)
Data <- structure(list(Time = c("2013-03-31 22:07:49", "2013-03-31 22:08:13",
"2013-03-31 22:08:14", "2013-03-31 22:08:16", "2013-03-31 22:08:19",
"2013-03-31 22:08:21", "2013-03-31 22:08:25", "2013-03-31 22:08:29",
"2013-03-31 22:08:31", "2013-03-31 22:09:27"), Price = c(93.3,
100, 100, 100, 100, 100, 100, 100, 100, 93.3), Volume = c(80.628518,
20, 1, 5.9, 29.833879, 20, 10, 1, 5.566121, 33.676862)), .Names = c("Time",
"Price", "Volume"), class = "data.frame", row.names = c(NA, -10L))
x <- xts(Data[,-1], as.POSIXct(Data[,1]))
d <- to.daily(x, name="BTC")

Troubles in applying the zoo aggregate function to a time series

We have the following function to compute monthly returns from a daily series of prices:
PricesRet = diff(Prices)/lag(Prices,k=-1)
tail(PricesRet)
# Monthly simple returns
MonRet = aggregate(PricesRet+1, as.yearmon, prod)-1
tail(MonRet)
The problem is that it returns wrong values, take for example the simple return for the month of Feb 2013, the function returns a return -0.003517301 while it should have been -0.01304773.
Why that happens?
Here are the last prices observations:
> tail(Prices,30)
Prices
2013-01-22 165.5086
2013-01-23 165.2842
2013-01-24 168.4845
2013-01-25 170.6041
2013-01-28 169.7373
2013-01-29 169.8724
2013-01-30 170.6554
2013-01-31 170.7210
2013-02-01 173.8043
2013-02-04 172.2145
2013-02-05 172.8400
2013-02-06 172.8333
2013-02-07 171.3586
2013-02-08 170.5602
2013-02-11 171.2172
2013-02-12 171.4126
2013-02-13 171.8687
2013-02-14 170.7955
2013-02-15 171.2848
2013-02-19 170.9482
2013-02-20 171.6355
2013-02-21 170.0300
2013-02-22 169.9319
2013-02-25 170.9035
2013-02-26 168.6822
2013-02-27 168.5180
2013-02-28 168.4935
2013-03-01 169.6546
2013-03-04 169.3076
2013-03-05 169.0579
Here are price returns:
> tail(PricesRet,50)
PricesRet
2012-12-18 0.0055865274
2012-12-19 -0.0015461900
2012-12-20 -0.0076140194
2012-12-23 0.0032656346
2012-12-26 0.0147750923
2012-12-27 0.0013482760
2012-12-30 -0.0004768131
2013-01-01 0.0128908541
2013-01-02 -0.0047646818
2013-01-03 0.0103372029
2013-01-06 -0.0024547278
2013-01-07 -0.0076920352
2013-01-08 0.0064368720
2013-01-09 0.0119663301
2013-01-10 0.0153828814
2013-01-13 0.0050590540
2013-01-14 -0.0053324785
2013-01-15 -0.0027043105
2013-01-16 0.0118840383
2013-01-17 -0.0005876459
2013-01-21 -0.0145541598
2013-01-22 -0.0013555548
2013-01-23 0.0193624621
2013-01-24 0.0125802978
2013-01-27 -0.0050807744
2013-01-28 0.0007959058
2013-01-29 0.0046096266
2013-01-30 0.0003844082
2013-01-31 0.0180603867
2013-02-03 -0.0091473127
2013-02-04 0.0036322298
2013-02-05 -0.0000390941
2013-02-06 -0.0085320734
2013-02-07 -0.0046591956
2013-02-10 0.0038517581
2013-02-11 0.0011412046
2013-02-12 0.0026607502
2013-02-13 -0.0062440496
2013-02-14 0.0028645616
2013-02-18 -0.0019651341
2013-02-19 0.0040206637
2013-02-20 -0.0093543648
2013-02-21 -0.0005764665
2013-02-24 0.0057176118
2013-02-25 -0.0129979321
2013-02-26 -0.0009730782
2013-02-27 -0.0001453191
2013-02-28 0.0068911863
2013-03-03 -0.0020455332
2013-03-04 -0.0014747845
The results of the function is instead:
> tail(data.frame(MonRet))
MonRet
ott 2012 -0.000848156
nov 2012 0.009833881
dic 2012 0.033406884
gen 2013 0.087822700
feb 2013 -0.023875638
mar 2013 -0.003517301
Your returns are wrong. The return for 2013-01-23 should be:
> 165.2842/165.5086-1
[1] -0.001355821
but you have 0.0193624621. I suspect this is because Prices is an xts object, not a zoo object. lag.xts breaks the convention in lag.ts and lag.zoo of k=1 implying a "lag" of (t+1) for the more common convention of using k=1 to imply a "lag" of (t-1).

Intraday high/low clustering

I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30
You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)

Resources