linear interpolate 15 Hz time series to match with 25 Hz time series in R - r

Hi I have the following data recorded with 15Hz and I want to resample it using linear interpolation to 25 Hz. What is the best way to achieve this?
Here is the first second of my data set:
RecordFile YTSIMTMD RBDDLO_0 RBDDGS_0 IDLWMWC1 time timeNF
864 2C01MUC.txx 85535.10 -0.31 -0.348873 1 0.00000 0
865 2C01MUC.txx 85535.17 -0.31 -0.348873 1 0.06667 6667
866 2C01MUC.txx 85535.23 -0.31 -0.348873 0 0.13334 13334
867 2C01MUC.txx 85535.30 -0.31 -0.348832 0 0.20000 20000
868 2C01MUC.txx 85535.37 -0.31 -0.348832 0 0.26667 26667
869 2C01MUC.txx 85535.43 -0.31 -0.348832 0 0.33334 33334
870 2C01MUC.txx 85535.50 -0.31 -0.348832 1 0.40000 40000
871 2C01MUC.txx 85535.57 -0.31 -0.348796 1 0.46667 46667
872 2C01MUC.txx 85535.63 -0.31 -0.348796 1 0.53334 53334
873 2C01MUC.txx 85535.70 -0.31 -0.348796 1 0.60000 60000
874 2C01MUC.txx 85535.77 -0.31 -0.348796 0 0.66667 66667
875 2C01MUC.txx 85535.83 -0.31 -0.348767 0 0.73334 73334
876 2C01MUC.txx 85535.90 -0.31 -0.348767 0 0.80000 80000
877 2C01MUC.txx 85535.97 -0.31 -0.348767 0 0.86667 86667
878 2C01MUC.txx 85536.03 -0.31 -0.348767 1 0.93334 93334
879 2C01MUC.txx 85536.10 -0.31 -0.348735 1 1.00000 100000
After that I want to match it with this data set recorded with 25 Hz
vpName vpID origIndex areaNum areaName startMS endMS durationMS startF endF durationF accumIndex
1 2C01 1 1 2 ATT 0 560 560 0 14 14 1
2 2C01 1 1 2 ATT 0 560 560 0 14 14 1
3 2C01 1 1 2 ATT 0 560 560 0 14 14 1
4 2C01 1 1 2 ATT 0 560 560 0 14 14 1
5 2C01 1 1 2 ATT 0 560 560 0 14 14 1
6 2C01 1 1 2 ATT 0 560 560 0 14 14 1
I found that approx seems to be the linear interpolation for linear interpolation in R, however I am not sure which parameters to use to upsample my data from 15 to 25 Hz?
There seem to be explicit packages for handling time series in R like zoo and xts, but I am not sure whether I need them.
Both data sets start at the same time, so after upsampling I could simply match by rownumber.
Thank your for your help!

I'll make some assumptions - first, that data columns "YTSIMTMD" "RBDDLO_0" and "RBDDGS_0" contain continuous data so linear interpolation can be used. Second, that column IDLWMWC1 contains binary data so we will interpolate using method=constant which selects the data value at the last data time prior to the interpolation time. Given this, the following uses approx to do the interpolations and combine them into a data frame. The interpolation times are generated at a time interval of 1/freq. I put your data into a data frame called xx.
t_seq <- seq(min(xx$time), max(xx$time),1/25)
ap <- cbind(t_seq, sapply(xx[,c("YTSIMTMD", "RBDDLO_0","RBDDGS_0")],
function(y, x, nout) approx(x, y, nout, method="linear")$y, x=xx$time, nout=t_seq ))
ap <- cbind(ap,IDLWMWC1=approx(xx$time, xx$IDLWMWC1, t_seq, method="constant")$y)
I don't quite understand how your second set of data relates to the first but if it's just additional information at intervals of 1/25 starting at the same time, you could just combine the two data frame using cbind.

Here's an example, using approxfun to create a function with the linear fit to the input data:
xin<-seq(1,26,by=5)
yin<-2.5+3*xin
myfun<-approxfun(xin,yin)
plot(xin,yin)
newy<-myfun(seq(3,18,by=5))
points(seq(3,18,by=5),newy)
points(seq(3,18,by=5),newy,col='red')
In your case, the inputs aretime for x-values and whatever you are working with for y-values. Then just feed a sequence of "new" x values at 25Hz intervals (0.04 seconds) to get the fitted values you want.

Related

Looping over two R dataframes to create a third dataframe

I am trying to backtest a trading strategy.
I am coding in R 3.6
The data is in two dataframes. The first is five years of daily price activity (i.e. fiveyrdaily). The second dataframe is price activity for the same five years only at the one minute level (i.e. fiveyrminutes). This data includes sessions low, high, open, and close.
My strategy is to loop over the respective days of the minute dataframe, while referring to and populating data on the daily dataframe to determine the following:
Condition that warrants the opening of a position (i.e. long or short).
Whether the target or stop price has been achieved.
Log the "trade" on a third dataframe (ORBOorders).
The embedded loops are not working the way I thought they would and I can't figure out why.
I know I can make the code simplier but it keeps getting drawn out so I can figure out where it is I'm going wrong.
I understand that appending to a vector in a loop is not the preferred way to handle vectors but I don't know how large the vector will be after the program is complete. I'd love to hear any ideas.
This is a strategy that seeks to catch price as it moves out of the range established in the first five minutes of the day.
This is the code to identify the direction of price movement after the initial 5 minutes. The counter is not working. The goal is to know the index of a specific event within the subsetted fiveyrminute dataframe (i.e. todaysdate); I will use this in the next batch of code below. The counter starts at 6 for the sixth minute. The output is never higher than 7, so, it's not counting with each iteration of the loop. I can't figure out why.
for (i in 1:length(fiveyrdaily$Date)){
todaysdate <- subset(fiveyrminutes, fiveyrdaily$Date[i] == fiveyrminutes$Date) # subset so I'm only seeing the respective days in the minutes dataframe
counter3 <- 6 # start a counter so I can track which position I'm in within the minutes dataframe, start after ORBO
for (t in counter:length(todaysdate$Date)){ #loop over the minutes dataframe
if ((fiveyrdaily$FiveHighClose[i] == 0) & (todaysdate$Close[t] > fiveyrdaily$FiveMinuteHigh[i])) {
fiveyrdaily$FiveHighClose[i] <- todaysdate$Close[t]
fiveyrdaily$FiveHighAfterClose[i] <- todaysdate$High[t] #record the session's high
counter3 <- counter3 + 1
fiveyrdaily$counterhigh[i] <- counter3 # record the position of this event within the subset of the minute dataframe
}
else if ((fiveyrdaily$FiveLowClose[i] == 0) & (todaysdate$Close[t] < fiveyrdaily$FiveMinuteLow[i])) {
fiveyrdaily$FiveLowClose[i] <- todaysdate$Close[t]
fiveyrdaily$FiveLowAfterClose[i] <- todaysdate$Low[t] # record the session's low
counter3 <- counter3 + 1
fiveyrdaily$counterlow[i] <- counter3
}
counter3 <- counter + 1
}
}
This is the "for" loop to determine if to buy and record the outcome.
#Open a long position
for (i in 1:length(fiveyrdaily$Date)){
if ((fiveyrdaily$counterhigh[i] < fiveyrdaily$counterlow[i]) & openposition < 2) {
#entryprice is the price predetermined ticks above the high
entryprice <- fiveyrdaily$FiveHighAfterClose[i] + (openticksaway * tickvalue)
#create stoploss
stoplossprice <- entryprice - stoplossvalue
#uncover target closest to five minute high
if (possibletargets > fiveyrdaily$FiveHighAfterClose[i]){ fiveyrdaily$ORBOtarget[i] <- min(possibletargets)}
poscounter <- fiveyrdaily$counterhigh[i]
beginforloop <- todaysdate[poscounter]
for (c in beginforloop:length(todaysdate$Date)){ #see where to open position starting from the occurence of the high
if ((entryprice > todaysdate$Low[c]) & (entryprice < todaysdate$High[c]) & (openposition < 2)){ #determine if conditions warrant entry
openposition <- openposition + 1 # open a position
if (openposition > 0){ #trade management
openpos <- 1
if ((stoplossprice < todaysdate$high[c]) & (stoplossprice > todaysdate$Low[c])){ # determine if stoploss has been hit
orderdate <- c(orderdate, todaysdate[c]) #enter data into orders dataframe
orderstrategy <- c(orderstrategy, "ORBO")
ordertype <- c(ordertype, "Long")
ordersymbol <- c(ordersymbol, "ES")
orderentry <- c(orderentry, entryprice)
orderclose <- c(orderclose, stoplossprice)
orderprofit <- c(orderprofit, abs((entryprice - stoplossprice) * pointvalue))
openpos <- 0
}
else if ((fiveyrdaily$ORBOtarget[i] < todaysdate$high[c]) & (fiveyrdaily$ORBOtarget[i] > todaysdate$Low[c])){ # determine if target has been hit
orderdate <- c(orderdate, todaysdate[c]) #enter data into orders dataframe
orderstrategy <- c(orderstrategy, "ORBO")
ordertype <- c(ordertype, "Long")
ordersymbol <- c(ordersymbol, "ES")
orderentry <- c(orderentry, entryprice)
orderclose <- c(orderclose, stoplossprice)
orderprofit <- c(orderprofit, abs((fiveyrdaily$ORBOtarget[i] - entryprice) * pointvalue))
openpos <- 0
}
}
}
}
}
}
Here is the data that is in the daily dataframe - fiveyrdaily:
> tail(fiveyrdaily)
Date Time Open High Low Close Vol OI UpperBand
1254 06/12/2019 16:15 2883.50 2889.75 2875.25 2881.00 1205406 2495060 2919.12
1255 06/13/2019 16:15 2894.75 2900.50 2886.75 2898.50 523312 448119 2925.39
1256 06/14/2019 16:15 2893.00 2899.75 2884.25 2894.75 1318938 951568 2927.99
1257 06/17/2019 16:15 2895.50 2902.75 2892.00 2896.25 1649621 1595842 2932.71
1258 06/18/2019 16:15 2914.00 2936.50 2910.25 2926.25 2257843 2093571 2944.19
1259 06/19/2019 16:15 2925.50 2936.75 2915.25 2933.50 1639495 2093571 2954.61
LowerBand MidLine PP RSI OverBot OverSld SlowK SlowD OverBot.1
1254 2751.13 2835.13 2892.58 56.24 70 30 86.82 87.54 80
1255 2749.21 2837.30 2882.00 59.06 70 30 87.60 88.11 80
1256 2748.24 2838.11 2895.25 58.19 70 30 89.01 87.81 80
1257 2746.94 2839.82 2892.92 58.46 70 30 91.79 89.47 80
1258 2743.68 2843.94 2897.00 63.41 70 30 92.63 91.14 80
1259 2740.02 2847.31 2924.33 64.51 70 30 95.20 93.21 80
OverSld.1 Volume Momentum ZeroLine InOrOut GapUpOrDown TypeOfDay PPTrend
1254 20 1205406 49.25 0 Disregard Disregard Bear Uptrend
1255 20 523312 93.50 0 Disregard Gap Up Bull Uptrend
1256 20 1318938 114.75 0 Disregard Disregard Bull Uptrend
1257 20 1649621 105.75 0 Disregard Disregard Bull Uptrend
1258 20 2257843 173.75 0 Disregard Gap Up Bull Uptrend
1259 20 1639495 184.00 0 Disregard Disregard Bull Uptrend
FiveMinuteLow FiveMinuteHigh FiveMinHtoL DollarFiveHtoL FiveHighClose
1254 2881.25 2889.75 8.50 425.0 0.00
1255 2892.00 2895.75 3.75 187.5 2896.00
1256 2886.25 2893.75 7.50 375.0 2894.75
1257 2892.00 2897.00 5.00 250.0 2897.25
1258 2910.25 2915.50 5.25 262.5 2920.75
1259 2923.25 2927.25 4.00 200.0 2930.00
FiveHighAfterClose FiveLowClose FiveLowAfterClose ORBOtarget counterhigh
1254 0.00 2881.00 2880.75 0 0
1255 2896.00 2891.75 2891.00 0 6
1256 2894.75 2886.00 2885.00 0 7
1257 2897.50 0.00 0.00 0 7
1258 2921.75 0.00 0.00 0 7
1259 2931.50 2922.50 2922.25 0 7
counterlow DollarOpentoHigh DollarOpentoClose DollarOpentoLow
1254 7 312.5 125.0 412.5
1255 7 287.5 187.5 400.0
1256 7 337.5 87.5 437.5
1257 0 362.5 37.5 175.0
1258 0 1125.0 612.5 187.5
1259 7 562.5 400.0 512.5
This is the data that is in the minutes dataframe - fiveyrminutes:
Date Time Open High Low Close Up Down UpperBand
509796 06/19/2019 16:10 2932.25 2932.50 2932.25 2932.50 717 430 2935.66
509797 06/19/2019 16:11 2932.25 2932.50 2932.25 2932.25 125 276 2935.46
509798 06/19/2019 16:12 2932.25 2932.75 2932.25 2932.75 612 604 2934.95
509799 06/19/2019 16:13 2932.50 2933.25 2932.50 2933.00 830 153 2934.66
509800 06/19/2019 16:14 2933.25 2933.25 2932.75 2933.00 676 376 2934.26
509801 06/19/2019 16:15 2932.75 2934.00 2932.75 2933.25 2929 2026 2933.90
LowerBand MidLine PP RSI OverBot OverSld SlowK SlowD OverBot.1
509796 2930.27 2932.96 2932.42 47.94 70 30 45.45 40.66 80
509797 2930.24 2932.85 2932.42 46.22 70 30 53.70 46.21 80
509798 2930.45 2932.70 2932.33 50.07 70 30 63.83 54.33 80
509799 2930.56 2932.61 2932.58 51.92 70 30 72.73 63.42 80
509800 2930.76 2932.51 2932.92 51.92 70 30 83.33 73.30 80
509801 2930.97 2932.44 2933.00 53.90 70 30 85.00 80.35 80
OverSld.1 Volume Momentum ZeroLine
509796 20 1147 -0.50 0
509797 20 401 -1.00 0
509798 20 1216 1.75 0
509799 20 983 1.75 0
509800 20 1052 2.00 0
509801 20 4955 1.75 0
This is the output for the orders dataframe, notice how it's empty - ORBOorders:
ORBOorders
orderdate orderstrategy ordertype ordersymbol orderentry orderclose
1 0 0
orderprofit
1 0
These are the problems:
-counter3 isn't working (to find after which point I should buy)
-The second batch of code gives this error:
Error in beginforloop:length(todaysdate$Date) : argument of length 0
In addition: Warning message: In if (possibletargets >
fiveyrdaily$FiveHighAfterClose[i]) { : the condition has length > 1
and only the first element will be used
-The ORBOorder dataframe has absolutely no data in it.
Thanks for any help in advance!

Product between two data.frames columns

I have two data.frames:
The first one is the coefficients of my regressions for each day:
> parametrosBase
beta0 beta1 beta4
2015-12-15 0.1622824 -0.012956819 -0.04637442
2015-12-16 0.1641884 -0.007914548 -0.06170213
2015-12-17 0.1623660 -0.005618474 -0.05914809
2015-12-18 0.1643263 0.005380472 -0.08533237
2015-12-21 0.1667710 0.003824588 -0.09040071
The second one is: the independent (x) variables:
> head(ir_dfSTORED)
ind m h0x h1x h4x beta0_h0x beta1_h1x beta4_h4x
1 2015-12-15 21 1 0.5642792 0.2859359 0 0 0
2 2015-12-15 42 1 0.3606713 0.2831963 0 0 0
3 2015-12-15 63 1 0.2550200 0.2334554 0 0 0
4 2015-12-15 84 1 0.1943071 0.1883048 0 0 0
5 2015-12-15 105 1 0.1561231 0.1544524 0 0 0
6 2015-12-15 126 1 0.1302597 0.1297947 0 0 0
> tail(ir_dfSTORED)
ind m h0x h1x h4x beta0_h0x beta1_h1x beta4_h4x
835 2015-12-21 2415 1 0.006799321 0.006799321 0 0 0
836 2015-12-21 2436 1 0.006740707 0.006740707 0 0 0
837 2015-12-21 2457 1 0.006683094 0.006683094 0 0 0
838 2015-12-21 2478 1 0.006626457 0.006626457 0 0 0
839 2015-12-21 2499 1 0.006570773 0.006570773 0 0 0
840 2015-12-21 2520 1 0.006516016 0.006516016 0 0 0
What i want is to multiply the beta0 column of "parametrosBase" by h0x column of "ir_dfSTORED" and store the result in the beta0_h0x column. And the same for the others: beta1 and beta4
The problem im facing is with the dates in "ind" column. This multiplication has to respect the dates.
So, once i change the day in "ir_dfSTORED" i have to change to the same day in "parametrosBase".
For example:
The first rowof "parametrosBase" df is
2015-12-15 0.1622824 -0.012956819 -0.04637442
is fixed for the 2015-12-15 day. And then i do the product. Once i enter on the 2015-12-16 day i will have to consider the second row of "parametrosBase" df.
How can i do this?
Thanks a lot. :)
Maybe you should merge the two datasets first:
parametrosBase$ind <- rownames(parametrosBase)
df <- merge(ir_dfSTORED,parametrosBase)
df <- within(df,{
beta0_h0x <- beta0*h0x
beta1_h0x <- beta1*h0x
beta4_h0x <- beta4*h0x
})
Since I don't know the structure of the data, you may have to convert the dates from rownames to a date format in order for the merge to work. Using ind as the name of the date in parametrosBase is key to making merge work, otherwise you'll have to specify the variables to merge by.

Plotting different columns on the same file using boxes

I have a file that looks like
$cat myfile.dat
1 8 32 19230 1.186 3.985
1 8 64 9620 0.600 7.877
1 8 128 4810 0.312 15.136
1 8 256 2410 0.226 20.927
1 8 512 1210 0.172 27.708
1 8 1024 610 0.135 35.582
1 8 2048 310 0.121 40.172
1 8 4096 160 0.117 43.141
1 8 8192 80 0.112 44.770
.....
2 8 16384 300 0.692 6.816
2 8 32768 150 0.686 6.877
2 8 65536 80 0.853 5.904
2 10 320 7830 1.041 4.575
2 10 640 3920 0.919 5.189
2 10 1280 1960 0.828 5.757
2 10 2560 980 0.773 6.167
2 10 5120 490 0.746 6.391
2 10 10240 250 0.748 6.507
2 10 20480 130 0.770 6.567
....
3 18 8192 10 1.311 12.759
3 20 32 650 1.631 3.978
3 20 64 330 0.838 7.863
3 20 128 170 0.483 14.046
3 20 256 90 0.508 14.160
3 20 512 50 0.559 14.283
3 20 1024 30 0.665 14.405
3 20 2048 20 0.865 14.782
3 20 4096 10 0.856 14.932
3 20 8192 10 1.704 14.998
As you can see, there are many ways of plotting this information depending on the column we want as x axis. One of the ways I would like to plot the information is the 6th against the 1st column
p "myfile.dat" u 1:6
My main questions is if there is a way to plot those bars as solid boxes since we are only interested in the peak value achieved and not the frequency or density region of the dots.
Gnuplot has the smooth option, which can be used e.g. as smooth frequency to sum all y-values for the same x-value. Unfortunately there is no smooth maximum, which you would need here, but one can 'emulate' that with a bit of tricking in the Using statement.
reset
xval = -1000
max(x, y) = (x > y ? x : y)
maxval = 0
colnum = 6
set boxwidth 0.2
plot 'mydata.dat' using (val = column(colnum), $1):\
(maxval_prev = (xval == $1 ? maxval : 0), \
maxval = (xval == $1 ? max(maxval, val) : val),\
xval = $1, \
(maxval > maxval_prev ? maxval-maxval_prev : 0)\
) \
smooth frequency lw 3 with boxes t 'maximum values'
Every using entry can consist of different assignments, which are separated by a comma.
If a new x value appears, the variables are initialized. This works, because the data is made monotonic in x by smooth frequency.
If the current value is bigger than the stored maximum value, the difference between the stored maximum value and the current value is added. Potentially, this could result in numerical errors due to repeated adding and subtracting, but judging from you sample data and given the resolution of the plot, this shouldn't be a problem.
The result for you data is:
You can search for the maximum and plot only that, but this is probably easier, even if it draws lots of boxes one over another:
plot "myfile.dat" using 1:6:(.1) with boxes fillstyle solid

Using a column entry as a "selector" for datasets in R

My array looks like this:
Slide Index A B C DoseGroup
482 778 l 0 0 2 13Gy_p_75_42wk
483 778 r 0 0 2 13Gy_p_75_42wk
484 779 l 0 0 2 13Gy_p_75_42wk
485 779 r 0 0 2 13Gy_p_75_42wk
486 4700 l 2 2 2 14.25Gy_C_50pl_42wk
487 4700 r 0 0 1 14.25Gy_C_50pl_42wk
488 4701 l 0 0 1 14.25Gy_C_50pl_42wk
I would like to use the DoseGroup column's entries to be able to select the respective entries in the other columns. I would like to be able to tell R, e.g., "Do a wilcox.test between the 13Gy_p_75_42wk and the 14.25Gy_C_50pl_42wk datasets using column C."
How can I do this with R? Is there some kind of way to select all columns having the entry 14.25Gy_C_50pl_42wk?
I modified your data to add a third level in DoseGroup to make it more realistic.
txt <- "Slide Index A B C DoseGroup
778 l 0 0 2 13Gy_p_75_42wk
778 r 0 0 2 13Gy_p_75_42wk
779 l 0 0 2 13Gy_p_75_42wk
779 r 0 0 2 13Gy_p_75_42wk
4700 l 2 2 2 14.25Gy_C_50pl_42wk
4700 r 0 0 1 14.25Gy_C_50pl_42wk
4701 l 0 0 1 14.25Gy_C_50pl_42wk
4702 l 0 0 10 15Gy_C_50pl_42wk"
dat <- read.table(text = txt, header = TRUE)
wilcox.test(C ~ DoseGroup, data = dat,
subset = DoseGroup %in% c("13Gy_p_75_42wk", "14.25Gy_C_50pl_42wk"))
## Wilcoxon rank sum test with continuity correction
## data: C by DoseGroup
## W = 10, p-value = 0.1175
## alternative hypothesis: true location shift is not equal to 0
To select data, you can use one of these two command.
dat[dat$DoseGroup == "14.25Gy_C_50pl_42wk", ]
subset(dat, DoseGroup == "14.25Gy_C_50pl_42wk")
Those commands are basics in R and if you read any introduction to R, you'll be able to do same.
So I urge you to do so, I you want to really enjoy R.

creating vector from 'if' function using apply in R

I'm tyring to create new vector in R using an 'if' function to pull out only certain values for the new array. Basically, I want to segregate data by day of week for each of several cities. How do I use the apply function to get only, say, Tuesdays in a new array for each city? Thanks
It sounds as though you don't want if or apply at all. The solution is simpler:
Suppose that your data frame is data. Then subset(data, Weekday == 3) should work.
You don't want to use the R if. Instead use the subsetting function [
dat <- read.table(text=" Date Weekday Holiday Atlanta Chicago Houston Tulsa
1 1/1/2008 3 1 313 313 361 123
2 1/2/2008 4 0 735 979 986 310
3 1/3/2008 5 0 690 904 950 286
4 1/4/2008 6 0 610 734 822 281
5 1/5/2008 7 0 482 633 622 211
6 1/6/2008 1 0 349 421 402 109", header=TRUE)
dat[ dat$Weekday==3, ]

Resources