Vectorize simple loop-based trading system in R? - r

After studying the vectorization methods used in the two links below, I've attempted to create a simple trading strategy template (code shown below) that can be vectorized in R for better speed vs a loop-based structure. I'm having difficulty vectorizing because variable state must be maintained and built upon such as:
1) The signals I'm using are not mutually exclusive for long and short (as in a simplistic MA crossover system).
2) Once triggered, the signal can wander until it gets an opposing indication (such as an RSI go short above 80, go long below 20 type system).
3) Positions are held for multiple periods so it isn't a case of enter on every signal or exit after a signal is false (I wish to be able to only enter once as in a Stop and Reverse or SAR system).
I consider this to be a simple example system but it is a bit more complex than the examples listed here:
http://blog.fosstrading.com/2011/03/how-to-backtest-strategy-in-r.html
Cumulative Return in Trading Strategy Test
System logic summary: The system starts flat then goes long (short) at the ask (bid) price when zscore is below (above) -2 (2). The system keeps track of performance statistics such as 'trades', 'wins', closed P&L (others omitted for simplicity). The system also keeps a running 'equity' for plotting after a system run.
# assume vectors bid, ask, and zscore containing those price series respectively
# pos = current position where 1 == long, -1 == short, 0 == flat
# entryP = entry price, pnl = open pnl, cpnl = closed pnl
pos = 0; entryP = 0.0; pnl = 0; cpnl = 0; trades = 0; wins = 0
ub = length(bid)
equity = rep(0, ub)
for (i in 10000:ub) {
pnl = 0
if (pos > 0) pnl = bid[i] - entryP
if (pos < 0) pnl = entryP - ask[i]
if (zscore[i] > 2.0 && pos != -1) { # go short
if (pos > 0) { # exit long and record pnl
cpnl = cpnl + pnl
if (pnl > 0) wins = wins + 1
trades = trades + 1
pnl = 0
}
pos = -1
entryP = bid[i]
} else if (zscore[i] < -2.0 && pos != 1) { # go long
if (pos < 0) { # exit short and record pnl
cpnl = cpnl + pnl
if (pnl > 0) wins = wins + 1
trades = trades + 1
pnl = 0
}
pos = 1
entryP = ask[i]
}
equity[i] = cpnl + pnl
}
# assume close-out of final position
cpnl = cpnl + pnl
if (pnl > 0) wins = wins + 1
if (pos != 0) trades = trades + 1
# plot equity chart and report performance stats
plot(equity, t='l', lwd=3)
cpnl;trades; cpnl / trades; wins/trades
Is it possible to vectorize this simple loop-based mean reversion trading system in R?

" I'm having difficulty vectorizing because variable state must be maintained "
That sums it all. You can't avoid loops if your result in any iteration depends on previous iterations.

Related

Squid game Episode 7 with simulation

Last night I saw the episode 7 of the Squid game tv series. The episode has a game with binomial distribution in the bridge.
Specifically there are 16 players and a bridge with 18 pair of glasses (one pure glass and one safe glass).If one player happened to choose the pure glass then the glass couldn't stand the weight of the player and the glass broke. The next player had the advantage that he/she was starting from the position that the last player had and continues the binomial search.At the end 3 players happened to cross the bridge.
So i was wondering: It is like, I have 16 euros in my pocket and I play head or tails with p = 1/2. Every time I bet on heads. If the coin flip is head then I earn 0 and if is tails I lose 1 euro. What is the probability of hitting 18 times (consecutive or not) heads and to be left 3 euros in my pocket.
I tried to simulate this problem in R:
squid_bridge = function(a,n,p) {
players = a
while (position > 0 & position < n) {
jump = sample(c(0,1),1,prob=c(1-p,p))
position = position + jump
}
if (position == 0)
return(1)
else
return(0)
}
n = 18
trials = 100000
a = 16
p = 1/2
set.seed(1)
simlist = replicate(trials, squid_bridge(a, n, p))
It does not seem to work. Any help?
Here is a Monte Carlo experiment in R returning the distribution of the number of failures.
apply(apply(matrix(rgeom(16*1e6,.5)+1,nc=16),1,cumsum)>18,1,mean)
#with details:
#rgeom(16*1e6,.5)+1 for 16 x 10⁶ geometric simulations when
#the outcome is the number of attempts till "success",
# "success" included
#,1,cumsum) for the number of steps till 16th "success"
#)>18 for counting the cases when a player manages to X the bridge
#1,mean) for finding the probability of each of the players to X
This is not a Binomial but a truncated Negative Binomial experiment in that the number of new steps made by each player is a Geometric Geom(1/2) variate unless the 18 steps have been made. The average number of survivors is thus
sum(1-pnbinom(size=1:16,q=17:2,prob=.5))
#Explanation:
#pnbinom is the Negative Binomial cdf
#with size the number of "successes"
#q the integer at which the cdf is computed
#prob is the Negative Binomial probability parameter
#Because nbinom() is calibrated as the number of attempts
#before "success", rather than until "success", the value of
#q decreases by one for each player in the game
whose value is 7.000076, rather than 16-18/2=7!
Here is how I think you can model the game in R. The first version is similar to what you have: there's a 50% chance of guessing correctly and if the guess is correct, the players advance a tile. Otherwise they do not, and the number of players decrements by 1. If the number of players reaches 0, or they advance to the end, the game ends. This is shown in squid_bridge1().
squid_bridge1 <- function(players, n, prob) {
if (players == 0 | n == 18) {
# All players have died or we have reached the end
return(players)
}
jump <- rbinom(1, 1, prob)
if (jump == 0) {
# Player died
return(squid_bridge1(players - 1, n, prob))
}
if (jump == 1 & n < 18) {
# Player lives and advances 1 space
return(squid_bridge1(players, n + 1, prob))
}
}
However, this does not accurately depict the game since a wrong guess gives the remaining players additional information. If a player chooses wrong, the probability of the next guess being correct is not 50%, it's 100%. However, after that point the probability of a correct guess decreases to 50%. This can be accounted for with another argument to keep track of the correctness of the previous guess.
squid_bridge2 <- function(players, n, prob, previous) {
if (players == 0 | n == 18) {
# The game ends if there are no players or they have reached the end
return(players)
}
if (previous == 0) {
# The previous guess was wrong, but now the players know where to go next
return(squid_bridge2(players, n + 1, prob, previous = 1))
}
jump <- rbinom(1, 1, prob)
if (jump == 0) {
# Player died
return(squid_bridge2(players - 1, n, prob, previous = 0))
}
if (jump == 1 & n < 18) {
# Move is correct. Advance 1 space
return(squid_bridge2(players, n + 1, prob, previous = 1))
}
}
However, there's a catch. It wasn't quite that simple in the show, and players fell for reasons other than an incorrect guess (being pushed, jumping on purpose, etc.). I don't know what a reasonable probability of doing something like this is, but it is likely low, let's say 10%.
not_stupid <- function() {
x <- runif(1, 0, 1)
if (x <= 0.1) {
return(FALSE)
} else {
return(TRUE)
}
}
Since emotions spike just before each move, we will test this prior to each move.
squid_bridge3 <- function(players, n, prob, previous) {
if (players == 0 | n == 18) {
# The game is over because there are no players left or they reached the end
return(players)
}
if (previous == 0) {
# The previous guess was wrong, but now the players know where to go next
return(squid_bridge3(players, n + 1, prob, previous = 1))
}
if (!not_stupid()) {
return(squid_bridge3(players - 1, n, prob, previous = 1))
}
jump <- rbinom(1, 1, prob)
if (jump == 0) {
# Player died because of either choosing wrong or a self-inflicted loss
return(squid_bridge3(players - 1, n, prob, previous = 0))
}
if (jump == 1 & n < 18) {
# Move is correct. Advance 1 space
return(squid_bridge3(players, n + 1, prob, previous = 1))
}
}
Then running some simulations:
set.seed(123)
trials <- 10000
players <- 16
squid1 <- replicate(trials, squid_bridge1(players, 0, 0.5))
squid2 <- replicate(trials, squid_bridge2(players, 0, 0.5, 1))
squid3 <- replicate(trials, squid_bridge3(16, 0, 0.5, 1))
df <- tibble(squid1 = squid1,
squid2 = squid2,
squid3 = squid3) %>%
pivot_longer(cols = c(squid1, squid2, squid3))
ggplot(data = df,
aes(x = value)) +
geom_histogram(bins = 10,
binwidth = 1,
fill = "cornflowerblue",
color = "black") +
facet_wrap(~name,
nrow = 3) +
xlab("# of players to make it to the end") +
scale_x_continuous(breaks = seq(0, 16, by = 1),
labels = seq(0, 16, by = 1))
As you can see below, the first situation is heavily skewed to the left. Since the players are essentially "blindly guessing" at each tile, it is unlikely that any will make it to the end. However, after accounting for the information gained from a wrong guess, it averages somewhere around 7 players making it. By adding in a random chance of falling for another reason, the distribution skews to the left some.
Average for first situation: 1.45
Average for second situation: 7.01
Average for third situation: 4.99
To answer the question of the probability of only 3 players making it, I get ~ 10.8% for the last case
Edit: As requested, here is the code to generate the plots. I also fixed the various functions that had some naming issues (went through a few different names when I made them). It looks like it resulted in a slight bug for the 3rd function, but I have fixed it throughout.
○ △ □
##########
# Game ○ △ □
##########
squidd7<-function(N_Fields,N_Players,p_new_field){
Players<-data.frame(id = 1:N_Players, alive=rep(1,N_Players),Field=0)
for(i in 1:N_Players){
while (Players[i,"alive"]==TRUE && max(Players$Field)< N_Fields) {
Players[i,"Field"]=Players[i,"Field"]+1 # Jump onto the next Field
Players[i,"alive"]=rbinom(1,1,p_new_field)# Fall or repeat
}
Players[i+1,"Field"]=Players[i,"Field"] # next player starts where prior player died
}
Players<-Players[1:N_Players,] # cosmetic because i do i+1 in the prior line
# Print me some messages
if(!any(Players$alive>0)){
cat("Players loose!")
} else {
cat(" \n After", max(Players$Field),"goal was reached! ")
cat("Players",Players[Players$alive==1,"id"], "survive")
}
return(Players)
}
squidd7(18,16,0.5)
###########
# simulation ○ △ □
###########
results<-data.frame(matrix(0, nrow = 100, ncol = 20))
for(x in 1:ncol(results)){
for (i in 1:nrow(results)) {
Players<-squidd7(x+7,16,0.5)
results[i,x]<-sum(Players$alive)
}
}
###########
## Results ○○□□○ △ □
sdt<-apply(results,2,sd) # standart devation
mn<-apply(results,2,mean) # ○ △ □
boxplot(results,xlab ="n Steps ",names = 8:27,ylab="N Survivors of 16 ")
points(mn,type="l")
points(sdt,type="l")
colors<-colorRampPalette(c(rgb(0,1,0,0.4),
rgb(1,1,0,0.4),
rgb(1,0,0,0.4)), alpha = TRUE)(21)
plot(density(results$X1),type="n",xlim=c(-1,17),ylim=c(0,0.30),
main="○ △ □ ",
sub="○ △ □ ○ △ □ ○ △ □",
xlab="number of survivors")
for( i in 1:21){
polygon(density(results[,i]),col= colors[i])
}
legend(15,0.31,title="Steps",legend=8:28,fill=colors,border = NA,
y.intersp = 0.5,
cex = 0.8, text.font = 0.3)
well to simulate this game. you will need 50%/50% chance 16 times. meaning all you have to code is 50% chance of not losing and run it 16 times than if you lose you it will do a -1 form a varible of 18. that will create a perfect digital recreation of squidgame bridge
If I'm understanding correctly, I believe the other answers are complicating the simulation.
We can simulate draws from a binomial distribution of size 18. Every 1 kills someone (assuming there is anyone left to kill). Thus we can calculate the number of survivors by subtracting the number of 1s drawn from the number of players, truncated at 0 (any negative results are counted as 0, via pmax()).
set.seed(47)
n_sim = 1e4
survivors = pmax(0, 16 - rbinom(n_sim, size = 18, prob = 0.5))
mean(survivors)
# [1] 7.009
The mean seems to approach Xi'an's answer of 7.000076 as n_sim increases. At 500M simulations (which still runs rather quickly with this method!) we get a mean of 7.000073.
Plotting these results, they appear basically identical to cazman's squid2 scenario.
ggplot(data.frame(survivors), aes(x = survivors)) + geom_bar() +
scale_y_continuous(limits = c(0, 6000))

Error in for loop - attempt to select less than one element in integerOneIndex

I'm trying to translate a C routine from an old sound synthesis program into R, but have indexing issues which I'm struggling to understand (I'm a beginner when it comes to using loops).
The routine creates an exponential lookup table - the vector exptab:
# Define parameters
sinetabsize <- 8192
prop <- 0.8
BP <- 10
BD <- -5
BA <- -1
# Create output vector
exptab <- vector("double", sinetabsize)
# Loop
while(abs(BD) > 0.00001){
BY = (exp(BP) -1) / (exp(BP*prop)-1)
if (BY > 2){
BS = -1
}
else{
BS = 1
}
if (BA != BS){
BD = BD * -0.5
BA = BS
BP = BP + BD
}
if (BP <= 0){
BP = 0.001
}
BQ = 1 / (exp(BP) - 1)
incr = 1 / sinetabsize
x = 0
stabsize = sinetabsize + 1
for (i in (1:(stabsize-1))){
x = x + incr
exptab [[sinetabsize-i]] = 1 - (BQ * (exp(BP * x) - 1))
}
}
Running the code gives the error:
Error in exptab[[sinetabsize - i]] <- 1 - (BQ * (exp(BP * x) - 1)) :
attempt to select less than one element in integerOneIndex
Which, I understand from looking at other posts, indicates an indexing problem. But, I'm finding it difficult to work out the exact issue.
I suspect the error may lie in my translation. The original C code for the last few lines is:
for (i=1; i < stabsize;i++){
x += incr;
exptab[sinetabsize-i] = 1.0 - (float) (BQ*(exp(BP*x) - 1.0));
}
I had thought the R code for (i in (1:(stabsize-1))) was equivalent to the C code for (i=1; i< stabsize;i++) (i.e. the initial value of i is i = 1, the test is whether i < stabsize, and the increment is +1). But now I'm not so sure.
Any suggestions as to where I'm going wrong would be greatly appreciated!
As you say, array indexing in R starts at 1. In C it starts at zero. I reckon that's your problem. Can sinetabsize-i ever get to zero?

Implementing the Izhikevich neuron model

I'm trying to implement the spiking neuron of the Izhikevich model. The formula for this type of neuron is really simple:
v[n+1] = 0.04*v[n]^2 + 5*v[n] + 140 - u[n] + I
u[n+1] = a*(b*v[n] - u[n])
where v is the membrane potential and u is a recovery variable.
If v gets above 30, it is reset to c and u is reset to u + d.
Given such a simple equation I wouldn't expect any problems. But while the graph should look like , all I'm getting is this:
I'm completely at loss what I'm doing wrong exactly because there's so little to do wrong. I've looked for other implementations but the code I'm looking for is always hidden in a dll somewhere. However I'm pretty sure I'm doing exactly what the Matlab code of the author (2) is doing. Here is my full R code:
v = -70
u = 0
a = 0.02
b = 0.2
c = -65
d = 6
history <- c()
for (i in 1:100) {
if (v >= 30) {
v = c
u = u + d
}
v = 0.04*v^2 + 5*v + 140 - u + 0
u=a*(b*v-u);
history <- c(history, v)
}
plot(history, type = "l")
To anyone who's ever implemented an Izhikevich model, what am I missing?
usefull links:
(1) http://www.opensourcebrain.org/projects/izhikevichmodel/wiki
(2) http://www.izhikevich.org/publications/spikes.pdf
Answer
So it turns out I read the formula wrong. Apparently v' means new v = v + 0.04*v^2 + 5*v + 140 - u + I. My teachers would have written this as v' = 0.04*v^2 + 6*v + 140 - u + I. I'm very grateful for your help in pointing this out to me.
Take a look at the code that implements the Izhikevich model in R below. It results in the following R plots:
Regular Spiking Cell:
Chattering Cell:
And the R code:
# Simulation parameters
dt = 0.01 # ms
simtime = 500 # ms
t = 0
# Injection current
I = 15
delay = 100 # ms
# Model parameters (RS)
a = 0.02
b = 0.2
c = -65
d = 8
# Params for chattering cell (CH)
# c = -50
# d = 2
# Initial conditions
v = -80 # mv
u = 0
# Input current equation
current = function()
{
if(t >= delay)
{
return(I)
}
return (0)
}
# Model state equations
deltaV = function()
{
return (0.04*v*v+5*v+140-u+current())
}
deltaU = function()
{
return (a*(b*v-u))
}
updateState = function()
{
v <<- v + deltaV()*dt
u <<- u + deltaU()*dt
if(v >= 30)
{
v <<- c
u <<- u + d
}
}
# Simulation code
runsim = function()
{
steps = simtime / dt
resultT = rep(NA, steps)
resultV = rep(NA, steps)
for (i in seq(steps))
{
updateState()
t <<- dt*(i-1)
resultT[i] = t
resultV[i] = v
}
plot(resultT, resultV,
type="l", xlab = "Time (ms)", ylab = "Membrane Potential (mV)")
}
runsim()
Some notes:
I've picked the parameters for the "Regular Spiking (RS)" cell from Izhikevich's site. You can pick other parameters from the two upper-right plots on that page. Uncomment the CH parameters to get a plot for the "Chattering" type cell.
As commenters have suggested, the first two equations in the question are incorrectly implemented differential equations. The correct way to implement the first one would be something like: "v[n+1] = v[n] + (0.04*v[n]^2 + 5*v[n] + 140 - u[n] + I) * dt". See the code above for example. dt refers to the user specified time step integration variable and usually dt << 1 ms.
In the for loop in the question, the state variables u and v should be updated first, then the condition checked after.
As noted by others, a current source is needed for both of these cell types. I've used 15 (I believe these are pico amps) from this page on the author's site (bottom value for I in the screenshot). I've also implemented a delay for the current onset (100 ms parameter).
The simulation code should implement some kind of time tracking so it's easier to know when the spikes are occurring in resulting plot. The above code implements this, and runs the simulation for 500 ms.

Flat top Pulse Amplitude Modulation

i want to plot flat-topped PAM of sinusoid. wave using matlab.
the sinusoidal signal has frequency = 10^4/(2*pi) HZ and sampling frequency = 8 kHZ. pulse duration T = 50 microseconds.
i wrote code for natural sampling, so how to do flat-top?
clear all;
close all;
Fs = 1e9;
t = 0:1/Fs:(0.2e-2);
fc = 8000; %sampling frequency
fm = 10^4/(2*pi); %message frequency
a = 1;
vm = a.*sin(2*pi*fm*t); %message
pulseperiods = [0:10]*1/fc;
pulsewidth = 50e-6;
vc = pulstran(t,pulseperiods,#rectpuls,pulsewidth);
y = vc.*vm;
figure
subplot(3,1,1);
plot(t,vm); % plot message
xlabel('Temps');
ylabel('Amplitude');
title('Message');
subplot(3,1,2);
plot(t,vc); % plot pulse
xlabel('Temps');
ylabel('Amplitude');
title('Switching waveform');
subplot(3,1,3);
plot(t,y); % plot PAM naturel
xlabel('Temps');
ylabel('Amplitude');
title('PAM naturel');
The flat-top PAM means the instantaneous sampling, i.e. the message signal is sampled only once per period, so modulated signal does not change its value until returning to zero and next sampling period. The sampling takes place at rising edge of carrier signal, so the solution is quite straightforward: by adding the for loop to your code:
for i = 2:length(t)
if vc(i) == 1 && vc(i-1) == 0 %if the rising edge is detected
y1(i) = vc(i) * vm(i); %sampling occurs
elseif vc(i) == 1 && vc(i-1) == 1 %and while the carrier signal is 1
y1(i) = y1(i-1); %the value of y1 remains constant
else
y1(i) = 0; %otherwise, y is zero
end
end
plot(t,y1); % flat-top PAM plot
xlabel('Temps');
ylabel('Amplitude');
title('PAM flat-top');
you get

translating matlab script to R

I've just been working though converting some MATLAB scripts to work in R, however having never used MATLAB in my life, and not exactly being an expert on R I'm having some trouble.
Edit: It's a script I was given designed to correct temperature measurements for lag generated by insulation mass effects. My understanding is that It looks at the rate of change of the temperature and attempts to adjust for errors generated by the response time of the sensor. Unfortunately there is no literature available to me to give me an indication of the numbers i am expecting from the function, and the only way to find out will be to experimentally test it at a later date.
the original script:
function [Tc, dT] = CTD_TempTimelagCorrection(T0,Tau,t)
N1 = Tau/t;
Tc = T0;
N = 3;
for j=ceil(N/2):numel(T0)-ceil(N/2)
A = nan(N,1);
# Compute weights
for k=1:N
A(k) = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)));
end
A = A./sum(A);
# Verify unity
if sum(A) ~= 1
disp('Error: Sum of weights is not unity');
end
Comp = nan(N,1);
# Compute components
for k=1:N
Comp(k) = A(k)*T0(j - (ceil(N/2)) + k);
end
Tc(j) = sum(Comp);
dT = Tc - T0;
end
where I've managed to get to:
CTD_TempTimelagCorrection <- function(temp,Tau,t){
## Define which equation to use based on duration of lag and frequency
## With ESM2 profiler sampling # 2hz: N1>tau/t = TRUE
N1 = Tau/t
Tc = temp
N = 3
for(i in ceiling(N/2):length(temp)-ceiling(N/2)){
A = matrix(nrow=N,ncol=1)
# Compute weights
for(k in 1:N){
A[k] = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)))
}
A = A/sum(A)
# Verify unity
if(sum(A) != 1){
print("Error: Sum of weights is not unity")
}
Comp = matrix(nrow=N,ncol=1)
# Compute components
for(k in 1:N){
Comp[k] = A[k]*temp[i - (ceiling(N/2)) + k]
}
Tc[i] = sum(Comp)
dT = Tc - temp
}
return(dT)
}
I think the problem is the Comp[k] line, could someone point out what I've done wrong? I'm not sure I can select the elements of the array in such a way.
by the way, Tau = 1, t = 0.5 and temp (or T0) will be a vector.
Thanks
edit: apparently my description is too brief in explaining my code samples, not really sure what more I could write that would be relevant and not just wasting peoples time. Is this enough Mr Filter?
The error is as follows:
Error in Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
replacement has length zero
In addition: Warning message:
In Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
number of items to replace is not a multiple of replacement length
If you write print(i - (ceiling(N/2)) + k) before that line, you will see that you are using incorrect indices for temp[i - (ceiling(N/2)) + k], which means that nothing is returned to be inserted into Comp[k]. I assume this problem is due to Matlab allowing the use of 0 as an index and not R, and the way negative indices are handled (they don't work the same in both languages). You need to implement a fix to return the correct indices.

Resources