I was trying to convert the Chan code about Pairs Trading strategy with dynamic hedge ratio into R.
I have already done all work with a steady hedge, but if I want to replicate his "for" loop I'm in trouble. Here my code for that part
lookback=20
hedge=rep(NaN,length(stockY))
for (i in lookback:length(hedge)){
reg=lm(stockY[i-lookback +1:i]~stockX[i-lookback +1:i])
hedge[i]=reg[1]$coeff[2]
}
I tried many different attempts but my low level in R is pretty evident here. I'm not trying to use a "lapply" function but just a for loop. Hope someone can help me. Thanks
Ok It seems I did it. The following is my code:
lookback=20
hedge=data.frame(hedge=rep(0,length(stockY)))
for (i in lookback:length(stockY)){
reg=summary(lm(stockY[i-lookback +1:i]~stockX[i-lookback +1:i]))
hedge[i,1]=reg$coefficients[2,1]
}
So, now I'd know how to extract my residuals. I mean, I need to list all residuals from regression. Unfortunately if I write reg$residuals it returns my last 20 residuals from the last iteration of the loop. So, I tried to include another "res" vector like "hedge" but.....I can't extract my residuals. Please, can someone help me?
Related
I'm trying to figure out how to get a for loop setup in R when I want it to run two or more parameters at once. Below I have posted a sample code where I am able to get the code to run and fill a matrix table with two values. In the 2nd line of the for loop I have
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],-.7))
And what I would like to do is replace the -.7 with another tt[i], example below, so that my for loop would run through the values starting at (-1,-1), then it would be as follows (-1,-.99),
(-1,-.98),...,(1,.98),(1,.99),(1,1) where the result matrix would then be populated by the output of Q and sigma.
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],tt[i]))
or something similar to
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],ss[i]))
It may be very possible that this would be better handled by two for loops however I'm not 100% sure on how I would set that up so the first parameter would be fixed and the code would run through the sequence of the second parameter, once that would get finished the first parameter would now increase by one and fix itself at that increase until the second parameter does another run through.
I've posted some sample code down below where the ARMA.var function just comes from the ts.extend package. However, any insight into this would be great.
Thank you
tt<-seq(-1,1,0.01)
Result<-matrix(NA, nrow=length(tt)*length(tt), ncol=2)
for (i in seq_along(tt)){
R<-ARMA.var(length(x_global_sample),ar=c(tt[i],-.7))
Q<-t((y-X%*%beta_est_d))%*%solve(R)%*%(y-X%*%beta_est_d)+
lam*t(beta_est_d)%*%D%*%beta_est_d
RSS<-sum((y-X%*%solve(t(X)%*%solve(R)%*%X+lam*D)%*%t(X)%*%solve(R)%*%y)^2)
Denom<-n-sum(diag(X%*%solve(t(X)%*%solve(R)%*%X+lam*D)%*%t(X)%*%solve(R)))
sigma<-RSS/Denom
Result[i,1]<-Q
Result[i,2]<-sigma
rm(Q)
rm(R)
rm(sigma)
}
Edit: I realize that what I have posted above is quite unclear so to simplify things consider the following code,
x<-seq(1,20,1)
y<-seq(1,20,2)
Result<-matrix(NA, nrow=length(x)*length(y), ncol=2)
for(i in seq_along(x)){
z1<-x[i]+y[i]
z2<-z1+y[i]
Result[i,1]<-z1
Result[i,2]<-z2
}
So the results table would appear as follow as the following rows,
Row1: 1+1=2, 2+1=3
Row2: 1+3=4, 4+3=7
Row3: 1+5=6, 6+5=11
Row4: 1+7=8, 8+7=15
And this pattern would continue with x staying fixed until the last value of y is reached, then x would start at 2 and cycle through the calculations of y to the point where my last row is as,
RowN: 20+19=39, 39+19=58.
So I just want to know if is there a way to do it in one loop or if is it easier to run it as 2 loops.
I hope this is clearer as to what my question was asking, and I realize this is not the optimal way to do this, however for now it is just for testing purposes to see how long my initial process takes so that it can be streamlined down the road.
Thank you
I am stuck with a 'for' loop and would greatly appreciate some help.
I have a dataframe, called 'df' including data for the number of people per household (household_size), ranging from 0 (I replaced the missing values with a 0) to 8, as well as the number of car.
My aim is to write a quick code that computes the average number of cars depending on the household size.
I tried the following:
avg <- function(df){
i <- df$household_size
for (i in 0 : 8){
print(mean(df$car))
}
}
I'm pretty sure I'm missing something really basic here, but I don't know what.
Thanks everyone for your input.
I wouldn't have used a function for this. However, this is an exercise as part of an introductory coding with R module that specifically requires a for-loop.
Here a solution to print the mean for each size group using a for loop. Let me know if it worked
for(i in unique(df$household_size)){
print(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
As mentioned in a comment, I took away the function part because I don't see the point of having it. But if it's mandatory, you can use lapply, that behaves a bit like a for loop according to me:
lapply(unique(df$household_size), function(i){
return(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
)
I have this code:
(%i3)depends([y,x],t)$
eqsp: [y=2*x,
v=diff(y,t,1)+y];
eliminate(eqsp,[y]);
(eqsp) [y=2*x,v='diff(y,t,1)+y]
(%o3) [-'diff(y,t,1)-2*x+v]
And this is a picture for better visualization:
PNG of code in wxMaxima
I was expecting that Maxima would perform a substitution of "y" in the second equation, and then differentiate to get "[-2*diff(x,t,1)-2*x+v]".
This is of course not the real problem (which has many more equations), it's just that I think I'm missing some concept here for Maxima to do what I want.
Thanks in advance for your comments. I'm a newbie in Stackoverflow and in Maxima, sorry if I made some mistake.
I am totally new to R. Hopefully you can help. I am trying to simulate from a Hawkes process using R. The main idea is that-first of all I simulated some events from a homogeneous Poisson process. Then each of these events will create their own children using a non homogeneous Poisson process. The code is like as below:
SimulateHawkesprocess<-function(n,tmax,lambda,lambda2){
times<-Simulatehomogeneousprocess(n,lambda)
count<-1
while(count<n){
newevent<-times[count] + Simulateinhomogeneousprocess(lambda2,tmax,lambdamax=NA)
times<-c(times,newevent)
count<-count+1
n<-length(times)
}
return(times)
}
But the r code is producing this infinite loop(probably because of the last line: (n<-length(times))). How can I overcome this problem? How can I put a stopping condition?
This is not a R specific problem. You need to get your algorithm working correctly first. Compare the code you have written against what you want to do. If you need help with the algorithm then tag the question as such. Moreover the function call to Simulateinhomogeneousprocess is very inconsistent. Some insight into that function would help. What is that function returning, a number or a vector?
Within the loop you are increasing the value of n by at least 1 each time so you never reach the end.
newevent<-times[count] + Simulateinhomogeneousprocess(lambda2,tmax,lambdamax=NA)
This creates a non empty variable
times<-c(times,newevent)
Increases the "times" vector by at least 1 (since newevent is non-empty)
count<-count+1
n<-length(times)
You increase the count by 1 but also increase the n value by atleast 1 thus creating a never ending loop. One of these things has to change for the loop to stop.
I'm using perl+R to analyze a large dataset of samples. For each two samples, I calculate the t-test p-value. Currently, I'm using the statistics::R module to export values from perl to R, and then use the t.test function. However, this process is extremely slow. I was wondering if someone knows a perl function that will do the same procedure, in a more efficient manner.
Thanks!
The volume of data, the number of dataset pairs, and perhaps even the code you have written would probably help us identify why your code is slow. For instance, sending many small datasets to R would be slow, but can probably be sped up simply by sending all the data at once.
For a pure Perl solution, you first need to compute the test statistic (that is easy, and already done in
Statistics::TTest,
for instance), and then to convert it to a p-value (you need something like R's qt function, but I am not sure it is readily available in Perl -- you could send the T-values to R, in one block, at the end, to convert them to p-values).
You can also try PDL, in particular PDL::Stats.
The Statistics::TTest module gives you a p-value.
use Statistics::TTest;
my #r1 = map { rand(10) } 1..32;
my #r2 = map { rand(10)-2 } 1..32;
my $ttest = new Statistics::TTest;
$ttest->load_data(\#r1,\#r2);
say "p-value = prob > |T| = ", $ttest->{t_prob};
Playing around a bit, I find that the p-values that this gives you are slightly lower than what you get from R. R is apparently doing something that reduces the degrees of freedom, but my knowledge of statistics is insufficient to explain what it's doing or why. (In the above example, the difference is about 1%. If you use samples of 320 floats instead of 32, then the difference is 50% or even more, but it's a difference between 1e-12 and 1.5e-12.) If you need precise p-values, you will want to take care.