I am trying to use trapezoid integration, and as a result integrating the survival function(1-F(x)) of a mixture of lognormal distribution, using the code below:
for(i in 1: nrow(mydf)){
mydf$Ex[i]<- trapzfun(function(x){(1-pnorm((log(x)-mydf$mu1[i])/mydf$sd1[i]))*mydf$pmix1[i]+(1-pnorm((log(x)-mydf$mu2[i])/mydf$sd2[i]))*mydf$pmix2[i]},a=0,b=1)
mu1,sd1,pmix1 is my mean,std in logscale. pmix1 and pmix2 are mixing proportions.
While this is running well, it is taking about 3-5 hours of run time.
I guess "for loops" are not a good method, and I am pretty new to R. I did try to use an apply function:
mixture<- function(x){
trapzfun(function(x){(1-plnorm(x,mydf$mu1,mydf$sd1))*mydf$pmix1+(1-plnorm(x,mydf$mu2,mydf$sd2))*mydf$pmix2},a=0,b=1)
}
[Note the similarity with plnorm and pnorm using the transformation I applied]
mydf$Ex_1<- apply(mydf,1,mixture)
It is returning list(value = c(0.055257000747731, 0.055257000747731, 0.00.....
Can you please help me on this problem!
Related
I was trying to convert the Chan code about Pairs Trading strategy with dynamic hedge ratio into R.
I have already done all work with a steady hedge, but if I want to replicate his "for" loop I'm in trouble. Here my code for that part
lookback=20
hedge=rep(NaN,length(stockY))
for (i in lookback:length(hedge)){
reg=lm(stockY[i-lookback +1:i]~stockX[i-lookback +1:i])
hedge[i]=reg[1]$coeff[2]
}
I tried many different attempts but my low level in R is pretty evident here. I'm not trying to use a "lapply" function but just a for loop. Hope someone can help me. Thanks
Ok It seems I did it. The following is my code:
lookback=20
hedge=data.frame(hedge=rep(0,length(stockY)))
for (i in lookback:length(stockY)){
reg=summary(lm(stockY[i-lookback +1:i]~stockX[i-lookback +1:i]))
hedge[i,1]=reg$coefficients[2,1]
}
So, now I'd know how to extract my residuals. I mean, I need to list all residuals from regression. Unfortunately if I write reg$residuals it returns my last 20 residuals from the last iteration of the loop. So, I tried to include another "res" vector like "hedge" but.....I can't extract my residuals. Please, can someone help me?
I have spectral data that I am trying to run PCA on. To learn how to do this I have created a matrix with two distinct groups and then pulled a file in that has typical wavelengths. I am pre-processing the data and have run into issues with baseline correction. I can do it with a for loop, but want to know if I can do it with apply instead. I get different errors depending on what I am trying and don't know if it is even possible. My most recent error is:
Error in matrix(0, np[1], np[2]) : non-numeric matrix extent.
I get this immediately after:
playdata.baseline<-apply(playdata,1, baseline,lambda=1,hwi=20,it=30,int=800,method='fillPeaks')
Can I use apply for the baseline function? If yes, why is it not working with the rest of the code that works fine using a for loop for baseline?
Here is what works:
#baseline corrections
playdata.baseline=matrix(0,ncol=nrow(playdata),nrow=ncol(playdata))
playdata.bc=c()
playdata=t(playdata)
for (n in 1:(length(playdata[,1]))){
playdata.bc=baseline(playdata[n,,drop=FALSE],
lambda=1, hwi=20, it=30,
int=800, method='fillPeaks')
playdata.baseline[n,]=playdata.bc#corrected
}
playdata=playdata.baseline
playdata=t(playdata)
Keeping everything the same until # baseline corrections, the apply attempt is as follows:
#baseline corrections
playdata=t(playdata)
playdata.baseline <- apply(playdata, 1, baseline, lambda=1, hwi=20,
it=30, int=800, method='fillPeaks')
playdata=t(playdata)
I am using integrate function (in R) to numerically compute integrals. I have an univariate function with one argument f(x,a) like this (just for example purpose):
test = function(x,a) 1/sqrt(2*pi)*exp(-(x-a)^2/2)
I want to define new univariate function, which is a function of a after integrate the above function:
testa = function(a) integrate(test,0,Inf,a=a)$value #this works
Now my question is that is it possible to use integrate function on function testa ? For example:
integrate(testa,0,1) # not working
I tried and it is not working (got error message evaluation of function gave a result of wrong length). I already know that one can apply multivariate integration procedure directly on test (for example use adaptIntegrate function from cubature package). But that is not my purpose!
So does anyone know how to apply successively integrate function like the example above? Or confirm that this is not permitted in R?
Thank in advance
integrate needs a vectorized function. You can use Vectorize:
integrate(Vectorize(testa),0,1)
#0.6843731905 with absolute error < 0.00000000000022
Disclaimer: I haven't checked the result for correctness.
I have multiple performance objects created using ROCR. Each of these contain auc or fpr/tpr values for a class. In turn they have results for multiple test runs. So,
length(first.perf.obj#y.values)
gives something > 1.
I can plot average for a single class using
plot(first.perf.obj, avg="vertical")
as described in the ROCR manual. I want to combine these objects to calculate and plot their global average. Something like
global.perf.obj <- combine.perf.objects(first.perf.obj, second.perf.obj, third.perf.obj)
Is there an easy way to do this, or should I decompose each object and calculate values by hand?
I went back recreating prediction objects for the global case.
I'm calling the prediction function like
global.prediction <- prediction(c(cls1.likelihood,
cls2.likelihood,
cls3.likelihood,
cls4.likelihood,
cls5.likelihood),
c(duplicate.cols(cls1.labels, ncol(cls1.likelihood)),
duplicate.cols(cls2.labels, ncol(cls2.likelihood)),
duplicate.cols(cls3.labels, ncol(cls3.likelihood)),
duplicate.cols(cls4.labels, ncol(cls4.likelihood)),
duplicate.cols(cls5.labels, ncol(cls5.likelihood))),
label.ordering=c(FALSE, TRUE))
for duplicate.cols simply builds a data.frame of repeating labels.
Then I'm able to get any statistic for the global case by e.g. performance(global.prediction, "auc")
It's a bit slow, but I think it's simpler than trying to combine values from multiple performance objects.
I'm using perl+R to analyze a large dataset of samples. For each two samples, I calculate the t-test p-value. Currently, I'm using the statistics::R module to export values from perl to R, and then use the t.test function. However, this process is extremely slow. I was wondering if someone knows a perl function that will do the same procedure, in a more efficient manner.
Thanks!
The volume of data, the number of dataset pairs, and perhaps even the code you have written would probably help us identify why your code is slow. For instance, sending many small datasets to R would be slow, but can probably be sped up simply by sending all the data at once.
For a pure Perl solution, you first need to compute the test statistic (that is easy, and already done in
Statistics::TTest,
for instance), and then to convert it to a p-value (you need something like R's qt function, but I am not sure it is readily available in Perl -- you could send the T-values to R, in one block, at the end, to convert them to p-values).
You can also try PDL, in particular PDL::Stats.
The Statistics::TTest module gives you a p-value.
use Statistics::TTest;
my #r1 = map { rand(10) } 1..32;
my #r2 = map { rand(10)-2 } 1..32;
my $ttest = new Statistics::TTest;
$ttest->load_data(\#r1,\#r2);
say "p-value = prob > |T| = ", $ttest->{t_prob};
Playing around a bit, I find that the p-values that this gives you are slightly lower than what you get from R. R is apparently doing something that reduces the degrees of freedom, but my knowledge of statistics is insufficient to explain what it's doing or why. (In the above example, the difference is about 1%. If you use samples of 320 floats instead of 32, then the difference is 50% or even more, but it's a difference between 1e-12 and 1.5e-12.) If you need precise p-values, you will want to take care.