How to speed up for loop in R? - r

rm(list=ls())
library(gld)
library(GLDEX)
find_likelihood = function(x){
fit = fit.fkml(x,"ML",return.data=TRUE)
params = fit$optim.results$par
dens = dgl(x, c(params), param = "fmkl", inverse.eps = 1e-08, max.iterations = 500)
dens[dens < 1e-12] = 1e-12
value = -sum(log(dens))
return(value)
}
loglik_com = NULL
min_value = NULL
for(j in 1: 1000){
x = rgl(100,c(2, 1, 1, 1), param="fkml")
for (i in 2:(100 - 2)) {
a = x[1:i]
b = x[(i + 1):100]
loglik_com[i] = find_likelihood(a) + find_likelihood(b)
}
min_value[j] = min(na.omit(loglik_com))
}
min_value
In my above R function, I have 1000 iterations but it takes nearly 7 hours to get results. My question is, am I able to use apply() function to speed up the results? Or any other way that I can use to speed up my r function?
Thank you in advance.

You could try some sort of parallelisation, I see there is the Parallel package? Have not had any personal experience however.
https://www.rdocumentation.org/packages/parallel/versions/3.6.2
https://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html

Related

R: C stack error when using recursive function - optimizing the code

I am trying to make a simple "model" of loss of an initial value based on a number of criteria, as seen in my code below:
mtDNAlen = 16299
copies = 10
mito = 50
fraction = copies*mtDNAlen*0.75
Rounds = 40
A = data.frame(
Length = c(rep(mtDNAlen,copies))
)
data = list()
for (i in 1:mito){
data[[i]]=list()
}
for (i in 1:length(data)){
for (j in 1:(Rounds+1)){
data[[i]][[j]]=data.frame(A)
}}
finaldata = data.frame(X = seq(1,copies,1))
random.sample = function(x) {
x = sample(sample.data, copies,
prob=Prob,
replace=FALSE)
if (sum(x) > fraction) return(x)
Recall(x)
}
for (i in 1:length(data)){
for(j in 1:Rounds){
data[[i]][[j]]$Deletion = sample(c("Yes","No"), nrow(data[[i]][[j]]), prob=c(0.05,0.95), replace=TRUE)
data[[i]][[j]]$DelLength = ifelse(data[[i]][[j]]$Deletion == "Yes", sample(seq(0,15000,1), replace = TRUE),0)
data[[i]][[j]]$Length = data[[i]][[j]]$Length - data[[i]][[j]]$DelLength
data[[i]][[j]] = data[[i]][[j]][data[[i]][[j]]$Length > 2000,]
data[[i]][[j]] = rbind(data[[i]][[j]],data[[i]][[j]])
Prob = c(rep(16300,nrow(data[[i]][[j]]))) - data[[i]][[j]]$Length
Prob = Prob / sum(Prob)
sample.data = c(data[[i]][[j]]$Length)
data[[i]][[j+1]]$Length = random.sample(sample.data)
finaldata[[i]] = data[[i]][[j+1]]$Length
}
}
However, when I try to run my code I get the following error:
Error: C stack usage 7969700 is too close to the limit
After searching, this seems to occur when using recursive functions. However, I am new to these types of functions, and I don't know how to optimise my code further, in order to et rig of this error.
I should mention, that when I run my code without prob=Prob in the random.sample function, there is no error. So I guess the error comes from having to redo the random.sample so many times? Can I do this in a better way to avoid the C stack error? I haven't been able to find an alternative myself.
Finally, if I set Rounds and mito to very small values, I can complete the calculation, but it is not really usable for me...
Thanks!
EDIT
I tried to switch to using repeat instead, making my for loop look like this:
for (i in 1:length(data)){
for(j in 1:Rounds){
data[[i]][[j]]$Deletion = sample(c("Yes","No"), nrow(data[[i]][[j]]), prob=c(0.05,0.95), replace=TRUE)
data[[i]][[j]]$DelLength = ifelse(data[[i]][[j]]$Deletion == "Yes", sample(seq(0,15000,1), replace = TRUE),0)
data[[i]][[j]]$Length = data[[i]][[j]]$Length - data[[i]][[j]]$DelLength
data[[i]][[j]] = data[[i]][[j]][data[[i]][[j]]$Length > 2000,]
data[[i]][[j]] = rbind(data[[i]][[j]],data[[i]][[j]])
Prob = c(rep(16300,nrow(data[[i]][[j]]))) - data[[i]][[j]]$Length
Prob = Prob / sum(Prob)
sample.data = c(data[[i]][[j]]$Length)
repeat {
v2 <- sample(sample.data, copies,
prob=Prob,
replace=FALSE)
if( sum(v2) > fraction )
break
}
return(v2)
data[[i]][[j+1]]$Length = v2
finaldata[[i]] = data[[i]][[j+1]]$Length
}
}
However, now I can't get the sampled data to go to the next dataframe, ie. the line data[[i]][[j+1]]$Length = v2 seems to not be working. I can see v2 is getting generated and it looks to have the appropriate form and data stored...

Turning code into a function

just a quick question to anyone that might know,
i have the following code in R pricing an option:
X = 1.05
r = .85
n = 250
nsim = 2000
ctot = 0
for( i in 1:nsim){ # begining of loop
u1=rnorm(n,0,1)
u2=rnorm(n,0,1)
x=u1
y=r*u1+sqrt(1-r*r)*u2
x=0.25/sqrt(250)*x + (0.03-0.5*0.25*0.25)/250; y= 0.25/sqrt(250)*y + (0.03-0.5*0.25*0.25)/250
ShareA = 100*cumprod(exp(x))
ShareB = 100*cumprod(exp(y))
c = max(ShareA[n]-X*ShareB[n],0)
ctot=ctot+c
} # end of loop
c=ctot/nsim
c=c*exp(-0.03)
c
my question is how can i turn this into a function where i change the correalation, r, to anything i like?
hope this makes sense... essentially the issue is turning this code into a function.
thanks
all suggestions appreciated.
To turn this call option into a function that accepts a correlation as an argument:
callOptionEval<-function(r=0.85){
X = 1.05 ; n = 250; nsim = 2000; ctot = 0;
for( i in 1:nsim){ # begining of loop
u1=rnorm(n,0,1);
u2=rnorm(n,0,1);
x=u1;
y=r*u1+sqrt(1-r*r)*u2;
x=0.25/sqrt(250)*x + (0.03-0.5*0.25*0.25)/250;
y= 0.25/sqrt(250)*y + (0.03-0.5*0.25*0.25)/250;
ShareA = 100*cumprod(exp(x));
ShareB = 100*cumprod(exp(y));
c = max(ShareA[n]-X*ShareB[n],0);
ctot=ctot+c;
} # end of loop c=ctot/nsim
c=c*exp(-0.03);
return(c)
}
callOptionEval(0.85)# gives 0
callOptionEval(0.5)# gives 12.45512
Note this code doesn't do the useful stuff that functions should like check that the input is between $(-1, 1)$ etc. This is more of a convenience function for the current user.
mycorr <- function(x, r, n, nsim, ctot) {
Your remaining lines and loops here
}
Use it as
mycorr(X = 1.05, r = .85, n = 250, nsim = 2000, ctot = 0)

Implementation of SVM-RFE Algorithm in R

I'm using the R code for the implementation of SVM-RFE Algorithm from this source http://www.uccor.edu.ar/paginas/seminarios/Software/SVM_RFE_R_implementation.pdf but I made a small modification so that the r code uses the gnum library. The code is the following:
svmrfeFeatureRanking = function(x,y){
n = ncol(x)
survivingFeaturesIndexes = seq(1:n)
featureRankedList = vector(length=n)
rankedFeatureIndex = n
while(length(survivingFeaturesIndexes)>0){
#train the support vector machine
svmModel = SVM(x[, survivingFeaturesIndexes], y, C = 10, cache_size=500,kernel="linear" )
#compute ranking criteria
rankingCriteria = svmModel$w * svmModel$w
#rank the features
ranking = sort(rankingCriteria, index.return = TRUE)$ix
#update feature ranked list
featureRankedList[rankedFeatureIndex] = survivingFeaturesIndexes[ranking[1]]
rankedFeatureIndex = rankedFeatureIndex - 1
#eliminate the feature with smallest ranking criterion
(survivingFeaturesIndexes = survivingFeaturesIndexes[-ranking[1]])
}
return (featureRankedList)
}
That function receive a matrix as an input for x and a factor as an input for y. I use the function for some data , and I receive the following error message in the last iterations:
Error in if (nrow(x) != length(y)) { : argument is of length zero
Debugging the code, I got this:
3 SVM.default(x[, survivingFeaturesIndexes], y, C = 10, cache_size = 500,
kernel = "linear")
2 SVM(x[, survivingFeaturesIndexes], y, C = 10, cache_size = 500,
kernel = "linear")
1 svmrfeFeatureRanking(sdatx, ym)
So, what's the error of the function?
Looks like your matrix gets converted into a list when only one feature remains. Try this:
svmModel = SVM(as.matrix(x[, survivingFeaturesIndexes]), y, C = 10, cache_size=500,kernel="linear" )

Returning a Vector in R

Beginning R programmer here. I'm trying to run a function with the argument being the number of samples (user-defined) and the output being a vector of means of those samples.
Here is what I have so far, however, I only get one mean value returned. How do I alter the formula so I get a vector of the means that is variable on the number the user inputs?
Pop1 <- rnorm(500, mean = 0.5, sd = 0.2)
My_Func <- function(Samples) {
A <- sample(Pop1, size = 25, replace = TRUE)
for (i in 1:Samples) {
Means <- mean(A)
}
return(Means)
}
Using a for loop it can be like this. As #MrFlick mentioned, avoid assingning the loop to the same variable. Include it into the loop.
Pop1 <- rnorm(500, mean = 0.5, sd = 0.2)
My_Func <- function(Samples) {
Means = numeric(Samples)
for (i in 1:Samples) {
A <- sample(Pop1, size = 25, replace = TRUE)
Means[i] <- mean(A)
}
return(Means)
}

How to avoid for loop in this dataset?

For loops in R are extremely slow but I know no alternative way of how to achieve the following.
As shown in this screenshot:
What I want the output format to look like:
> gene_id tss_id x y
in which, x = isosub$q1_FPKM / iso.agg$q1_FPKM // (correspond gene_id)
y = isosub$q2_FPKM / iso.agg$q2_FPKM
Here is my code with the for loop:
length = length(isosub$gene_id)
tmp = data.frame(isosub$gene_id, isosub$q1_FPKM, isosub$q2_FPKM)
j = 1
denominator_q1 = iso.agg$q1_FPKM[j]
denominator_q2 = iso.agg$q2_FPKM[j]
gene_id = isosub$gene_id
tmpq1 = tmp$isosub.q1_FPKM
tmpq2 = tmp$isosub.q2_FPKM
isoq1 = iso.agg$q1_FPKM
isoq2 = iso.agg$q2_FPKM
o2_q1 = rep(0, length)
o2_q2 = rep(0, length)
i = 0
for (i in 1:length){
if (gene_id[i+1] == gene_id[i]){
o2_q1[i] = tmpq1[i] / denominator_q1
o2_q2[i] = tmpq2[i] / denominator_q2
}else{
o2_q1[i] = tmpq1[i] / denominator_q1
o2_q2[i] = tmpq2[i] / denominator_q2
j = j + 1
denominator_q1 = isoq1[j]
denominator_q2 = isoq2[j]
}
}
when length = 1000, system.time shows that:
> user system elapsed
> 55.74 0.00 56.45
And my actual length is even larger: 13751.
Do you want to do a merge?
outdf <- merge(isosub[c("gene_id", "tss_id", "q1_FPKM", "q2_FPKM")],
iso.agg[c("gene_id", "q1_FPKM", "q2_FPKM")],
by="gene_id",
suffix=c(".1", ".2"))
outdf$x <- outdf$q1_FPKM.1 / outdf$q1_FPKM.2
outdf$y <- outdf$q2_FPKM.1 / outdf$q2_FPKM.2
If you ended up here looking for ways to avoid or speed up loops, check out this answer:
Speed up the loop operation in R
It helped me with a similar problem I was having, and shows ways to keep necessary loops but increase performance dramatically.

Resources