Turning code into a function - r

just a quick question to anyone that might know,
i have the following code in R pricing an option:
X = 1.05
r = .85
n = 250
nsim = 2000
ctot = 0
for( i in 1:nsim){ # begining of loop
u1=rnorm(n,0,1)
u2=rnorm(n,0,1)
x=u1
y=r*u1+sqrt(1-r*r)*u2
x=0.25/sqrt(250)*x + (0.03-0.5*0.25*0.25)/250; y= 0.25/sqrt(250)*y + (0.03-0.5*0.25*0.25)/250
ShareA = 100*cumprod(exp(x))
ShareB = 100*cumprod(exp(y))
c = max(ShareA[n]-X*ShareB[n],0)
ctot=ctot+c
} # end of loop
c=ctot/nsim
c=c*exp(-0.03)
c
my question is how can i turn this into a function where i change the correalation, r, to anything i like?
hope this makes sense... essentially the issue is turning this code into a function.
thanks
all suggestions appreciated.

To turn this call option into a function that accepts a correlation as an argument:
callOptionEval<-function(r=0.85){
X = 1.05 ; n = 250; nsim = 2000; ctot = 0;
for( i in 1:nsim){ # begining of loop
u1=rnorm(n,0,1);
u2=rnorm(n,0,1);
x=u1;
y=r*u1+sqrt(1-r*r)*u2;
x=0.25/sqrt(250)*x + (0.03-0.5*0.25*0.25)/250;
y= 0.25/sqrt(250)*y + (0.03-0.5*0.25*0.25)/250;
ShareA = 100*cumprod(exp(x));
ShareB = 100*cumprod(exp(y));
c = max(ShareA[n]-X*ShareB[n],0);
ctot=ctot+c;
} # end of loop c=ctot/nsim
c=c*exp(-0.03);
return(c)
}
callOptionEval(0.85)# gives 0
callOptionEval(0.5)# gives 12.45512
Note this code doesn't do the useful stuff that functions should like check that the input is between $(-1, 1)$ etc. This is more of a convenience function for the current user.

mycorr <- function(x, r, n, nsim, ctot) {
Your remaining lines and loops here
}
Use it as
mycorr(X = 1.05, r = .85, n = 250, nsim = 2000, ctot = 0)

Related

Use for-loop and if function to create a new vector?

I want to do the following operation with the code: I want to get a sample of n = 30 out of a given normal distribution and calculate the mean of each sample. (until this step my function works without any problem). After that I want to create a new vector with yes or no , dependent on if the mean is in a certain range or not. Sadly the code does notconduct this step. I always get a vector with 13 elements,but there should be 500. What is the problem? Where is my mistake?
o = 13
u = 7
d = c()
for (i in 1:500){
i = rnorm(30,mean = 10,sd = 6.04)
i = mean(i)
if (i <= o & i >=u) {
d[i]=("Yes")
} else {
d[i]=("No")
}
}
You should avoid changing the value of your iterator (i) within your loop. In your case, your i is becoming a non-integer value. When you try to index your d vector, it takes the integer portion of i.
Consider what happens when I have a vector
x <- 1:4
and I take the pi index of it.
x[pi]
# [1] 3
Your code should look more like this:
o = 13
u = 7
d = c()
for (i in 1:500){
sample_i = rnorm(30, mean = 10, sd = 6.04)
mean_i = mean(sample_i)
if (mean_i <= o & mean_i >=u) {
d[i]=("Yes")
} else {
d[i]=("No")
}
}
If you would like to improve your code some, here are some suggestions:
First, avoid "growing" your results. This has performance implications. It is better to decide how long your result (d) should be and set it to that length to begin with.
Next, try not to hard code the number of iterations into your loop. Get familiar with seq_along and seq_len and use them to count iterations for you.
o = 13
u = 7
d = numeric(500) # I made a change here
for (i in seq_along(d)){ # And I made a change here
sample_i = rnorm(30, mean = 10, sd = 6.04)
mean_i = mean(sample_i)
if (mean_i <= o & mean_i >=u) {
d[i]=("Yes")
} else {
d[i]=("No")
}
}
Re-assigning i looks like a bad idea to me.
Are you sure you want to do this in a for loop? If not, a vectorised solution with crossing (tidyverse - nice explanations at varianceexplained.org ) should work pretty nicely, I think?
o = 13
u = 7
crossing(trial = 1:500,
rounds = 1:30)%>%
mutate(num = rnorm(n(), mean = 10, sd = 6.04))%>%
group_by(trial)%>%
summarise(mean = mean(num))%>%
mutate(d = case_when(mean <= o & mean >= u ~ "Yes",
TRUE ~ "No"))%>%
count(d)

How to speed up for loop in R?

rm(list=ls())
library(gld)
library(GLDEX)
find_likelihood = function(x){
fit = fit.fkml(x,"ML",return.data=TRUE)
params = fit$optim.results$par
dens = dgl(x, c(params), param = "fmkl", inverse.eps = 1e-08, max.iterations = 500)
dens[dens < 1e-12] = 1e-12
value = -sum(log(dens))
return(value)
}
loglik_com = NULL
min_value = NULL
for(j in 1: 1000){
x = rgl(100,c(2, 1, 1, 1), param="fkml")
for (i in 2:(100 - 2)) {
a = x[1:i]
b = x[(i + 1):100]
loglik_com[i] = find_likelihood(a) + find_likelihood(b)
}
min_value[j] = min(na.omit(loglik_com))
}
min_value
In my above R function, I have 1000 iterations but it takes nearly 7 hours to get results. My question is, am I able to use apply() function to speed up the results? Or any other way that I can use to speed up my r function?
Thank you in advance.
You could try some sort of parallelisation, I see there is the Parallel package? Have not had any personal experience however.
https://www.rdocumentation.org/packages/parallel/versions/3.6.2
https://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/parallel.html

How to write down the number of the repeated data in R function

Suppose I have repeated simulated data (100 times). Then, suppose that I would like to apply one function on each of these data. Since my data is repeated (sometimes 1000 times) I would like to know at which data my code working at this moment. That is, I would like my code to show the number of each data it is working on it. For example, when my code start with first data, then I would like it to let me know this is the first data. Then, the same for the second data and so on. I know that I will get the number of my data in console as a list. However, my function is much more complicated. This is just a simple example to explain my problem. I would like my code to let me know at which data it is working now.
This is my code:
N.a = 186; N.b = 38; N.ab=13; N.o = 284
## 1) numerical optimization
llk = function(xpar){
tmp = exp(c(xpar,0))
pr = tmp/sum(tmp) ## A/B/O
res1 = N.a*log(pr[1]^2+2*pr[1]*pr[3]) + N.b*log(pr[2]^2+2*pr[2]*pr[3])
res2 = N.ab*log(2*pr[1]*pr[2]) + N.o*log(pr[3]^2)
-res1-res2
}
pr = rep(1/3,3) ## A/B/O
it = 0; pdiff = 1
while( (it<100)&(pdiff>1e-5) ){
tmp = c(pr[1]^2, 2*pr[1]*pr[3])
tmp = tmp/sum(tmp)
N.aa = N.a*tmp[1]
N.ao = N.a*tmp[2]
tmp = c(pr[2]^2, 2*pr[2]*pr[3])
tmp = tmp/sum(tmp)
N.bb = N.b*tmp[1]
N.bb = N.b*tmp[1]
N.bo = N.b*tmp[2]
pr1 = c(2*N.aa+N.ao+N.ab, 2*N.bb+N.bo+N.ab, N.ao+N.bo+2*N.o)
pr1 = pr1/sum(pr1)
pdiff = mean(abs(pr1-pr))
it = it+1
pr = pr1
cat(it, pr, "\n")
}
How I can use cat function. For example, how to use this in my code:
cat(paste0("data: ", i, "\n"))

How to use apply function instead of nested for loop?

I currently have a large matrix (3000x20) and want to compute an value using the values in first row and first column and a vector. My dataset (in excel) is like this (I use VBA code to create this excel):
SumRow = 0
SumCol = 0
RowInterval = 0.001
ColInterval = 0.01
For i = 2 To 3001
Cells(i, 1).Value = SumRow + RowInterval
SumPD = Cells(i, 1).Value
Next i
For j = 2 To 21
Cells(1, j).Value = SumCol + ColInterval
SumRho = Cells(1, j).Value
Next j
I am currently using the following R code to do the calculation
InputVector <- c(1,2,3,4,5,6,7,8,9,10)
Testing<-read.csv("InputFile.csv", header=FALSE)
for (m in (2:(3001)))
{ for (n in (2:21))
{ Sum = 0
Row = Testing(m,1)
Col = Testing(1,n)
for (p in (1:length(InputVector)))
{ Sum = Sum + sqrt((1-Col)/Col)*exp(Row) }
Testing[m,n] = Sum } }
write.csv(Testing, "TestingOutput.csv")
Basically it first puts a vector (x values) into a formula f(x) and I want to print the sum of f(x) on excel with different parameters listed in the first row and first column in the excel.
I run the above code and it works, but it takes very long time. I am new to Apply Function and may I know how I can use the apply function to speed up the calculation and do the same output as above?
Here is a three line R solution to your problem, including generating the data:
library(reshape2)
# generate the combinations to iterate over
vInput = seq(1, 10)
dfSeq = expand.grid(rowSeq = seq(from = 0, by = 0.001, length.out = 3000),
colSeq = seq(from = 0, by = 0.01, length.out = 20))
# generate the values
dfSeq = cbind.data.frame(result = mapply(function(row, col) {
length(vInput)*sqrt((1-col)/col)*exp(row)
}, dfSeq$rowSeq, dfSeq$colSeq), dfSeq)
# cast them in the shape required
dfSeqWide = dcast(dfSeq, rowSeq~colSeq, value.var = "result")

How to avoid for loop in this dataset?

For loops in R are extremely slow but I know no alternative way of how to achieve the following.
As shown in this screenshot:
What I want the output format to look like:
> gene_id tss_id x y
in which, x = isosub$q1_FPKM / iso.agg$q1_FPKM // (correspond gene_id)
y = isosub$q2_FPKM / iso.agg$q2_FPKM
Here is my code with the for loop:
length = length(isosub$gene_id)
tmp = data.frame(isosub$gene_id, isosub$q1_FPKM, isosub$q2_FPKM)
j = 1
denominator_q1 = iso.agg$q1_FPKM[j]
denominator_q2 = iso.agg$q2_FPKM[j]
gene_id = isosub$gene_id
tmpq1 = tmp$isosub.q1_FPKM
tmpq2 = tmp$isosub.q2_FPKM
isoq1 = iso.agg$q1_FPKM
isoq2 = iso.agg$q2_FPKM
o2_q1 = rep(0, length)
o2_q2 = rep(0, length)
i = 0
for (i in 1:length){
if (gene_id[i+1] == gene_id[i]){
o2_q1[i] = tmpq1[i] / denominator_q1
o2_q2[i] = tmpq2[i] / denominator_q2
}else{
o2_q1[i] = tmpq1[i] / denominator_q1
o2_q2[i] = tmpq2[i] / denominator_q2
j = j + 1
denominator_q1 = isoq1[j]
denominator_q2 = isoq2[j]
}
}
when length = 1000, system.time shows that:
> user system elapsed
> 55.74 0.00 56.45
And my actual length is even larger: 13751.
Do you want to do a merge?
outdf <- merge(isosub[c("gene_id", "tss_id", "q1_FPKM", "q2_FPKM")],
iso.agg[c("gene_id", "q1_FPKM", "q2_FPKM")],
by="gene_id",
suffix=c(".1", ".2"))
outdf$x <- outdf$q1_FPKM.1 / outdf$q1_FPKM.2
outdf$y <- outdf$q2_FPKM.1 / outdf$q2_FPKM.2
If you ended up here looking for ways to avoid or speed up loops, check out this answer:
Speed up the loop operation in R
It helped me with a similar problem I was having, and shows ways to keep necessary loops but increase performance dramatically.

Resources