How to use apply function instead of nested for loop? - r

I currently have a large matrix (3000x20) and want to compute an value using the values in first row and first column and a vector. My dataset (in excel) is like this (I use VBA code to create this excel):
SumRow = 0
SumCol = 0
RowInterval = 0.001
ColInterval = 0.01
For i = 2 To 3001
Cells(i, 1).Value = SumRow + RowInterval
SumPD = Cells(i, 1).Value
Next i
For j = 2 To 21
Cells(1, j).Value = SumCol + ColInterval
SumRho = Cells(1, j).Value
Next j
I am currently using the following R code to do the calculation
InputVector <- c(1,2,3,4,5,6,7,8,9,10)
Testing<-read.csv("InputFile.csv", header=FALSE)
for (m in (2:(3001)))
{ for (n in (2:21))
{ Sum = 0
Row = Testing(m,1)
Col = Testing(1,n)
for (p in (1:length(InputVector)))
{ Sum = Sum + sqrt((1-Col)/Col)*exp(Row) }
Testing[m,n] = Sum } }
write.csv(Testing, "TestingOutput.csv")
Basically it first puts a vector (x values) into a formula f(x) and I want to print the sum of f(x) on excel with different parameters listed in the first row and first column in the excel.
I run the above code and it works, but it takes very long time. I am new to Apply Function and may I know how I can use the apply function to speed up the calculation and do the same output as above?

Here is a three line R solution to your problem, including generating the data:
library(reshape2)
# generate the combinations to iterate over
vInput = seq(1, 10)
dfSeq = expand.grid(rowSeq = seq(from = 0, by = 0.001, length.out = 3000),
colSeq = seq(from = 0, by = 0.01, length.out = 20))
# generate the values
dfSeq = cbind.data.frame(result = mapply(function(row, col) {
length(vInput)*sqrt((1-col)/col)*exp(row)
}, dfSeq$rowSeq, dfSeq$colSeq), dfSeq)
# cast them in the shape required
dfSeqWide = dcast(dfSeq, rowSeq~colSeq, value.var = "result")

Related

Add argument in function with default value if omitted in R

I have a dataframe (data) in R and I have created a function that does the following:
If data contains zero values, then replace data with data +2 and then return datanew <- data + data^2. If data does not contains zeros then do datanew <- data + data^2.
I manage to do that as follows:
set.seed(123)
data <- as.data.frame(matrix(rbinom(10 * 5, 1, 0.5), ncol = 5, nrow = 10))
Myfunction <- function(data) {
if (any(data == 0, na.rm = TRUE)) {
data <- data + 2
} # 2 is the a value that i want to add in all elements
datanew <- data + data ^ 2
print(datanew)
}
Myfunction(data = data)
However, I want to define in the function the element a (function(data, a)) and if omitted then the default value will be 2, otherwise the value that has be given by the user. How can i do that R??

Use for-loop and if function to create a new vector?

I want to do the following operation with the code: I want to get a sample of n = 30 out of a given normal distribution and calculate the mean of each sample. (until this step my function works without any problem). After that I want to create a new vector with yes or no , dependent on if the mean is in a certain range or not. Sadly the code does notconduct this step. I always get a vector with 13 elements,but there should be 500. What is the problem? Where is my mistake?
o = 13
u = 7
d = c()
for (i in 1:500){
i = rnorm(30,mean = 10,sd = 6.04)
i = mean(i)
if (i <= o & i >=u) {
d[i]=("Yes")
} else {
d[i]=("No")
}
}
You should avoid changing the value of your iterator (i) within your loop. In your case, your i is becoming a non-integer value. When you try to index your d vector, it takes the integer portion of i.
Consider what happens when I have a vector
x <- 1:4
and I take the pi index of it.
x[pi]
# [1] 3
Your code should look more like this:
o = 13
u = 7
d = c()
for (i in 1:500){
sample_i = rnorm(30, mean = 10, sd = 6.04)
mean_i = mean(sample_i)
if (mean_i <= o & mean_i >=u) {
d[i]=("Yes")
} else {
d[i]=("No")
}
}
If you would like to improve your code some, here are some suggestions:
First, avoid "growing" your results. This has performance implications. It is better to decide how long your result (d) should be and set it to that length to begin with.
Next, try not to hard code the number of iterations into your loop. Get familiar with seq_along and seq_len and use them to count iterations for you.
o = 13
u = 7
d = numeric(500) # I made a change here
for (i in seq_along(d)){ # And I made a change here
sample_i = rnorm(30, mean = 10, sd = 6.04)
mean_i = mean(sample_i)
if (mean_i <= o & mean_i >=u) {
d[i]=("Yes")
} else {
d[i]=("No")
}
}
Re-assigning i looks like a bad idea to me.
Are you sure you want to do this in a for loop? If not, a vectorised solution with crossing (tidyverse - nice explanations at varianceexplained.org ) should work pretty nicely, I think?
o = 13
u = 7
crossing(trial = 1:500,
rounds = 1:30)%>%
mutate(num = rnorm(n(), mean = 10, sd = 6.04))%>%
group_by(trial)%>%
summarise(mean = mean(num))%>%
mutate(d = case_when(mean <= o & mean >= u ~ "Yes",
TRUE ~ "No"))%>%
count(d)

Use of tail() in out-of-sample prediction

Below you see an out of sample rolling window estimation I found here: (https://www.r-bloggers.com/2017/11/formal-ways-to-compare-forecasting-models-rolling-windows/)
Here is my question: I know the tail() function returns the last n rows of a dataset. But I don't understand its purpose when its used in the random walk in line 13 or when calculating the errors in line 17 and 18. Any help on clarifying this would be much appreciated.
# = Number of windows and window size
w_size = 300
n_windows = nrow(X) - 300
# = Rolling Window Loop = #
forecasts = foreach(i=1:n_windows, .combine = rbind) %do%{
# = Select data for the window (in and out-of-sample) = #
X_in = X[i:(w_size + i - 1), ] # = change to X[1:(w_size + i - 1), ] forxpanding window
X_out = X[w_size + i, ]
# = Regression Model = #
m1 = lm(infl0 ~ . - prodl0, data = X_in)
f1 = predict(m1, X_out)
# = Random Walk = #
f2 = tail(X_in$infl0, 1)
return(c(f1, f2))
}
# = Calculate and plot errors = #
e1 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,1]
e2 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,2]
Here the function tail is applied to a vector because you select only the "inf10" column. In this case tail return the last element of the selected column.
df <- data.frame(A = c(1,2), B = c(3,4))
df[,"A"] # will return c(1,2)
tail(df[,"A"], 1) # will return 2
tail(df$B, 1) # will return 4

Turning code into a function

just a quick question to anyone that might know,
i have the following code in R pricing an option:
X = 1.05
r = .85
n = 250
nsim = 2000
ctot = 0
for( i in 1:nsim){ # begining of loop
u1=rnorm(n,0,1)
u2=rnorm(n,0,1)
x=u1
y=r*u1+sqrt(1-r*r)*u2
x=0.25/sqrt(250)*x + (0.03-0.5*0.25*0.25)/250; y= 0.25/sqrt(250)*y + (0.03-0.5*0.25*0.25)/250
ShareA = 100*cumprod(exp(x))
ShareB = 100*cumprod(exp(y))
c = max(ShareA[n]-X*ShareB[n],0)
ctot=ctot+c
} # end of loop
c=ctot/nsim
c=c*exp(-0.03)
c
my question is how can i turn this into a function where i change the correalation, r, to anything i like?
hope this makes sense... essentially the issue is turning this code into a function.
thanks
all suggestions appreciated.
To turn this call option into a function that accepts a correlation as an argument:
callOptionEval<-function(r=0.85){
X = 1.05 ; n = 250; nsim = 2000; ctot = 0;
for( i in 1:nsim){ # begining of loop
u1=rnorm(n,0,1);
u2=rnorm(n,0,1);
x=u1;
y=r*u1+sqrt(1-r*r)*u2;
x=0.25/sqrt(250)*x + (0.03-0.5*0.25*0.25)/250;
y= 0.25/sqrt(250)*y + (0.03-0.5*0.25*0.25)/250;
ShareA = 100*cumprod(exp(x));
ShareB = 100*cumprod(exp(y));
c = max(ShareA[n]-X*ShareB[n],0);
ctot=ctot+c;
} # end of loop c=ctot/nsim
c=c*exp(-0.03);
return(c)
}
callOptionEval(0.85)# gives 0
callOptionEval(0.5)# gives 12.45512
Note this code doesn't do the useful stuff that functions should like check that the input is between $(-1, 1)$ etc. This is more of a convenience function for the current user.
mycorr <- function(x, r, n, nsim, ctot) {
Your remaining lines and loops here
}
Use it as
mycorr(X = 1.05, r = .85, n = 250, nsim = 2000, ctot = 0)

Implementation of SVM-RFE Algorithm in R

I'm using the R code for the implementation of SVM-RFE Algorithm from this source http://www.uccor.edu.ar/paginas/seminarios/Software/SVM_RFE_R_implementation.pdf but I made a small modification so that the r code uses the gnum library. The code is the following:
svmrfeFeatureRanking = function(x,y){
n = ncol(x)
survivingFeaturesIndexes = seq(1:n)
featureRankedList = vector(length=n)
rankedFeatureIndex = n
while(length(survivingFeaturesIndexes)>0){
#train the support vector machine
svmModel = SVM(x[, survivingFeaturesIndexes], y, C = 10, cache_size=500,kernel="linear" )
#compute ranking criteria
rankingCriteria = svmModel$w * svmModel$w
#rank the features
ranking = sort(rankingCriteria, index.return = TRUE)$ix
#update feature ranked list
featureRankedList[rankedFeatureIndex] = survivingFeaturesIndexes[ranking[1]]
rankedFeatureIndex = rankedFeatureIndex - 1
#eliminate the feature with smallest ranking criterion
(survivingFeaturesIndexes = survivingFeaturesIndexes[-ranking[1]])
}
return (featureRankedList)
}
That function receive a matrix as an input for x and a factor as an input for y. I use the function for some data , and I receive the following error message in the last iterations:
Error in if (nrow(x) != length(y)) { : argument is of length zero
Debugging the code, I got this:
3 SVM.default(x[, survivingFeaturesIndexes], y, C = 10, cache_size = 500,
kernel = "linear")
2 SVM(x[, survivingFeaturesIndexes], y, C = 10, cache_size = 500,
kernel = "linear")
1 svmrfeFeatureRanking(sdatx, ym)
So, what's the error of the function?
Looks like your matrix gets converted into a list when only one feature remains. Try this:
svmModel = SVM(as.matrix(x[, survivingFeaturesIndexes]), y, C = 10, cache_size=500,kernel="linear" )

Resources