I am trying to construct a for loop that solves for a list of variables in a data set.
However, once I run it, the code only seems to calculate the first variable.
for (t in 1:nTest){
m = DataTest$Item[t]
u = DataTest$User[t]
Sorted = sort(ItemSim[,m], decreasing = T,index.return=TRUE)
MostSim = Sorted$ix[1:N]
Wgts = ItemSim[MostSim,m]
DataTest$CosineItem = (t(Wgts) %*% UM.Item[MostSim,u])/sum(Wgts)
DataTest$CosineItem}
DataTest$CosineItem
Any help with this issue is greatly appreciated.
The CosineItem column is getting updated/replacing from each loop. If we are updating a particular row, then use the index
for (t in 1:nTest){
m <- DataTest$Item[t]
u <- DataTest$User[t]
Sorted <- sort(ItemSim[,m], decreasing = T,index.return=TRUE)
MostSim <- Sorted$ix[1:N]
Wgts <- ItemSim[MostSim,m]
DataTest$CosineItem[t] <- (t(Wgts) %*% UM.Item[MostSim,u])/sum(Wgts)
}
Related
Hey everyone, I have a large Matrix X with the dimensions (654x7095). I wanted to subset this matrix and replace the values of this subsetted matrix of X with another matrix which I have created. The R-code is as follows -
install.packages("Matrix")
install.packages("base")
library(Matrix)
library(base)
T = 215
n = 3
k = 33
X = matrix(0,T*n,T*k)
IN = diag(n)
K1 = Matrix(0, n*n, n*(n-1)/2, sparse = TRUE)
for(i in 1:(n-1)){
K1[(2+(i-1)*(n+1)):(i*n), (1+(i-1)*(n-i/2)):(i*(n-i)*(i+1)/2)] <- diag(n-i)
}
yin = matrix(rnorm(645), ncol = 3)
Xu = matrix(rnorm(2150), ncol = 10)
#Till yet I have defined the variables and matrices which will be used in subsetting.
Above codes are perfectly fine, however, the code below is showing error -
#Loop for X subsetting
for(i in 1:T){
X[(((i-1)*n)+1):(i*n), (((i-1)*k)+1):(i*k)] <- cbind( (t(kronecker(yin[i,],IN))%*%K1) , (t(kronecker(Xu[i,],IN))))
}
# in this Kronecker() finds the Kronecker tensor product of two Matrix A and B. This function can be used with the help of "base" library.
When I am running this above code, the error which is showing is -
Error in X[(((i - 1) * n) + 1):(i * n), ] <- cbind((t(kronecker(yin[i, :
number of items to replace is not a multiple of replacement length
However, when I am running this same command in MATLAB it is working perfectly fine. MATLAB CODE -
X = zeros(T*n,T*k);
for i = 1:T
X((i-1)*n+1:i*n,(i-1)*k+1:i*k) = [kron(yin(i,:),IN)*K1, kron(Xu(i,:),IN)];
end
The output which MATLAB is giving is that it fills up the values in number of rows and columns which is defined in the Loop for subsetting the X. I have attached the snapshot of the desired output which MATLAB is giving. However, error is showing in R for the same.
Can someone enlighten me as where I am going wrong with the R code?
I appreciate the help, Many thanks.
I think the problem is how the class 'dgeMatrix' is handled. Try
for (i in 1:T) {
X[(((i-1)*n)+1):(i*n), (((i-1)*k)+1):(i*k)] <- as.matrix(cbind((t(kronecker(yin[i,],IN))%*%K1) , (t(kronecker(Xu[i,],IN)))))
}
I've run into a little problem, simulating the throw of dice. Basically im doing this to get familiar with loops and their output.
Intention is to simulate the throw of two dice as follows:
R = 100
d6 = c(1:6)
d = 60
DICE = NULL
for (i in 1:R)
{
i <- as.factor((sample(d6, size=d, replace = T)) + (sample(d6, size=d, replace = T)))
j <- summary(i)
DICE = rbind(DICE, j)
}
head(DICE)
HIS = colMeans(DICE)
boxplot(DICE)
title(main= "Result 2d6", ylab= "Throws", xlab="")
relHIS = (HIS / sum(HIS))*100
relHIS
Problems occur if the result in one cathegorie is 0 (result did not occur in the sample). If this happens randomly in the first subsample one or more the categories (numbers 2-12) are missing. This causes problems ("number of columns of result is not a multiple of vector length (arg 2)") in the following subsamples.
Im sure there is a really simple solution for this, by defining everything beforehand...
Thanks for your help!
Here are some fixes:
R = 100
d6 = c(1:6)
d = 60
DICE = matrix(nrow = R, ncol = 11) #pre-allocate
colnames(DICE) <- 2:12
for (i in 1:R)
{
sim <- ordered((sample(d6, size=d, replace = T)) + (sample(d6, size=d, replace = T)),
levels = 2:12) #define the factor levels
sumsim <- table(sim)
DICE[i,] <- sumsim #sub-assign
}
head(DICE)
HIS = colMeans(DICE)
boxplot(DICE)
title(main= "Result 2d6", ylab= "Throws", xlab="")
prop.table(HIS) * 100
Always pre-allocate your result data structure. Growing it in a loop is terribly slow and you know how big it needs to be. Also, don't use the same symbol for the iteration variable and something else.
Omit as.factor()in your seventh row
I have code that successfully calculates VaR based on Extreme Value Theory using historical data. I'm trying to run this same code on multiple simulated price paths (i.e. calculating a VaR for each path) and then taking the median or average of those VaRs.
Every example I could find online had the simulation function return the price at the end of the period and then they replicated the function X many time. That makes sense to me, except that I essentially need to calculate value-at-risk for each simulated path. Below is the code I have so far. I can say that the code works when using historical data (i.e. the "evt" function works fine and the datatable is populated correctly when the lossOnly, u, and evtVar lines aren't in a function). However, I've been trying to implement simulation in the second function and trying various combinations, which have all failed.
library('RODBC')
library('nor1mix')
library('fExtremes')
library('QRM')
library('fGarch')
#function for computing the EVT VaR
evt <- function(data,u){
#fit excess returns to gpd to get estimates
gpdfit = tryCatch({
gpdfit <- gpdFit(data,u,type="mle")
}, warning = function(w) {
gpdfit <- gpdFit(data,u,type="mle",optfunc="nlminb")
return(gpdfit)
}, error = function(e) {
gpdfit <- gpdFit(data,u,type="pwm",optfunc="nlminb")
return(gpdfit)
}, finally = {})
#now calculate VaRs
xi <- gpdfit#fit$par.ests["xi"]
beta <- gpdfit#fit$par.ests["beta"]
Nu <- length(gpdfit#data$exceedances)
n <- length(data)
evtVar95 <- (u+((beta/xi)*(((n/Nu)*.05)^(-xi) - 1.)))*100
evtVar99 <- (u+((beta/xi)*(((n/Nu)*.01)^(-xi) - 1.)))*100
evtVar997 <- (u+((beta/xi)*(((n/Nu)*.003)^(-xi) - 1.)))*100
evtVar999 <- (u+((beta/xi)*(((n/Nu)*.001)^(-xi) - 1.)))*100
#return calculations
return(cbind(evtVar95,evtVar99,evtVar997,evtVar999,u,xi,beta,Nu,n))
}
#data <- read.table("pricedata.txt")
prices <- data$V1
returns <- diff(log(prices)) #or returns <- log(prices[-1]/prices[-n])
xi <- mean(returns)
std <- sd(returns)
N <- length(prices)
lstval <- prices[N]
options(scipen = 999)
p <- c(lstval, rep(NA, N-1))
gen.path <- function(){
N <- length(prices)
for(i in 2:N)
p[i] <- p[i-1] * exp(rnorm(1, xi, std))
# plot(p, type = "l", col = "brown", main = "Simulated Price")
#evt calculation
#first get only the losses and then make them absolute
lossOnly <- abs(p[p<0])
#get threshold
u <- quantile(lossOnly, probs = 0.9, names=FALSE)
evtVar <- evt(lossOnly,u)
return(evtVar)
}
runs <- 10
sim.evtVar <- replicate(runs, gen.path())
evtVar <- mean(sim.evtVar)
#add data to total table
VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
In short, I'm trying to run the value-at-risk function (first function) within the monte carlo function (second function) and trying to put the average simulated values into a data tables. I know the first function works, but it's the second function that's driving me crazy. There are the errors I'm getting:
> sim.evtVar <- replicate(runs, gen.path())
Error in if (xi > 0.5) { : missing value where TRUE/FALSE needed
Called from: .gpdpwmFit(x, u)
Browse[1]> evtVar <- mean(sim.evtVar)
Error during wrapup: object 'sim.evtVar' not found
Browse[1]>
> #add data to total table
> VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
Error: object 'evtVar' not found
> DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
Any help you can provide is greatly appreciated! Thank you in advance!
I think the Problem is this row:
lstval <- prices[N]
because if you take a stock price, that can't ever be negative, you produce an empty vector at this row in your function:
lossOnly <- abs(p[p<0])
you should try instead:
lstval <- min(returns)
if you want the highest negative return of your dataset
Thanks for any help in advance. I have a dataset with correlation values in a column called 'exit' and corresponding sample sizes (n) in a column called 'samplesize' in a data frame called 'dataset'.
My task is to create an R script to populate two full columns (CIleft and CIright) with the confidence interval outputs using the CIr function within the "psychometric" package for each row of data. This CIr function operates as follows, outputting the left and right confidence interval values:
CIr(r = .9, n = 100, level = .95)
[1] 0.8546667 0.9317133
Below is my unsuccessful script.
CI <- function(x)
{
require(psychometric)
library(psychometric)
r <- x["dataset$exit"];
n <- x["dataset$samplesize"];
results <- CIr(r, n, level = .95);
x["dataset$CIleft"] <- results[1];
x["dataset$CIright"] <- results[2];
}
One complication (which I believe may be relevant) is that test runs of "CI(x)" in the console produce the following errors:
// Error in CIz(z, n, level) : (list) object cannot be coerced to type 'double'
Then entering dataset2 <- as.matrix(dataset) and trying CI(x) again yields:
Error in dataset2$exit : $ operator is invalid for atomic vectors
And for
dataset3 <- lapply(dataset$exit, as.numeric)
dataset4 <- lapply(dataset$samplesize, as.numeric)
trying CI(x) again yields:
Error in 1 + x : non-numeric argument to binary operator //
Can anyone assist in helping me populate each row of my data frame with the appropriate output for CIleft and CIright, given that r = 'exit', and n = 'samplesize'?
I don't think you need a function.
library("psychometric")
dataset$lwr = NULL
dataset$upr = NULL
for (row in 1:nrow(dataset)){
dataset[["lwr"]][row] <- CIr(r = dataset[["exit"]][row], n = dataset[["samplesize"]][row], level = .95)[1]
dataset[["upr"]][row] <- CIr(r = dataset[["exit"]][row], n = dataset[["samplesize"]][row], level = .95)[2]
}
I will note though that it's generally advisable to avoid for loops in R because of its architecture (i.e., they're slow). Perhaps someone else can provide a solution with something else, e.g., apply. However, if you only have a small dataframe, the speed cost of using a for loop is unlikely to be noticeable.
Test Data:
set.seed(55); m = rnorm(26, 20, 40); dataset = data.frame( exit = seq(0, 1, 0.04), samplesize = abs(round(m)))
dataset$samplesize[dataset$samplesize == 0] = 5
dataset$exit[dataset$exit == 1] = 0.99
Here is the formula which I am trying to calculate in R.
So far, this is my approach using a simplified example
t <- seq(1, 2, 0.1)
expk <- function(k){exp(-2*pi*1i*t*k)}
set.seed(123)
dat <- ts(rnorm(100), start = c(1994,3), frequency = 12)
arfit <- ar(dat, order = 4, aic = FALSE) # represent \phi in the formula
tmp1 <- numeric(4)
for (i in seq_along(arfit$ar)){
ek <- expk(i)
arphi <- arfit$ar[i]
tmp1[i] <- ek * arphi
}
tmp2 <- sum(tmp1)
denom = abs(1-tmp2)^2
s2 <- t/denom
Error : Warning message:
In tmp1[i] <- ek * arphi :
number of items to replace is not a multiple of replacement length
I was trying to avoid using for loop and tried using sapply as in solutions to this question.
denom2 <- abs(1- sapply(seq_along(arfit$ar), function(x)sum(arfit$ar[x]*expf(x))))^2
but doesnt seem to be correct. The problem is to do the sum of the series(over index k) when it is taking values from another vector as well, in this case, t which is in the numerator.
Any solutions ?
Any suggestion for a test dataset, maybe using 0 and 1 to check if the calculation is done correctly in this loop here ?
Typing up the answer determined in chat. Here's a solution involving vapply.
First correct expk to:
expk <- function(k){sum(exp(-2*pi*1i*t*k))}
Then you can create this function and vapply it:
myFun <- function(i) return(expk(i) * arfit$ar[i])
tmp2 <- sum(vapply(seq_along(arfit$ar), myFun, complex(1)))