Calculating Value at Risk with performanceanalytics package - r

I tried to calculate the Value at Risk for a list auf Stock Returns. There are 1000 observations, but i wanted to calculate like the following:
VaR for observation:
1 to 500
2 to 501
3 to 502
4 to 503
and 500 to 999
as you can see the result would be 500 calculations.
To solve the problem I tried to use a if condition with a for loop.
like this:
if(x < 501 & y < 1000){for(i in KO.Returns){VaR(KO.Returns[x: y], p = 0.95, method = "historical")}}
If I use the mentioned code I get the following error code:
VaR calculation produces unreliable result (inverse risk) for column
1:

I think the problem is in your data. When you specify your window, the calculation of historical VaR sorts the data and picks out 95th percentile. Sometimes your data will not have a negative value in that percentile, thus historical VaR is meaningless (your losses cannot be a positive value, loss is always negative). Hence the error.
I have been trying to reproduce similar errors using the following code:
library(PerformanceAnalytics)
data("edhec")
data = edhec[, 5]
valat = rollapply(data = data, width = 20,
FUN = function(x) VaR(x, p = 0.95, method = "historical"),
by.column = TRUE)
valat
But when I change the confidence level to p = 0.99, I stop getting the error. So, maybe you can try to change your confidence level and see.
Also see this and this.

Related

In Rjags/runjags, what causes the "node inconsistent with parents" error when using dinterval?

I have wracked my brain trying to come up with a solution to this problem and I'm at wits end! First, the necessary context: Aquatic plants in lakes are sampled with rakes. You throw a rake out into the lake, you pull it back into your boat, and you figure out what plants are on its tines. In our case, we measure both presence/absence as well as "abundance," but in an ordinal/interval-censored way --> it's 0 if species X isn't noticed on the rake at all, 1 if it covers < 25% of the rake's tines, 2 if it covers between 25 and 75%, and 3 if it covers > 75%. However, it's fairly easy to miss a species entirely when it's in low abundance, so 0s are sketchy--they may not represent true absences, and that is really the issue our model is trying to explore.
So, there are really three layers here--a true, fully latent abundance that we don't observe directly at all, a partially latent "true presence/absence" in that we know where true presences are but not where true absences are, and then we have our observed presence/absence data. What's more interesting is that we think some environmental variables may affect both true abundance and true occurrence but differently, and then other variables may affect detectability, and it's those processes we're trying to tease apart.
So, anyhow, my actual model is much larger and more complicated than what I've pasted below, but here is a sort of functional (but probably academically meritless) training version of it that replicates the error I am getting.
#data setup
N = 1500 #Number of cases
obs = sample(c(0,1,2,3), N,
replace=T, prob=c(0.7, 0.2, 0.075, 0.025)) #Our observed, interval-censored data.
X1 = rnorm(N) #Some covariate that probably affects both occurrance and abundance but maybe in different ways.
abundances = rep(NA, times = N) #Abundance is a latent variable we don't directly observe. From elsewhere, I know the values here need to be NAs so the model will know to impute them
occur = rep(1, times = N) #Occurance is a degraded form of our abundance data.
#d will be the initials for the abundance data, since this is apparently needed to jumpstart the imputation.
d = vector()
for(o in 1:N) {
if (obs[o]==0) { d[o] = 0.025; occur[o] = 0 }
if (obs[o]==1) { d[o] = 0.15 }
if (obs[o]==2) { d[o] = 0.5 }
if (obs[o]==3) { d[o] = 0.875 }
}
#Data
test.data = list("N" = N,
"obs" = obs,
"X1" = X1,
"abund" = abundances,
"lim" = c(0.05, 0.25, 0.75, 0.9999),
"occur" = occur)
#Inits
inits = list(abund = d)
cat("model
{
for (i in 1:N) {
obs[i] ~ dinterval(abund[i], lim)
abund[i] ~ dbeta(theta[i], rho[i]) T(0.0001, 0.9999)
theta[i] <- mu[i] * epsilon
rho[i] <- epsilon * (1-mu[i])
logit(mu[i]) <- alpha1 + X.beta1 * X1[i]
occur[i] ~ dbern(phi[i])
logit(phi[i]) <- alpha2 + X.beta2 * X1[i]
}
#Priors
epsilon ~ dnorm(5, 0.1) T(0.01, 10)
alpha1 ~ dnorm(0, 0.01)
X.beta1 ~ dnorm(0, 0.01)
alpha2 ~ dnorm(0, 0.01)
X.beta2 ~ dnorm(0, 0.01)
}
", file = "training.txt")
test.run = jags.model(file = "training.txt", inits = inits, data=test.data, n.chains = 3)
params = c("epsilon",
"alpha1",
"alpha2",
"X.beta1",
"X.beta2")
run1 = run.jags("training.txt", data = test.data, n.chains=3, burnin = 1000, sample = 5000, adapt = 4000, thin = 2,
monitor = c(params), method="parallel", modules = 'glm')
At the end, I get this error, and I always get this error any time I try to do something even remotely like this:
Graph information: Observed stochastic nodes: 3000 Unobserved
stochastic nodes: 1505 Total graph size: 19519 . Reading
parameter file inits1.txt. Initializing model Error in node obs1
Node inconsistent with parents
I've read every posting that covers this error I can find, including this one, this one, this one, and this one. I can surmise from my research and testing that the error is probably occurring for one of the following reasons.
My initials for the latent abundance variable are not adequate somehow. It sounds like this requires pretty useful initial values to work.
One or more of my priors is allowing values that are not permissible OR they are too broad and that's causing problems somehow. This might be especially an issue because of the beta distribution I am using which has strong requirements about not having values outside of 0 and 1.
I am using the dinterval() function incorrectly, which seems likely because it is always the line containing it that trips the error.
My model is somehow mis-specified.
But I can't see where I might be going wrong--I have tried a number of different options for 1 and 2, and so far as I can tell from the documentation (see pages 55-56), I am using dinterval correctly. What am I missing??
In case it's relevant, from what I have gathered, the idea of dinterval() is that the variable on the left of the ~ is the interval-censored version of the variable given in the first argument (here, abundance). Then, the second argument (here, lim) is a vector of "breakpoints" that dictate which intervals the abundance data end up in. So, here, you end up with an observed abundance code of 0 if you are lower than the lowest lim (here, 0.05), 1 if you are in between the first two values in lim, etc. It's like the abundance variable is being pushed through a "binning sieve" created by the lim variable to produce a binned output variable, our observed abundances.
Any guidance would be most welcome!!
I have run your example with JAGS 4.3.0 and rjags 4-12. For me, the version with rjags runs correctly. The version with runjags does not work because you have not provided intial values. This is easily fixed by adding the argument
inits=list(inits, inits, inits)
to the call to run.jags().
You have correctly understood the purpose of dinterval. This is an "observable function" which imposes constraints on its parameters via a likelihood. When using dinterval you must always provide initial values that satisfy the constraints from the fist iteration. As far as I can see, your initial values do satisfy the constraints and this is verified by the fact that I can run your example (with initial values).

R function loglik() returning -inf?

Simulating an SIR model in R. I have a data set I am trying to plot accurately with the model. I am right now using the particle filter function, then would like to use the corresponding logLik method on the result. When I do this, I get "[1] -Inf" as a result. I can't find in the documentation why this is and how I can avoid it. Are my parameters for the model not accurate enough? Is there something else wrong?
My function looks like this:
SIRsim %>%
pfilter(Np=5000) -> pf
logLik(pf)
From an online course lesson entitled Likelihood for POMPS https://kingaa.github.io/sbied/pfilter/ , this is the R script for the lesson. However, the code works here... I'm not sure how to reproduce my specific problem with it and unfortunately cannot share the dataset or code I am using because it is for academic research.
library(tidyverse)
library(pomp)
options(stringsAsFactors=FALSE)
stopifnot(packageVersion("pomp")>="3.0")
set.seed(1350254336)
library(tidyverse)
library(pomp)
sir_step <- Csnippet("
double dN_SI = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
S -= dN_SI;
I += dN_SI - dN_IR;
R += dN_IR;
H += dN_IR;
")
sir_init <- Csnippet("
S = nearbyint(eta*N);
I = 1;
R = nearbyint((1-eta)*N);
H = 0;
")
dmeas <- Csnippet("
lik = dbinom(reports,H,rho,give_log);
")
rmeas <- Csnippet("
reports = rbinom(H,rho);
")
read_csv("https://kingaa.github.io/sbied/pfilter/Measles_Consett_1948.csv")
%>%
select(week,reports=cases) %>%
filter(week<=42) %>%
pomp(
times="week",t0=0,
rprocess=euler(sir_step,delta.t=1/7),
rinit=sir_init,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
statenames=c("S","I","R","H"),
paramnames=c("Beta","mu_IR","eta","rho","N"),
params=c(Beta=15,mu_IR=0.5,rho=0.5,eta=0.06,N=38000)
) -> measSIR
measSIR %>%
pfilter(Np=5000) -> pf
logLik(pf)
library(doParallel)
library(doRNG)
registerDoParallel()
registerDoRNG(652643293)
foreach (i=1:10, .combine=c) %dopar% {
measSIR %>% pfilter(Np=5000)
} -> pf
logLik(pf) -> ll
logmeanexp(ll,se=TRUE)
If I set Beta=100 in the code above I can get a negative-infinite log-likelihood.
Replacing the measurement-error snippet with this:
dmeas <- Csnippet("
double ll = dbinom(reports,H,rho,give_log);
lik = (!isfinite(ll) ? -1000 : ll );
")
appears to 'solve' the problem, although you should be a little bit careful; papering over numerical cracks like this is sometimes OK, but could conceivably come back to bite you in some way later on. If you just need to avoid non-finite values long enough to get into a reasonable parameter range this might be OK ...
Some guesses as to why this is happening:
you are somehow getting an "impossible" situation like a positive number of reported cases when the underlying true number of infections is zero.
Sometimes non-finite log-likelihoods occur when a very small positive probability underflows to zero. The equivalent here is likely that the probability of infection 1-exp(-Beta*I/N*dt) goes to 1.0; then any observed outcome where less than 100% of the population is infected is impossible.
You can try to diagnose the situation by seeing what the filtered trajectory actually looks like and comparing it with the data, or by adding debugging statements to the code. If there's a way to run just the deterministic simulation with your parameter values that might tell you pretty quickly what's going wrong.
An easier/more direct way to debug would be to replace the Csnippet you're using for dmeas with an R function: this will be slower but easier to work with (especially if you're not familiar with C coding). If you uncomment the browser() statement below, the code will drop into debug mode when you encounter the bad situation ...
dmeas <- function(reports,H,rho,log, ...) {
lik <- dbinom(reports,size=H,prob=rho,log=log)
if (!is.finite(lik)) {
lik <- -1000
## browser()
}
return(lik)
}
For example:
(t = 3, reports = 2, S = 2280, I = 0, R = 35721, H = 0, Beta = 100,
mu_IR = 0.5, rho = 0.5, eta = 0.06, N = 38000, log = TRUE)
Browse[1]> debug at /tmp/SO65554258.R!ZlSILG#7: return(lik)
Browse[2]> reports
[1] 2
Browse[2]> H
[1] 0
Browse[2]> rho
[1] 0.5
This shows that the problem is indeed that you have a positive number of reported cases when there have been zero infections ... R is trying to compute the binomial probability of observing reports cases out when there are H infections that are potentially reportable, each reported with a probability rho. When the number of trials N in a binomial probability Binom(N,p) is zero, the only possible outcome is zero 'successes' (reported cases), with probability 1. All other outcomes have probability 0 (and log-probability -Inf).

Fitting a truncated binomial distribution to data in R

I have discrete count data indicating the number of successes in 10 binomial trials for a pilot sample of 46 cases. (Larger samples will follow once I have the analysis set up.) The zero class (no successes in 10 trials) is missing, i.e. each datum is an integer value between 1 and 10 inclusive. I want to fit a truncated binomial distribution with no zero class, in order to estimate the underlying probability p. I can do this adequately on an Excel spreadsheet using least squares with Solver, but because I want to calculate bootstrap confidence intervals on p, I am trying to implement it in R.
Frankly, I am struggling to understand how to code this. This is what I have so far:
d <- detections.data$x
# load required packages
library(fitdistrplus)
library(truncdist)
library(mc2d)
ptruncated.binom <- function(q, p) {
ptrunc(q, "binom", a = 1, b = Inf, p)
}
dtruncated.binom <- function(x, p) {
dtrunc(x, "binom", a = 1, b = Inf, p)
}
fit.tbin <- fitdist(d, "truncated.binom", method="mle", start=list(p=0.1))
I have had lots of error messages which I have solved by guesswork, but the latest one has me stumped and I suspect I am totally misunderstanding something.
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'start' must specify names which are arguments to 'distr'.<
I think this means I must specify starting values for x in dtrunc and q in ptrunc, but I am really unclear what they should be.
Any help would be very gratefully received.

Genetic algorithm for permutations without repetition

I am writing a Genetic Algorithm (GA) to find a certain permutation in my social network data (an iGraph object). I am using the R library GA, but the permutations it generates contain repetitions and their length varies, while I want permutations without repetitions and the same length.
I understand that the mutation and the cross-over functions cause this phenomenon, but I cannot find a way around it. I have tried to implement a fitness function that gives a low score to the "bad" permutations but this caused an error (see below).
cp_GA <- function(g, ratio = 0.2, maxiter = 1000, run = 40, pop = 200) {
library("igraph")
library("GA")
# ratio : ratio of the number of core/all vertices
# this is describing the desired size of the core group
# maxiter: max number of iterations for the GA
# run : max number of runs with the same fitness
# pop : population size for tha GA
# desired core size:
coresize <- round(vcount(g) * ratio, 0)
fitness_vertex_perm <- function(permutation) {
# this is the fitness function for the GA
# it calculates the density of the core with the current permutation
# the if-else structure prevents permutations with repetitions
if (sort(permutation) == c(1:vcount(g))) {
dens <- edge_density(
induced_subgraph(permute(g, as.numeric(permutation)), 1:coresize, impl =
"auto"))
} else {
dens <- 0
}
return(dens)
}
lowerlimit <- 1:vcount(g)
upperlimit <- vcount(g):1
hint <- order(degree(g), decreasing = TRUE)
maxfitness <- 1
GA <- ga(type = "permutation",
fitness = fitness_vertex_perm,
lower = lowerlimit,
upper = upperlimit,
maxiter = maxiter,
run = run,
maxFitness = maxfitness,
suggestions = hint,
popSize = pop
)
return(GA)
}
In the fitness function above the if else statement checks if a permutation is OK but this drops an error:
testresult <- cp_GA(g, ratio = 0.13, maxiter = 1000, run = 60, pop = 400)
Error in getComplete(control) :
argument "control" is missing, with no default
In addition: Warning message:
In if (sort(permutation) == c(1:vcount(g))) { :
Error in getComplete(control) :
argument "control" is missing, with no default
without the if-else it runs but produces a permutation result that is not useful for me.
How can I set GA to generate the right type of permutations?
See now you have two thigs to implement:
1.GA selection mechanism.
2.Without replacement strategy.
Theory of GA selection is that when you have parent selction you can do that randomly or alao can do that by applying some technique, you seem to have just done what is required.
Theory of without replacement is that you will have to deduct the remaining population one less as compared to the previous.
Probability(new) = 1/ (Proability(old) - 1)
So if you adjust your upper limit of the population in the looping section to one less you can achieve your result.
Hope this is all you need a hint in right direction.

Performing an Interval Regression in R

I am trying to run an interval regression, where my dependent variable, y is made up of 14 intervals, representing incomes. I have 5000 observations. I have six independent variables I am trying to use to predict my y.
I am trying to follow the steps performed here:
http://www.karlin.mff.cuni.cz/~pesta/NMFM404/interval.html#References
So, I actually have y at its exact values, but am trying to learn how to do an interval regression from this. So, first I convert y into an interval.
Income[Income < 10000] <- 1
Income[Income > 10000 & Income < 20001] <- 2
Income[Income > 20000 & Income < 30001] <- 3
...
Income[Income > 300000] <- 14
Okay, fine. From the above link, I should actually convert it to correspond to each lower bound of the interval, and each upper bound. I have to imagine that isn't the only way, but for now, I am following those directions.
lIncome <- rep(0,5000)#lower income bound
uIncome <- rep(0,5000)#upper income bound
for (i in 1:5000){
if(Income[i] == 1){
lIncome[i] = 0
uIncome[i] = 10000
}
if(Income[i] == 2){
lIncome[i] = 10001
uIncome[i] = 20000
}
...
if(Income[i] == 14){
lIncome[i] = 300001
uIncome[i] = Inf
}
}
So now I have columns lIncome and uIncome which correspond to the levels of income. I am fine for this part. Perhaps it is problematic my last interval goes to infinity; but even if I just cap it at 500000 I still get errors.
The instructions next say to incorporate the Surv() function.
So, I perform:
TEST <- Surv(lIncome, uIncome, event = rep(3,5000))
However, my errors start now. I get:
Warning message:
In Surv(lIncome, uIncome, event = rep(3, 5000)) :
Invalid status value, converted to NA
If I try
TEST <- Surv(lIncome, uIncome, event = rep(2,5000))
it works, but then:
m <- survreg(TEST ~ Age + AgeSq + ... , dist="gaussian")
gives:
Error in survreg(TEST ~ Age + AgeSq + NoDegree, dist = "gaussian") :
Invalid survival type
First of all, I am not sure why changing the 3 -> 2 makes it work. Even if I change the Inf value to 500000 (or any appropriate number) having it equal to 2 (or any number) does not resolve the issue.
Second, when I can get past that part, the fact that survreg is failing is leaving me puzzled.
Right now, my approach is to play around with my intervals, to see if I can get it to work somehow, then go from there. I am also looking closer at all the documentation for ?Surv and ?survreg
Any help is very appreciated though, thank you.

Resources