student-t distribution for portfolio optimization in R - r

How can i use a student-t distribution for portfolio optimization in R?
I would fit the data via the estimated parameter and then throw my new distribution into a portfolio optimization package.
From the beginning: I'm trying to do a Portfolio Optimization via the Entropy Pooling Approach by Meucci. As a Basis (Reference Model) i would like to use historical data fitted by a multivariate skewed t-distribution.
Basics: The Entropy Pooling Approach is build upon Black-Litterman - simply said: you can incorporate Views (absolute or relative) into your Model/Portfolio Optimization. The difference compared to the BL is, that you can use a non-normal distribution (not even returns), non-linear Views and views on a variety of parameters. (returns, cor, sd etc.) Therefore, you can put any random data into your model as a reference model. The following step is to blend this model with your individual selected views.
So now, i have a distribution object, but how do i get the distribution into my optimizer. (optimize.portfolio - package 'PortfolioAnalytics'). The requirement therefore is "an xts, vector, matrix, data frame, timeSeries or zoo object of asset returns". The gap in my knowledge is at the transition from the distribution to the new data set.
Thx in Advance!
My code subsequent:
return_distribution = sn::mst.mple(y=returns[,-1])
xi = c(return_distribution[['dp']]$beta)
omega = return_distribution[['dp']]$Omega
alpha = return_distribution[['dp']]$alpha
df = return_distribution[['dp']]$nu
marketDistribution = BLCOP::mvdistribution('mst', xi = xi, Omega = omega,
alpha = alpha, nu = df)

You should look for scenario optimisation, see e.g. https://quant.stackexchange.com/questions/31818/optimize-portfolio-of-non-normal-binary-return-assets/31847#31847 . For an implementaion in R, see for instance https://quant.stackexchange.com/questions/42339/target-market-correlation-for-long-short-equity-portfolio/50622#50622 (though it does not use PortfolioAnalytics).

Related

Fitting an inhomogeneous Cox LGCP to a replicated point process using mppm

My recent foray into spatial point patterns has brought me to examining LGCP Cox processes. In my case I actually have a series of point patterns that I want to fit a single model to. One of my previous inquiries brought me to using mppm to train such models( thanks Adrian Baddeley!). My next question relates to using this type of Cox model in the context of mppm.
Is this possible to fit an inhomogeneous LGCP Cox process (or other type of Cox process) to a replicated point pattern using mppm? I see some info on fitting Gibbs processes, but not really for Cox processes.
It seems like the answer may be "possibly" through some creative use of the "random" argument.
For the sake of example, lets say I'm fitting a using point pattern Y with a single covariate X (which is a single im). The call to kppm would be:
myModel = kppm(Y ~ X,"LGCP")
If I were fitting a simple inhomogeneous Poisson process to a replicated point pattern and associated covariate in hyperframe G, I believe the call would look like the following:
myModel = mppm(Y ~ X, data=G)
After going through Chapter 16 of the SpatStat book I think that fitting a replicated LGCP Cox model might be accomplished by using the simulated intensities from calls to rLGCP, maybe like this...
myLGCP = rLGCP(model="exp",mu=0,saveLambda=TRUE,nsim=2,win=myWindow)
myIntensity = lapply(myLGCP,function(x) attributes(x)$Lambda)
G$Z = myIntensity
myModel = mppm(Y ~ X, data=G, random=~Z|id)
The above approach "runs" without errors... but I have no idea if I'm even remotely close to actually accomplishing what I wanted to do. It's also a little unclear how to use the fitted object to then simulate a realization of the model, since simulate.kppm requires a kppm object.
Thoughts and suggestions appreciated.
mppm does not currently support Cox processes.
You could do the following
Fit the trend part of the model to your replicated point pattern data using mppm, for example m <- mppm(Y ~ X, data=G)
Extract the fitted intensities for each point pattern using predict.mppm
For each point pattern, using the corresponding intensity obtained from the model, compute the inhomogeneous K function using Kinhom (with argument ratio=TRUE)
Combine the K functions using pool
Estimate the cluster parameters of the LGCP by applying lgcp.estK to the pooled K function.
Optionally after step 4 you could convert the pooled K function to a pair correlation function using pcf.fv and then fit the cluster parameters using lgcp.estpcf.
This approach assumes that the same cluster parameters will apply to each point pattern. If your data consist of several distinct groups of patterns, and you want the model to assign different cluster parameter values to the different groups of patterns, then just apply steps 4 and 5 separately to each group.

how to decompose a gamma distribution into two gamma distribution in R

Is there an algorithm available in R that can decompose a gamma distribution into two (or more) gamma distributions? If so, can you give me an example with it? Basically, I have a data set that looks like a gamma distribution if I plot it with respect to time (it's a time series data). Basically, this data contains the movement of the animal. And the animal can be in two different states: hungry, not hungry. My immediate reaction was to use the Hidden Markov Model and see if I can predict the two states. I was trying to use the depmix() function from depmixS4 library in R to see if I can see the two different states. However, I don't really know how to use this function in gamma distribution. The following is the code that I wrote, but it says that I need an argument for gamma, which I don't understand. Can someone tell me what parameter I should use and how to determine the parameter? Thanks!
mod <- depmix(freq ~ 1, data = mod.data, nstates = 2, family = gamma())
fit.mod <- fit(mod)
Thank you!

Simluating an ARMA Model Using R

My professor and I who are new to time series analysis in R are attempting the simulate an ARMA model. However, we are having trouble understanding where the parameters for the time series simulation come from. When simulating an ARMA model in R using the arima.sim() function, one argument that is required in the function is model =, which is a list with component ar and ma giving the AR and MA coefficients respectively. The issue we are running into is that we do not know where these AR and MA coefficients come from. Would anyone happen to know where the coefficients arise from?
I have tried searching the internet for information regarding this issue. However, the only answer that I have seen is that the coefficients are from
running an ACF and PACF. Though, there has been no further explanations as to what we are running the ACF and PACF over to generate these coefficients. Are we running ACF and PACF over previously simulated data or something else?
AR(1) Model Example Code
Ar.sm <- list(order = c(1,0,0), ar = 0.1, sd = 0.1)
Ar.lg <- list(order = c(1,0,0), ar = 0.1, sd = 0.1)
AR1.sm <- arima.sim(model = Ar.sm, n = 50)
AR1.lg <- arima.sim(model = Ar.lg, n = 50)
Any help would be greatly appreciated. Additionally, if anyone has found any literature or videos explaining this more in depth, that would be fantastic. Thank you and have a nice day.
The ARMA model is actually a class of models where you get different models by using different parameters. If you are using an ARMA(p,q) model then this means you have p auto-regressive (AR) terms and q moving-average (MA) terms. The AR and MA coefficients in the model set the size of these terms. If you are merely simulating a model (as opposed to making inferences from data) then it is up to you to set the coefficients to whatever values you want to simulate with. You are correct that different coefficient values give different kinds of results that are closely connected to the ACF and PACF.
Since you are simulating, may I suggest that you just try to simulate some examples using coefficients of your choice, and vary the coefficients you put into your simulation to see the differences in what you get out. It would also be a useful exercise for you to construct the sample ACF and PACF of your simulated data, and see how these vary as you change the coefficient values going into your simulation. This will give you a better idea of the connection between the coefficients and the output of the model.

How to deal with spatially autocorrelated residuals in GLMM

I am conducting an analysis of where on the landscape a predator encounters potential prey. My response data is binary with an Encounter location = 1 and a Random location = 0 and my independent variables are continuous but have been rescaled.
I originally used a GLM structure
glm_global <- glm(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs,
data=Data_scaled, family=binomial)
but realized that this failed to account for potential spatial-autocorrelation in the data (a spline correlogram showed high residual correlation up to ~1000m).
Correlog_glm_global <- spline.correlog (x = Data_scaled[, "Y"],
y = Data_scaled[, "X"],
z = residuals(glm_global,
type = "pearson"), xmax = 1000)
I attempted to account for this by implementing a GLMM (in lme4) with the predator group as the random effect.
glmm_global <- glmer(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs+(1|Group),
data=Data_scaled, family=binomial)
When comparing AIC of the global GLMM (1144.7) to the global GLM (1149.2) I get a Delta AIC value >2 which suggests that the GLMM fits the data better. However I am still getting essentially the same correlation in the residuals, as shown on the spline correlogram for the GLMM model).
Correlog_glmm_global <- spline.correlog (x = Data_scaled[, "Y"],
y = Data_scaled[, "X"],
z = residuals(glmm_global,
type = "pearson"), xmax = 10000)
I also tried explicitly including the Lat*Long of all the locations as an independent variable but results are the same.
After reading up on options, I tried running Generalized Estimating Equations (GEEs) in “geepack” thinking this would allow me more flexibility with regards to explicitly defining the correlation structure (as in GLS models for normally distributed response data) instead of being limited to compound symmetry (which is what we get with GLMM). However I realized that my data still demanded the use of compound symmetry (or “exchangeable” in geepack) since I didn’t have temporal sequence in the data. When I ran the global model
gee_global <- geeglm(Encounter ~ Dist_water_cs+coverMN_cs+I(coverMN_cs^2)+
Prey_bio_stand_cs+Prey_freq_stand_cs+Dist_centre_cs,
id=Pride, corstr="exchangeable", data=Data_scaled, family=binomial)
(using scaled or unscaled data made no difference so this is with scaled data for consistency)
suddenly none of my covariates were significant. However, being a novice with GEE modelling I don’t know a) if this is a valid approach for this data or b) whether this has even accounted for the residual autocorrelation that has been evident throughout.
I would be most appreciative for some constructive feedback as to 1) which direction to go once I realized that the GLMM model (with predator group as a random effect) still showed spatially autocorrelated Pearson residuals (up to ~1000m), 2) if indeed GEE models make sense at this point and 3) if I have missed something in my GEE modelling. Many thanks.
Taking the spatial autocorrelation into account in your model can be done is many ways. I will restrain my response to R main packages that deal with random effects.
First, you could go with the package nlme, and specify a correlation structure in your residuals (many are available : corGaus, corLin, CorSpher ...). You should try many of them and keep the best model. In this case the spatial autocorrelation in considered as continous and could be approximated by a global function.
Second, you could go with the package mgcv, and add a bivariate spline (spatial coordinates) to your model. This way, you could capture a spatial pattern and even map it. In a strict sens, this method doesn't take into account the spatial autocorrelation, but it may solve the problem. If the space is discret in your case, you could go with a random markov field smooth. This website is very helpfull to find some examples : https://www.fromthebottomoftheheap.net
Third, you could go with the package brms. This allows you to specify very complex models with other correlation structure in your residuals (CAR and SAR). The package use a bayesian approach.
I hope this help. Good luck

Regression evaluation in R

Are there any utilities/packages for showing various performance metrics of a regression model on some labeled test data? Basic stuff I can easily write like RMSE, R-squared, etc., but maybe with some extra utilities for visualization, or reporting the distribution of prediction confidence/variance, or other things I haven't thought of. This is usually reported in most training utilities (like caret's train), but only over the training data (AFAICT). Thanks in advance.
This question is really quite broad and should be focused a bit, but here's a small subset of functions written to work with linear models:
x <- rnorm(seq(1,100,1))
y <- rnorm(seq(1,100,1))
model <- lm(x~y)
#general summary
summary(model)
#Visualize some diagnostics
plot(model)
#Coefficient values
coef(model)
#Confidence intervals
confint(model)
#predict values
predict(model)
#predict new values
predict(model, newdata = data.frame(y = 1:10))
#Residuals
resid(model)
#Standardized residuals
rstandard(model)
#Studentized residuals
rstudent(model)
#AIC
AIC(model)
#BIC
BIC(model)
#Cook's distance
cooks.distance(model)
#DFFITS
dffits(model)
#lots of measures related to model fit
influence.measures(model)
Bootstrap confidence intervals for parameters of models can be computed using the recommended package boot. It is a very general package requiring you to write a simple wrapper function to return the parameter of interest, say fit the model with some supplied data and return one of the model coefficients, whilst it takes care of the rest, doing the sampling and computation of intervals etc.
Consider also the caret package, which is a wrapper around a large number of modelling functions, but also provides facilities to compare model performance using a range of metrics using an independent test set or a resampling of the training data (k-fold, bootstrap). caret is well documented and quite easy to use, though to get the best out of it, you do need to be familiar with the modelling function you want to employ.

Resources