Related
Is there a way to specify weights in relrisk.ppp function in spatstat (version 1.63-3)?
The relrisk.ppp function calls the density.ppp function, which does allow users to specify their own weights.
For example, let us build upon the provided spatstat.data::urkiola data where, instead of individual trees, the locations are tree stands and we have a second numeric mark for the frequency of trees at each point-location:
urkiola_new <- spatstat.data::urkiola
urkiola_new$marks <- data.frame("type" = urkiola_new$marks, "freq" = rpois(urkiola_new$n, 3))
f1 <- spatstat::relrisk(urkiola_new, weights = urkiola_new$marks$freq)
When using the urkiola_new in a call of relrisk, urkiola_new is caught by stopifnot(is.multitype(X)) in relrisk.ppp. I next tried specifying the weights separately as a vector while using the original urkiola data,
f2 <- spatstat::relrisk(urkiola, weights = urkiola_new$marks$freq)
but was caught by a warning from the pixellate.ppp function within the internal density.ppp function:
Error in pixellate.ppp(x, ..., padzero = TRUE) : length(weights) == npoints(x) || length(weights) == 1 is not TRUE
The same error occurs when I convert the weights into a list
urkiola_weights <- split(urkiola_new$marks$freq, urkiola_new$marks$type)
f3 <- spatstat::relrisk(urkiola, weights = urkiola_weights)
I suspect there is a way to specify the weights cleverly, but it yet escapes me. Any suggestions or guidance would be helpful, thank you!
The function relrisk.ppp is not currently designed to handle weights. The help entry for relrisk.ppp does not mention weights.
The example above does not work because relrisk.ppp applies density.ppp separately to the sub-patterns of points of each type, and the extra argument weights is the wrong length for these sub-patterns.
I will take this question as a feature request, to add this capability to relrisk.ppp. It should be done soon.
Update: this is now implemented in the development version, spatstat 1.64-0.018 available at the spatstat github repository
I am trying to analyze some Reaction Time data using GLMM. to find a distribution that fits my data best.I used fitdist() for gamma and lognormal distributions. the results showed that lognormal fits my data better.
However, recently i read that the inverse gaussian distribution might be a better fit for reaction time data.
I used nigFitStart to obtain the start values:
library(GeneralizedHyperbolic)
invstrt <- nigFitStart(RTtotal, startValues = "FN")
which gave me this:
$paramStart
mu delta alpha beta
775.953984862 314.662306398 0.007477984 -0.004930604
so i tried using the start parameteres for fitdist:
require(fitdistrplus)
fitinvgauss <- fitdist(RTtotal, "invgauss", start = list(mu=776, delta=314, alpha=0.007, beta=-0.05))
but i get the following error:
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'start' must specify names which are arguments to 'distr'.
i also used ig_fit{goft} and got the following results:
Inverse Gaussian MLE
mu 775.954
lambda 5279.089
so, this time i used these two parameters for the start argument in fitdist and still got the exact same error:
> fitinvgauss <- fitdist(RTtotal, "invgauss", start = list(mu=776, lambda=5279))
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'start' must specify names which are arguments to 'distr'.
someone had mentioned that changing the parametere names from mu and lambda to mean and shape had solved their problem, but i tried it and still got the same error.
Any idea how i can fix this? or could you suggest an alternative way to fit inverse gaussian to my data?
thank you
dput(RTtotal)
c(594.96, 659.5, 706.14, 620.92, 811.05, 420.63, 457.08, 585.53,
488.59, 484.87, 496.72, 769.01, 458.92, 521.76, 889.08, 514.11,
553.09, 564.68, 1057.19, 437.79, 660.33, 639.58, 643.45, 419.47,
469.16, 457.78, 530.58, 538.73, 557.17, 1140.09, 560.03, 543.18,
1093.29, 607.59, 430.2, 712.06, 716.6, 566.69, 989.71, 449.96,
653.22, 556.52, 654.8, 472.54, 600.26, 548.36, 597.51, 471.97,
596.72, 600.29, 706.77, 511.6, 475.89, 599.13, 570.12, 767.57,
402.68, 601.56, 610.02, 891.95, 483.22, 588.78, 505.95, 554.15,
445.54, 489.02, 678.13, 532.06, 652.61, 654.79, 535.08, 1215.66,
633.6, 645.92, 454.37, 535.81, 508.97, 690.78, 685.97, 703.04,
731.99, 592.75, 662.03, 1400.33, 599.73, 1021.34, 1232.35, 855.1,
780.32, 554.4, 1965.77, 841.89, 1262.76, 721.62, 788.95, 1104.24,
1237.4, 1193.04, 513.91, 474.74, 380.56, 570.63, 700.96, 380.89,
481.96, 723.63, 835.22, 781.1, 468.76, 555.1, 522.22, 944.29,
541.06, 559.18, 738.68, 880.58, 500.14, 1856.97, 1001.59, 703.7,
1022.35, 1813.35, 1128.73, 864.75, 1166.77, 1220.4, 776.56, 2073.72,
1223.88, 617, 1387.71, 595.57, 1506.13, 678.41, 1797.87, 2111.04,
1116.61, 1038.6, 894.25, 778.51, 908.51, 1346.69, 989.09, 1334.17,
877.31, 649.31, 978.22, 1276.84, 1001.58, 1049.66, 1131.83, 700.8,
1267.21, 693.52, 1182.3)
So I'm guessing that you failed to tell us that you also have the statmod-package loaded (or perhaps some other package with a 'invgauss'-family including a dinvgauss function). You should be able to tell which package dinvgauss comes from by reading the top line of the help page for the function, So after installing that package and reading the help page (which one should ALWAYS do) for ?dinvgauss:
fitinvgauss <- fitdist(RTtotal, "invgauss",
start = list(mean=776, dispersion=314, shape=1))
fitinvgauss
# --------------
Fitting of the distribution ' invgauss ' by maximum likelihood
Parameters:
estimate Std. Error
mean 779.2535 NA
dispersion -1007.5490 NA
shape 4972.5745 NA
All I did was read the error message and then read the help page and use the correct names for that function's parameters. (And then play around a bit to get the parameter starting values into the feasible range of values.)
I am new to neural networks and the mxnet package in R. I want to do a logistic regression on my predictors since my observations are probabilities varying between 0 and 1. I'd like to weight my observations by a vector obsWeights I have, but I'm not sure where to implement the weights. There seems to be a weight= option in mx.symbol.FullyConnected but if I try weight=obsWeights I get the following error message
Error in mx.varg.symbol.FullyConnected(list(...)) :
Cannot find argument 'weight', Possible Arguments:
----------------
num_hidden : int, required
Number of hidden nodes of the output.
no_bias : boolean, optional, default=False
Whether to disable bias parameter.
How should I proceed to weight my observations? Here is my code at the moment.
# Prepare data
train.mm = model.matrix(obs ~ . , data = train_data)
train_label = train_data$obs
# Normalize
train.mm = apply(train.mm, 2, function(x) (x-min(x))/(max(x)-min(x)))
# Create MXDataIter compatible iterator
batch_size = 128
train.iter = mx.io.arrayiter(data=t(train.mm), label=train_label,
batch.size=batch_size, shuffle=T)
# Symbolic model definition
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data=data, num.hidden=128, name='fc1')
act1 = mx.symbol.Activation(data=fc1, act.type='relu', name='act1')
final = mx.symbol.FullyConnected(data=act1, num.hidden=1, name='final')
logistic = mx.symbol.LogisticRegressionOutput(data=final, name='logistic')
# Run model
mxnet_train = mx.model.FeedForward.create(
symbol = logistic,
X = train.iter,
initializer = mx.init.Xavier(rnd_type = 'gaussian', factor_type = 'avg', magnitude = 2),
num.round = 25)
Assigning the fully connected weight argument is not what you want to do at any rate. That weight is a reference to parameters of the layer; i.e., what you multiply in the inputs by to get output values These are the parameter values you're trying to learn.
If you want to make some samples matter more than others, then you'll need to adjust the loss function. For example, multiply the usual loss function by your weights so that they do not contribute as much to the overall average loss.
I do not believe the standard Mxnet loss functions have a spot for assigning weights (that is LogisticRegressionOutput won't cover this). However, you can make your own cost function that does. This would involve passing your final layer through a sigmoid activation function to first generate the usual logistic regression output value. Then pass that into the loss function you define. You could do squared error, but for logistic regression you'll probably want to use the cross entropy function:
l * log(y) + (1 - l) * log(1 - y),
where l is the label and y is the predicted value.
Ideally, you'd write a symbol with an efficient definition of the gradient (Mxnet has a cross entropy function, but its for softmax input, not a binary output. You could translate your output to two outputs with softmax as an alternative, but that seems less easy to work with in this case), but the easiest path would be to let Mxnet do its autodiff on it. Then you multiply that cross entropy loss by the weights.
I haven't tested this code, but you'd ultimately have something like this (this is what you'd do in python, should be similar in R):
label = mx.sym.Variable('label')
out = mx.sym.Activation(data=final, act_type='sigmoid')
ce = label * mx.sym.log(out) + (1 - label) * mx.sym.log(1 - out)
weights = mx.sym.Variable('weights')
loss = mx.sym.MakeLoss(weigths * ce, normalization='batch')
Then you want to input your weight vector into the weights Variable along with your normal input data and labels.
As an added tip, the output of an mxnet network with a custom loss via MakeLoss outputs the loss, not the prediction. You'll probably want both in practice, in which case its useful to group the loss with a gradient-blocked version of the prediction so that you can get both. You'd do that like this:
pred_loss = mx.sym.Group([mx.sym.BlockGrad(out), loss])
I am trying to use the chart.EfficientFrontier function in the portfolioanalytics package in R to chart an efficient frontier object that I have created but it keeps failing. Basically I am trying to find a frontier that will minimize annaulized standard deviation. Eventually once I get this working I would also like to maximize annualized return.
Firstly I created an annualized standard deviation function using this code
pasd <- function(R, weights){
as.numeric(StdDev(R=R, weights=weights)*sqrt(12)) # hardcoded for monthly data
# as.numeric(StdDev(R=R, weights=weights)*sqrt(4)) # hardcoded for quarterly data
}
I imported a csv file with monthly returns and my portfolio object looks like this:
> prt
**************************************************
PortfolioAnalytics Portfolio Specification
**************************************************
Call:
portfolio.spec(assets = colnames(returns))
Number of assets: 3
Asset Names
[1] "Global REITs" "Au REITs" "Au Util and Infra"
Constraints
Enabled constraint types
- leverage
- long_only
Objectives:
Enabled objective names
- mean
- pasd
Now I successfully create an efficient frontier object using this line:
prt.ef <- create.EfficientFrontier(R = returns, portfolio = prt, type = "DEoptim", match.col = "pasd")
But when I try to plot it I am getting the following error messages.
> chart.EfficientFrontier(prt.ef, match.col="pasd")
Error in StdDev(R = R, weights = weights) :
argument "weights" is missing, with no default
In addition: There were 26 warnings (use warnings() to see them)
Error in StdDev(R = R, weights = weights) :
argument "weights" is missing, with no default
Error in StdDev(R = R, weights = weights) :
argument "weights" is missing, with no default
Error in xlim[2] * 1.15 : non-numeric argument to binary operator
Anyone know why this is the case? When I use summary(prt.ef) I can see the weights, but why is the chart.EfficientFrontier function failing?
As #WaltS suggested, you need to be consistent in implementing functions to annualize mean and risk returns.
But actually to get annualized statistics you have two options, you are not using any:
1) Make the optimization with monthly data, with the original risk return functions in the specification. For plotting you can anualize making
Port.Anua.Returns=prt.ef$frontier[,1]*12
Port.Anua.StDev=prt.ef$frontier[,2]*12^.5
The weights will be the same for monthly or annualized portfolios.
prt.ef$frontier[,-(1:3)]
2) Transform your monthly returns in annualized returns multiplying by 12. Then do the optimization with the usual procedure, all risk and return will be already annualized in prt.ef$frontier.
Related to the jagged line in EF. Using your portfolio specification I was also able to recreate the same behavior. For the following plot I used edhec data, your specification with original mean and StdDev in the objectives:
data(edhec)
returns <- edhec[,1:3]
That behavior must be influenced by the specification or the optimization algorithm you are using. I did the same optimization with solve.QP from package quadprog. This is the result.
Update
The code is here:
require(quadprog)
#min_x(-d^T x + 1/2 b^T D x) r.t A.x>=b
MV_QP<-function(nx, tarRet, Sig=NULL,long_only=FALSE){
if (is.null(Sig)) Sig=cov(nx)
dvec=rep(0,ncol(Sig))
meq=2
Amat=rbind(rep(1,ncol(Sig)),
apply(nx,2,mean) )
bvec=c(1,tarRet )
if (long_only) {
meq=1
Amat=Amat[-1,]
Amat=rbind(Amat,
diag(1,ncol(Sig)),
rep(1,ncol(Sig)),
rep(-1,ncol(Sig)))
bvec=bvec[-1]
bvec=c(bvec,
rep(0,ncol(Sig)),.98,-1.02)
}
sol <- solve.QP(Dmat=Sig, dvec, t(Amat), bvec, meq=meq)$solution
}
steps=50
x=returns
µ.b <- apply(X = x, 2, FUN = mean)
long_only=TRUE
range.bl <- seq(from = min(µ.b), to = max(µ.b)*ifelse(long_only,1,1.6), length.out = steps)
risk.bl <- t(sapply(range.bl, function(targetReturn) {
w <- MV_QP(x, targetReturn,long_only=long_only)
c(sd(x %*% w),w) }))
weigthsl=round(risk.bl[,-1],4)
colnames(weigthsl)=colnames(x)
weigthsl
risk.bl=risk.bl[,1]
rets.bl= weigthsl%*%µ.b
fan=12
plot(x = risk.bl*fan^.5, y = rets.bl*fan,col=2,pch=21,
xlab = "Annualized Risk ",
ylab = "Annualized Return", main = "long only EF with solve.QP")
Adding to Robert's comments, the optimization calculation with monthly returns is a quadratic programming problem with linear constraints. When mean is the return objective and StdDev or var is the risk objective, optimize.portfolio and create.EfficientFrontier select the ROI method as the solver which uses solve.QP, an efficient solver for these sorts of problems. When the risk objective is changed to pasd, these functions don't recognize this as a QP problem so use DEoptim a general nonlinear problem solver perhaps better suited to solving nonconvex problem rather than convex QP ones. See Differential Evolution with DEoptim . This seems to be the cause of the jagged efficient frontier.
In order to have create.EfficientFrontier use solve.QP, which is much more efficient and accurate for this type of problem, you can make a custom moment function to compute the mean and variance and then specify it with the argument momentFUN. However, create.EfficientFrontier at least in part uses means computed directly from the returns rather than using mu from momentFUN. To deal with that, multiply the returns and divide the variance by 12 as shown in the example below.
library(PortfolioAnalytics)
data(edhec)
returns <- edhec[,1:3]
# define moment function
annualized.moments <- function(R, scale=12, portfolio=NULL){
out <- list()
out$mu <- matrix(colMeans(R), ncol=1)
out$sigma <- cov(R)/scale
return(out)
}
# define portfolio
prt <- portfolio.spec(assets=colnames(returns))
prt <- add.constraint(portfolio=prt, type="long_only")
# leverage defaults to weight_sum = 1 so is equivalent to full_investment constraint
prt <- add.constraint(portfolio=prt, type="leverage")
prt <- add.objective(portfolio=prt, type="risk", name="StdDev")
# calculate and plot efficient frontier
prt_ef <- create.EfficientFrontier(R=12*returns, portfolio=prt, type="mean-StdDev",
match.col = "StdDev", momentFUN="annualized.moments", scale=12)
xlim <- range(prt_ef$frontier[,2])*c(1, 1.5)
ylim <- range(prt_ef$frontier[,1])*c(.80, 1.05)
chart.EfficientFrontier(prt_ef, match.col="StdDev", chart.assets = FALSE,
labels.assets = FALSE, xlim=xlim, ylim=ylim )
points(with(annualized.moments(12*returns, scale=12), cbind(sqrt(diag(sigma)), mu)), pch=19 )
text(with(annualized.moments(12*returns, scale=12), cbind(sqrt(diag(sigma)), mu)),
labels=colnames(returns), cex=.8, pos=4)
chart.EF.Weights(prt_ef, match.col="StdDev")
The means and standard deviations of the assets also need to be adjusted and so are plotted outside of chart.EfficientFrontier and shown on the chart below.
At the end of the day it would be simpler, as Robert suggests, to compute the weights for the efficient frontier using the monthly returns and then compute the portfolio returns and standard deviations using annualized asset means and standard deviations and the monthly weights which are the same in both cases. However, perhaps this example is useful to show the use of custom moment and objective functions.
Does not find the reason of the error, but setting the limits it partially works!
prt.ef$frontier #see the EF
xylims=apply(prt.ef$frontier[,c(2,1)],2,range)*c(.98,1.01)
chart.EfficientFrontier(prt.ef, match.col="pasd",
main="Portfolio Optimization",
xlim=xylims[,1], ylim=xylims[,2])
#or
plot(prt.ef$frontier[,c(2,1)],col=2)
ok so I tried the pasd function that WaltS suggested, and the chart.EfficientFrontier seemed to work but it gave me a jagged line and not a smooth line.
I have now created an annualized return function using this code:
pamean <- function(R, weights=NULL){Return.annualized(apply(as.xts(t(t(R) * weights)),1,sum))}
and added this as an objective to my portfolio prt.
> prt
**************************************************
PortfolioAnalytics Portfolio Specification
**************************************************
Call:
portfolio.spec(assets = colnames(returns))
Number of assets: 3
Asset Names
[1] "Global REITs" "Au REITs" "Au Util and Infra"
Constraints
Enabled constraint types
- long_only
- leverage
Objectives:
Enabled objective names
- pamean
- pasd
I then create the efficient frontier again using this line:
> prt.ef <- create.EfficientFrontier(R=returns, portfolio=prt, type="DEoptim", match.col="pasd")
but when I use the summary function I see that only 1 frontier point has been generated. What does the error msg mean and why was only 1 point generated?
> summary(prt.ef)
**************************************************
PortfolioAnalytics Efficient Frontier
**************************************************
Call:
create.EfficientFrontier(R = returns, portfolio = prt, type = "DEoptim",
match.col = "pasd")
Efficient Frontier Points: 1
Error in `colnames<-`(`*tmp*`, value = character(0)) :
attempt to set 'colnames' on an object with less than two dimensions
I'm relatively new in R and I would appreciated if you could take a look at the following code. I'm trying to estimate the shape parameter of the Frechet distribution (or inverse weibull) using mmedist (I tried also the fitdist that calls for mmedist) but it seems that I get the following error :
Error in mmedist(data, distname, start = start, fix.arg = fix.arg, ...) :
the empirical moment function must be defined.
The code that I use is the below:
require(actuar)
library(fitdistrplus)
library(MASS)
#values
n=100
scale = 1
shape=3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
memp=minvweibull(c(1,2), shape=3, rate=1, scale=1)
# estimating the parameters
para_lm = mmedist(data_fre,"invweibull",start=c(shape=3,scale=1),order=c(1,2),memp = "memp")
Please note that I tried many times en-changing the code in order to see if my mistake was in syntax but I always get the same error.
I'm aware of the paradigm in the documentation. I've tried that as well but with no luck. Please note that in order for the method to work the order of the moment must be smaller than the shape parameter (i.e. shape).
The example is the following:
require(actuar)
#simulate a sample
x4 <- rpareto(1000, 6, 2)
#empirical raw moment
memp <- function(x, order)
ifelse(order == 1, mean(x), sum(x^order)/length(x))
#fit
mmedist(x4, "pareto", order=c(1, 2), memp="memp",
start=c(shape=10, scale=10), lower=1, upper=Inf)
Thank you in advance for any help.
You will need to make non-trivial changes to the source of mmedist -- I recommend that you copy out the code, and make your own function foo_mmedist.
The first change you need to make is on line 94 of mmedist:
if (!exists("memp", mode = "function"))
That line checks whether "memp" is a function that exists, as opposed to whether the argument that you have actually passed exists as a function.
if (!exists(as.character(expression(memp)), mode = "function"))
The second, as I have already noted, relates to the fact that the optim routine actually calls funobj which calls DIFF2, which calls (see line 112) the user-supplied memp function, minvweibull in your case with two arguments -- obs, which resolves to data and order, but since minvweibull does not take data as the first argument, this fails.
This is expected, as the help page tells you:
memp A function implementing empirical moments, raw or centered but
has to be consistent with distr argument. This function must have
two arguments : as a first one the numeric vector of the data and as a
second the order of the moment returned by the function.
How can you fix this? Pass the function moment from the moments package. Here is complete code (assuming that you have made the change above, and created a new function called foo_mmedist):
# values
n = 100
scale = 1
shape = 3
# simulate a sample
data_fre = rinvweibull(n, shape, scale)
# estimating the parameters
para_lm = foo_mmedist(data_fre, "invweibull",
start= c(shape=5,scale=2), order=c(1, 2), memp = moment)
You can check that optimization has occurred as expected:
> para_lm$estimate
shape scale
2.490816 1.004128
Note however, that this actually reduces to a crude way of doing overdetermined method of moments, and am not sure that this is theoretically appropriate.