I am using the npudens function in the np package for R.
I am trying to find a kernel density function of a multivariate dataset and the density evaluated at each of the 632 points to run a conditional efficiency analysis.
I have 4 continuous one dummy variable and my sample size is 632 observations.
I use the below function in R.
kerz <- npudens(bws=bw_cx[i,], cykertype="epanechnikov", cxkertype="epanechnikov",
oxkertype="liracine", tdat=tdata, edat=dat)
In earlier versions, this worked fine, as I was able to retrieve the necessary density estimates with kerz$dens.
In newer version and in Rstudio Cloud I get an error:
Error in if (any(a <= 0)) warning(paste("variable", which(a <= 0), " appears to be constant",:missing value where TRUE/FALSE needed
I suppose some if-statement doesn't evaluate to a TRUE or FALSE somewhere in the npudens function. I have tried to debug the command by changing it to the following command:
kerz2 <- npudens(bws=(bw_cx[i,]), ckertype="epanechnikov",, okertype="liracine",
tdat=tdata, edat=dat)
Unfortunately, I get the same error.
Any help/advice on how to fix this would be greatly appreciated.
Related
I'm working with the kohonen package (version 3.0.11) in R for applying the self-organising maps algorithm to a large data set.
In order to determine the optimal grid size, I tried to calculate both the quantisation error and the topographic error at various grid sizes, to see at which size their normalised sum is minimal.
Unfortunately, whenever I run the topo.error() function, I get an error and I'm wondering if the function is still usable after version 2.0.19 of the package (that's the latest version for which I found documentation about the topo.error function).
I know other packages such as aweSOM have similar functions, but the kohonen::topo.error() function only uses the data set and grid parameters as arguments, and not the trained SOM model, saving a substantial amount of computation time.
Here is a minimal reproducible example with the output error:
Code
library('kohonen')
data(yeast)
set.seed(261122)
## take only complete cases
X <- yeast[[3]][apply(yeast[[3]], 1, function(x) sum(is.na(x))) == 0,]
yeast.som <- som(X, somgrid(5, 8, "hexagonal"))
## quantization error
mean(yeast.som$distances)
## topographical error
topo.error(yeast.som, "bmu")
Output
Error in topo.error(yeast.som, "bmu") :
could not find function "topo.error"
This is my first question here, so I will try to make it as well written as possible. Please be overbearing should I make a silly mistake.
Briefly, I am trying to do a maximum likelihood estimation where I need to estimate 5 parameters. The general form of the problem I want to solve is as follows: A weighted average of three copulas, each with one parameter to be estimated, where the weights are nonnegative and sum to 1 and also need to be estimated.
There are packages in R for doing MLE on single copulas or on a weighted average of copulas with fixed weights. However, to the best of my knowledge, no packages exist to directly solve the problem I outlined above. Therefore I am trying to code the problem myself. There is one particular type of error I am having trouble tracing to its source. Below I have tried to give a minimal reproducible example where only one parameter needs to be estimated.
library(copula)
set.seed(150)
x <- rCopula(100, claytonCopula(250))
# Copula density
clayton_density <- function(x, theta){
dCopula(x, claytonCopula(theta))
}
# Negative log-likelihood function
nll.clayton <- function(theta){
theta_trans <- -1 + exp(theta) # admissible theta values for Clayton copula
nll <- -sum(log(clayton_density(x, theta_trans)))
return(nll)
}
# Initial guess for optimization
guess <- function(x){
init <- rep(NA, 1)
tau.n <- cor(x[,1], x[,2], method = "kendall")
# Guess using method of moments
itau <- iTau(claytonCopula(), tau = tau.n)
# In case itau is negative, we need a conditional statement
# Use log because it is (almost) inverse of theta transformation above
if (itau <= 0) {
init[1] <- log(0.1) # Ensures positive initial guess
}
else {
init[1] <- log(itau)
}
}
estimate <- nlminb(guess(x), nll.clayton)
(parameter <- -1 + exp(estimate$par)) # Retrieve estimated parameter
fitCopula(claytonCopula(), x) # Compare with fitCopula function
This works great when simulating data with small values of the copula parameter, and gives almost exactly the same answer as fitCopula() every time.
For large values of the copula parameter, such as 250, the following error shows up when I run the line with nlminb():"Error in .local(u, copula, log, ...) : parameter is NA
Called from: .local(u, copula, log, ...)
Error during wrapup: unimplemented type (29) in 'eval'"
When I run fitCopula(), the optimization is finished, but this message pops up: "Warning message:
In dlogcdtheta(copula, u) :
dlogcdtheta() returned NaN in column(s) 1 for this explicit copula; falling back to numeric derivative for those columns"
I have been able to find out using debug() that somewhere in the optimization process of nlminb, the parameter of interest is assigned the value NaN, which then yields this error when dCopula() is called. However, I do not know at which iteration it happens, and what nlminb() is doing when it happens. I suspect that perhaps at some iteration, the objective function is evaluated at Inf/-Inf, but I do not know what nlminb() does next. Also, something similar seems to happen with fitCopula(), but the optimization is still carried out to the end, only with the abovementioned warning.
I would really appreciate any help in understanding what is going on, how I might debug it myself and/or how I can deal with the problem. As might be evident from the question, I do not have a strong background in coding. Thank you so much in advance to anyone that takes the time to consider this problem.
Update:
When I run dCopula(x, claytonCopula(-1+exp(guess(x)))) or equivalently clayton_density(x, -1+exp(guess(x))), it becomes apparent that the density evaluates to 0 at several datapoints. Unfortunately, creating pseudobservations by using x <- pobs(x) does not solve the problem, which can be see by repeating dCopula(x, claytonCopula(-1+exp(guess(x)))). The result is that when applying the logarithm function, we get several -Inf evaluations, which of course implies that the whole negative log-likelihood function evaluates to Inf, as can be seen by running nll.clayton(guess(x)). Hence, in addition to the above queries, any tips on handling log(0) when doing MLE numerically is welcome and appreciated.
Second update
Editing the second line in nll.clayton as follows seems to work okay:
nll <- -sum(log(clayton_density(x, theta_trans) + 1e-8))
However, I do not know if this is a "good" way to circumvent the problem, in the sense that it does not introduce potential for large errors (though it would surprise me if it did).
I have an R package I am working on that returns output from a Metropolis-Hastings sampler. The output consists of, among other things, matrices where the columns are the variables and the rows are the samples from the posterior. I convert these into coda mcmc objects with this code:
colnames(results$beta) = x$data$Pops
results$beta = mcmc(results$beta, thin = thin)
where thin is 183 and beta is a 21 x 15 matrix (this is a toy example). The mcmc.summary method works fine, but the plot.mcmc gives me:
Error in plot.new() : figure margins too large
I have done a bit of debugging. All the values are finite, there are no NA's, the limits of the axes seem to be being set okay, and there are enough panels (2 plots each with 4 rows and 2 columns) I think. Is there something I am missing in the coercion into the mcmc object?
Package source and all associated files can be found on http://github.com/jmcurran/rbayesfst. A script which will produce the error quickly is in the unexported function mytest, so you'll need
rbayesfst:::mytest()
to get it to run.
There has been suggestion that this has been answered already in this question, but I would like to point out that it is not me setting any of the par values, but plot.mcmc so my question is not about par or plot but what (if anything) I am doing wrong in making a matrix into an mcmc object that cannot be plotted by plot.mcmc It can't be the size of the matrix, because I have had examples with many more dimensions directly from rjags that worked fine.
I'm going through the bartMachine vignette for R, and towards the end, it has an example for using bartMachine for classification problems. This is using the Pima.te data set in the MASS package. When trying to predict "type" with bartMachine (just following the vignette), it looks like my confusion matrix is labeled incorrectly, by comparing my results to the vignette's. I'm getting extremely high error rates - and the numbers in the off diagonal look an awful lot like the vignette's true positive and true negative numbers. Can anyone else confirm this?
options(java.parameters = "-Xmx5g")
library(bartMachine)
set_bart_machine_num_cores(4)
data("Pima.te",package = "MASS")
X <- data.frame(Pima.te[,-8])
y <- Pima.te[,8]
bart_machine_cv <- bartMachineCV(X,y)
bart_machine_cv
It looks like there is indeed a labeling error in the package which will be fixed in the next version (1.2.3) https://github.com/kapelner/bartMachine/blob/255c206be6834d0ab13b9689a41d961de1e73d8a/bartMachine/CHANGELOG
I am having trouble relating how forecasts are calculated in the R packages forecast::croston and tsintermittent::crost. I understand the concept of croston, such as in the example posted here (www.robjhyndman.com/papers/MASE.xls), but the output from the R packages produces very different results.
I used the values from the Excel example (by R. Hyndman) in the following code:
library (tsintermittent)
library (forecast)
x=c(0,1,0,11,0,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0) # from Hyndman Excel example
x_crost = crost(x,h=5, w=0.1, init = c(1,1) ) # from the tsintermittent package
x_croston=croston(x,h=5, alpha = 0.1) # from the forecast package
x_croston$fitted
y=data.frame(x,x_crost$frc.in,x_croston$fitted)
y
plot(x_croston)
lines(x_croston$fitted, col="blue")
lines(x_crost$frc.in,col="red")
x_crost$initial
x_crost$frc.out # forecast
x_croston$mean # forecast
The forecast from the Excel example is 1.36, crost gives 1.58 and croston gives 1.15. Why are they not the same? Also note that the in-sample (fitted) values are very different.
For crost in the tsintermittent package you need a second flag to not optimise the initial values: init.opt=FALSE, so the command should be:
crost(x,w=0.1,init=c(2,2),init.opt=FALSE)
Setting only init=c(2,2) will only set the initial values for the optimiser to work from.
Also note that the time series that Rob Hyndman has in his example has two additional values in the beggining (see column B), so x should be:
x=c(0,2,0,1,0,11,0,0,0,0,2,0,6,3,0,0,0,0,0,7,0,0,0,0)
Running these two commands produces the same values as in the excel example.