Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
What are some different packages in R that contain in built function to simulate the Zero inflated distributions, related to the popular discrete models like the Poisson, Negative Binomial, COM-Poisson, Poisson Inverse Gaussian, Poisson-Lindley except the 'iZid' package.
Have a look at the CRAN Task View on Distributions. This is a curated look at R packages that help you work with distributions. You can search the page for "inflated" to quickly find the relevant parts.
If you have an existing function that generates random deviates from a non-zero-inflated distribution, you can write a wrapper (or decorator) that creates a zero-inflated-deviate simulator. The only assumption I've made here is that the first argument of the original function is called n and specifies the number of random deviates to pick.
For example, if we want to extend rbinom to return zero-inflated binomial deviates ...
ziversion <- function(rfun) {
f <- function(n, ..., zi) {
x <- rfun(n, ...)
x <- ifelse(runif(n) < zi, 0, x)
return(x)
}
return(f)
}
rzibinom <- ziversion(rbinom)
set.seed(101)
rzibinom(10, size = 10, prob = 0.2, zi = 0.5)
## [1] 1 0 3 2 0 1 2 0 0 0
zi is the zero-inflation probability. With a little bit of effort the code could be made more efficient ...
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I would like to estimate the parameters of a nonlinear regression model with LAD regression. In essence the LAD estimator is an M-estimator. As far as I know it is not possible to use the robustbase package to do this. How could I use R to do LAD regression? Could I use a standard package?
You could do this with the built-in optim() function
Make up some data (make sure x is positive, so that a*x^b makes sense - raising negative numbers to fractional powers is problematic):
set.seed(101)
a <- 1; b <- 2
dd <- data.frame(x=rnorm(1000,mean=7))
dd$y <- a*dd$x^b + rnorm(1000,mean=0,sd=0.1)
Define objective function:
objfun <- function(p) {
pred <- p[1]*dd$x^p[2] ## a*x^b
sum(abs(pred-dd$y)) ## least-absolute-deviation criterion
}
Test objective function:
objfun(c(0,0))
objfun(c(-1,-1))
objfun(c(1,2))
Optimize:
o1 <- optim(fn=objfun, par=c(0,0), hessian=TRUE)
You do need to specify starting values, and deal with any numerical/computational issues yourself ...
I'm not sure I know how to compute standard errors: you can use sqrt(diag(solve(o1$hessian))), but I don't know if the standard theory on which this is based still applies ...
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
It seems that the 'SwarmSVM' package used to have a kmeans.predict function, but no longer does.
I would like to divide a dataframe to training+testing subsets to train a model and then test it. I am currently only able to use the 'kmeans' function to create clusters, but I can't figure out which functions/packages to use to train and test a model.
k-means is a clustering method, i.e. for unsupervised learning, not supervised, and as such isn't designed to predict on future data, as adding more data would change the centers. Supervised alternatives that can do classification include k-NN, LDA/QDA, and SVMs, but such an approach would require a training set with known classes.
All that said, you could write a predict method for stats::kmeans using dist, as you're presumably really looking for the closest center to the point. Hardly optimized, but functional:
predict.kmeans <- function(object, newdata){
centers <- object$centers
n_centers <- nrow(centers)
dist_mat <- as.matrix(dist(rbind(centers, newdata)))
dist_mat <- dist_mat[-seq(n_centers), seq(n_centers)]
max.col(-dist_mat)
}
set.seed(47)
in_train <- sample(nrow(iris), 100)
mod_kmeans <- kmeans(iris[in_train, -5], 3)
test_preds <- predict(mod_kmeans, iris[-in_train, -5])
table(test_preds, iris$Species[-in_train])
#>
#> test_preds setosa versicolor virginica
#> 1 0 0 10
#> 2 0 18 7
#> 3 15 0 0
install.packages("class")
library(class)
use the knn function
for further help see use
?knn
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I was wondering if there is a way to calculate Nagelkerke R-square based upon the output produced. I know that I can calculate McFadden R-square directly. But Nagelkerke produces what we feel is a more accurate strength of the model.
I am not having luck with adding on packages to my setup, if that is the line of thought that you have.
Thanks.
This question is underdefined so I'll do my best assuming that "the output produced" is a glm object. This function should produce the appropriate pseudo-R square you want when applied to a glm object.
Nagelkerke <- function(mod) {
l_full <- exp(logLik(mod))
l_intercept <- exp(logLik( update(mod, . ~ 1) ))
N <- length(mod$y)
r_2 <- (1 - (l_intercept / l_full)^(2/N)) / (1 - l_intercept^(2/N))
return( as.numeric(r_2) )
}
Example:
model <- glm(formula = vs ~ mpg + disp, family = binomial("logit"), data = mtcars);
Nagelkerke(model);
#[1] 0.6574295
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?
This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need to make a simple numeric linear interpolation in Delphi, was thinking of implementing a function, but then thought better and I think that should already be some library. I found nothing on google.
My problem is simple, I have a dataset with X and Y data, and other new dataset X data (Xb) that will be the basis for finding new Y data (Yin) interpolated.
In R, for example, have a function approx that accomplishes this easily. This function also allows Xb length is of any size.
Yin <- approx (X, Y, Xb, method = "linear")$y
There is some statistical library to do this in Delphi? Or continue to write my function (based on approx)?
Linear interpolation of 1D-data is very simple:
Find such index in X-array, that X[i] <= Xb < X[i+1]
(binary search for case of random access, linear search for case of step-by step Xb changing)
Calculate
Yb = Y[i] + (Y[i+1] - Y[i]) * (Xb - X[i]) / (X[i+1] - X[i])