Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I would like to estimate the parameters of a nonlinear regression model with LAD regression. In essence the LAD estimator is an M-estimator. As far as I know it is not possible to use the robustbase package to do this. How could I use R to do LAD regression? Could I use a standard package?
You could do this with the built-in optim() function
Make up some data (make sure x is positive, so that a*x^b makes sense - raising negative numbers to fractional powers is problematic):
set.seed(101)
a <- 1; b <- 2
dd <- data.frame(x=rnorm(1000,mean=7))
dd$y <- a*dd$x^b + rnorm(1000,mean=0,sd=0.1)
Define objective function:
objfun <- function(p) {
pred <- p[1]*dd$x^p[2] ## a*x^b
sum(abs(pred-dd$y)) ## least-absolute-deviation criterion
}
Test objective function:
objfun(c(0,0))
objfun(c(-1,-1))
objfun(c(1,2))
Optimize:
o1 <- optim(fn=objfun, par=c(0,0), hessian=TRUE)
You do need to specify starting values, and deal with any numerical/computational issues yourself ...
I'm not sure I know how to compute standard errors: you can use sqrt(diag(solve(o1$hessian))), but I don't know if the standard theory on which this is based still applies ...
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
What are some different packages in R that contain in built function to simulate the Zero inflated distributions, related to the popular discrete models like the Poisson, Negative Binomial, COM-Poisson, Poisson Inverse Gaussian, Poisson-Lindley except the 'iZid' package.
Have a look at the CRAN Task View on Distributions. This is a curated look at R packages that help you work with distributions. You can search the page for "inflated" to quickly find the relevant parts.
If you have an existing function that generates random deviates from a non-zero-inflated distribution, you can write a wrapper (or decorator) that creates a zero-inflated-deviate simulator. The only assumption I've made here is that the first argument of the original function is called n and specifies the number of random deviates to pick.
For example, if we want to extend rbinom to return zero-inflated binomial deviates ...
ziversion <- function(rfun) {
f <- function(n, ..., zi) {
x <- rfun(n, ...)
x <- ifelse(runif(n) < zi, 0, x)
return(x)
}
return(f)
}
rzibinom <- ziversion(rbinom)
set.seed(101)
rzibinom(10, size = 10, prob = 0.2, zi = 0.5)
## [1] 1 0 3 2 0 1 2 0 0 0
zi is the zero-inflation probability. With a little bit of effort the code could be made more efficient ...
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
It seems that the 'SwarmSVM' package used to have a kmeans.predict function, but no longer does.
I would like to divide a dataframe to training+testing subsets to train a model and then test it. I am currently only able to use the 'kmeans' function to create clusters, but I can't figure out which functions/packages to use to train and test a model.
k-means is a clustering method, i.e. for unsupervised learning, not supervised, and as such isn't designed to predict on future data, as adding more data would change the centers. Supervised alternatives that can do classification include k-NN, LDA/QDA, and SVMs, but such an approach would require a training set with known classes.
All that said, you could write a predict method for stats::kmeans using dist, as you're presumably really looking for the closest center to the point. Hardly optimized, but functional:
predict.kmeans <- function(object, newdata){
centers <- object$centers
n_centers <- nrow(centers)
dist_mat <- as.matrix(dist(rbind(centers, newdata)))
dist_mat <- dist_mat[-seq(n_centers), seq(n_centers)]
max.col(-dist_mat)
}
set.seed(47)
in_train <- sample(nrow(iris), 100)
mod_kmeans <- kmeans(iris[in_train, -5], 3)
test_preds <- predict(mod_kmeans, iris[-in_train, -5])
table(test_preds, iris$Species[-in_train])
#>
#> test_preds setosa versicolor virginica
#> 1 0 0 10
#> 2 0 18 7
#> 3 15 0 0
install.packages("class")
library(class)
use the knn function
for further help see use
?knn
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I was wondering if there is a way to calculate Nagelkerke R-square based upon the output produced. I know that I can calculate McFadden R-square directly. But Nagelkerke produces what we feel is a more accurate strength of the model.
I am not having luck with adding on packages to my setup, if that is the line of thought that you have.
Thanks.
This question is underdefined so I'll do my best assuming that "the output produced" is a glm object. This function should produce the appropriate pseudo-R square you want when applied to a glm object.
Nagelkerke <- function(mod) {
l_full <- exp(logLik(mod))
l_intercept <- exp(logLik( update(mod, . ~ 1) ))
N <- length(mod$y)
r_2 <- (1 - (l_intercept / l_full)^(2/N)) / (1 - l_intercept^(2/N))
return( as.numeric(r_2) )
}
Example:
model <- glm(formula = vs ~ mpg + disp, family = binomial("logit"), data = mtcars);
Nagelkerke(model);
#[1] 0.6574295
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I usually do decissions trees in SPSS to get targets from a DDBB, I did a bit of research and found that there are three packages: tree, party and rpart that are available for R, but which is better for that task?
Thanks!
I have used rpart before, which is handy. I have used for predictive modeling by splitting training and test set. Here is the code. Hope this will give you some idea...
library(rpart)
library(rattle)
library(rpart.plot)
### Build the training/validate/test...
data(iris)
nobs <- nrow(iris)
train <- sample(nrow(iris), 0.7*nobs)
test <- setdiff(seq_len(nrow(iris)), train)
colnames(iris)
### The following variable selections have been noted.
input <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
numeric <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
categoric <- NULL
target <-"Species"
risk <- NULL
ident <- NULL
ignore <- NULL
weights <- NULL
#set.seed(500)
# Build the Decision Tree model.
rpart <- rpart(Species~.,
data=iris[train, ],
method="class",
parms=list(split="information"),
control=rpart.control(minsplit=12,
usesurrogate=0,
maxsurrogate=0))
# Generate a textual view of the Decision Tree model.
print(rpart)
printcp(rpart)
# Decision Tree Plot...
prp(rpart)
dev.new()
fancyRpartPlot(rpart, main="Decision Tree Graph")
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I need to make a simple numeric linear interpolation in Delphi, was thinking of implementing a function, but then thought better and I think that should already be some library. I found nothing on google.
My problem is simple, I have a dataset with X and Y data, and other new dataset X data (Xb) that will be the basis for finding new Y data (Yin) interpolated.
In R, for example, have a function approx that accomplishes this easily. This function also allows Xb length is of any size.
Yin <- approx (X, Y, Xb, method = "linear")$y
There is some statistical library to do this in Delphi? Or continue to write my function (based on approx)?
Linear interpolation of 1D-data is very simple:
Find such index in X-array, that X[i] <= Xb < X[i+1]
(binary search for case of random access, linear search for case of step-by step Xb changing)
Calculate
Yb = Y[i] + (Y[i+1] - Y[i]) * (Xb - X[i]) / (X[i+1] - X[i])