How to do decision trees in R? [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I usually do decissions trees in SPSS to get targets from a DDBB, I did a bit of research and found that there are three packages: tree, party and rpart that are available for R, but which is better for that task?
Thanks!

I have used rpart before, which is handy. I have used for predictive modeling by splitting training and test set. Here is the code. Hope this will give you some idea...
library(rpart)
library(rattle)
library(rpart.plot)
### Build the training/validate/test...
data(iris)
nobs <- nrow(iris)
train <- sample(nrow(iris), 0.7*nobs)
test <- setdiff(seq_len(nrow(iris)), train)
colnames(iris)
### The following variable selections have been noted.
input <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
numeric <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
categoric <- NULL
target <-"Species"
risk <- NULL
ident <- NULL
ignore <- NULL
weights <- NULL
#set.seed(500)
# Build the Decision Tree model.
rpart <- rpart(Species~.,
data=iris[train, ],
method="class",
parms=list(split="information"),
control=rpart.control(minsplit=12,
usesurrogate=0,
maxsurrogate=0))
# Generate a textual view of the Decision Tree model.
print(rpart)
printcp(rpart)
# Decision Tree Plot...
prp(rpart)
dev.new()
fancyRpartPlot(rpart, main="Decision Tree Graph")

Related

Species Distribution modelling incorporating climate data in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am trying to use data from GBIF to get an idea of Vachellia species disturbution across Africa and overlay this with annual rainfall in R.
Any package advice, online resources or tutorials would be greatly appreciated?
Thanks in advance
Not easy to see what you have in mind, but you may try
Chloropleth: sf, rnaturalearth, ggplot (geom_sf), maps
Openstreetmap: ggmap (get_stamenmap(bbox = bbox, zoom = 5, maptype = "toner-lite"))
Spatial smoothing: mgcv (gam_gp = target ~ te(lat, long, m = list(c(3,.5)), d=2, bs = 'gp'), data = data_dt, cluster=cl, method = "REML")
Spatial regression
Other packages may be useful as well, on the top of my mind: leaflet, tmap, gganimate
see:
using spatial data in R
Geocomputaion with R
Species distribution modelling
R-spatial
Nice example of spatial smoothing with GAMs
Have fun

Nonlinear LAD regression in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I would like to estimate the parameters of a nonlinear regression model with LAD regression. In essence the LAD estimator is an M-estimator. As far as I know it is not possible to use the robustbase package to do this. How could I use R to do LAD regression? Could I use a standard package?
You could do this with the built-in optim() function
Make up some data (make sure x is positive, so that a*x^b makes sense - raising negative numbers to fractional powers is problematic):
set.seed(101)
a <- 1; b <- 2
dd <- data.frame(x=rnorm(1000,mean=7))
dd$y <- a*dd$x^b + rnorm(1000,mean=0,sd=0.1)
Define objective function:
objfun <- function(p) {
pred <- p[1]*dd$x^p[2] ## a*x^b
sum(abs(pred-dd$y)) ## least-absolute-deviation criterion
}
Test objective function:
objfun(c(0,0))
objfun(c(-1,-1))
objfun(c(1,2))
Optimize:
o1 <- optim(fn=objfun, par=c(0,0), hessian=TRUE)
You do need to specify starting values, and deal with any numerical/computational issues yourself ...
I'm not sure I know how to compute standard errors: you can use sqrt(diag(solve(o1$hessian))), but I don't know if the standard theory on which this is based still applies ...

What package to use in R for Kmeans prediction? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
It seems that the 'SwarmSVM' package used to have a kmeans.predict function, but no longer does.
I would like to divide a dataframe to training+testing subsets to train a model and then test it. I am currently only able to use the 'kmeans' function to create clusters, but I can't figure out which functions/packages to use to train and test a model.
k-means is a clustering method, i.e. for unsupervised learning, not supervised, and as such isn't designed to predict on future data, as adding more data would change the centers. Supervised alternatives that can do classification include k-NN, LDA/QDA, and SVMs, but such an approach would require a training set with known classes.
All that said, you could write a predict method for stats::kmeans using dist, as you're presumably really looking for the closest center to the point. Hardly optimized, but functional:
predict.kmeans <- function(object, newdata){
centers <- object$centers
n_centers <- nrow(centers)
dist_mat <- as.matrix(dist(rbind(centers, newdata)))
dist_mat <- dist_mat[-seq(n_centers), seq(n_centers)]
max.col(-dist_mat)
}
set.seed(47)
in_train <- sample(nrow(iris), 100)
mod_kmeans <- kmeans(iris[in_train, -5], 3)
test_preds <- predict(mod_kmeans, iris[-in_train, -5])
table(test_preds, iris$Species[-in_train])
#>
#> test_preds setosa versicolor virginica
#> 1 0 0 10
#> 2 0 18 7
#> 3 15 0 0
install.packages("class")
library(class)
use the knn function
for further help see use
?knn

R: Relationship between McFadden & Nagelkerke [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I was wondering if there is a way to calculate Nagelkerke R-square based upon the output produced. I know that I can calculate McFadden R-square directly. But Nagelkerke produces what we feel is a more accurate strength of the model.
I am not having luck with adding on packages to my setup, if that is the line of thought that you have.
Thanks.
This question is underdefined so I'll do my best assuming that "the output produced" is a glm object. This function should produce the appropriate pseudo-R square you want when applied to a glm object.
Nagelkerke <- function(mod) {
l_full <- exp(logLik(mod))
l_intercept <- exp(logLik( update(mod, . ~ 1) ))
N <- length(mod$y)
r_2 <- (1 - (l_intercept / l_full)^(2/N)) / (1 - l_intercept^(2/N))
return( as.numeric(r_2) )
}
Example:
model <- glm(formula = vs ~ mpg + disp, family = binomial("logit"), data = mtcars);
Nagelkerke(model);
#[1] 0.6574295

predicting the index number based on spiral data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Each co-ordinate(x,y) corresponds to an index. For example: (0,0) has index 0 AND (1,0) has index 1 .
X <-c(0,1,1,0,-1,-1,-1,0,1,2,2,2,2,1,0,-1,-2)
Y <- c(0,0,1,1,1,0,-1,-1,-1,-1,0,1,2,2,2,2,2)
Z<- as.factor(0:16)
df <- data.frame(X,Y,Z)
library(e1071)
model <- svm(Z ~ X+Y, xyz, kernel = "radial")
predict(model, xyz[1:4,1:2]) #1.647681 3.859354 5.908940 4.151374
# From the last code, it appears that it does not predict well for its own data
i need help in building a predictive model which can predict the index.
What is the index corresponding to (12,-22) ?

Resources