value of model unknown error on train function from AMORE package - r

I'm learning to use AMORE package for fitting data with an optimal neural network, so i'm following examples on his wiki page and im' trying to run the following code but I couldn't, train function triggered a message error:
modelLookup(method) : value of model unknown
require(AMORE)
## We create two artificial data sets. ''P'' is the input data set. ''target'' is the output.
P <- matrix(sample(seq(-1,1,length=500), 500, replace=FALSE), ncol=1)
target <- P^2 + rnorm(500, 0, 0.5)
## We create the neural network object
net.start <- newff(n.neurons=c(1,3,1),
learning.rate.global=1e-2,
momentum.global=0.5,
error.criterium="LMS",
Stao=NA, hidden.layer="tansig",
output.layer="purelin",
method="ADAPTgdwm")
## We train the network according to P and target.
result <- train(net.start, P, target, error.criterium="LMS", report=TRUE, show.step=100, n.shows=5 )
## Several graphs, mainly to remark that
## now the trained network is is an element of the resulting list.
y <- sim(result$net, P)
plot(P,y, col="blue", pch="+")
points(P,target, col="red", pch="x")
Any suggestion is appreciated!

Related

How to predict in kknn function? library(kknn)

I try to use kknn + loop to create a leave-out-one cross validation for a model, and compare that with train.kknn.
I have split the data into two parts: training (80% data), and test (20% data). In the training data, I exclude one point in the loop to manually create LOOCV.
I think something gets wrong in predict(knn.fit, data.test). I have tried to find how to predict in kknn through the kknn package instruction and online but all the examples are "summary(model)" and "table(validation...)" rather than the prediction on a separate test data. The code predict(model, dataset) works successfully in train.kknn function, so I thought I could use the similar arguments in kknn.
I am not sure if there is such a prediction function in kknn. If yes, what arguments should I give?
Look forward to your suggestion. Thank you.
library(kknn)
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
# train.data + validation.data is the 80% data I split.
}
pred.knn <- predict(knn.fit, data.test) # data.test is 20% data.
Here is the error message:
Error in switch(type, raw = object$fit, prob = object$prob,
stop("invalid type for prediction")) : EXPR must be a length 1
vector
Actually I try to compare train.kknn and kknn+loop to compare the results of the leave-out-one CV. I have two more questions:
1) in kknn: is it possible to use another set of data as test data to see the knn.fit prediction?
2) in train.kknn: I split the data and use 80% of the whole data and intend to use the rest 20% for prediction. Is it an correct common practice?
2) Or should I just use the original data (the whole data set) for train.kknn, and create a loop: data[-i,] for training, data[i,] for validation in kknn? So they will be the counterparts?
I find that if I use the training data in the train.kknn function and use prediction on test data set, the best k and kernel are selected and directly used in generating the predicted value based on the test dataset.
In contrast, if I use kknn function and build a loop of different k values, the model generates the corresponding prediction results based on
the test data set each time the k value is changed. Finally, in kknn + loop, the best k is selected based on the best actual prediction accuracy rate of test data. In short, the best k train.kknn selected may not work best on test data.
Thank you.
For objects returned by kknn, predict gives the predicted value or the predicted probabilities of R1 for the single row contained in validation.data:
predict(knn.fit)
predict(knn.fit, type="prob")
The predict command also works on objects returned by train.knn.
For example:
train.kknn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 10,
kernel = "rectangular", scale = TRUE)
class(train.kknn.fit)
# [1] "train.kknn" "kknn"
pred.train.kknn <- predict(train.kknn.fit, data.test)
table(pred.train.kknn, as.factor(data.test$R1))
The train.kknn command implements a leave-one-out method very close to the loop developed by #vcai01. See the following example:
set.seed(43210)
n <- 500
data.train <- data.frame(R1=rbinom(n,1,0.5), matrix(rnorm(n*10), ncol=10))
library(kknn)
pred.kknn <- array(0, nrow(data.train))
for (i in 1:nrow(data.train)) {
train.data <- data.train[-i,]
validation.data <- data.train[i,]
knn.fit <- kknn(as.factor(R1)~., train.data, validation.data, k = 40,
kernel = "rectangular", scale = TRUE)
pred.kknn[i] <- predict(knn.fit)
}
knn.fit <- train.kknn(as.factor(R1)~., data.train, ks = 40,
kernel = "rectangular", scale = TRUE)
pred.train.kknn <- predict(knn.fit, data.train)
table(pred.train.kknn, pred.kknn)
# pred.kknn
# pred.train.kknn 1 2
# 0 374 14
# 1 9 103

Parameter estimates using FME ODE model fitting in R

I have a system of ODE equations that I am trying to fit to generated data, synthetic or lab. The final product I am interested in is the parameter and it's estimated error. We use the R package FME with modCost and modFit. As an example, a system of ODEs may be defined as such:
eqs <- function (time, y, parms, ...) {
with(as.list(c(parms, y)), {
dP <- k2*PA - k1*A*P # concentration of nucleic acid
dA <- dP # concentration of free protein
dPA <- -dP
list(c(dA,dP,dPA))
}
}
with parameters k1 and k2 and variables A,P and PA. I import the data (not shown) and define the cost function used in modFit
cost <- function(p, data, ...) {
yy <- p[c("A","P","PA")]
pp <- p[c("k1", "k2")]
out <- ode(yy, time, eqs, pp)
modCost(out, data, ...)
}
I set some initial conditions with a parms vector and then do the fitting with
fit <- modFit(f = cost, p = parms, data = dat, weight = "std",
lower = rep(0, 8), upper = c(600,100,600,0.01,0.01), method = "Marq")
I then do a final ode to get the generated fits with best parameters, Bob's your uncle, and boom, estimated parameters. The input numbers don't matter, I hope my process outline is legible for those who use this package.
My issue and question centers around two things: I'm a scientist, a physicist, and the error of the estimated parameters is important to report. Can I generate the estimated error from MFE somehow or is there a separate package for that kind of return?
I don't get your point. You can just use:
summary(fit)
to see the Std. Error.

plot one of 500 trees in randomForest package

How can plot trees in output of randomForest function in same names packages in R? For example I use iris data and want to plot first tree in 500 output tress. my code is
model <-randomForest(Species~.,data=iris,ntree=500)
You can use the getTree() function in the randomForest package (official guide: https://cran.r-project.org/web/packages/randomForest/randomForest.pdf)
On the iris dataset:
require(randomForest)
data(iris)
## we have a look at the k-th tree in the forest
k <- 10
getTree(randomForest(iris[, -5], iris[, 5], ntree = 10), k, labelVar = TRUE)
You may use cforest to plot like below, I have hardcoded the value to 5, you may change as per your requirement.
ntree <- 5
library("party")
cf <- cforest(Species~., data=iris,controls=cforest_control(ntree=ntree))
for(i in 1:ntree){
pt <- prettytree(cf#ensemble[[i]], names(cf#data#get("input")))
nt <- new("Random Forest BinaryTree")
nt#tree <- pt
nt#data <- cf#data
nt#responses <- cf#responses
pdf(file=paste0("filex",i,".pdf"))
plot(nt, type="simple")
dev.off()
}
cforest is another implementation of random forest, It can't be said which is better but in general there are few differences that we can see. The difference is that cforest uses conditional inferences where we put more weight to the terminal nodes in comparison to randomForest package where the implementation provides equal weights to terminal nodes.
In general cofrest uses weighted mean and randomForest uses normal average. You may want to check this .

Using randomForest package in R, how to map Random forest prediction?

enter image description hereI am trying to use randomforest to generate a spatial prediction map.
I developed my model by using random forest regression, but I met a little difficulty in the last step to use the best predictors for building the predictive map. I want to create a map prediction map.
My code:
library(raster)
library(randomForest)
set.seed(12)
s <- stack("Density.tif", "Aqui.tif", "Rech.tif", "Rainfall.tif","Land Use.tif", "Cond.tif", "Nitrogen.tif", "Regions.tif","Soil.tif","Topo.tif", "Climatclass.tif", "Depth.tif")
points <- read.table("Coordonnées3.txt",header=TRUE, sep="\t", dec=",",strip.white=TRUE)
d <- extract(s, points)
rf <-randomForest(nitrate~ . , data=d, importance=TRUE, ntree=500, na.action = na.roughfix)
p <- predict(s, rf)
plot(p)
Sample Data:
> head(points)
LAT LONG
1 -13.057007 27.549580
2 -4.255000 15.233745
3 5.300000 -1.983610
4 7.245675 -4.233336
5 12.096330 15.036016
6 -4.255000 15.233745
The error when I run my short code is:
Error in eval(expr, envir, enclos) : object 'nitrate' not found.
I am guessing the error happens when you fit the model.
Why would there be a variable called nitrate. Given how you create your RasterStack, perhaps there is one called Nitrogen. Either way you can find out by looking at names(s) and colnames(d).
NOTE that your points are not good! They are in reverse order. The order should be (longitude, latitude).
Based on your comments (please edit your question instead), you should
add nitrate the points file (the third column) or something like that. Then do
xy <- points[, 2:1]
nitrate <- points[,3]
Extract points and combine with your observed data
d <- extract(s, xy)
d <- cbind(nitrate=nitrate, d)
Build model and predict
rf <-randomForest(nitrate~ . , data=d, importance=TRUE, ntree=500, na.action = na.roughfix)
p <- predict(s, rf)
It sounds like the error is coming when you are trying to build the forest. It may be most helpful to not use the formula interface. Also, if d is large, then using the formula interface is not advisable. From the help file on randomForest: "For large data sets, especially those with large number of variables, calling randomForest via the formula interface is not advised: There may be too much overhead in handling the formula."
Assuming d$nitrate exists then the solution is randomForest(y = d$nitrate, x = subset(d, select = -nitrate), importance=TRUE, ntree=500, na.action = na.roughfix)

Plot in SVM model (e1071 Package) using DocumentTermMatrix

i trying do create a plot for my model create using SVM in e1071 package.
my code to build the model, predict and build confusion matrix is
ptm <- proc.time()
svm.classifier = svm(x = train.set.list[[0.999]][["0_0.1"]],
y = train.factor.list[[0.999]][["0_0.1"]],
kernel ="linear")
pred = predict(svm.classifier, test.set.list[[0.999]][["0_0.1"]], decision.values = TRUE)
time[["svm"]] = proc.time() - ptm
confmatrix = confusionMatrix(pred,test.factor.list[[0.999]][["0_0.1"]])
confmatrix
train.set.list and test.set.list contains the test and train set for several conditions. train and set factor has the true label for each set. Train.set and test.set are both documenttermmatrix.
Then i tried to see a plot of my data, i tried with
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]])
but i got the message:
"Error in plot.svm(svm.classifier, train.set.list[[0.999]][["0_0.1"]]) :
missing formula."
what i'm doing wrong? confusion matrix seems good to me even not using formula parameter in svm function
Without given code to run, it's hard to say exactly what the problem is. My guess, given
?plot.svm
which says
formula formula selecting the visualized two dimensions. Only needed if more than two input variables are used.
is that your data has more than two predictors. You should specify in your plot function:
plot(svm.classifier, train.set.list[[0.999]][["0_0.1"]], predictor1 ~ predictor2)

Resources