R gmm package use for prediction - r

I wish to try the R gmm algorithm to predict.
Question #1: is it possible to use gmm to predict? (the word "predict" does not appear in the manual)
Question #2: if it is possible to do prediction with gmm, how does one do it?
I am looking for the simplest example; for example using svm would be done with:
model <- svm(train, trainLabels)
testpred <- predict(model, test)
Question #3:
I cannot even reproduce the examples mentioned in the manual.
Page 24 shows the code:
## CAPM test with GMM
data(Finance)
r <- Finance[1:300, 1:10]
rm <- Finance[1:300, "rm"]
rf <- Finance[1:300, "rf"]
z <- as.matrix(r-rf)
t <- nrow(z)
zm <- rm-rf
h <- matrix(zm, t, 1)
res <- gmm(z ~ zm, x = h)
summary(res)
but even after installing the package gmm, R does not know the data Finance.
> data(Finance)
Warning message:
In data(Finance) : data set ‘Finance’ not found
What am I missing?

you need the following commands at the top
library(gmm)
data(Finance)

Related

Why does R and PROCESS render different result of a mediation model (one is significant, the other one is not)?

As a newcomer who just gets started in R, I am confused about the result of the mediation analysis.
My model is simple: IV 'T1Incivi', Mediator 'T1Envied', DV 'T2PSRB'. I ran the same model in SPSS using PROCESS, but the result was insignificant in PROCESS; however, the indirect effect is significant in R. Since I am not that familiar with R, could you please help me to see if there is anything wrong with my code? And tell me why the result is significant in R but not in SPSS?Thanks a bunch!!!
My code in R:
X predict M
apath <- lm(T1Envied~T1Incivi, data=dat)
summary(apath)
X and M predict Y
bpath <- lm(T2PSRB~T1Envied+T1Incivi, data=dat)
summary(bpath)
Bootstrapping for indirect effect
getindirect <- function(dataset,random){
d=dataset[random,]
apath <- lm(T1Envied~T1Incivi, data=d)
bpath <- lm(T2PSRB~T1Envied+T1Incivi, data=dat)
indirect <- apath$coefficients["T1Incivi"]*bpath$coefficients["T1Envied"]
return(indirect)
}
library(boot)
set.seed(6452234)
Ind1 <- boot(data=dat,
statistic=getindirect,
R=5000)
boot.ci(Ind1,
conf = .95,
type = "norm")`*PSRB as outcome*
In your function getindirect all linear regressions should be based on the freshly shuffled data in d.
However there is the line
bpath <- lm(T2PSRB~T1Envied+T1Incivi, data=dat)
that makes the wrong reference to the variable dat which should really not be used within this function. That alone can explain incoherent results.

Cross Validation without caret in R

Since the model (package fastNaiveBayes) that I am using is not in the built-in library of the caret package, I am trying to make a k-fold cross validation in R without using the caret package. Does anyone have a solution to this?
Edit:
Here is my code so far from what I learned on how to do cv without caret. I am very certain something is wrong here.
library(fastNaiveBayes)
k<- 10
outs <- NULL
proportion <- 0.8
for (i in 1:10)
{
split <- sample(1:nrow(data), round(proportion*nrow(data)))
traindata <- data[split,]
testdata <- data[-split,]
y <- traindata$Label
x <- traindata[,0 - 15:ncol(traindata)]
model <- fnb.train(x, y=y, priors = NULL, laplace=0,
distribution = fnb.detect_distribution(x, nrows = nrow(x)))
model
test1 <- testdata[,0 - 15:ncol(testdata)]
pred <- predict(model, newdata = test1)
cm<- table(testdata$Label, pred)
print(confusionMatrix(cm))
}
It gave me 10 different results and I think that's not how it cross validation is supposed to work. I'm an entry-level R learner and I appreciate so much to receive enlightenment from this

Using randomForest package in R, how to map Random forest prediction?

enter image description hereI am trying to use randomforest to generate a spatial prediction map.
I developed my model by using random forest regression, but I met a little difficulty in the last step to use the best predictors for building the predictive map. I want to create a map prediction map.
My code:
library(raster)
library(randomForest)
set.seed(12)
s <- stack("Density.tif", "Aqui.tif", "Rech.tif", "Rainfall.tif","Land Use.tif", "Cond.tif", "Nitrogen.tif", "Regions.tif","Soil.tif","Topo.tif", "Climatclass.tif", "Depth.tif")
points <- read.table("Coordonnées3.txt",header=TRUE, sep="\t", dec=",",strip.white=TRUE)
d <- extract(s, points)
rf <-randomForest(nitrate~ . , data=d, importance=TRUE, ntree=500, na.action = na.roughfix)
p <- predict(s, rf)
plot(p)
Sample Data:
> head(points)
LAT LONG
1 -13.057007 27.549580
2 -4.255000 15.233745
3 5.300000 -1.983610
4 7.245675 -4.233336
5 12.096330 15.036016
6 -4.255000 15.233745
The error when I run my short code is:
Error in eval(expr, envir, enclos) : object 'nitrate' not found.
I am guessing the error happens when you fit the model.
Why would there be a variable called nitrate. Given how you create your RasterStack, perhaps there is one called Nitrogen. Either way you can find out by looking at names(s) and colnames(d).
NOTE that your points are not good! They are in reverse order. The order should be (longitude, latitude).
Based on your comments (please edit your question instead), you should
add nitrate the points file (the third column) or something like that. Then do
xy <- points[, 2:1]
nitrate <- points[,3]
Extract points and combine with your observed data
d <- extract(s, xy)
d <- cbind(nitrate=nitrate, d)
Build model and predict
rf <-randomForest(nitrate~ . , data=d, importance=TRUE, ntree=500, na.action = na.roughfix)
p <- predict(s, rf)
It sounds like the error is coming when you are trying to build the forest. It may be most helpful to not use the formula interface. Also, if d is large, then using the formula interface is not advisable. From the help file on randomForest: "For large data sets, especially those with large number of variables, calling randomForest via the formula interface is not advised: There may be too much overhead in handling the formula."
Assuming d$nitrate exists then the solution is randomForest(y = d$nitrate, x = subset(d, select = -nitrate), importance=TRUE, ntree=500, na.action = na.roughfix)

R nls2 "invalid model formula" fitting gamma

Working in R 3.1.3 and Rstudio.
I want to fit gamma distributions that include a location parameter to data in order to 'shift' the x values to a new origin.
I am trying to use nls2 with the following code:
library(nls2)
theVals <- data.frame(c(26.76,24.3,34.63,38.05,25.56,21.98,20.62,34,26.75,27.79,28.4,33.31,29.26,18.65,22.77,25.72,25.86,25.32,24.08,27.68,26.2,26.16,25.34,26.91,22.6,23.94,23.3,22.34,41.25,24.83,21.66,30.47,26.53,27.74,29.41,25.65,36.05,18.29,27.2,22.99,25.8,21.9,25.27,30.29,22.72,26.49,18.75,33.57,20.87,21.82,20.73,28.59,19.64,33.21,28.94,27.98,22.2,25.95,30.64,26.56,32.11,26.05,20.66,28.64,22.4,22.4,31.91,21.82,26.82,20.77,24.12,28.83,23.07,26.5,21.14,27.29,19.61,25.28,28.6,27.16,22.46,18.19,22.35,23.79,26.32,26.5,27.39,23.29,25.79,26.35,26.38,24.98,20,37.15,25.61,21.39,21.63,24.12,24.4,27.72,42.74,25.33,17.79,21.33,38.65,25.22,28.39,21.61,23.38,25.25,24.88,23.34,26.26,21.96,22.18,24.78,21.15,24.65,21.23,31.9,28.66,27.66,18.08,22.99,22.46,21.69,28.21,29.8,25.72,27.09,20.02,21.26,21.34,27.18,25.48,20.51,20.96,20.07,20.89,27.56,24.43,21.35,24.3,28.1,26.53,29.03,30.08,19.19,21.27,26.18,23.79,36.52,24.81,26.36,24.44,20.99,19.84,23.32,18.21,26.6,21.48,23.21,29.93,23.4,30.9,23.58,21.58,18.38,25.13,23.03,22.73,24.42,22.89,43.44,23.47,27.09,29.96,23.94,28.51,25.74,28.54,30.41,22.7,29.19,25.66,23.89,21.9,36.26,22.61,19.68,27.85,28.83,28.6,22.68,19.07,20.22,24.35,19.09,37.66,22.55,24.25,22.61,26.09,24.42,26.11,32.15,25.78,21.94,23.93,30.19,23.53,26.49,30.48,25.02,28.14,23.43,20.22,17.57,21.68,36.07,24.92,32.48,32.04,25.86,26.69,22.41,26.4,22.72,28.32,22.82,32.73,28.08,29.16,36.18,21.61,23.9,28.8,23.24,24.89,22.17,27.7,34.75,26.74,29.62,17.46,20.06,22.23,22.09,24.05,22.37,24.98,33.26,30.95,26.24,22.16,30.97,27.22,23.81,42.16,28.2,28.37,26.1,26.28,27.44,20.52,35.02,21.43,23.14,18.37,28.86,25.18,28.15,19.97,24.2,25.91,28.92,23.95,19.48,28.57,21.77,23.46,37.51,22.13,37.18,21.83,23.8,18.93,27.43,26.51,25.64,22.15,22.27,29.21,24.45,18.81,22.62,25.16,24.62,30.53,28.77,27.11,22.07,28.95,26.54,39.23,31.9,33,29.93,24.37,26.4,21.33,25.37,25.9,21.25,19.06,25.69,26.44,26.09,23.24,27.04,20.09,28.73,37.06,32.45,22.93,22.7,24.82,31.23,23.25,22.94,20.47,25.7,23.92,34.71,26.5,20.28,21.78,26.54,30.34,21.97,27.38,27.64,34.08,22.05,27.21,20.11,25.79,33.22,31.24,29.93,21.81,30.68,32.46,30.45,22.62,28.83,33.95,27.12,45.51,25.23,29.61,29.09))
colnames(theVals) <- c("theGamma")
fo <- theGamma ~ dgamma(theX-location, shape=theShape, scale=theScale )
startList <- list(location=5, theShape=3, theScale=3)
theGamma=NULL
theX <- 0:50
mo1 <- nls2(fo, start=startList, data=theVals)
I get an error "invalid model formula in ExtractVars".
Curiosly dgamma works fine:
location<- 5
theShape <- 3
theScale <- 3
dgamma(theX-location, shape=theShape, scale=theScale )
I have search stackoverflow and other sites, but can't find an answer to this one.
Any ideas?

Naive bayes in R

I am getting an error while running naive bayes classifier in R. I am using the following code-
mod1 <- naiveBayes(factor(X20) ~ factor(X1) + factor(X2) +factor(X3) +factor(X4)+factor(X5)+factor(X6)+factor(X7)
+factor(X8)+factor(X9)
+factor(X10)+factor(X11)+ factor(X12)+factor(X13)+factor(X14)
+factor(X15)
+factor(X16)+factor(X17)
+factor(X18)+factor(X19),data=intent.test)
res1 <- predict(mod1)$posterior
First part of this code runs fine. But when it try to predict the posterior probability it throws following error-
**Error in as.data.frame(newdata) :
argument "newdata" is missing, with no default**
I tried running something like
res1 <- predict(mod1,new_data=intent.test)$posterior
but this also gives the same error.
You seem to be using the e1071::naiveBayes algorithm, which expects a newdata argument for prediction, hence the two errors raised when running your code. (You can check the source code of the predict.naiveBayes function on CRAN; the second line in the code is expecting a newdata, as newdata <- as.data.frame(newdata).) Also as pointed out by #Vincent, you're better off converting your variables to factor before calling the NB algorithm, although this has certainly nothing to do with the above errors.
Using NaiveBayes from the klar package, no such problem would happen. E.g.,
data(spam, package="ElemStatLearn")
library(klaR)
# set up a training sample
train.ind <- sample(1:nrow(spam), ceiling(nrow(spam)*2/3), replace=FALSE)
# apply NB classifier
nb.res <- NaiveBayes(spam ~ ., data=spam[train.ind,])
# predict on holdout units
nb.pred <- predict(nb.res, spam[-train.ind,])
# but this also works on the training sample, i.e. without using a `newdata`
head(predict(nb.res))

Resources