getting error in mclust-package while working on univariate fit - r

While working on a univariate fit using Mclust I am getting following error:
Error in mstepE(data = as.matrix(data)[initialization$subset, ], z = z, :
row dimension of z should equal data length
I am using the code mentioned in:
https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html#initialisation
This is the code section where I am getting error:
df1 <- dataSample
BIC <- NULL
for(j in 1:20){
rBIC <- mclustBIC(df1, verbose = T,
initialization = list(hcPairs = randomPairs(df1)))
BIC <- mclustBICupdate(BIC, rBIC)
}
summary(BIC)
Following link contains data to be passed to variable 'df1' (file name:dataSample.csv)
https://drive.google.com/open?id=0Bzau9RsRnQreYk9XOWVBSm91b2o4NTQ4RlA2UFdWbDBVOVpR

This is the solution I get from one of the Authors (Prof. Luca Scrucca) for 'mclust' library:
"there was a bug due to the use of automatic subset that clash when hcPairs are provided. I have fixed it in the current dev version of mclust.
Since submission to CRAN won't happen shortly, you may use the following code to avoid the error with the current release of mclust:
rBIC <- mclustBIC(df1, verbose = T,
initialization = list(hcPairs = randomPairs(df1),
subset = 1:NROW(df1)))
When the bug fix will be released, the subset argument could be omitted as it is redundant."
Now, the code is working fine.

Related

What does "invalid type (closure) for variable 'variable1'" mean and how do I fix it?

I am trying to write a function in R, which contains a function from another package. The code works perfectly outside a function.
I am guessing, it might have got to do something with the package I am using (survey).
A self-contained code example:
#activating the package
library(survey)
#getting the dataset into R
tm <- read.spss("tm.sav", to.data.frame = T, max.value.labels = 5)
# creating svydesign object (it basically contains the weights to adjust the variables (~persgew: also a column variable contained in the tm-dataset))
tm_w <- svydesign(ids=~0, weights = ~persgew, data = tm)
#getting overview of the welle-variable
#this variable is part of the tm-dataset. it is needed to execute the following steps
table(tm$welle)
# data manipulation as in: taking the v12d_gr-variable as well as the welle-variable and the svydesign-object to create a longitudinal variable which is transformed into a data frame that can be passed to ggplot
t <- svytable(~v12d_gr+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
v12d <- tt[2,]
v12d <- as.data.frame(v12d)
this is the code outside the function, working perfectly. since I have to transform quite a few variables in the exact same way, I aim to create a function to save up some time.
The following function is supposed to take a variable that will be transformed as an argument (v12sd2_gr).
#making sure the survey-object is loaded
tm_w <- svydesign(ids=~0, weights = ~persgew, data = data)
#trying to write a function containing the code from above
ltd_zsw <- function(variable1){
t <- svytable(~variable1+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
var_ltd_zsw <- tt[2,]
var_ltd_zsw <- as.data.frame(var_ltd_zsw)
return(var_ltd_zsw)
}
Calling the function:
#as v12d has been altered already, I am trying to transform another variable v12sd2_gr
v12sd2 <- ltd_zsw(v12sd2_gr)
Console output:
Error in model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design)) :
invalid type (closure) for variable 'variable1'
Called from: model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design))
How do I fix it? And what does it mean to dynamically build a formula and reformulating?
PS: I hope it is the appropriate way to answer to the feedback in the comments.
Update: I think I was able to trace the problem back to the argument I am passing (variable1) and I am guessing it has got something to do with the fact, that I try to call a formula within the function. But when I try to call the svytable with as.formula(svytable(~variable1+welle, tm_w))it still doesn't work.
What to do?
I have found a solution to the problem.
Here is the tested and working function:
ltd_test <- function (var, x, string1="con", string2="pro") {
print (table (var))
x$w12d_gr <- ifelse(as.numeric(var)>2,1,0)
x$w12d_gr <- factor(x$w12d_gr, levels = c(0,1), labels = c(string1,string2))
print (table (x$w12d_gr))
x_w <- svydesign(ids=~0, weights = ~persgew, data = x)
t <- svytable(~w12d_gr+welle, x_w)
tt <- round(prop.table(t,2)*100, digits=0)
w12d <- tt[2,]
w12d <- as.data.frame(w12d)
}
The problem appeared to be caused by the svydesgin()-fun. In its output it produces an object which is then used by the formula for svytable()-fun. Thats why it is imperative to first create the x_w-object with svydesgin() and then use the svytable()-fun to create the t-object.
Within the code snippet I posted originally in the question the tm_w-object has been created and stored globally.
Thanks for the help to everyone. I hope this is gonna be of use to someone one day!

Error in eval(parse()) - r unable to find argument input

I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.

Error in impute() in R

I'm learning Random Forest. For learning purpose I'm using following link random Forest. I'm trying to run the code given in this link using my R-3.4.1.
But while running the following code for missing value treatment
mp2 <- impute(data = test,target = "target",classes =
list(integer=imputeMedian(), factor=imputeMode()))
I'm getting error message Error in impute(data = test, target = "target", classes = list(integer = imputeMedian(), :
unused argument (data = test)
I modified the code & try running this
imp2 <- impute(test,target = "target",classes = list(integer=imputeMedian(), factor=imputeMode()))
Still I'm getting the error but the error message is different. Can you please help me to solve this issue?
The key mistake (among many mistakes) in that code was that there is no data parameter. The parameter name is obj. When I change that the example code runs.
You also need to set on= or setkey given that the object is a data.table, or simply change it to a data.frame for the imputation step:
imp1 <- impute(obj = as.data.frame(train),target = "target",classes = list(integer=imputeMedian(), factor=imputeMode()))

H2O: Deep learning object not found in function 'predict' for argument 'model'

I'm just testing out h2o, in particular its deep learning capabilities, since I've heard great things about it. So far I've been using the following code:
library(h2o)
library(caret)
data("iris")
# Initiate H2O --------------------
h2o.removeAll() # Clean up. Just in case H2O was already running
h2o.init(nthreads = -1, max_mem_size="22G") # Start an H2O cluster with all threads available
# Get training and tournament data -------------------
a <- createDataPartition(iris$Species, list=FALSE)
training <- iris[a,]
test <- iris[-a,]
# Convert target to factor -------------------
target <- as.factor(iris$Species)
feature_names <- names(train)[1:(ncol(train)-1)]
train_h2o <- as.h2o(train)
test_h2o <- as.h2o(test)
prob <- test[, "id", drop = FALSE]
model_dl <- h2o.deeplearning(x = feature_names, y = "target", training_frame = train_h2o, stopping_metric = "logloss")
h2o.logloss(model_dl)
pred_dl <- predict(model_dl, newdata = tourn_h2o)
prob <- cbind(prob, as.data.frame(pred_dl$p1, col.names = "dl"))
write.table(prob[, c("id", "dl")], paste0(model_dl#model_id, ".csv"), sep = ",", row.names = FALSE, col.names = c("id", "probability"))
The relevant part is really that last line, where I got the following error:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Object 'DeepLearning_model_R_1494350691427_70' not found in function: predict for argument: model
Has anyone come across this before? Are there any easy solutions to this that I might be missing? Thanks in advance.
EDIT: With the updated code I get the error:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, :
ERROR MESSAGE:
Illegal argument(s) for DeepLearning model: DeepLearning_model_R_1494428751150_1. Details: ERRR on field: _train: Training data must have at least 2 features (incl. response).
ERRR on field: _stopping_metric: Stopping metric cannot be logloss for regression.
I assume this has to do with the the way the Iris dataset is being read in.
Answer To First Question: Your original error message sounds like one you can get when things get of sync. E.g. maybe you had two sessions running at once, and removed the model in one session; the other session wouldn't know its variables are now out of date. H2O allows multiple connections, but they have to be co-operative. (Flow - see next paragraph - counts as a second session.)
Unless you can make a reproducible example, shrug and put it down to gremlins, and start a new session. Or, go and look at the data/models in Flow (a web server always running on 127.0.0.1:54321 ), and see if something is no longer there.
For your EDIT question, your model is making a regression model, but you are trying to use logloss, so thought you were doing a classification. This is caused by not having set the target variable to be a factor. Your current as.factor() line is on the wrong data, in the wrong place. It should go after your as.h2o() lines:
train_h2o <- as.h2o(training) #Typo fix
test_h2o <- as.h2o(test)
feature_names <- names(training)[1:(ncol(training)-1)] #typo fix
y = "Species" #The column we want to predict
train_h2o[,y] <- as.factor(train_h2o[,y])
test_h2o[,y] <- as.factor(test_h2o[,y])
And then make the model with:
model_dl <- h2o.deeplearning(x = feature_names, y = y, training_frame = train_h2o, stopping_metric = "logloss")
Get predictions with:
pred_dl <- predict(model_dl, newdata = test_h2o) #Typo fix
And compare with correct answer with the prediction using:
cbind(test[, y], as.data.frame(pred_dl$predict))
(BTW, H2O always detects the Iris data set columns as numeric vs. factor perfectly, so the above as.factor() lines are not needed; your error message must've been on your original data.)
StackOverflow advice: test your reproducible example, in full, and copy and paste in that exact code, with the exact error message that code is giving you. Your code had numerous small typos. E.g. train in places, training in others. createDataPartition() was not given; I assumed a = sample(nrow(iris), 0.8*nrow(iris)). test has no "id" column.
Other H2O advice:
Run h2o.removeAll() after h2o.init(). It was giving you an error message if run before. (Personally I avoid that function - it is the kind of thing that gets left in a production script by mistake...)
Consider importing your data into h2o earlier, and using h2o.splitFrame() to split it. I.e. avoid doing things in R that H2O can easily handle.
Avoid having your data in R, at all, if you can. Prefer importFile() over as.h2o().
The thinking beyond both the last points is that H2O will scale beyond the memory of one machine, while R won't. It also is less confusing than trying to keep track of the same thing in two places.
I had the same issue but could resolve it quite easily.
My error occured because I read in an h2o-object before initialising the h2o-cluster. So I trained an h2o-model, saved it, shut down the cluster, loaded in the model and then initialized the cluster once again.
Before reading in the h2o-object, you should already initialize the cluster (h2o.init()).

predict in caret ConfusionMatrix is removing rows

I'm fairly new to using the caret library and it's causing me some problems. Any
help/advice would be appreciated. My situations are as follows:
I'm trying to run a general linear model on some data and, when I run it
through the confusionMatrix, I get 'the data and reference factors must have
the same number of levels'. I know what this error means (I've run into it before), but I've double and triple checked my data manipulation and it all looks correct (I'm using the right variables in the right places), so I'm not sure why the two values in the confusionMatrix are disagreeing. I've run almost the exact same code for a different variable and it works fine.
I went through every variable and everything was balanced until I got to the
confusionMatrix predict. I discovered this by doing the following:
a <- table(testing2$hold1yes0no)
a[1]+a[2]
1543
b <- table(predict(modelFit,trainTR2))
dim(b)
[1] 1538
Those two values shouldn't disagree. Where are the missing 5 rows?
My code is below:
set.seed(2382)
inTrain2 <- createDataPartition(y=HOLD$hold1yes0no, p = 0.6, list = FALSE)
training2 <- HOLD[inTrain2,]
testing2 <- HOLD[-inTrain2,]
preProc2 <- preProcess(training2[-c(1,2,3,4,5,6,7,8,9)], method="BoxCox")
trainPC2 <- predict(preProc2, training2[-c(1,2,3,4,5,6,7,8,9)])
trainTR2 <- predict(preProc2, testing2[-c(1,2,3,4,5,6,7,8,9)])
modelFit <- train(training2$hold1yes0no ~ ., method ="glm", data = trainPC2)
confusionMatrix(testing2$hold1yes0no, predict(modelFit,trainTR2))
I'm not sure as I don't know your data structure, but I wonder if this is due to the way you set up your modelFit, using the formula method. In this case, you are specifying y = training2$hold1yes0no and x = everything else. Perhaps you should try:
modelFit <- train(trainPC2, training2$hold1yes0no, method="glm")
Which specifies y = training2$hold1yes0no and x = trainPC2.

Resources