Error in mgcv R package, depending of the R version

Error in mgcv R package, depending of the R version - r

The following program works perfectly with R\2.15.3 with the mgcv packages:
foo<-c(0.08901294, 0.04221170, 0.01608613, 0.04389676, 0.04102295, 0.03552413, 0.06571099, 0.11004966, 0.08380553, 0.09181121, 0.07422538,
0.11494897, 0.18523257, 0.13809043, 0.13569868, 0.13433534, 0.16056145, 0.15559133, 0.22381149, 0.13998797, 0.02831030)
infant.gamfit<-gam(foo~s(c(1:21)), family=gaussian(link = "logit"))
But with R\3.1.1 and 3.1.2, it produces the following error:
Error in reformulate(pav) : 'termlabels' must be a character vector
of length at least one
Which is an error I don't understand.
Of course the values in foo is an example among others, but I have the same problem with other values. Fixing k in the spline doesn't change anything.
That wouldn't be a problem if I wouldn't need to use it on a large scale with a supercomputer where all the versions of R create the same error...
(for the sake of the discussion, the R versions I tested on the supercomputer were:
R/2.15.3-foss-2014a-default;
R/2.15.3-foss-2014a-st;
R/2.15.3-intel-2014a-default;
R/3.0.2-foss-2014a-default)
So that's not a supercomputer problem, but more a problem related to the use of mgcv in different version of R.
I didn't find any answer on the internet.
Thank you in advance for your help.
Guillaume

It looks like recent versions of mgcv::gam can be a bit fragile when your predictor is an expression, as opposed to a named variable. This works:
x <- 1:21
gam(foo~s(x), family=gaussian(link = "logit"))
As does this:
x <- 1:21
gam(foo~s(x + 0), ...)
But this doesn't:
x <- rep(0, 21)
gam(foo~s(x + 1:21), ...)
In general, I'd suggest you should precompute your predictors when using gam.
PS. Gaussian family with logit link isn't very sensible, but that's another issue.

Related

Error in eval(parse()) - r unable to find argument input

I am very new to R, and this is my first time of encountering the eval() function. So I am trying to use the med and boot.med function from the following package: mma. I am using it to conduct mediation analysis. med and boot.med take in models such as linear models, and dataframes that specify mediators and predictors and then estimate the mediation effect of each mediator.
The author of the package gives the flexible option of specifying one's own custom.function. From the source code of med, it can be seen that the custom.function is passed to the eval(). So I tried insert the gbmt function as the custom function. However, R kept giving me error message: Error during wrapup: Number of trees to be used in prediction must be provided. I have been searching online for days and tried many ways of specifying the number of trees parameter n.trees, but nothing works (I believe others have raised similar issues: post 1, post 2).
The following codes are part of the source code of the med function:
cf1 = gsub("responseY", "y[,j]", custom.function[j])
cf1 = gsub("dataset123", "x2", cf1)
cf1 = gsub("weights123", "w", cf1)
full.model[[j]] <- eval(parse(text = cf1))
One custom function example the author gives in the package documentation is as follows:
temp1<-med(data=data.bin,n=2,custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
Here the glm is the custom function. This example code works and you can replicate it easily (if you have mma installed and loaded). However when I am trying to use the gbmt function on a survival object, I got errors and here is what my code looks like:
temp1 <- med(data = data.surv,n=2,type = "link",
custom.function = 'gbmt(responseY ~.,
data = dataset123,
distribution = dist,
train_params = start_stop,
cv_folds=10,
keep_gbm_data = TRUE,
)')
Anyone has any idea how the argument about number of trees n.trees can be added somewhere in the above code?
Many thanks in advance!
Update: in order to replicate the example code, please install mma and try the following:
library("mma")
data("weight_behavior") ##binary x #binary y
x=weight_behavior[,c(2,4:14)]
pred=weight_behavior[,3]
y=weight_behavior[,15]
data.bin<-data.org(x,y,pred=pred,contmed=c(7:9,11:12),binmed=c(6,10), binref=c(1,1),catmed=5,catref=1,predref="M",alpha=0.4,alpha2=0.4)
temp1<-med(data=data.bin,n=2) #or use self-defined final function
temp1<-med(data=data.bin,n=2, custom.function = 'glm(responseY~.,data=dataset123,family="quasibinomial",
weights=weights123)')
I changed the custom.function to gbmt and used a survival object as responseY and the error occurs. When I use the gbmt function on my data outside the med function, there is no error.

How to get the prediction output from glmmPQL to work with performance using R?

Problem
I am using R 3.3.3 on Windows 10 (x64 bit). I get the following prediction output from the glmmPQL prediction function as follows:
library(MASS)
library(nlme)
library(dplyr)
model<-glmmPQL(a ~ b + c + d, data = trainingDataSet, family = binomial, random = list( ~ 1 | e), correlation = corAR1())
The prediction values are given as follows:
p <- predict(model, newdata=testingDataSet, type="response",level=0) (1.0)
The output it gives is as follows:
I then try to measure the performance of this output using the following code:
pr <- prediction(p, testingDataSet$a)(1.1)
It gives us the following error as follows:
Error in prediction(p, testingDataSet$a) :
Format of predictions is invalid. (1.2)
I have successfully been able to use the prediction method in R using other functions (glm,svm,nn) when the data looks something like as follows:
model<-glm(a ~ b + c + e, family = binomial(link = 'logit'), data = trainingDataSet)
p <- predict(model, newdata=testingDataSet, type="response") (1.3)
Attempts
I believe the fix to the above problem is to get it into the format shown below (1.3). I have tried the following things using R and have been failing.
I have tried casting p in 1.0 using as.numeric() and as.list() and other things. I want to get look like the p R object in 1.3. In other words, I believe the format is reason why things not working for me?
No matter what mutate or casting I try, I can't seem to get it into the form in 1.3 and image shown as desired. Especially with the index as columns features.
I'm coming up empty handed on stackoverflow and the R help files. When I use the command class(p) both tell me they are numeric.
Question
Give the above, can someone tell me how I can use R to get the output from glmmPQL in a format that the prediction function can use as shown above please?
In other words, how can I make sure the output in 1.0 can made to match the output in 1.3 in R? My attempts have failed and I would deeply appreciate someone more skilled in R to point out where I am failing?

If you use as.numeric(p) then you'll get the values you want - then the only difference is that the GLM output has names. You can add these in with something like:
p <- as.numeric(p)
names(p) <- 1:length(p)
If this doesn't work, you can use str(p) to examine the structure of the object in more depth.

Extracting path coefficients of piecewise SEM (structural equation model)

I'm constructing a piecewise structural equation model using the piecewiseSEM package in R (Lefcheck - https://cran.r-project.org/web/packages/piecewiseSEM/vignettes/piecewiseSEM.html)
I already created the model set and I could evaluate the model fit, so the model itself works. Also, the data fits the model (p = 0.528).
But I do not succeed in extracting the path coefficients.
This is the error i get: Error in cbind(Xlarge, Xsmall) : number of rows of matrices must match (see arg 2)
I already tried (but this did not work):
standardising my data because of the warning: Some predictor variables are on very different scales: consider rescaling
adapted my data (threw some NA values away)
This is my modellist:
predatielijst = list(
lmer(plantgrootte ~ gapfraction + olsen_P + (1|plot_ID), data = d),
glmer(piek1 ~ gapfraction + olsen_P + plantgrootte + (1|plot_ID),
family = poisson, data = d),
glmer(predatie ~ piek1 + (1|plot_ID), family = binomial, data = d)
)
with "predatie" being a binary variable (yes or no) and all the rest continuous variables (gapfraction, plantgrootte, olsen_P & piek1)
Thanks in advance!

Try installing the development version:
library(devtools)
install_github("jslefche/piecewiseSEM#2.0")
Replace list with psem and run the coefs or summary function. It will likely get rid of your error. If not, open a bug on Github!
WARNING: this will overwrite your current version from CRAN. You will need to reinstall from CRAN to get version 1.4 back.

try to use lme (out of the nlme library) ilstead of glmer. As far as I understand, the fact that lmer does not provide p-values (while lme does) seems to be the problem here.
Hope this works.

Is lme4:::profile.merMod() supposed to work with glmer models?

Is lme4:::profile.merMod() supposed to work with glmer models? What about Negative Binomial models?
I have a negative binomial model that throws this error:
Error in names(opt) <- profnames(fm, signames) :
'names' attribute [2] must be the same length as the vector [1]
When I try and run the profile function on my model profile(model12) to get standard errors for my random effects.
Am I missing something or is this a problem with lme4?
I should mention that I'm using glmer(..., family = negative.binomial(theta = lme4:::est_theta(poissonmodel))) not glmer.nb() because I had issues with the update() function in using glmer.nb().

I can reproduce your error with the CRAN version (1.1-8). There has been some improvement in glmer.nb in the most recent development version, so if you have compilation tools installed I would definitely do devtools::install_github("lme4/lme4") and try again. In addition, update() works better with NB models now, so you might not need your workaround.
This works fine with version 1.1-9:
library("lme4")
m1 <- glmer.nb(TICKS~cHEIGHT+(1|BROOD),data=grouseticks)
pp <- profile(m1)
lattice::xyplot(pp)
Note by the way that your solution with est_theta only does the initial step or two of an iterative solution where the theta value and the other parameters are optimized in alternation ...
m0 <- glmer(TICKS~cHEIGHT+(1|BROOD),data=grouseticks,family=poisson)
m2 <- update(m0,
family = negative.binomial(theta = lme4:::est_theta(m0)))
cbind(glmer.nb=fixef(m1),pois=fixef(m0),fakenb=fixef(m2))
## glmer.nb pois fakenb
## (Intercept) 0.58573085 0.56835340 0.57759498
## cHEIGHT -0.02520326 -0.02521386 -0.02520702
profile() works OK on this model too, at least in the devel version ...

Evaluating weka classifier J48 with missing values in test set, R RWeka

I have an error when evaluating a simple test set with evaluate_Weka_classifier. Trying to learn how the interface works from R to Weka with RWeka, but I still don't get this.
library("RWeka")
iris_input <- iris[1:140,]
iris_test <- iris[-(1:140),]
iris_fit <- J48(Species ~ ., data = iris_input)
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)
No problems here, as we would assume (It is ofcourse a stupit test, no random holdout data etc). But now I want to simulate missing data (alot). So i set Petal.Width as missing:
iris_test$Petal.Width <- NA
evaluate_Weka_classifier(iris_fit, newdata = iris_test, numFolds=5)
Which gives the error:
Error in .jcall(evaluation, "S", "toSummaryString", complexity) :
java.lang.IllegalArgumentException: Can't have more folds than instances!
Edit: This error should tell me that I have not enough instances, but I have 10
Edit: If I use write.arff, it can be exported and read in by Weka. Change Petal.Width {} into Petal.Width numeric to make the two files exactly the same. Then it works in Weka.
Is this a thinking error? When reading Machine Learning, Practical machine learning tools and techniques it seems to be legit. Maybe I just have to tell RWeka that I want to use fractions when a split uses a missing variable?
Thnx!

The issue is that you need to tell J48() what to do with missing values.
library(RWeka)
?J48()
#pertinent output
J48(formula, data, subset, na.action,
control = Weka_control(), options = NULL)
na.action tells R what to do with missing values. When following up on na.action you will find that "The ‘factory-fresh’ default is na.omit". Under this setting of course there are not enough instances!
Instead of leaving na.action as the default omit, I have changed it as follows,
iris_fit<-J48(Species~., data = iris_input, na.action=NULL)
and it works like a charm!