Hosmer-Lemeshow error - r

This is probably a simple coding issue but I can't for the life of me get it.
Using this code for a logit:
glm(formula = cbind(Found, Missing) ~ Male + Age, family = binomial,
data = table.5.15)
I can't get the Hosmer-Lemeshow to work:
hosmerlem(miss.logit$cbind(Found,Missing), fitted(miss.logit))
Error in cbind(1 - y, y) : attempt to apply non-function
I realize this is a problem with having the cbind in my logit model.

Assuming you're using some implementation of the Hosmer Lemeshow test which roughly resembles Frank Harrell's,
it seems most likely that your mistake is a basic syntactical issue:
miss.logit$cbind(Found,Missing),
Your $ operator is not smart enough to reference both Found and Missing as objects which resolve in the scope of miss.logit. For instance:
> x <- data.frame('n'=1:26, 'l'=letters[1:26])
> x$cbind(n, l)
Error: attempt to apply non-function
The issue is that R thinks cbind is a function that lives in x which you're trying to evaluate on two globals n and l. Even if I made cbind an element of x, n and l would need to be referenced within x as well.
I can correct this code by using the with statement instead, or just basic array subsetting.
> x[, c('n', 'l')] ## works (best)
> with(x, cbind(n, l)) ## works
> cbind(x$n, x$l) ## works (worst)

FYI for anyone else.
I'm an R newbie and I had a similar problem today...
My code:
hosmerlem(y = MM_LOGIT$sick_or_not, yhat = fitted(MM_LOGIT))
Error in model.frame.default(formula = cbind(1 - y, y) ~ cutyhat) :
variable lengths differ (found for 'cutyhat')
I had to change the y= bit to point towards the name of my data, rather than the logistic/glm output...
hosmerlem(y=MYDATA$sick_or_not, yhat=fitted(MM_LOGIT))
$chisq
[1] 20.24864
$p.value
[1] 0.0003960473

Related

Allowing for aliased coefficients when running `grangertest()` in R

I'm currently trying to run a granger causality analysis in R/R Studio. I am receiving errors about aliased coefficients when using the function grangertest(). From my understanding, this occurs because there is perfect multicolinearity between the variables.
Due to having a very large number of pairwise comparisons (e.g. 200+), I would like to simply run the granger with the aliased coefficients as per normal rather than returning an error. According to one answer here, the solution is or was to add set singular.ok=TRUE, but either I am doing it incorrectly the answer s out of date. I've tried checking the documentation, but have come up empty. Any help would be appreciated.
library(lmtest)
x <- c(0,1,2,3)
y <- c(0,3,6,9)
grangertest(x,y,1) # I want this to run successfully even if there are aliased coefficients.
grangertest(x,y,1, singular.ok=TRUE) # this also doesn't work
"Error in waldtest.lm(fm, 2, ...) :
there are aliased coefficients in the model"
Additionally is there a way to flag x and y are actually aliased variables? There seem to be some answers like here but I'm having issues getting it working properly.
alias((x~ y))
Thanks in advance.
After some investigation and emailing the creator of the grangertest package, they sent me this solution. The solution should run on aliased variables when granger test does not. When the variables are not aliased, the solution should give the same values as the normal granger test.
library(lmtest)
library(dynlm)
# Some data that is multicolinear
x <- c(0,1,2,3,4)
y <- c(0,3,6,9,12)
# Some data that is not multicolinear
# x <- c(0,125,200,230,777)
# y <- c(0,3,6,9,200)
# Convert to time series (this is an important step)
x=ts(x)
y=ts(y)
# This will run even when the data is multicolinear (but also when it is not)
# and is functionally the same as running the granger test (which by default uses the waldtest
m1 = dynlm(x ~ L(x, 1:1) + L(y, 1:1))
m2 = dynlm(x ~ L(x, 1:1))
result <-anova(m1, m2, test="F")
# This will fail if the data is multicolinear or aliased but should give the same results as the anova otherwise (F value and P value etc)
#grangertest(y,x,1)

error message in R : if (nomZ %in% coded) { : argument is of length zero

I'm very new to R (and stackoverflow). I've been trying to conduct a simple slopes analysis for my continuous x dichotomous regression model using lmres, and simpleSlope from the pequod package.
My variables:
SLS - continuous DV
csibdiff - continuous predictor (I already manually centered the variable with another code)
culture - dichotomous moderator
newmod<-lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <-simpleSlope(newmod, pred="csibdiff", mod1="culture")
However, after running the simpleSlope function, I get this error message:
Error in if (nomZ %in% coded) { : argument is of length zero
I don't understand the nomZ part but I assume something was wrong with my variables. What does this mean? I don't have a nomZ named thing in my data at all. None of my variables are null class (I checked them with the is.null() function), and I didn't seem to have accidentally deleted the contents of the variable (I checked with the table() function).
If anyone else can suggest another function/package that I can do a simple slope analysis in, as well, I'd appreciate it. I've been stuck on this problem for a few days now.
EDIT: I subsetted the relevant variables into a csv file.
https://www.dropbox.com/s/6j82ky457ctepkz/sibdat2.csv?dl=0
tl;dr it looks like the authors of the package were thinking primarily about continuous moderators; if you specify mod1="cultureEuropean" (i.e. to match the name of the corresponding parameter in the output) the function returns an answer (I have no idea if it's sensible or not ...)
It would be a service to the community to let the maintainers of the pequod package (maintainer("pequod")) know about this issue ...
Read data and replicate error:
sibdat2 <- read.csv("sibdat2.csv")
library(pequod)
newmod <- lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="culture")
Check the data:
summary(sibdat2)
We do have some NA values in csibdiff, so try removing these ...
sibdat2B <- na.omit(sibdat2)
But that doesn't actually help (same error as before).
Plot the data to check for other strangeness
library(ggplot2); theme_set(theme_bw())
ggplot(sibdat2B,aes(csibdiff,SLS,colour=culture))+
stat_sum(aes(size=factor(..n..))) +
geom_smooth(method="lm")
There's not much going on here, but nothing obviously wrong either ...
Use traceback() to see approximately where the problem is:
traceback()
3: simple.slope(object, pred, mod1, mod2, coded)
2: simpleSlope.default(newmod, pred = "csibdiff", mod1 = "culture")
1: simpleSlope(newmod, pred = "csibdiff", mod1 = "culture")
We could use options(error=recover) to jump right to the scene of the crime, but let's try step-by-step debugging instead ...
debug(pequod:::simple.slope)
As we go through we can see this:
nomZ <- names(regr$coef)[pos_mod]
nomZ ## character(0)
And looking a bit farther back we can see that pos_mod is also a zero-length integer. Farther back, we see that the code is looking through the parameter names (row names of the variance-covariance matrix) for the name of the modifier ... but it's not there.
debug: pos_pred_mod1 <- fI + grep(paste0("\\b", mod1, "\\b"), jj[(fI +
1):(fI + fII)])
Browse[2]> pos_mod
## integer(0)
Browse[2]> jj[1:fI]
## [[1]]
## [1] "(Intercept)"
##
## [[2]]
## [1] "csibdiff"
##
## [[3]]
## [1] "cultureEuropean"
Browse[2]> mod1
## [1] "culture"
The solution is to tell simpleSlope to look for a variable that is there ...
(newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="cultureEuropean"))
## Simple Slope:
## simple slope standard error t-value p.value
## Low cultureEuropean (-1 SD) -0.2720128 0.2264635 -1.201133 0.2336911
## High cultureEuropean (+1 SD) 0.2149291 0.1668690 1.288011 0.2019241
We do get some warnings about NaNs produced -- you'll have to dig farther yourself to see if you need to worry about them.

bic.glm predict error: "newdata is missing variables"

I've spent a lot of time trying to solve this error and searching for solutions without any luck, and I thank you in advance for your help.
I'm trying to create predicted values from the coefficients created via BMA. Whenever I run my predict function, I am getting a "newdata is missing variables" error. All variables included in the original model are present in the new dataframe, so I'm not quite sure what the problem is. I'm working with a fairly large dataset with many independent variables. I'm fairly new to R, so I apologize if this is an obvious question!
y<-df$y
x<-df
x$y<-NULL
bic.glm<-bic.glm(x, y, strict=FALSE, OR=20, glm.family="binomial", factortype=TRUE)
predict(bic.glm.bwt, x)
I've also tried it this way:
bic.glm<-bic.glm(y~., data=df, strict=FALSE, OR=20, glm.family="binomial", factortype=TRUE)
predict(bic.glm, x)
And also with creating a new data frame...
bic.glm<-bic.glm(y~., data=df, strict=FALSE, OR=20, glm.family="binomial", factortype=TRUE)
newdata<-x
predict(bic.glm, newdata=x)
Each time I receive the same error message:
Error in predict.bic.glm(bic.glm, newdata=x) :
newdata is missing variables
Any help is very much appreciated!
First, it is bad practice to call your LHS the same name as the function call. You may be masking the function bic.glm from further use.
That minor comment aside... I just encountered the same error. After some digging, it seems that predict.bic.glm checks the names vs. the mle matrix in the bic.glm object. The problem is that somewhere in bic.glm, if factors are used, those names get a '.x' or just '.' appended at the end. Therefore, whenever you use factors you will get this error.
I communicated this to package maintainers. Meanwhile, you can work around the bug by renaming the column names of the mle object, like this (using your example):
fittedBMA<-bic.glm(y~., data=df)
colnames(fittedBMA$mle)=colnames(model.matrix(y~., data=df)) ### this is the workaround
predict(fittedBMA,newdata=x) ### should work now, if x has the same variables as df
Okay, so first look at the usage section in the cran documentation for BMA::bic.glm.
here
This example is instructive for a data.frame.
Example 2 (binomial)
library(MASS)
data(birthwt)
y <- birthwt$lo
x <- data.frame(birthwt[,-1])
x$race <- as.factor(x$race)
x$ht <- (x$ht>=1)+0
x <- x[,-9]
x$smoke <- as.factor(x$smoke)
x$ptl <- as.factor(x$ptl)
x$ht <- as.factor(x$ht)
x$ui <- as.factor(x$ui)
bic.glm.bwT <- bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial",
factor.type=TRUE)
predict( bic.glm.bwT, newdata = x)
bic.glm.bwF <- bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial",
factor.type=FALSE)
predict( bic.glm.bwF, newdata = x)

function works (boot.stepAIC ) but throws an error inside another function - environment issue?

I realized a strange behavior today with in my R code.
I tried a package {boot.StepAIC} which includes a bootstrap function for the results of the stepwise regression with the AIC. However I do not think the statistical background is here the problem (I hope so).
I can use the function at the top level of R. This is my example code.
require(MASS)
require(boot.StepAIC)
n<-100
x<-rnorm(n); y<-rnorm(n,sd=2); z<-rnorm(n,sd=3); res<-x+y+z+rnorm(n,sd=0.1)
dat.test<-as.data.frame(cbind(x,y,z,res))
form.1<-as.formula(res~x+y+z)
boot.stepAIC(lm(form.1, dat.test),dat.test) # should be OK - works at me
However, I wanted to wrap that in an own function. I pass the data and the formula to that function. But I get an error within boot.stepAIC() saying:
the model fit failed in 100 bootstrap samples Error in
strsplit(nam.vars, ":") : non-character argument
# custom function
fun.boot.lm.stepAIC<-function(dat,form) {
if(!inherits(form, "formula")) stop("No formula given")
fit.lm<-lm(formula=form,data=dat)
return(boot.stepAIC(object=fit.lm,data=dat))
}
fun.boot.lm.stepAIC(dat=dat.test,form=form.1)
# results in an error
So where is the mistake? I suppose it must have something to do with the local and global environment, doesn't it?
Using do.call as in anova test fails on lme fits created with pasted formula provides the answer.
boot.stepAIC doesn't have access to form when run within a function; that can be recreated in the global environment like this; we see that lm is using form.1 as the formula, and removing it makes boot.stepAIC fail.
> form.1<-as.formula(res~x+y+z)
> mm <- lm(form.1, dat.test)
> mm$call
lm(formula = form.1, data = dat.test)
> rm(form.1)
> boot.stepAIC(mm,dat.test)
# same error as OP
Using do.call does work. Here I use as.name as well; otherwise the mm object carries around the entire dataset instead of just the name of it.
> form.1<-as.formula(res~x+y+z)
> mm <- do.call("lm", list(form.1, data=as.name("dat.test")))
> mm$call
lm(formula = res ~ x + y + z, data = dat.test)
> rm(form.1)
> boot.stepAIC(mm,dat.test)
To apply this to the original problem, I'd do this:
fun.boot.lm.stepAIC<-function(dat,form) {
if(!inherits(form, "formula")) stop("No formula given")
mm <- do.call("lm", list(form, data=as.name(dat)))
do.call("boot.stepAIC", list(mm,data=as.name(dat)))
}
form.1<-as.formula(res~x+y+z)
fun.boot.lm.stepAIC(dat="dat.test",form=form1)
This works too but the entire data set gets included in the final output object, and the final output to console, as well.
fun.boot.lm.stepAIC<-function(dat,form) {
if(!inherits(form, "formula")) stop("No formula given")
mm <- do.call("lm", list(form, data=dat))
boot.stepAIC(mm,data=dat)
}
form.1<-as.formula(res~x+y+z)
fun.boot.lm.stepAIC(dat=dat.test,form=form.1)

Linear model function lm() error: NA/NaN/Inf in foreign function call (arg 1)

Say I have data.frame a
I use
m.fit <- lm(col2 ~ col3 * col4, na.action = na.exclude)
col2 has some NA values, col3 and col4 have values less than 1.
I keep getting
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in foreign function call (arg 1)
I've checked the mailing list and it appears that it is because of the NAs in col2 but I tried using na.action=na.exclude/omit/pass but none of them seem to work. I've tested lm again on first 10 entries, definitely not because of the NAs. Problem with this warning is every google results seem to be pointing at NA.
Did I misinterpret the error or am I using lm wrongly?
Data is at kaggle. I'm modelling MonthlyIncome data using linear regression (as I couldn't get a certain glm family to work). I've created my own variables to use but if you try to model MonthlyIncome with variables already present it fails.
I know this thread is really old, but the answers don't seem complete, and I just ran into the same problem.
The problem I was having was because the NA columns also had NaN and Inf. Remove those and try it again. Specifically:
col2[which(is.nan(col2))] = NA
col2[which(col2==Inf)] = NA
Hope that helps your 18 month old question!
You should have a read the book A Beginner’s Guide to R for a complete explanation on this. Specifically, it mentions the following error:
Error in lm.fit(x, y, offset = offset, singular.ok =
singular.ok,...): NA/NaN/Inf in foreign function call (arg 4)
The solution is to add a small constant value to the Intensity data, for example, 1. Note that there is an on-going discussion in the statistical community concerning adding a small value. Be that as it may, you cannot use the log of zero when doing calculations in R.
I just suffered another possibility, after all posible na.omit and na.exclude checks.
I was taking something like:
lm(log(x) ~ log(y), data = ...)
Without noticing that, for some values in my dataset, x or y could be zero:
log(0) = -Inf
So just another thing to watch out for!
I solved this type of problem by resetting my options.
options(na.action="na.exclude")
or
options(na.action="na.omit")
I checked my settings and had previously changed the option to
"na.pass" which didn't drop my y observations with NAs (where y~x).
Try changing the type of col2 (and all other variables)
col2 <- as.integer(col2)
I just encountered the same problem. get the finite elements using
finiteElements = which(is.finite(col3*col4))
finiteData = data[finiteElements,]
lm(col2~col3*col4,na.action=na.exclude,data=finiteData)
I encountered this error when my equivalent of col2 was an integer64 rather than an integer and when using natural and polynomial splines, splines::bs and splines:ns for example:
m.fit <- lm(col1 ~ ns(col2))
m.fit <- lm(col1 ~ bs(col2, degree = 3))
Converting to a standard integer worked for me:
m.fit <- lm(col1 ~ ns(as.integer(col2)))
m.fit <- lm(col1 ~ bs(as.integer(col2), degree = 3))
I got this error when I inverted the arguments when calling reformulate and use the formula in my lm call without checking, so I had the wrong predictor and response variable.
Another thing to watch out for is using functions like log() or sin() make your x's and y's inf. eg. log 0 = 0 or sin(pi) = 0.
This is what helped in my case. I parsed the data that already exclude NAs and INFs.
lm(y ~ x, data = data[(y != Inf & is.na(y) == FALSE)])
Make sure you don't have any 0 in your dependent variable.

Resources