Handling segmented error: NA breakpoint(s) at the boundary - r

I have a function that tries to apply a piece-wise regression model to my data.
In some cases, the data has a generous amount of missing values and I don't have a good estimator of where the knots will be. I decided to bypass the piece-wise and go for a simple linear regression:
try(piecewise) if error go to lm with just one slope
Here's the code that does it. Note that lin.reg is a helper function that outputs predict() for the lm object in the x range. It does not create any problem.
piece <- function(x,y){
# just in case this comes with column names or something
y <- as.numeric(y)
# create the lm object
lm.obj <- lm(y~x)
# Try to fit piecewise
seg <- try(segmented(lm.obj,seg.Z=~x))
# print(seg)
if("try-error" %in% class(seg)) {
# Print that you are using a linear regression and not the piece-wise
print("Using linear Regression")
# Call helper function
result <- lin.reg(x,y)
# Get out of the error/function
return(result)
}
# Use the piece-wise
result <- predict(segmented::segmented(lm.obj,seg.Z=~x),
newdata = data.frame(x,y))
print("Using piece-wise regression")
return(result)
}
Problem(s)
I get this error when piece-wise goes wrong
Error: at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)
But it is unreliable/unpredictable, sometimes it gets ignored and sometimes it breaks the function. I am looping over the rows of a data frame with the y values and the same call gets to different rows before braking.
I believe it has to do with the if("try-error" %in% class(seg)) that might not be the best way to catch the error.
I added some printing to make sure. Here's when it works properly, note iteration 284 gave error and went to simple linear.
[1] "Using piece-wise regression"
[1] 283
[1] "segmented" "lm"
[1] "Using piece-wise regression"
[1] 284
Error : at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)
[1] "try-error"
[1] "Using linear Regression"
And here's when it doesn't, seems like the try() call is not returning error as it should
[1] "Using piece-wise regression"
[1] 312
[1] "segmented" "lm"
[1] "Using piece-wise regression"
[1] 313
[1] "segmented" "lm"
Error: at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)

Adding the argument silent=T in the try block worked out for me.

Related

How to hide packages intersection when method called in R

I have one problem with packages intersection.
I use arfima and forecast packages which both have identical name methods (like AIC, BIC, etc).
So, when I try to run code, I have a problem with methods mismatch - R try to use method from the last loaded package. Yes, I can call exported methods by"::", I know. The main problem connected with internal code of these methods - they use own package methods in the methods body without "::". For example, when I try to use run this code:
require(arfima);
require(forecast);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
It gives those errors:
> fit <- arfima::arfima(x);
Note: autoweed is ON. It is possible, but not likely,
that unique modes may be lost.
Beginning the fits with 2 starting values.
> summary(fit);
Error in AIC.logLik(logl) :unrealized type (29) in 'eval'
So, this code w/o loaded forecast package works well (I ran it in "clear" R session):
require(arfima);
#require(forecast);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
# Note: autoweed is ON. It is possible, but not likely,
# that unique modes may be lost.
# Beginning the fits with 2 starting values.
#
# summary(fit);
#
# Call:
#
# arfima::arfima(z = x)
#
#
# Mode 1 Coefficients:
# Estimate Std. Error Th. Std. Err. z-value Pr(>|z|)
# d.f -0.0208667 0.0770519 0.0779679 -0.27081 0.78653
# Fitted mean -0.0432115 0.0845518 NA -0.51107 0.60930
# sigma^2 estimated as 0.851957; Log-likelihood = 8.51214; AIC = -11.0243; BIC = 282.976
#
# Numerical Correlations of Coefficients:
# d.f Fitted mean
# d.f 1.00 -0.09
# Fitted mean -0.09 1.00
#
# Theoretical Correlations of Coefficients:
# d.f
# d.f 1.00
#
# Expected Fisher Information Matrix of Coefficients:
# d.f
# d.f 1.65
So, is it possible to hide package for the part or executed code? Something like this:
require(arfima);
#require(forecast);
hide(forecast);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
unhide(forecast);
or like this:
require(arfima);
#require(forecast);
used(arfima);
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit);
unused(arfima);
Perhaps you should update all your package installations and retry. I don't get any error with your code using a newly installed binary copy of arfima. AIC is generic and when you look at the loaded methods you do see one for objects of class "arfima", so no use of the "::" function should be needed:
> methods(AIC)
[1] AIC.arfima* AIC.default* AIC.logLik*
see '?methods' for accessing help and source code
There is a detach function but it's often not enough to completely set aside a package unless its "unload" parameter is set to TRUE, and in this case may be overkill since your diagnosis appears to be incorrect. This shows that strategy to be feasible but I still suspect unnecessary:
require(arfima);
require(forecast);
detach(package:forecast,unload=TRUE)
x <- rnorm(100);
fit <- arfima::arfima(x);
summary(fit); library(forecast)
This shows that there are class specific methods for summary so there should be no errors caused by package confusion there:
> methods(summary)
[1] summary,ANY-method summary,DBIObject-method
[3] summary,quantmod-method summary.aov
[5] summary.aovlist* summary.arfima*
[7] summary.Arima* summary.arma*
[9] summary.aspell* summary.check_packages_in_dir*
[11] summary.connection summary.data.frame
[13] summary.Date summary.default
[15] summary.ecdf* summary.ets*
[17] summary.factor summary.forecast*
snipped res
Furthermore you are incorrect in thinking there is a forecast::BIC. With forecast and arfima loaded we see only :
methods(BIC)
[1] BIC.arfima*
(And there is no suggestion on the Index page of forecast that either AIC or BIC are defined within that package. So any object created by the forecast functions would be handled by AIC.default and would be expected to fail with BIC.

error message in R : if (nomZ %in% coded) { : argument is of length zero

I'm very new to R (and stackoverflow). I've been trying to conduct a simple slopes analysis for my continuous x dichotomous regression model using lmres, and simpleSlope from the pequod package.
My variables:
SLS - continuous DV
csibdiff - continuous predictor (I already manually centered the variable with another code)
culture - dichotomous moderator
newmod<-lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <-simpleSlope(newmod, pred="csibdiff", mod1="culture")
However, after running the simpleSlope function, I get this error message:
Error in if (nomZ %in% coded) { : argument is of length zero
I don't understand the nomZ part but I assume something was wrong with my variables. What does this mean? I don't have a nomZ named thing in my data at all. None of my variables are null class (I checked them with the is.null() function), and I didn't seem to have accidentally deleted the contents of the variable (I checked with the table() function).
If anyone else can suggest another function/package that I can do a simple slope analysis in, as well, I'd appreciate it. I've been stuck on this problem for a few days now.
EDIT: I subsetted the relevant variables into a csv file.
https://www.dropbox.com/s/6j82ky457ctepkz/sibdat2.csv?dl=0
tl;dr it looks like the authors of the package were thinking primarily about continuous moderators; if you specify mod1="cultureEuropean" (i.e. to match the name of the corresponding parameter in the output) the function returns an answer (I have no idea if it's sensible or not ...)
It would be a service to the community to let the maintainers of the pequod package (maintainer("pequod")) know about this issue ...
Read data and replicate error:
sibdat2 <- read.csv("sibdat2.csv")
library(pequod)
newmod <- lmres(SLS ~ csibdiff*culture, data=sibdat2)
newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="culture")
Check the data:
summary(sibdat2)
We do have some NA values in csibdiff, so try removing these ...
sibdat2B <- na.omit(sibdat2)
But that doesn't actually help (same error as before).
Plot the data to check for other strangeness
library(ggplot2); theme_set(theme_bw())
ggplot(sibdat2B,aes(csibdiff,SLS,colour=culture))+
stat_sum(aes(size=factor(..n..))) +
geom_smooth(method="lm")
There's not much going on here, but nothing obviously wrong either ...
Use traceback() to see approximately where the problem is:
traceback()
3: simple.slope(object, pred, mod1, mod2, coded)
2: simpleSlope.default(newmod, pred = "csibdiff", mod1 = "culture")
1: simpleSlope(newmod, pred = "csibdiff", mod1 = "culture")
We could use options(error=recover) to jump right to the scene of the crime, but let's try step-by-step debugging instead ...
debug(pequod:::simple.slope)
As we go through we can see this:
nomZ <- names(regr$coef)[pos_mod]
nomZ ## character(0)
And looking a bit farther back we can see that pos_mod is also a zero-length integer. Farther back, we see that the code is looking through the parameter names (row names of the variance-covariance matrix) for the name of the modifier ... but it's not there.
debug: pos_pred_mod1 <- fI + grep(paste0("\\b", mod1, "\\b"), jj[(fI +
1):(fI + fII)])
Browse[2]> pos_mod
## integer(0)
Browse[2]> jj[1:fI]
## [[1]]
## [1] "(Intercept)"
##
## [[2]]
## [1] "csibdiff"
##
## [[3]]
## [1] "cultureEuropean"
Browse[2]> mod1
## [1] "culture"
The solution is to tell simpleSlope to look for a variable that is there ...
(newmodss <- simpleSlope(newmod, pred="csibdiff", mod1="cultureEuropean"))
## Simple Slope:
## simple slope standard error t-value p.value
## Low cultureEuropean (-1 SD) -0.2720128 0.2264635 -1.201133 0.2336911
## High cultureEuropean (+1 SD) 0.2149291 0.1668690 1.288011 0.2019241
We do get some warnings about NaNs produced -- you'll have to dig farther yourself to see if you need to worry about them.

predict.lm with arbitrary coefficients r

I'm trying to predict an lm object using predict.lm. However, I would like to use manually inserted coefficients.
To do this I tried:
model$coefficients <- coeff
(where "coeff" is a vector of correct coefficients)
which would indeed modify the coefficients as I want. Nevertheless, when I execute
predict.lm(model, new.data)
I just get predictions calculated with the "old" parameters. Is there a way I could force predict.lm to use the new ones?
Post Scriptum: I need to do this to fit a bin-smooth (also called regressogram).
In addition, when I predict "by hand" (i.e. using matrix multiplication) the results are fine, hence I'm quite sure that the problem lies in the predict.lm not recognizing my new coefficients.
Thanks in advance for the help!
Hacking the $coefficients element does indeed seem to work. Can you show what doesn't work for you?
dd <- data.frame(x=1:5,y=1:5)
m1 <- lm(y~x,dd)
m1$coefficients <- c(-2,1)
m1
## Call:
## lm(formula = y ~ x, data = dd)
##
## Coefficients:
## [1] -2 1
predict(m1,newdata=data.frame(x=7)) ## 5 = -2+1*7
predict.lm(...) gives the same results.
I would be very careful with this approach, checking each time you do something different with the hacked model.
In general it would be nice if predict and simulate methods took a newparams argument, but they don't in general ...

Extracting predictions from a GAM model with splines and lagged predictors

I have some data and am trying to teach myself about utilize lagged predictors within regression models. I'm currently trying to generate predictions from a generalized additive model that uses splines to smooth the data and contains lags.
Let's say I have the following data and have split the data into training and test samples.
head(mtcars)
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
Great, let's train the gam model on the training set.
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(lag(disp, 1), bs="cr"), data=mtcars[Train,])
summary(f_gam)
When I go to predict on the holdout sample, I get an error message.
f_gam.pred <- predict(f_gam, mtcars[-Train,]); f_gam.pred
Error in ExtractData(object, data, NULL) :
'names' attribute [1] must be the same length as the vector [0]
Calls: predict ... predict.gam -> PredictMat -> Predict.matrix3 -> ExtractData
Can anyone help diagnose the issue and help with a solution. I get that lag(__,1) leaves a data point as NA and that is likely the reason for the lengths being different. However, I don't have a solution to the problem.
I'm going to assume you're using gam() from the mgcv library. It appears that gam() doesn't like functions that are not defined in "base" in the s() terms. You can get around this by adding a column which include the transformed variable and then modeling using that variable. For example
tmtcars <- transform(mtcars, ldisp=lag(disp,1))
Train <- sample(1:nrow(mtcars), ceiling(nrow(mtcars)*3/4), replace=FALSE)
f_gam <- gam(hp ~ s(qsec, bs="cr") + s(ldisp, bs="cr"), data= tmtcars[Train,])
summary(f_gam)
predict(f_gam, tmtcars[-Train,])
works without error.
The problem appears to be coming from the mgcv:::get.var function. It tires to decode the terms with something like
eval(parse(text = txt), data, enclos = NULL)
and because they explicitly set the enclosure to NULL, variable and function names outside of base cannot be resolved. So because mean() is in the base package, this works
eval(parse(text="mean(x)"), data.frame(x=1:4), enclos=NULL)
# [1] 2.5
but because var() is defined in stats, this does not
eval(parse(text="var(x)"), data.frame(x=1:4), enclos=NULL)
# Error in eval(expr, envir, enclos) : could not find function "var"
and lag(), like var() is defined in the stats package.

How to retrieve value of a function in R

when calling a function in R, how can I retrieve the result values. For example, I used 'roc' function and I need to extract AUC value and CI (0.6693 and 0.6196-0.7191 respectively in the following example).
> roc(tmpData[,lenCnames], fitted(model), ci=TRUE)
Call:
roc.default(response = tmpData[, lenCnames], predictor = fitted(model), ci = TRUE)
Data: fitted(model) in 127 controls (tmpData[, lenCnames] 0) < 3248 cases (tmpData[, lenCnames] 1).
Area under the curve: 0.6693
95% CI: 0.6196-0.7191 (DeLong)
I can use the following to fetch these values with associated texts.
> z$auc
Area under the curve: 0.6693
> z$ci
95% CI: 0.6196-0.7191 (DeLong)
Is there a way to get only the values and not the text.
I do now how to get these using 'regular expression' or 'strsplit' function, but I suspect there should be some other way to directly access these values.
It's helpful to use reproducible examples when asking a question. Also best to refer to the library you're asking about ("pROC"), since it is not loaded with base R. pROC has functions that extract auc and ci.auc objects from the roc object.
>library("pROC")
>data(aSAH)
# Basic example
>z <- roc(aSAH$outcome, aSAH$s100b,
levels=c("Good", "Poor"))
# Examining the class of 'auc' output shows us that it is also of class 'numeric'
> class(auc(z))
[1] "auc" "numeric"
# calling 'as.numeric' will extract the value
> as.numeric(auc(z))
[1] 0.7313686
# calling 'as.numeric' on the 'ci.auc' object extracts three values.
as.numeric(ci(z))
[1] 0.6301182 0.7313686 0.8326189
# The ones we want are 1 and 3
> as.numeric(ci(z))[c(1,3)]
[1] 0.6301182 0.8326189
Using the functions str, class, and attributes will often help you figure out how to get what you want out of an object.

Resources