error with predict() with betareg - r

I used the betareg() function to determine the dynamic of my dependent variables by using a data set with 19083 rows.
M_Beta_F1 <- betareg(Y ~ X1+X2|1, data = data1)
summary(M_Beta_F1)
The fitted values are extracted via:
in_Y <- fitted(M_Beta_F1)
I wanted to determine the predicted values by using a new data set (data2) having 28779 rows.
out_Y_response <- predict.glm(M_Beta_F1, data2, type=c("response"))
I got the following message
Error in seq_len(p) : argument must be coercible to non-negative integer
In addition: Warning messages:
1: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == :
calling predict.lm(<fake-lm-object>) ...
2: 'newdata' had 28779 rows but variables found have 19083 rows
3: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
variable 'mean' is absent, its contrast will be ignored
4: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
variable 'precision' is absent, its contrast will be ignored
5: In seq_len(p) : first element used of 'length.out' argument
How can I fix this problem?

Related

Error when trying to fit Hierarchical GAMs (Model GS or S) using mgcv

I have a large dataset (~100k observations) of presence/absence data that I am trying to fit a Hierarchical GAM with individual effects that have a Shared penalty (e.g. 'S' in Pedersen et al. 2019). The data consists of temp as numeric, region (5 groups) as a factor.
Here is a simple version of the model that I am trying to fit.
modS1 <- gam(occurrence ~ s(temp, region), family = binomial,
data = df, method = "REML")
modS2 <- gam(occurrence ~ s(temp, region, k= c(10,4), family = binomial,
data = df, method = "REML")
In the first case I received the following error:
Which I assumed it because k was set too high for region given there are only 5 different regions in the data set.
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors
In the second case I attempt to lower k for region and receive this error:
Error in if (k < M + 1) { : the condition has length > 1
In addition: Warning messages:
1: In mean.default(xx) : argument is not numeric or logical: returning NA
2: In Ops.factor(xx, shift[i]) : ‘-’ not meaningful for factors
I can fit Models G and GI and I from Pedersen et al. 2019 with no issues. It is models GS and S where I run into issues.
If anyone has any insights I would really appreciate it!
The bs = "fs" argument in the code you're using as a guide is important. If we start at the ?s help page and click on the link to the ?smooth.terms help page, we see:
Factor smooth interactions
bs="fs" Smooth factor interactions are often produced using by variables (see gam.models), but a special smoother class (see factor.smooth.interaction) is available for the case in which a smooth is required at each of a large number of factor levels (for example a smooth for each patient in a study), and each smooth should have the same smoothing parameter. The "fs" smoothers are set up to be efficient when used with gamm, and have penalties on each null space component (i.e. they are fully ‘random effects’).
You need to use a smoothing basis appropriate for factors.
Notably, if you take your source code and remove the bs = "fs" argument and attempt to run gam(log(uptake) ∼ s(log(conc), Plant_uo, k=5, m=2), data=CO2, method="REML"), it will produce the same error that you got.

How do I use the findCorrelation in the Caret Package in R when I have missing values?

I am trying to impute missing values but am coming up with a system is computationally singular error. Hence, I am trying to drop collinear variables.
I tried the following code:
indexesToDrop <- findCorrelation(cor(df_before, use = "pairwise.complete.obs"), cutoff = 0.85)
Which producers the following error:
Error in findCorrelation_fast(x = x, cutoff = cutoff, verbose = verbose) :
The correlation matrix has some missing values.
In addition: Warning message:
In cor(as.matrix(df_before), use = "pairwise.complete.obs") :
the standard deviation is zero

predict.lm throws error when dataframe is subset

I am trying to use the caret::train() function to create a linear model with leave-one-out cross-validation from a data frame with multiple response variables. Some of the response variables I want to log transform. Some of the other response variables have NA variables. I am getting the following error:
Error in seq_len(p) : argument must be coercible to non-negative integer
In addition: Warning messages:
1: In predict.lm(trainlm, newdata = df2, type = "response") :
calling predict.lm(<fake-lm-object>) ...
2: In seq_len(p) : first element used of 'length.out' argument
Looking through other posts, It seemed like this arose either because:
I subset the dataframe
I had NA values
I tried to remedy this by first creating a new dataframe with the appropriate columns and selecting rows with complete.cases(), but the problem persists. Below is my reproducible example:
library(caret) # for train() function
set.seed(52) # to make reproducible
##Creating Fake Dataset
X1<-runif(100, 2, 21)
X2<-runif(100, 21, 40)
X3<-runif(100, 12, 18)
errors1<-rnorm(100, 0, 1)
errors2<-rnorm(100, 0, 1)
#multiple response variables
Y1<-2.31+(0.52*X1)+(0.84*X2)+(2.2*X3)+(1.5*X1*X2)+(1.6*errors1)
Y2<-5.31+(2.1*X1)+(2.2*X3)+(1.5*X1*X3)+(0.4*errors2)
##Creating an NA Value
Y2[82]<-NA
##Dataframe with all predictors and both response variables
df<-data.frame(Y1, Y2, X1, X2, X3)
##Subsetting to get rid of NA and other
df2<-subset(df[complete.cases(df),], select=-1)
##Building the model
TrCtrl<-trainControl(method="LOOCV")
trainlm<-train(log(Y2+1)~X1+X2+X3+(X1+X2)+(X1*X3)+(X2*X3)+(X1*X2*X3), method="lmStepAIC", data=df2, trControl=TrCtrl)
##Getting Prediction##
Train.Predict<-predict.lm(trainlm, newdata = df2, type = "response")
trainlm isn't an lm class so predict.lm isn't the right function to call.
class(trainlm)
#> [1] "train" "train.formula"
Use predict and let S3 choose the appropriate method.
Train.Predict <- predict(trainlm, newdata = df2)

Fail to predict woe in R

I used this formula to get woe with
library("woe")
woe.object <- woe(data, Dependent="target", FALSE,
Independent="shop_id", C_Bin=20, Bad=0, Good=1)
Then I want to predict woe for the test data
test.woe <- predict(woe.object, newdata = test, replace = TRUE)
And it gives me an error
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "data.frame"
Any suggestions please?
For prediction, you cannot do it with the package woe. You need to use the package. Take note of the masking of the function woe, see below:
#let's say we woe and then klaR was loaded
library(klaR)
data = data.frame(target=sample(0:1,100,replace=TRUE),
shop_id = sample(1:3,100,replace=TRUE),
another_var = sample(letters[1:3],100,replace=TRUE))
#make sure both dependent and independent are factors
data$target=factor(data$target)
data$shop_id = factor(data$shop_id)
data$another_var = factor(data$another_var)
You need two or more dependent variables:
woemodel <- klaR::woe(target~ shop_id+another_var,
data = data)
If you only provide one, you have an error:
woemodel <- klaR::woe(target~ shop_id,
data = data)
Error in woe.default(x, grouping, weights = weights, ...) : All
factors with unique levels. No woes calculated! In addition: Warning
message: In woe.default(x, grouping, weights = weights, ...) : Only
one single input variable. Variable name in resulting object$woe is
only conserved in formula call.
If you want to predict the dependent variable with only one independent, something like logistic regression will work:
mdl = glm(target ~ shop_id,data=data,family="binomial")
prob = predict(mdl,data,type="response")
predicted_label = ifelse(prob>0.5,levels(data$target)[1],levels(data$target)[0])

Error in panel spatial model in R using spml

I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.

Resources