Argument "weights" in bayesglm() function in R - r

I am building a default risk prediction model using bayesglm with the binomial method and I would like fit the model with weights, I am trying to use the principal vector (amount of money that a company has lent to a person) as weights, but I got these warning messages:
1: In bayesglm.fit(x = X, y = Y, weights = weights, start = start, : non-finite coefficients at iteration 4
2: algorithm did not converge
3: fitted probabilities numerically 0 or 1 occurred
The principal has a high variance, that could be a reason? I tried with the log and got also the same messages.
set.seed(123)
lm_D_O9<- bayesglm(sampleDefaultO_tr$Default ~ ., data = sampleDefaultO_tr[,-c(20,23,24,49:54,58,60:62)], family=binomial,control = list(maxit = 100),
weights=floor(log(sampleDefaultO_tr$mntTotal)*1000))
My repo here--> github.com/dclopezb9/Thesis
Thank you in advance!

Related

Estimating Dynamic Difference in Difference in R

I've been trying to estimate the above regression using a multiple cross section dataset, and I tried using the did library without success. I have a large dataset and I already formatted the data such that I have a event time dummy, but it gives an error. Treatment is in 2018 and outcome is emp, and base period should be 2017.
I tried:
df4<-df1[complete.cases(df1$treat),]
df4<-df4[complete.cases(df4$emp),]
df4<-df4[(df4$year>=2014),]
df4$g<-ifelse(df4$treat==1,2018,0)
att1 <- att_gt(yname = "emp",
tname = "period",
gname = "G",
xformla = ~treat+factor(month)+factor(year),
data = df4,
panel=FALSE
)
and it gives me
'
Error in DRDID::drdid_rc(y = Y, post = post, D = G, covariates = covariates, :
Outcome regression model coefficients have NA components.
Multicollinearity (or lack of variation) of covariates is a likely reason.
In addition: Warning messages:
1: glm.fit: algorithm did not converge
2: In DRDID::drdid_rc(y = Y, post = post, D = G, covariates = covariates, :
glm algorithm did not converge
'
I also did a regression using lm only but it implied insignificant results, which should not be the case for my assignment
`
ols1 <-lm(emp ~ relevel(factor(year),ref="2017")*treat+factor(month),
data=df4)
summary(ols1)
`

Trouble in GAM model in R software

I am trying to run the following code on R:
m <- gam(Flp_pop ~ s(Flp_CO, bs = "cr", k = 30), data = data, family = poisson, method = "REML")
My dataset is like this:
enter image description here
But when I try to execute, I get this error message:
"Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)"
I am very new to R, maybe it is a very basic question. But does anyone know why this is happening?
Thanks!
The Poisson distribution has support on the non-negative integers and you are passing a continuous variable as the response. Here's an example with simulated data
library("mgcv")
library("gratia")
library("dplyr")
df <- data_sim("eg1", seed = 2) %>% # simulate Gaussian response
mutate(yabs = abs(y)) # make y non negative
mp <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
# fails
which reproduces the error you saw
Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The warnings are of the form:
$> warnings()[1]
Warning message:
In dpois(y, y, log = TRUE) : non-integer x = 7.384012
Indicating the problem; the model is evaluating the probability mass for your response data given the estimated model and you're evaluating this at the indicated non-integer value, which returns a 0 mass plus the warning.
If we'd passed the original Gaussian variable as the response, which includes negative values, the function would have errored out earlier:
mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
which raises this error:
r$> mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
Error in eval(family$initialize) :
negative values not allowed for the 'Poisson' family
An immediate but not necessarily advisable solution is just to use the quasipoisson family
mq <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = quasipoisson, method = "REML")
which uses the same mean variance relationship as the Poisson distribution but not the actual distribution so we can get away with abusing it.
Better would be to ask yourself why you were trying to fit a model that is ostensibly for counts to a response that is a continuous (non-negative) variable?
If the answer is you had a count but then normalised it in some way (say by dividing by some measure of effort like area surveyed or length of observation time) then you should use an offset of the form + offset(log(effort_var)) added to the model formula, and use the original non-normalised integer variable as the response.
If you really have a continuous response and the poisson was an over sight, try fitting with family = Gamma(link = "log")) or family = tw().
If it's something else, you should edit your question to include that info and perhaps we here can help or the question could be migrated to CrossValidated if the issue is more statistical in nature.

Why does R produce non-finite values error when using lm or plm

read_dta("PSID.dta")
PSID <- read_dta("PSID.dta")
new_id <- PSID[PSID$id>=5000,]
I am trying to create a linear regression model of such, both OLS and Fixed Effect Estimator on RStudio based on a panel data set where the variables are:
var_list = c("id","year","txhw","totaldonation")
and the linear model regression I am willing to produce is:
log(totaldonation) = β0 + β1 log(txhwit) + α + u
α represents the unobserved individual heterogeneity of individual households.
My coding is as below, but it is giving me errors
reg_ols <- lm(log(totaldonation) ~ log(txhw), data=new_id)
Above reg_ols gives an error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'x'"
reg_fe <- plm(log(totaldonation) ~ log(txhw), data=new_id, method="within")
Above reg_fe gives me an error in model.matrix.pdata.frame(data, rhs = 1, model = model, effect = effect, :
model matrix or response contains non-finite values (NA/NaN/Inf/-Inf)
There's no NA values within my data set, what could I do to resolve these problems?
I've tried using complete cases as below, but am not too sure if it is the right method.
new_id <- new_id[complete.cases(new_id),]

Why am I getting NA's for sigma in this gamlss call?

The following question was asked by Michael Barton on Cross Validated and rejected because it was deemed to be a computer question. Regardless, I personally think the question is interesting and am wondering if it can be answered here.
The original post is here.
I am fitting a gamlss model with the call:
gamlss(formula = image_name + random(biological_source_name) - 1,
sigma.formula = biological_source_name - 1,
family = "NBI",
data = na.omit(data))
After three iterations I get an error:
GAMLSS-RS iteration 1: Global Deviance = 3814
GAMLSS-RS iteration 2: Global Deviance = 7760
GAMLSS-RS iteration 3: Global Deviance = 7756
In digamma(y + (1/sigma)) : NaNs produced
In digamma(1/sigma) : NaNs produced
In digamma(y + (1/sigma)) : NaNs produced
In digamma(1/sigma) : NaNs produced
Error in glim.fit(f = sigma.object, X = sigma.X, y = y, w = w, fv = sigma, :
NA's in the working vector or weights for parameter sigma
This suggests to me that the estimated sigma for some of the
categorical predictors is going to 0. Would this be correct?
Any suggestions on how to go about resolving this?
I contacted the authors regarding this. The issue is that a negative binomial is only able to model over dispersion, whereas my data contains both under- and over-dispersed output variables, between different dependent variable groups. This results in the error for the sigma going to 0.
The problem could be that the data are underdispered. and sigma goes to zero and the derivatives produced NA’s.
Try to fit double Poisson DPO() in this specific data set.
As recommended by the one of the authors, a distribution such as double poisson allows for fitting this because the standard deviation can be modelled being both more or less than the mean. When using this distribution, this solved the above problem for me and I was able to fit a model.
gamlss(formula = metric ~ image_name + random(biological_source_name) - 1,
sigma.formula = ~ biological_source_name - 1,
family = "DPO",
data = na.omit(data))
Note the use of DPO in the above example.

Using non-integers vs integers: warnings with non-integers but model won't run with integers

I am having some trouble running negative binomial models. Basically, I have a dataset with counts of animals. However, the effort is different and therefore I can calculate the rate of animals per day. I am doing this with quite a big dataset (>100000 observations). I am quite surprised I couldn't find other topics that covered my question, if you know one: would be helpful!
When trying to fit a model to my data, I run into some problems. Either I run a negative binomial model with the rates
> m1<-glm.nb(Rates ~ Par1+Par2+...+Par7+Par8,data=data)
and then I get the following warning messages:
>Warning messages:
1: In dpois(y, mu, log = TRUE) : non-integer x = 25.913718
2: In dpois(y, mu, log = TRUE) : non-integer x = 5.457385
3: In dpois(y, mu, log = TRUE) : non-integer x = 2.195133
4: In dpois(y, mu, log = TRUE) : non-integer x = 2.721088
5: In dpois(y, mu, log = TRUE) : non-integer x = 6.971678
6: In dpois(y, mu, log = TRUE) : non-integer x = 21.863799
7: In dpois(y, mu, log = TRUE) : non-integer x = 5.300733
8: In dpois(y, mu, log = TRUE) : non-integer x = 7.157865
9: In dpois(y, mu, log = TRUE) : non-integer x = 14.117588
10: In dpois(y, mu, log = TRUE) : non-integer x = 6.505993, etc.
Or I run the model with an offset
> m2<-glm.nb(Count ~ Par1+Par2+...+Par7+Par8+offset(Effort),data=data)
This however gives the following error:
> Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted rates numerically 0 occurred
I have already tried providing the coefficients of the first model as starting coefficients for the second, but this won't work. Also using the package pscl doesnt work, or increasing the amount of iterations. This is a subset of my data (one species) with very few zeros.
Any suggestions? I feel that actually the second way of modelling this is the proper way of doing it, but I don't know how to get this model to run. Any ideas? Would be much appreciated.
You almost certainly want one of the following, assuming Rates = Count/Effort. Either fit the rate, and use effort as a weighting variable:
glm.nb(Rates ~ *, weights=Effort, data=data)
Or, fit the counts, and use log(effort) as an offset:
glm.nb(Count ~ * + offset(log(Effort)), data=data)
See also my answer on CrossValidated about offsets in poisson/negative binomial models.

Resources