Estimating Dynamic Difference in Difference in R - r

I've been trying to estimate the above regression using a multiple cross section dataset, and I tried using the did library without success. I have a large dataset and I already formatted the data such that I have a event time dummy, but it gives an error. Treatment is in 2018 and outcome is emp, and base period should be 2017.
I tried:
df4<-df1[complete.cases(df1$treat),]
df4<-df4[complete.cases(df4$emp),]
df4<-df4[(df4$year>=2014),]
df4$g<-ifelse(df4$treat==1,2018,0)
att1 <- att_gt(yname = "emp",
tname = "period",
gname = "G",
xformla = ~treat+factor(month)+factor(year),
data = df4,
panel=FALSE
)
and it gives me
'
Error in DRDID::drdid_rc(y = Y, post = post, D = G, covariates = covariates, :
Outcome regression model coefficients have NA components.
Multicollinearity (or lack of variation) of covariates is a likely reason.
In addition: Warning messages:
1: glm.fit: algorithm did not converge
2: In DRDID::drdid_rc(y = Y, post = post, D = G, covariates = covariates, :
glm algorithm did not converge
'
I also did a regression using lm only but it implied insignificant results, which should not be the case for my assignment
`
ols1 <-lm(emp ~ relevel(factor(year),ref="2017")*treat+factor(month),
data=df4)
summary(ols1)
`

Related

Trouble in GAM model in R software

I am trying to run the following code on R:
m <- gam(Flp_pop ~ s(Flp_CO, bs = "cr", k = 30), data = data, family = poisson, method = "REML")
My dataset is like this:
enter image description here
But when I try to execute, I get this error message:
"Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)"
I am very new to R, maybe it is a very basic question. But does anyone know why this is happening?
Thanks!
The Poisson distribution has support on the non-negative integers and you are passing a continuous variable as the response. Here's an example with simulated data
library("mgcv")
library("gratia")
library("dplyr")
df <- data_sim("eg1", seed = 2) %>% # simulate Gaussian response
mutate(yabs = abs(y)) # make y non negative
mp <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
# fails
which reproduces the error you saw
Error in if (abs(old.score - score) > score.scale * conv.tol) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
The warnings are of the form:
$> warnings()[1]
Warning message:
In dpois(y, y, log = TRUE) : non-integer x = 7.384012
Indicating the problem; the model is evaluating the probability mass for your response data given the estimated model and you're evaluating this at the indicated non-integer value, which returns a 0 mass plus the warning.
If we'd passed the original Gaussian variable as the response, which includes negative values, the function would have errored out earlier:
mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
which raises this error:
r$> mp <- gam(y ~ s(x2, bs = "cr"), data = df,
family = poisson, method = "REML")
Error in eval(family$initialize) :
negative values not allowed for the 'Poisson' family
An immediate but not necessarily advisable solution is just to use the quasipoisson family
mq <- gam(yabs ~ s(x2, bs = "cr"), data = df,
family = quasipoisson, method = "REML")
which uses the same mean variance relationship as the Poisson distribution but not the actual distribution so we can get away with abusing it.
Better would be to ask yourself why you were trying to fit a model that is ostensibly for counts to a response that is a continuous (non-negative) variable?
If the answer is you had a count but then normalised it in some way (say by dividing by some measure of effort like area surveyed or length of observation time) then you should use an offset of the form + offset(log(effort_var)) added to the model formula, and use the original non-normalised integer variable as the response.
If you really have a continuous response and the poisson was an over sight, try fitting with family = Gamma(link = "log")) or family = tw().
If it's something else, you should edit your question to include that info and perhaps we here can help or the question could be migrated to CrossValidated if the issue is more statistical in nature.

Argument "weights" in bayesglm() function in R

I am building a default risk prediction model using bayesglm with the binomial method and I would like fit the model with weights, I am trying to use the principal vector (amount of money that a company has lent to a person) as weights, but I got these warning messages:
1: In bayesglm.fit(x = X, y = Y, weights = weights, start = start, : non-finite coefficients at iteration 4
2: algorithm did not converge
3: fitted probabilities numerically 0 or 1 occurred
The principal has a high variance, that could be a reason? I tried with the log and got also the same messages.
set.seed(123)
lm_D_O9<- bayesglm(sampleDefaultO_tr$Default ~ ., data = sampleDefaultO_tr[,-c(20,23,24,49:54,58,60:62)], family=binomial,control = list(maxit = 100),
weights=floor(log(sampleDefaultO_tr$mntTotal)*1000))
My repo here--> github.com/dclopezb9/Thesis
Thank you in advance!

Missings values for variable importance for neural network in Package IML in R

I try to get variables importance from a neural network with iml package in R. The dependant variable is binary and predictors are normalised. I get a missing value for every predictor. Here's the code I'm using:
library(mlr)
library(iml)
tsk = makeClassifTask(data = fullnorm, target = "churn")
rfa <- makeLearner("classif.nnet", predict.type = "prob")# cross validation with NN
mod = train(rfa, tsk)
X =fullnorm[which(names(fullnorm) != "churn")]
Y <- as.numeric(as.character(fullnorm$churn))
predictor = iml::Predictor$new(mod, data = X, y = Y)
imp = FeatureImp$new(predictor, loss = "f1")
plot(imp)
I get no message except the fact that missing values (i.e. all predictors) are not fit.
> plot(imp)
Warning messages:
1: Removed 15 rows containing missing values (geom_point).
2: Removed 15 rows containing missing values (geom_segment).

Error when trying to run fixed effects logistic regression

not sure where can I get help, since this exact post was considered off-topic on StackExchange.
I want to run some regressions based on a balanced panel with electoral data from Brazil focusing on 2 time periods. I want to understand if after a change in legislation that prohibited firm donations to candidates, those individuals that depended most on these resources had a lower probability of getting elected.
I have already ran a regression like this on R:
model_continuous <- plm(percentage_of_votes ~ time +
treatment + time*treatment, data = dataset, model = 'fd')
On this model I have used a continuous variable (% of votes) as my dependent variable. My treatment units or those that in time = 0 had no campaign contributions coming from corporations.
Now I want to change my dependent variable so that it is a binary variable indicating if the candidate was elected on that year. All of my units were elected on time = 0. How can I estimate a logit or probit model using fixed effects? I have tried using the pglm package in R.
model_binary <- pglm(dummy_elected ~ time + treatment + time*treatment,
data = dataset,
effects = 'twoways',
model = 'within',
family = 'binomial',
start = NULL)
However, I got this error:
Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start, :
argument "start" is missing, with no default
Why is that happening? What is wrong with my model? Is it conceptually correct?
I want the second regression to be as similar as possible to the first one.
I have read that clogit function from the survival package could do the job, but I dont know how to do it.
Edit:
this is what a sample dataset could look like:
dataset <- data.frame(individual = c(1,1,2,2,3,3,4,4,5,5),
time = c(0,1,0,1,0,1,0,1,0,1),
treatment = c(0,0,1,1,0,0,1,1,0,0),
corporate = c(0,0,0.1,0,0,0,0.5,0,0,0))
Based on the comments, I believe the logistic regression reduces to treatment and dummy_elected. Accordingly I have fabricated the following dataset:
dataset <- data.frame("treatment" = c(rep(1,1000),rep(0,1000)),
"dummy_elected" = c(rep(1, 700), rep(0, 300), rep(1, 500), rep(0, 500)))
I then ran the GLM model:
library(MASS)
model_binary <- glm(dummy_elected ~ treatment, family = binomial(), data = dataset)
summary(model_binary)
Note that the treatment coefficient is significant and the coefficients are given. The resulting probabilities are thus
Probability(dummy_elected) = 1 => 1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))
Probability(dummy_elected) = 0 => 1 - 1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))
Note that these probabilities are consistent with the frequencies I generated the data.
So for each row, take the max probability across the two equations above and that's the value for dummy_elected.

Prediction and Marginal Effects failure using mlogit() in R for a Nested Logit Model with updated data frame

I have run a Nested Logit model in R using the mlogit() package. I am now trying to measure marginal effects/elasticities and continue to run into an error. Here I have recreated the error by modifying the vignette by the package author:
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
m <- mlogit(mode ~ price | income | catch, data = Fish,
nests=list(water=c("boat","charter"),
land=c("beach","pier")))
# compute a data.frame containing the mean value of the covariates in the sample
z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean),
catch = tapply(catch, index(m)$alt, mean),
income = mean(income)))
# compute the marginal effects (the second one is an elasticity
effects(m, covariate = "income", data = z)
I get the following error:
Error in `colnames<-`(`*tmp*`, value = c("beach", "boat", "charter", "pier" :
attempt to set 'colnames' on an object with less than two dimensions
In addition: Warning message:
In cbind(Gb, Gl) :
number of rows of result is not a multiple of vector length (arg 2)
This works fine when I do not have a nested model (like a regular Multinomial Logit), and that has been covered in some previous stackoverflow questions, but something weird is happening specifically with the step of re-predicting on a changed data frame (in this case the means frame z).
Ill note that the solution here: marginal effects of mlogit in R did not help me.

Resources