Why does ivreg function generate strange error - r

I'm trying to use instrument variables for the following three variables: pwtopen, inc, incsqr with the following three instruments: elhsfs, incf, incfsqr. polity is an exegenous variable.
answer<- ivreg(sulfdm ~ polity + pwtopen + inc + incsqr|polity + elhsfs + incf + incfsqr,
mydata)
I am then getting the error message:
Error in ivreg(sulfdm ~ polity + pwtopen + inc + incsqr | polity + elhsfs + :
length(formula)[1] == 1L is not TRUE
Any thoughts? Thanks

You should use the arguments in this case:
ivreg(formula, instruments, data, subset, na.action, weights, offset,
contrasts = NULL, model = TRUE, y = TRUE, x = FALSE, …)
As you didn't specify the arguments, the function tried to use your data as "instrument".
So, this should solve your problem:
answer<- ivreg(sulfdm ~ polity + pwtopen + inc + incsqr|polity + elhsfs + incf + incfsqr,
data = mydata)

I faced the same issue. Rewriting the command as below solved it:
answer <- ivreg(sulfdm ~ polity + pwtopen + inc + incsqr, ~polity + elhsfs + incf + incfsqr, mydata)

Related

How can I solve "invalid 'times' value" error in tidymv package in R?

I am using a generalized additive method (gam) in R and trying to use "tidymv" package for it's visualisation. My gam model is the following:
gam_reg <-
gam(
asinhnondurable ~ as.factor(CountyName) + zero5 + m611 + f611 + m1217 + f1217 +
m1864Illiterate + f1864Illiterate + m1864Primary +
f1864Primary + m1864Secondary + f1864Secondary +
m1864University + f1864University + AgeCtgover65 +
car + HomeOwner + Land + Livestock + Year +
Year * as.factor(CountyName) + Year2 * as.factor(CountyName) +
s(tempshock) + s(rainfallshock),
family = gaussian,
data = data,
weights = data$Weight
)
And then I use the following code for visualizasion:
model_p <- predict_gam(gam_reg)
but, I face the following error:
Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
invalid 'times' value
Does anyone know what the problem is and how I could solve the problem?

String formula for linear models

I'm trying to create a string formula with the independent variables that are significant within my linear model, though I'm finding it difficult trying to include the + at the end of each variable.
I have tried:
as.formula(sprintf("encounter ~ %s",
names(tbest$model)[-1]))
However, this only gives the first variable:
encounter ~ open_shrubland
Warning message:
Using formula(x) is deprecated when x is a character vector of length > 1.
Consider formula(paste(x, collapse = " ")) instead.
How would I include all of them such that: encounter ~ X1 + X2 + X3 ..., also, can this be made functional, such that if I wanted to remove a variable, I would only have to do my.formula[-3] to remove it?
list of variable names:
c("open_shrubland", "Appalachian_Mountains", "Boreal_Hardwood_Transition",
"Central_Hardwoods", "Piedmont", "wetland", "Badlands_And_Prairies",
"Peninsular_Florida", "Central_Mixed_Grass_Prairie", "water",
"New_England_Mid_Atlantic_Coast", "grassland", "mixed_forest",
"cropland", "Oaks_And_Prairies", "Eastern_Tallgrass_Prairie",
"evergreen_needleleaf", "year", "pland_change", "evergreen_broadleaf",
"Southeastern_Coastal_Plain", "Prairie_Potholes", "Shortgrass_Prairie",
"urban", "Prairie_Hardwood_Transition", "Lower_Great_Lakes_St.Lawrence_Plain",
"mosaic", "Mississippi_Alluvial_Valley", "deciduous_broadleaf",
"deciduous_needleleaf", "barren")
Using reformulate will be helpful.
reformulate(names(tbest$model)[-1], 'encounter')
If the list of variable names are in x :
reformulate(x, 'encounter')
encounter ~ open_shrubland + Appalachian_Mountains + Boreal_Hardwood_Transition +
Central_Hardwoods + Piedmont + wetland + Badlands_And_Prairies +
Peninsular_Florida + Central_Mixed_Grass_Prairie + water +
New_England_Mid_Atlantic_Coast + grassland + mixed_forest +
cropland + Oaks_And_Prairies + Eastern_Tallgrass_Prairie +
evergreen_needleleaf + year + pland_change + evergreen_broadleaf +
Southeastern_Coastal_Plain + Prairie_Potholes + Shortgrass_Prairie +
urban + Prairie_Hardwood_Transition + Lower_Great_Lakes_St.Lawrence_Plain +
mosaic + Mississippi_Alluvial_Valley + deciduous_broadleaf +
deciduous_needleleaf + barren
We can create a formula with paste
as.formula(paste('encounter~', paste(names(tbtest$model)[-1], collapse = "+")))

Add column in regression output

I would like to add a column with the X² (Chi-Square) as well as a column with the Exp(B) into my regression output. Is there any idea on how to do this? Many thanks in advance. For now i have calculated this manually for every model and variable, which is quite time consuming.
model_simple <- as.formula("completion_yesno ~ ac + ov + UCRate + FirstWeek + LastWeek + DayofWeekSu + DayofWeekMo + DayofWeekTu + DayofWeekWe + DayofWeekTh + DayofWeekFr + MonthofYearJan + MonthofYearFeb + MonthofYearMar + MonthofYearApr +MonthofYearMay+ MonthofYearJun + MonthofYearJul + MonthofYearAug + MonthofYearSep + MonthofYearOct + MonthofYearNov")
clog_simple1 = glm(model_simple,data=cllw,family = binomial(link = cloglog))
summary(clog_simple1)
Maybe you can elaborate on what you mean by chi-square and exp(B). You can do the below:
da <- MASS::Pima.tr
model <- glm(type ~ .,data=da,family = binomial(link = cloglog))
results <- data.frame(coefficients(summary(model)),check.names=FALSE)
# some random values
results$chisq = rchisq(nrow(results),1)
results$expB = exp(results$Estimate)
Or you can use tidy from broom:
library(broom)
results = tidy(model)
results$expB = exp(results$Estimate)

upper scope has term ‘NA’ not included in model

I am working on a data set and would like to do step wise logistic regression using some variables and to do so I am using the add1() function in R. A sample of the data set can be downloaded from the link here: https://drive.google.com/file/d/0B0N-Nc7kEi4bVjhDd1FDaEE5cEE/view?usp=sharing
I thereby fit a logistic regression using:
train <- read.csv('training.csv')
glm.model_step_1 <- glm(loan_status ~ acc_open_past_24mths + annual_inc + avg_cur_bal + bc_open_to_buy + delinq_2yrs + dti + inq_last_6mths + installment + int_rate + mo_sin_old_il_acct + mo_sin_old_rev_tl_op + mo_sin_rcnt_rev_tl_op + mo_sin_rcnt_tl + mort_acc + mths_since_last_delinq + mths_since_recent_bc + mths_since_recent_inq + num_accts_ever_120_pd + num_actv_bc_tl + num_actv_rev_tl + num_bc_tl + num_il_tl + num_op_rev_tl + num_tl_op_past_12m + pct_tl_nvr_dlq + percent_bc_gt_75 + pub_rec_bankruptcies + revol_bal + revol_util + term + total_acc + total_bc_limit + total_il_high_credit_limit + fico_mean + addr_state + emp_length + verification_status + Count_NA + Info_missing + Engineer + Teacher + Doctor + Professor + Manager + Director + Analyst + senior + lead + consultant + home_ownership_own + home_ownership_rent + purpose_debt_consolidation + purpose_medical + purpose_credit_card + purpose_other,
data = train,
family = binomial(link = 'logit'))
And use the add1() function to do a forward selection.
add1(glm.model_step_1, scope = train)
This code does not work. I get the below error:
Error in factor.scope(attr(terms1, "factors"), list(add = attr(terms2, :
upper scope has term ‘NA’ not included in model
Does anyone know how to solve this error?
A question asked previously on datascience.stackexchange (https://datascience.stackexchange.com/questions/11604/checking-regression-coefficients-stability) mentioned checking for NAs. There aren't any NAs in the data set and that can be confirmed by running sapply(train, function(x) sum(is.na(x))
The train dataset of #Jash Sash has some anomalous values inside which force read.csv to read some numerical variables as factors with many categories.
Anyway, I consider here a model with only few variables in order to show how to avoid the error message reported above.
Remember that the scope argument must be a "formula giving the terms to be considered for adding or dropping"; it cannot be a data.frame like in the code of #Jash Sash.
train <- read.csv('training.csv')
numeric <- apply(train,2,is.factor)
glm.model_step_1 <- glm(loan_status ~ acc_open_past_24mths + avg_cur_bal + bc_open_to_buy,
data = na.omit(train),
family = binomial(link = 'logit'))
add1(glm.model_step_1, scope=~.+delinq_2yrs+inq_last_6mths+int_rate)
The results is:
Model:
loan_status ~ acc_open_past_24mths + avg_cur_bal + bc_open_to_buy
Df Deviance AIC
<none> 1038.6 1046.6
delinq_2yrs 1 1037.9 1047.9
inq_last_6mths 1 1038.0 1048.0
int_rate 1 1038.0 1048.0

Trying to run glmulti with quasipoisson

I have tried running the following equation:
modruns2 <- glmulti(SRI ~ TVIS + SNH3000 + BU1000 + TBE250 + TS3000 + TAB3000 +
PELHL1000 + SR250 + SW250, data = data1, family = quasipoisson,
glmulti-cvalue = 6.219132, level =1, maxsize = 4, crit = qaicc)
And it comes up with the error:
Error: unexpected '=' in "modruns2 <- glmulti(SRI ~ TVIS + SNH3000 + BU1000 + TBE250 +
TS3000 + TAB3000 + PELHL1000 + SR250 + SW250, data = data1, family = quasipoisson, glmulti-
cvalue ="
I have been trying to follow http://cran.r-project.org/web/packages/glmulti/glmulti.pdf & http://vcalcagnoresearch.wordpress.com/package-glmulti/
But it seems I may be specifying the c-hat incorrectly.
Any help?
Install the package named R.utils.
What LyzandeR suggests was correct. But instead of
setOption('glmulti-cvalue'=6.219132)
use
setOption('glmulti-cvalue',6.219132)
I had the same issue, and I found answer in an example on this page.

Resources