String formula for linear models - r

I'm trying to create a string formula with the independent variables that are significant within my linear model, though I'm finding it difficult trying to include the + at the end of each variable.
I have tried:
as.formula(sprintf("encounter ~ %s",
names(tbest$model)[-1]))
However, this only gives the first variable:
encounter ~ open_shrubland
Warning message:
Using formula(x) is deprecated when x is a character vector of length > 1.
Consider formula(paste(x, collapse = " ")) instead.
How would I include all of them such that: encounter ~ X1 + X2 + X3 ..., also, can this be made functional, such that if I wanted to remove a variable, I would only have to do my.formula[-3] to remove it?
list of variable names:
c("open_shrubland", "Appalachian_Mountains", "Boreal_Hardwood_Transition",
"Central_Hardwoods", "Piedmont", "wetland", "Badlands_And_Prairies",
"Peninsular_Florida", "Central_Mixed_Grass_Prairie", "water",
"New_England_Mid_Atlantic_Coast", "grassland", "mixed_forest",
"cropland", "Oaks_And_Prairies", "Eastern_Tallgrass_Prairie",
"evergreen_needleleaf", "year", "pland_change", "evergreen_broadleaf",
"Southeastern_Coastal_Plain", "Prairie_Potholes", "Shortgrass_Prairie",
"urban", "Prairie_Hardwood_Transition", "Lower_Great_Lakes_St.Lawrence_Plain",
"mosaic", "Mississippi_Alluvial_Valley", "deciduous_broadleaf",
"deciduous_needleleaf", "barren")

Using reformulate will be helpful.
reformulate(names(tbest$model)[-1], 'encounter')
If the list of variable names are in x :
reformulate(x, 'encounter')
encounter ~ open_shrubland + Appalachian_Mountains + Boreal_Hardwood_Transition +
Central_Hardwoods + Piedmont + wetland + Badlands_And_Prairies +
Peninsular_Florida + Central_Mixed_Grass_Prairie + water +
New_England_Mid_Atlantic_Coast + grassland + mixed_forest +
cropland + Oaks_And_Prairies + Eastern_Tallgrass_Prairie +
evergreen_needleleaf + year + pland_change + evergreen_broadleaf +
Southeastern_Coastal_Plain + Prairie_Potholes + Shortgrass_Prairie +
urban + Prairie_Hardwood_Transition + Lower_Great_Lakes_St.Lawrence_Plain +
mosaic + Mississippi_Alluvial_Valley + deciduous_broadleaf +
deciduous_needleleaf + barren

We can create a formula with paste
as.formula(paste('encounter~', paste(names(tbtest$model)[-1], collapse = "+")))

Related

This an event study difference in differences regression equation. how can we transform this stata code to R code?

reg laccidentsvso2 weakban strongban lpop lunemp permale2 lrgastax laccidentmv2 st1-st50 t1-t48 time stt1-stt50 [aweight=pop],cluster(state)
There is no direct equivalent of the Stata regress command in R, but the following code should produce the same results:
library(lmtest)
model1 <- lm(laccidentsvso2 ~ weakban + strongban + lpop + lunemp + permale2 + lrgastax + laccidentmv2 + st1-st50 + t1-t48 + time + stt1-stt50, data = rstata, weights = aweight)
model2 <- lm(laccidentsvso2 ~ weakban + strongban + lpop + lunemp + permale2 + lrgastax + laccidentmv2 + st1-st50 + t1-t48 + time + stt1-stt50 + cluster(state), data = rstata, weights = aweight)
coeftest(model1, model2)

r glmmLasso : Error in n %*% s (glmm.rmd#64): requires numeric/complex matrix/vector arguments

I am trying to fit a glmmLasso model using this code:
lasso <- glmmLasso(stars ~ ADJRIND + AUC + KAPPA + DICE + ICCORR + JACRD + MUTINF + RNDIND + SURFOVLP + SURFDICE + VOLSMTY + HDRFDST + AVGDIST + MAHLNBS + VARINFO + GCOERR + PROBDST + SNSVTY + SPCFTY + PRCISON + RECALL + FMEASR + ACURCY + FALLOUT + TP + FP + TN + FN + GTVOL + SEGVOL, rnd = ~1|participant, family = acat(), data = mixedModel_df, lambda=10, switch.NR=TRUE, control=list(print.iter=TRUE))
I get the following error msg:
Error in n %*% s (glmm.rmd#64): requires numeric/complex matrix/vector arguments
all the IVs and my DV are doubles, only my random variable is a factor.
Also what should I specify for family?
As I requested here the head of my dataframe:
"","condition","algorithm","patient","participant","stars","ADJRIND","AUC","KAPPA","DICE","ICCORR","JACRD","MUTINF","RNDIND","SURFOVLP","SURFDICE","VOLSMTY","HDRFDST","AVGDIST","MAHLNBS","VARINFO","GCOERR","PROBDST","SNSVTY","SPCFTY","PRCISON","RECALL","FMEASR","ACURCY","FALLOUT","TP","FP","TN","FN","GTVOL","SEGVOL"
"1","rnd","BIIPL-rnd","brats_2013_pat0116_1","ablaze_gull",3,0.756964689377408,0.926221010756581,0.768802263900468,0.776064818441724,0.768315078860883,0.65050064332349,0.135475119060726,0.971161689934318,0.874015774267621,0.887817577382146,0.866059384942721,14.6242696243608,0.938245742350098,0.293565667284625,0.160520618779326,0.0249053133772558,0.334382929069828,0.861652251473578,0.990789770039584,0.744264326384239,0.861652251473578,0.776064818441724,0.985270803169769,0.00921022996041633,669.333333333333,198.222222222222,20804.6666666667,131.444444444444,800.777777777778,867.555555555556
"2","simple","simple","brats_2013_pat0116_1","ablaze_gull",6,0.864182045030138,0.954469433017555,0.870656560504094,0.874123730811541,0.870654729849903,0.779756022847205,0.164077165463804,0.986913573761583,0.974186271807427,0.979900170507277,0.957512050579079,6.97899055058698,0.175777792477891,0.114795641459174,0.0924422799237813,0.0125654163033296,0.147516600257771,0.913391157391641,0.99554770864347,0.839991365397643,0.913391157391641,0.874123730811541,0.99340680392773,0.00445229135652993,749.333333333333,92.4444444444444,20910.4444444444,51.4444444444444,800.777777777778,841.777777777778
"3","zyx","zyx","brats_2013_pat0116_1","ablaze_gull",5,0.870755923933255,0.952679073225429,0.876764151799132,0.879962617937312,0.876762586811393,0.78886242309958,0.165135328034368,0.98790705513673,0.981797036327553,0.980082675921329,0.964137575847861,7.45373777515636,0.172965178585364,0.102381275486303,0.0869596431464502,0.0116484532775206,0.139556099413398,0.909124934179681,0.996233212271177,0.855306841266058,0.909124934179681,0.879962617937312,0.993911405460269,0.00376678772882303,746,78.8888888888889,20924,54.7777777777778,800.777777777778,824.888888888889
"4","rnd","NJIT4321-rnd","brats_2013_pat0130_1","ablaze_gull",4.66666666666667,0.85306228612433,0.950463081424465,0.860901046931854,0.865114219533763,0.860857546901481,0.765289506462121,0.165902118593191,0.983989082060977,0.976890577030139,0.974069012629503,0.90775738368346,4.81983416902026,0.180602636858228,0.11416432081964,0.104953444577379,0.0149408163861948,0.15932446284558,0.905340924248747,0.995585238600183,0.849506729173741,0.905340924248747,0.865114219533763,0.991917129836001,0.00441476139981669,725.222222222222,89.2222222222222,20432.4444444444,81.1111111111111,806.333333333333,814.444444444444
"5","simple","simple","brats_2013_pat0130_1","ablaze_gull",5.33333333333333,0.85114266479814,0.948252440431362,0.858939084118719,0.863141736009654,0.858899132291815,0.762269269811805,0.165170605240332,0.984029969073529,0.978056597734505,0.975025528879729,0.910458708845115,4.24374989738083,0.179670438458937,0.107656330926193,0.104676641737853,0.0148917217218999,0.161945633247734,0.900627942177379,0.995876938685345,0.848419539019799,0.900627942177379,0.863141736009654,0.991938377123562,0.00412306131465471,719.888888888889,83.5555555555556,20438.1111111111,86.4444444444444,806.333333333333,803.444444444444
"6","zyx","zyx","brats_2013_pat0130_1","ablaze_gull",5,0.852937839717816,0.946405048878467,0.860405209999464,0.864421349815021,0.860343196696214,0.765679742754649,0.167425516242723,0.984621691861738,0.992079844226351,0.975692379970523,0.886529546039636,5.87456263780313,0.190822351731851,0.110378761504372,0.0990425203771905,0.0141114720044004,0.162026913526487,0.896740018774269,0.996070078982665,0.864244423545339,0.896740018774269,0.864421349815021,0.992240792064425,0.00392992101733526,722.333333333333,79.2222222222222,20442.4444444444,84,806.333333333333,801.555555555556
"7","rnd","UTintelligence-rnd","brats_2013_pat0134_1","ablaze_gull",1,0.439786545572739,0.734625155724745,0.465756360468916,0.488884490908871,0.46288397904837,0.347261286186172,0.104377530895259,0.907741141155794,0.765389141177261,0.654499477639931,0.696681027725238,21.7259079503732,2.41851089469065,0.46886831548909,0.401259375354614,0.0738450010093984,1.33323405520428,0.492371538620209,0.976878772829282,0.594188643079343,0.492371538620209,0.488884490908871,0.951254405933036,0.0231212271707184,736.444444444444,481.777777777778,19740.8888888889,574.555555555556,1311,1218.22222222222
"8","simple","simple","brats_2013_pat0134_1","ablaze_gull",3.33333333333333,0.785524625502044,0.901316639398279,0.801388538397059,0.810880384656072,0.801313556069467,0.696584302979966,0.207963331040685,0.965077426818364,0.958667243569809,0.928427262204787,0.916562197208537,12.6546214655814,0.472279577962782,0.107353421791921,0.197743216013271,0.0320484916833766,0.26517778145634,0.809476686023845,0.993156592772713,0.824707032687797,0.809476686023845,0.810880384656072,0.982142000911457,0.0068434072272868,1066.88888888889,141.222222222222,20081.4444444444,244.111111111111,1311,1208.11111111111
"9","zyx","zyx","brats_2013_pat0134_1","ablaze_gull",5.33333333333333,0.808816961290452,0.915389573957884,0.82255657191667,0.830680387649496,0.822524512693832,0.724253190335505,0.22263294830665,0.969975154484348,0.968502955091587,0.936460323238279,0.935112385680564,14.3628067945127,0.484728125133601,0.0985229007138266,0.177639450016539,0.0279718268170166,0.229407179518171,0.837724911061577,0.99305423685419,0.832290331137827,0.837724911061577,0.830680387649496,0.984701258781098,0.00694576314580967,1122.22222222222,144.555555555556,20078.1111111111,188.777777777778,1311,1266.77777777778
"10","rnd","Misfits-rnd","brats_2019_138_1","ablaze_gull",3,0.516106062592575,0.873715987224171,0.523038256992382,0.527656540346325,0.521661240623707,0.432121538305335,0.101222815104127,0.977103022412987,0.620103370493099,0.650702373280009,0.601390174958572,19.2537294363992,3.56344098531587,0.742348348696815,0.118109524669584,0.017378193265862,10.1585752147551,0.754093369661688,0.993338604786654,0.503577431529888,0.754093369661688,0.527656540346325,0.988377655513698,0.00666139521334549,447.111111111111,136.666666666667,20606,97.8888888888889,545,583.777777777778
"11","simple","simple","brats_2019_138_1","ablaze_gull",3.66666666666667,0.706452946686834,0.864953464263324,0.708649608736678,0.709924711751815,0.708580745993345,0.612548925090868,0.126104497349992,0.994785970651161,0.89537673459454,0.822033177974469,0.779958731884256,19.8501973757616,2.61846276424863,0.467761562013242,0.0385223535652684,0.00460614018952978,0.716819072659924,0.730891949753302,0.999014978773347,0.814645168472805,0.730891949753302,0.709924711751815,0.997383637222852,0.000985021226653005,510.777777777778,19.3333333333333,20723.3333333333,34.2222222222222,545,530.111111111111
"12","zyx","zyx","brats_2019_138_1","ablaze_gull",3.66666666666667,0.610753810401177,0.779712151049051,0.612727017856821,0.613869106450475,0.612524486488184,0.551363347584298,0,0.994766944943347,0,0.676414733802843,0.632006181002577,32.7153630434273,5.88313878476988,0.904099638796773,0,0.103402624869185,0.195759812968779,0.559908770106544,0.999515531991557,0.743154734248098,0.559908770106544,0.613869106450475,0.997373332439284,0.000484468008443052,500.444444444444,9.33333333333333,20733.3333333333,44.5555555555556,545,509.777777777778
"13","rnd","jaguars-rnd","brats_2019_141_1","ablaze_gull",4.33333333333333,0.776798511092125,0.9317995657587,0.788648121769615,0.795339119656737,0.788418290173286,0.679887517482747,0.196866342094433,0.974220482589075,0.872334990138839,0.900384822932791,0.847074704404622,16.5906961635536,0.919603987391872,0.123497536657694,0.155282955291875,0.023063324485323,0.298026651717319,0.870914581312203,0.992684550205198,0.777214433083584,0.870914581312203,0.795339119656737,0.986931119159787,0.00731544979480233,882.888888888889,136.555555555556,18413.5555555556,120.333333333333,1003.22222222222,1019.44444444444
"14","simple","simple","brats_2019_141_1","ablaze_gull",5,0.835890266663824,0.90833613741848,0.8426182511776,0.846277863042727,0.842552946270031,0.760738977047531,0.217793633127813,0.986032615479199,0.968525544930771,0.950089365515111,0.922774534238376,9.02494724371663,0.365712977970675,0.147826375535477,0.0979926364847341,0.0130304763502148,0.233237811437155,0.819784468248017,0.996887806588943,0.895772876952561,0.819784468248017,0.846277863042727,0.992962414786454,0.00311219341105724,923.111111111111,56.6666666666667,18493.4444444444,80.1111111111111,1003.22222222222,979.777777777778
"15","zyx","zyx","brats_2019_141_1","ablaze_gull",5,0.806421117677648,0.884725093453355,0.81387284154689,0.817936172710702,0.813668896379885,0.731083455431912,0.211748115057157,0.984046725668933,0.990145970199641,0.916421870155159,0.863140738937082,8.49706824661522,0.499082439574537,0.211907909066182,0.10400475182111,0.0143409339546113,0.336382757538914,0.772799817859347,0.996650369047362,0.917527401034257,0.772799817859347,0.817936172710702,0.991951857957291,0.00334963095263752,906,60.3333333333333,18489.7777777778,97.2222222222222,1003.22222222222,966.333333333333
"16","rnd","FightGliomas-rnd","brats_2019_90_1","ablaze_gull",6,0.968257954413161,0.987093362528072,0.970800193935607,0.972048533746708,0.970801355739778,0.94570310089342,0.204682906269684,0.995299728580093,0.99851769262895,0.99773209120061,0.990706184810191,2.35337109861099,0.0344489741039471,0.0285055152008294,0.0397278812479393,0.00464629073617492,0.0288059740714348,0.975754720290816,0.998432004765327,0.9685629127278,0.975754720290816,0.972048533746708,0.997641026050083,0.00156799523467276,770.222222222222,30.7777777777778,20064.1111111111,18.5555555555556,788.777777777778,801
"17","simple","simple","brats_2019_90_1","ablaze_gull",5.66666666666667,0.952329476723764,0.982446168742101,0.955913951405959,0.957702789853,0.955915379517709,0.918920095273011,0.198412758997542,0.993254294603284,0.9948496225819,0.995843213411434,0.981882531870561,3.4166433002747,0.0553223101655522,0.0321278128256843,0.0537364283831558,0.00663399526436313,0.0442181038384758,0.967137999982214,0.997754337501988,0.949079574914114,0.967137999982214,0.957702789853,0.996610565192861,0.00224566249801225,762.111111111111,44.4444444444444,20050.4444444444,26.6666666666667,788.777777777778,806.555555555556
"18","zyx","zyx","brats_2019_90_1","ablaze_gull",5.66666666666667,0.957669116253882,0.985320290447796,0.960897795289467,0.962499846465236,0.960898699250652,0.927832221824958,0.201587609349249,0.993957447335076,0.998264235195115,0.996885441695005,0.977337562590699,3.15931022051951,0.0457814118164736,0.025446184952602,0.0483538818641294,0.00592110152408344,0.0390370747623204,0.972853877376545,0.997786703519048,0.953321254089481,0.972853877376545,0.962499846465236,0.996965340044328,0.00221329648095196,769,43.7777777777778,20051.1111111111,19.7777777777778,788.777777777778,812.777777777778
"19","rnd","Tyagi-rnd","brats_MDA_945_1","ablaze_gull",4.33333333333333,0.745555569694423,0.885669742953136,0.759263828544658,0.767838981407406,0.759157345204569,0.633507916050858,0.146023656683058,0.967517603171903,0.881010728788507,0.88835140017712,0.910730480827375,12.3514245536757,0.815941012416867,0.280601413840585,0.188352804604093,0.0295388388785871,0.328332764553787,0.78016854749535,0.991170938410922,0.772205515354681,0.78016854749535,0.767838981407406,0.983443927479636,0.00882906158907816,717.111111111111,180.111111111111,20582.4444444444,175.666666666667,892.777777777778,897.222222222222
"20","simple","simple","brats_MDA_945_1","ablaze_gull",5.33333333333333,0.826356297027293,0.916198592408247,0.836785251821064,0.842660724647247,0.836755885662414,0.731791097441842,0.169311412198244,0.977814120503358,0.967413416696995,0.953399039161258,0.933051000360542,7.68751024370488,0.267137250532801,0.106993910988983,0.141276068501097,0.0208114845989645,0.191602981147392,0.838959573262704,0.993437611553791,0.857374950750217,0.838959573262704,0.842660724647247,0.98876460334073,0.00656238844620888,785.555555555556,131.888888888889,20630.6666666667,107.222222222222,892.777777777778,917.444444444444
edit
I somehow got a calculation running now by reducing some predictors and switching the family variable and getting rid of my random variables for the moment:
glm3 <- glmmLasso(stars ~ ADJRIND + AUC + KAPPA + DICE + ICCORR + JACRD + MUTINF + RNDIND + SURFOVLP + SURFDICE + VOLSMTY + HDRFDST + AVGDIST + MAHLNBS + VARINFO + GCOERR + PROBDST + SNSVTY + SPCFTY + PRCISON + RECALL + FMEASR + ACURCY + FALLOUT, rnd = NULL, family = cumulative(), data = mixedModel_df, lambda=10, switch.NR=TRUE, control=list(print.iter=TRUE))
However the calculation is stuck at iteration 89 for 30minutes now.
I have trouble deciding which one is the correct family type for my problem. This question is probably better suited for CV. How do I correctly specify my random variables, my final model will have two?

upper scope has term ‘NA’ not included in model

I am working on a data set and would like to do step wise logistic regression using some variables and to do so I am using the add1() function in R. A sample of the data set can be downloaded from the link here: https://drive.google.com/file/d/0B0N-Nc7kEi4bVjhDd1FDaEE5cEE/view?usp=sharing
I thereby fit a logistic regression using:
train <- read.csv('training.csv')
glm.model_step_1 <- glm(loan_status ~ acc_open_past_24mths + annual_inc + avg_cur_bal + bc_open_to_buy + delinq_2yrs + dti + inq_last_6mths + installment + int_rate + mo_sin_old_il_acct + mo_sin_old_rev_tl_op + mo_sin_rcnt_rev_tl_op + mo_sin_rcnt_tl + mort_acc + mths_since_last_delinq + mths_since_recent_bc + mths_since_recent_inq + num_accts_ever_120_pd + num_actv_bc_tl + num_actv_rev_tl + num_bc_tl + num_il_tl + num_op_rev_tl + num_tl_op_past_12m + pct_tl_nvr_dlq + percent_bc_gt_75 + pub_rec_bankruptcies + revol_bal + revol_util + term + total_acc + total_bc_limit + total_il_high_credit_limit + fico_mean + addr_state + emp_length + verification_status + Count_NA + Info_missing + Engineer + Teacher + Doctor + Professor + Manager + Director + Analyst + senior + lead + consultant + home_ownership_own + home_ownership_rent + purpose_debt_consolidation + purpose_medical + purpose_credit_card + purpose_other,
data = train,
family = binomial(link = 'logit'))
And use the add1() function to do a forward selection.
add1(glm.model_step_1, scope = train)
This code does not work. I get the below error:
Error in factor.scope(attr(terms1, "factors"), list(add = attr(terms2, :
upper scope has term ‘NA’ not included in model
Does anyone know how to solve this error?
A question asked previously on datascience.stackexchange (https://datascience.stackexchange.com/questions/11604/checking-regression-coefficients-stability) mentioned checking for NAs. There aren't any NAs in the data set and that can be confirmed by running sapply(train, function(x) sum(is.na(x))
The train dataset of #Jash Sash has some anomalous values inside which force read.csv to read some numerical variables as factors with many categories.
Anyway, I consider here a model with only few variables in order to show how to avoid the error message reported above.
Remember that the scope argument must be a "formula giving the terms to be considered for adding or dropping"; it cannot be a data.frame like in the code of #Jash Sash.
train <- read.csv('training.csv')
numeric <- apply(train,2,is.factor)
glm.model_step_1 <- glm(loan_status ~ acc_open_past_24mths + avg_cur_bal + bc_open_to_buy,
data = na.omit(train),
family = binomial(link = 'logit'))
add1(glm.model_step_1, scope=~.+delinq_2yrs+inq_last_6mths+int_rate)
The results is:
Model:
loan_status ~ acc_open_past_24mths + avg_cur_bal + bc_open_to_buy
Df Deviance AIC
<none> 1038.6 1046.6
delinq_2yrs 1 1037.9 1047.9
inq_last_6mths 1 1038.0 1048.0
int_rate 1 1038.0 1048.0

Error in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments

Have encountered this error whilst trying to train a neural network with R neuralnet package, I have 716 input variables that are either 1 or 0 and I am trying to predict a column that is also 1 or 0. I have looked around and so far this error appears to be caused by a non numeric value but all my values are numeric, so what is the possible problem?
Here is the code
train <- read.csv("training.csv")
n <- names(train)
f <- as.formula(paste("conclusion ~", paste(n[!n %in% "conclusion"], collapse = " + ")))
nn <- neuralnet(f, data=train, hidden = 1, threshold=0.4)
Here is the output of paste
conclusion ~ not + work + virus + me + have + down + sick + outbreak +
tabs + pill + pills + dont + do + n.t + fall + falling +
fell + study + school + fever + again + symptom + why + bird +
bad + shot + ill + return + cough + throat + soar + itch...........

Error in the class of random forest

I'm feeding a new set of data to the Random forest prediction model and encounter this error:
Error in checkData(oldData, RET) :
Classes of new data do not match original data
Here's the code:
fit1 <- cforest((b == 'three')~ affect+ certain+ negemo+ future+swear+sad
+negate+ppron+sexual+death + filler+leisure + conj+ funct + i
+future + past + bio + body+cause + cogmech + death +
discrep + future +incl + motion + quant + sad + tentat + excl+insight +percept +posemo
+ppron +quant + relativ + space + article + age + s_all + s_sad + gender
, data = trainset1,
controls=cforest_unbiased(ntree=500, mtry= 1))
testset2$pre_swl<-predict(fit1, newdata=testset2 , type='response')
Both of the training set and the test set are data.frame.

Resources