I am trying to run a rather simple randomForest. I keep having an error code that does not make any sense to me. See code below.
test.data<-data.frame(read.csv("test.RF.data.csv",header=T))
attach(test.data)
head(test.data)
Depth<-Data1
STemp<-Data2
FPT<-Sr_hr_15
Stage<-stage_feet
Q<-discharge_m3s
V<-vel_ms
Turbidity<-turb_ntu
Day_Night<-day_night
FPT.rf <- randomForest(FPT ~ Depth + STemp + Q + V + Stage + Turbidity + Day_Night, data = test.data,mytry=1,importance=TRUE,na.action=na.omit)
Error in randomForest.default(m, y, ...) : data (x) has 0 rows
In addition: Warning message:
In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
I then run the dimensions to ensure there is infact data recognized in R
dim(test.data)
[1] 77 15
This is a subset of the complete data set I ran just to test if I could get it to run since I got the same error with the complete data set.
Why is it telling me data(x) has 0 rows when clearly there is.
Thanks
Related
I'm working on a project for my Economics capstone with a very large data set. This is my first time ever programming and I had to merge multiple data sets, 16 in total, with anywhere between 30,000-130,000 observations. I did experience an issue merging the data sets since certain data sets contained more columns than others, but I was able to address it using "rbind.fill" Afterwards, I attempted to run a regression but I encountered an error. The error was
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Here is the original code for the regression
ols_reg_mortcur1 <- lm(MORTCUR ~ EST_ST + WEEK + TBIRTH_YEAR + EGENDER + RHISPANIC +
RRACE + EEDUC + MS + THHLD_NUMPER + THHLD_NUMKID + THHLD_NUMADLT + WRKLOSS + ANYWORK +
KINDWORK + RSNNOWRK + UNEMPPAY + INCOME + TENURE + MORTCONF, data = set_up_weeks15st)
I googled the error for some possible solutions; I found solutions like "na.omit", "na.exclude"' etc. I tried these solutions to aval. This leads me to think I didn't implement them correctly or perhaps something went wrong with the merge itself. While I was cleaning the data I set unknown or missing values, listed as -88 or -99 in the data sets, to NA since I had to create a summary stats table. I'll attach my R doc. I do apologize for the length of the attached code below I was sure if to just attach the sections leading up to the regression or include other lines.
Based on the error message,
0 (non-NA) cases the likely reason is that you have at least one NA in each of your rows. (Easy to check this by using na.omit(set_up_weeks15st). This should return zero rows.)
In this case, setting na.action to na.omit or na.exclude is not going to help.
Try to find columns with most NA's and remove them, or impute the missing values using an appropriate method.
I'm running a simple lm model in R and I am trying to analyze the results using the DALEX package explain object.
My model is as follows: lm_model <- lm (DV ~ x + z, data = datax)
If it matters, x and z are factors and DV is numeric. The lm runs with no errors, and everything looks fine via summary(lm_model).
When I try to create the explain object in DALEX like so:
lm_exp <- DALEX::explain(lm_model, label = "lm", data = datax, y = datax$DV)
It gives me the following:
Preparation of a new explainer is initiated
-> model label : lm
-> data : 15375 rows 49 cols
-> data : tibbble converted into a data.frame
-> target variable : 15375 values
Error in if (is_y_in_data(data, y)) { :
missing value where TRUE/FALSE needed
Before the lm is run, datax is filtered for values between .2 and 1 using the subset command. Looking at summary(datax$DV) and sum(is.na(datax$DV)), everything looks fine. I also checked for blanks / errors using a filter in Excel. For those reasons, I do not believe there are any blanks in the DV col of datax, so I am unsure of why I am receiving "Error in if (is_y_in_data(data, y)) { :
missing value where TRUE/FALSE needed"
I have scoured the internet for this error when using DALEX explain, but I have not found any results. Thanks for any help that can be provided.
I'm trying to use esttab to output regression results in R. However, every time I run it I get an error:
Error in FUN(X[[i]], ...) : variable names are limited to 10000 bytes
. Any ideas how to solve it? My code is below:
reg <- lm(y ~ ln_gdp + diffxln_gdp + diff + year, data=df)
eststo(reg)
esttab(store=reg)
The input data comes from approx 25,000 observations. It's all coded as numeric. I can share more information that is deemed relevant but I don't know what that would be right now.
Thanks!
I'm trying to run a mixed effects model that includes three fixed effects with interaction and a random intercept and slope. The model I'm trying to specify in glmmadmb is:
> fit_zipoiss_ambig<-glmmadmb(AmbigCount~Posn.c*mood.c*Valence.c + offset(InputAmbig) + (1+Valence.c|mood.c/Chain), data = Data, zeroInflation = TRUE, family="poisson")
First I received this error message:
Error in Droplevels(eval(parse(text = x), data)) :
all grouping variables in random effects must be factors
So I used (as an example) fPosn.c=as.factor(Data$Posn.c)to convert all my predictors to factors. Then I ran this model:
> fit_zipoiss_ambig<-glmmadmb(AmbigCount~fPosn.c*fmood.c*fValence.c + offset(InputAmbig) + (1+fValence.c|fmood.c/Chain), data = Data, zeroInflation = TRUE, family="poisson")
Then I got this error:
Error in glmmadmb(AmbigCount ~ fPosn.c * fmood.c * fValence.c + offset(InputAmbig) + :
The function maximizer failed (couldn't find STD file) Troubleshooting steps include (1) run with 'save.dir' set and inspect output files; (2) change run parameters: see '?admbControl'
In addition: Warning message:
running command 'C:\Windows\system32\cmd.exe /c "C:/Program Files/R/R-3.2.2/library/glmmADMB/bin/windows64/glmmadmb.exe" -maxfn 500 -maxph 5 -noinit -shess' had status 1
I tried to follow the troubleshooting advice so included , admb.opts=admbControl(shess=FALSE,noinit=FALSE)) at the end of my model. Now I am receiving this error:
Error in glmmadmb(AmbigCount ~ fPosn.c * fmood.c * fValence.c + offset(InputAmbig) + :
rank of X = 106 < ncol(X) = 107
I have no idea what this error means. I'm hoping someone can help me work out how to specify my model in glmmadmb or failing that, some other package that will allow me to test a poisson or negative binomial distribution.
Without being able to run it myself, what jumps out at me is:
As far as your first error message, it is saying that the variables in your nested random-effects formula need to be factors.
Then, in your code: fPosn.c=as.factor(Data$Posn.c)
you are not creating "fPosn.c" within your data frame. To do that you need to run:
Data$fPosn.c = as.factor(Data$Posn.c)
I am trying to perform a negative binomial regression using R. When I am executing the following command:
DV2.25112013.nb <- glm.nb(DV2.25112013~ Bcorp.Geographic.Proximity + Dirty.Industry +
Clean.Industry + Bcorp.Industry.Density + State + Dirty.Region +
Clean.Region + Bcorp.Geographic.Density + Founded.As.Bcorp + Centrality +
Bcorp.Industry.Density.Squared + Bcorp.Geographic.Density.Squared +
Regional.Institutionalization + Sales + Any.Best.In.Class +
Dirty.Region.Heterogeneity + Clean.Region.Heterogeneity +
Ind.Dirty.Heterogeneity+Ind.Clean.Heterogeneity + Industry,
data = analysis25112013DF6)
R gives the following error:
Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
NA/NaN/Inf in 'x'
In addition: Warning message:
step size truncated due to divergence
I do not understand this error since my data matrix does not contain any NA/NaN/Inf values...how can I fix this?
thank you,
I think the most likely cause of this error are negative values or zeros in the data, since the default link in glm.nb is 'log'. It would be easy enough to test by changing link="identity". I also think you need to try smaller models .... maybe a quarter of those variables to start. That also lets you add related variables as bundles since it looks from the names that you have possibly severe potential for collinearity with categorical variables.
We really need a data description. I wondered about Dirty.Industry + Clean.Industry. That is the sort of dichotomy that is better handled with a factor variable that has those levels. That prevents the collinearity if Clean = not-Dirty. Perhaps similarly with your "Heterogeneity" variables. (I'm not convinced that #BenBolker's comment is correct. I think it very possible that you first need statistical consultation before address coding issues.)
require(MASS)
data(quine) # following example in ?glm.nb page
> quine$Days[1] <- -2
> quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine, link = "identity")
Error in eval(expr, envir, enclos) :
negative values not allowed for the 'Poisson' family
> quine$Days[1] <- 0
> quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine, link = "identity")
Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In log(y/mu) : NaNs produced
i have resolved this issue by putting in the control argument into the model assumptions with maxiter=10 or lower. the default is 50 iterations. perhaps it works for you with a little more iterations. just try