I have been trying to do stepwise selection on my variables with R. This is my code:
library(lattice)#to get the matrix plot, assuming this package is already installed
library(ftsa) #to get the out-of sample performance metrics, assuming this package is already installed
library(car)
mydata=read.csv("C:/Users/jgozal1/Desktop/Multivariate Project/Raw data/FINAL_alldata_norowsunder90_subgroups.csv")
names(mydata)
str(mydata)
mydata$country_name=NULL
mydata$country_code=NULL
mydata$year=NULL
mydata$Unemployment.female....of.female.labor.force...modeled.ILO.estimate.=NULL
mydata$Unemployment.male....of.male.labor.force...modeled.ILO.estimate.=NULL
mydata$Life.expectancy.at.birth.male..years.= NULL
mydata$Life.expectancy.at.birth.female..years. = NULL
str(mydata)
Full_model=lm(mydata$Fertility.rate.total..births.per.woman. + mydata$Immunization.DPT....of.children.ages.12.23.months. + mydata$Immunization.measles....of.children.ages.12.23.months. + mydata$Life.expectancy.at.birth.total..years. + mydata$Mortality.rate.under.5..per.1000.live.births. + mydata$Improved.sanitation.facilities....of.population.with.access. ~ mydata$Primary.completion.rate.female....of.relevant.age.group. + mydata$School.enrollment.primary....gross. + mydata$School.enrollment.secondary....gross. + mydata$School.enrollment.tertiary....gross. + mydata$Internet.users..per.100.people. + mydata$Primary.completion.rate.male....of.relevant.age.group. + mydata$Mobile.cellular.subscriptions..per.100.people. + mydata$Foreign.direct.investment.net.inflows..BoP.current.US.. + mydata$Unemployment.total....of.total.labor.force...modeled.ILO.estimate., data= mydata)
summary(Full_model) #this provides the summary of the model
Reduced_model=lm(mydata$Fertility.rate.total..births.per.woman. + mydata$Immunization.DPT....of.children.ages.12.23.months. + mydata$Immunization.measles....of.children.ages.12.23.months. + mydata$Life.expectancy.at.birth.total..years. + mydata$Mortality.rate.under.5..per.1000.live.births. + mydata$Improved.sanitation.facilities....of.population.with.access. ~1,data= mydata)
step(Reduced_model,scope=list(lower=Reduced_model, upper=Full_model), direction="forward", data=mydata)
step(Full_model, direction="backward", data=mydata)
step(Reduced_model,scope=list(lower=Reduced_model, upper=Full_model), direction="both", data=mydata)
This is the link to the dataset that I am using: http://speedy.sh/YNXxj/FINAL-alldata-norowsunder90-subgroups.csv
After setting the scope for my stepwise I get this error:
Error in step(Reduced_model, scope = list(lower = Reduced_model, upper = Full_model), :
number of rows in use has changed: remove missing values?
In addition: Warning messages:
1: In add1.lm(fit, scope$add, scale = scale, trace = trace, k = k, :
using the 548/734 rows from a combined fit
2: In add1.lm(fit, scope$add, scale = scale, trace = trace, k = k, :
using the 548/734 rows from a combined fit
I have looked at other posts with the same error and the solutions usually is to omit the NAs from the data used, but that hasn't solved my problem and I am still getting exactly the same error.
Related
This data is from an excel CSV file.
I want to see if a transformation is necessary, but my problem is that I keep getting this message:
Error in model.frame.default(formula = comment$Number.of.Comments ~ comment$Character.Count + : 'data' must be a data.frame, environment, or list
The following is my code:
comment <- read.csv('AdAnalysis3.csv', header = TRUE, fileEncoding = "UTF-8-BOM")
commentfit <- lm(comment$Number.of.Comments ~ comment$Character.Count + comment$Number.of.Shares + comment$Number.of.Likes + comment$Type.of.Ad + comment$Dealing.with.Life + comment$Christlike.Attributes + comment$Spiritual.Learning, data = comment)
library(car)
boxCox(commentfit)
I get the following message immediately after boxCox(commentfit):
Any suggestions?
You haven't given us a reproducible example, but my guess is that you have confused car::boxCox() by including comment$ in your formula. In general it's better (for a number of reasons including clarity) to specify a linear model with just the variable names, i.e.:
commentfit <- lm(Number.of.Comments ~ Character.Count + Number.of.Shares +
Number.of.Likes + Type.of.Ad + Dealing.with.Life +
Christlike.Attributes + Spiritual.Learning,
data = comment)
I am trying to implement a new nonlinear function to use in nlmer function in lme4 package. But I'm not sure what the problem is. This is the first time I'm trying to use nlmer but I'm following all the instructions I've found on the internet. The first error is about my dataframe.
data <- read.csv(paste("C:/Users/oguz/Desktop/Runs4SiteModels/db/", "DB4NLSiteModel", Periods[i],".txt", sep=""), sep = "", header = TRUE)
psa_rock <- data$PSAr
nparams <- c("c")
nonl_fn <- deriv(~ log(( psa_rock + c)/c),
namevec = c("c"),
function.arg=c("c", psa_rock))
fm <- nlmer(log(data$PSAm) ~ nonl_fn(c, psa_rock) ~ 1 + data$M1 + data$M3 + data$M85 + data$Nflag + data$Rflag + data$FDepth +
data$Dist1 + data$Dist3 + data$VN + (exp(-1*exp(2*log(data$Vs)- 11)) * log((data$PSAr + c) / c) ) +
(1|data$EQID) + (1|data$STID), data=data, start=c(c=0.1))
When I run this code, I'm getting the following error:
Error in model.frame.default(data = data, drop.unused.levels = TRUE, formula = log(data$PSAm) ~ :
invalid type (list) for variable 'data'
which I wasn't getting it while using lmer function (of course without the nonlinear function). That's why I'm thinking my problem is not about my dataframe.
Other issue that I couldn't stop thinking about, the part in the fixed-effects:
(exp(-1*exp(2*log(data$Vs)- 11)) * log((data$PSAr + c) / c) )
as you can see my nonlinear function also takes a part in my fixed-effects formula and I'm not quite sure how to implement that. I hope my way is correct but because of my first problem, I couldn't find an opportunity to test that.
I wonder how I can sort this bug in R.
My simple lines
Remit_data <- panel_data(dataremit, id = id, wave = t)
model<-asym(wel_loggdp_cap ~ logremit + remitsq + logcpi + corruption +
employilo + senrol_netprim + logfert + urbanization + tradegdp +
netoda_gini, data = dataremit)
I get this error
Error: Only strings can be converted to symbols Backtrace:
panelr::asym(...)
panelr:::diff_data(...)
rlang::sym(id)
In panelr you have to define your panel data classifier (e.g. id / time) outside of the wmb function. You can compare this to plm were it can be done within plm.
library(panelr)
library(plm)
data(Produc)
# fixed effects with plm
FE_plm <- plm(gsp ~ pcap + pc + pcap:pc,
data = Produc,
index = c("state","year"),
method="within")
# fixed effects with panelr
Produc <- panel_data(Produc, id = state, wave = year)
FE_panelr <- wbm(gsp ~ pcap + pc + pcap:pc,
model = "within",
interaction.style = c("double-demean"),
data = Produc)
This should fix the issue. Always try to provide a minimal working example.
I'm attempting to bootstrap my data to get 2000 measurements based on the linear regression and Theil regression (mblm function w/ repeated=FALSE).
My bootstrap R code works perfectly for the normal regression (from what I can tell), given below:
> fitfunc <- function(formula, data, index) {
+ d<- data[index,]
+ f<- lm(formula,data=d)
+ return(coef(f))
+ }
boot(dataframe, fitfunc, R=2000, formula=`Index A`~`Measurement B`)
But I get an error when attempting the Theil estimator bootstrap:
> fitfuncTheil <- function(formula,data,index) {
+ d<- data[index,]
+ f<- mblm(formula, data=d, repeated=FALSE)
+ return(coef(f))
+ }
> boot(dataframe, fitfuncTheil, R=2000, formula=`Index A`~`Measurement B`)
Error in order(x) : argument 1 is not a vector
In addition: Warning message:
In is.na(x) :
The error message seems basic but I cannot figure out why this would work in one case but not the other.
Once I removed the space from the column names (referenced in the formula field), the issue was resolved.
I am trying to run sensitivity analysis for a mediation test I ran and would appreciate feedback from anyone who might be able to identify the source of the error generated.
Mediation:
model.m1 <- lm(cagr_2006_2008 ~ log_pdens_06 + closeness_06 + rainfall_diff_06 + historical_drought+pct_ag_land + ag_prox + cagr_1990_2000 + historical_drought*ag_prox, data=lnd_data_c1_mediate)
summary(model.m1)
model.y1 <- glm(protest_sum_1_logit ~ log_total_pop_10 + cagr_2006_2008 + historical_drought + rainfall_diff_06, family=binomial(link="probit"), data=lnd_data_c1_mediate)
summary(model.y1)
out.1_moderate <- mediate(model.m1, model.y1, sims=1000, boot=TRUE, treat="rainfall_diff_06", mediator="cagr_2006_2008", control.value = -57.1, treat.value = 0, dropobs = TRUE)
Sensitivity Analysis:
sens.1 <- medsens(out.1_moderate, rho.by=0.05, sims=1000)
summary(sens.1)
## Error in Mmodel.coef.sim * (rho12.sim/sigma.2.sim) %x% t(rep(1, y.k - :
## non-conformable arrays
My code base and data sets are available here. (If you download Regression and Mediation Analyses.R with lnd_data_c1_mediate.csv and lnd_data.csv, and select the csv file locations when prompted, the code should run smoothly. The sensitivity analyses conducted are all grouped together at the end of the file.)
Thank you very much for any information you can provide!