Hey everyone having some trouble executing this code. Here is what my boss told me to do, having such a remedial issue. PLease advice.
require(data.table)
require(MASS)
dat = fread("~/OneDrive - SUNY Upstate Medical University/bin/projects/rdoc/brett_project_igt/WGCNA_moduleEigengenes_SamplePhenotype.txt")
#working on setting the right directory
#C:\Users\brett\OneDrive\Desktop\Lab\brett_project_igt
dat = fread("~/Users/brett/OneDrive/Desktop/Lab/WGCNA_moduleEigengenes_SamplePhenotype.txt")
hist(dat$IGT_total_net)
hist(dat$IGT_total_lat)
# linear model
fit = lm(IGT_total_net ~ Age + Gender + SV1 + race + RIN + ME36, data = dat)
summary(fit)
# negative binomial model
fit = glm.nb(IGT_total_lat ~ Age + Gender + SV1 + race + RIN + ME36, data = dat)
summary(fit)
Here is my most current error message, some of the previous issues have been not having the right wd, then also having issues with fread function. Thanks in advance.
Error in fread("C:/Users/brett/AppData/Local/Packages/Microsoft.MicrosoftEdge_8wekyb3d8bbwe/TempState/Downloads") :
File 'C:/Users/brett/AppData/Local/Packages/Microsoft.MicrosoftEdge_8wekyb3d8bbwe/TempState/Downloads' is a directory. Not yet implemented.
Related
I would like to use the gamlss package for fitting a model benefiting from more available distributions in that package. However, I am struggling to correctly specify my random effects or at least I think there is a mistake because if I compare the output of a lmer model with Gaussian distribution and the gamlss model with Gaussian distribution output differs. If comparing a lm model without the random effects and a gamlss model with Gaussian distribution and without random effects output is similar.
I unfortunately cannot share my data to reproduce it.
Here my code:
df <- subset.data.frame(GFW_food_agg, GFW_food_agg$fourC_area_perc < 200, select = c("ISO3", "Year", "Forest_loss_annual_perc_boxcox", "fourC_area_perc", "Pop_Dens_km2", "Pop_Growth_perc", "GDP_Capita_current_USD", "GDP_Capita_growth_perc",
"GDP_AgrForFis_percGDP", "Gini_2008_2018", "Arable_land_perc", "Forest_loss_annual_perc_previous_year", "Forest_extent_2000_perc"))
fourC <- lmer(Forest_loss_annual_perc_boxcox ~ fourC_area_perc + Pop_Dens_km2 + Pop_Growth_perc + GDP_Capita_current_USD +
GDP_Capita_growth_perc + GDP_AgrForFis_percGDP + Gini_2008_2018 + Arable_land_perc + Forest_extent_2000_perc + (1|ISO3) + (1|Year),
data = df)
summary(fourC)
resid_panel(fourC)
df <- subset.data.frame(GFW_food_agg, GFW_food_agg$fourC_area_perc < 200, select = c("ISO3", "Year", "Forest_loss_annual_perc_boxcox", "fourC_area_perc", "Pop_Dens_km2", "Pop_Growth_perc", "GDP_Capita_current_USD", "GDP_Capita_growth_perc",
"GDP_AgrForFis_percGDP", "Gini_2008_2018", "Arable_land_perc", "Forest_loss_annual_perc_previous_year", "Forest_extent_2000_perc"))
df <- na.omit(df)
df$ISO3 <- as.factor(df$ISO3)
df$Year <- as.factor(df$Year)
fourC <- gamlss(Forest_loss_annual_perc_boxcox ~ fourC_area_perc + Pop_Dens_km2 + Pop_Growth_perc + GDP_Capita_current_USD +
GDP_Capita_growth_perc + GDP_AgrForFis_percGDP + Gini_2008_2018 + Arable_land_perc + Forest_extent_2000_perc + random(ISO3) + random(Year),
data = df, family = NO, control = gamlss.control(n.cyc = 200))
summary(fourC)
plot(fourC)
How do the random effects need to be specified in gamlss to be similar to the random effects in lmer?
If I specify the random effects instead using
re(random = ~1|ISO3) + re(random = ~1|Year)
I get the following error:
Error in model.frame.default(formula = Forest_loss_annual_perc_boxcox ~ :
variable lengths differ (found for 're(random = ~1 | ISO3)')
I found the +re(random=~1|x) specification to work fairly well with my GAMLSS. Have you double check that the NA's are being removed from your dataset? Sometimes na.omit does not work properly.
Have a look at this thread that has the same error than yours, but in a GAM. You can try that code to remove your NA's
Error in model.frame.default: variable lengths differ
I am trying to run a regression with a panel data from the Michigan Consumers Survey. It is the first time I am using panel data on R so I am not very aware of the package "plm" that is needed. I am setting my panel data for fixed effects on individuals (CASEID) and time (YYYY):
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
Then I am using the following regression:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
However R is showing me the following error:
> mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, :
empty model
Does anyone know what I am doing wrong?
Could you give the link where is this specific survey? I found various dataset with this data name.
I suspect (only suspect), you data isn't panel data, please check the CASEID variable.
Changing the order between formula and data in plm won't be solve your problem.
.
I think the error come when you write the model. Your solution is this:
mod_1 <- plm(data = Michigan_panel, ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq, model = "within")
In my view, you have to specify indexes in the formula, and follow the order of the plm package. I would like to write your formula as follows:
mod_1 <- plm(ICS ~ ICE + PX1Q2 + RATEX + ZLB + INCOME + AGE + EDUC + MARRY + SEX + AGE_sq,
data = Michigan_panel,
index= c("CASEID", "YYYY"),
model = "within")
1. Different Approach
From my knowledge we can also code this formula in a more elegant format.
library(plm)
Michigan_panel <- pdata.frame(Michigan_survey, index = c("CASEID", "YYYY"))
attach(Michigan_panel)
y <- cbind(ICS)
X <- cbind(ICE,PX1Q2,RATEX,ZLB,INCOME,AGE,EDUC,MARRY,SEX,AGE_sq)
model1 <- plm(y~X+factor(CASEID)+factor(YEAR), data=Michigan_panel, model="within")
summary(model1)
detach()
Adding factor(CASEID) and factor(YEAR) will add dummy variables in your model.
I want to analyze when the claims of a protest are directed at the state, based on action and country level characteristics, using glmer. So, I would like to obtain p-values of both the fixed and random effects. My model looks like this:
targets <- glmer(state ~ ENV + HLH + HRI + LAB + SMO + Capital +
(1 + rile + parties + rep + rep2 + gdppc + election| Country),
data = df, family = binomial)
The output only gives me the Variance & Std.Dev. of the random effects, as well as the correlations among them, which makes sense for most multilevel analyses but not for my purposes. Is there any way I can get something like the estimates and the p-values for the random effects?
If this cannot be done with R, is there any other statistical software that would give such an output?
UPDATE: Following the suggestions here, I have moved this question to Cross Validated: https://stats.stackexchange.com/questions/381208/r-how-to-get-estimates-and-p-values-for-random-effects-in-glmer
library(lme4)
library(lattice)
xyplot(incidence/size ~ period|herd, cbpp, type=c('g','p','l'),
layout=c(3,5), index.cond = function(x,y)max(y))
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
summary(gm1)
I'm working with a model from the Prestige dataset in the car package in R.
library(car)
library(carData)
data = na.omit(Prestige)
prestige = data$prestige
income = data$income
education = data$education
type = data$type
I'm trying to fit the model lm(prestige ~ income + education + type + income:type + education:type). For class I'm starting with the full model and working down to a smaller model, just backward selection. One of the least useful covariates according to p-value is the education:typeprof. How do I just delete that covariate from the model without taking out all the education:type interactions? In general how do you exclude interactions with factors? I saw an answer with the update function specifying which interaction to exclude but it didn't work in my case. Maybe I implemented it incorrectly.
fit4 = lm(prestige ~ income + education + type + income:type + education:type)
newfit = update(fit4, . ~ . - education:typeprof)
Unfortunately this didn't work for me.
So there is a way to drop a single interaction term. Suppose you have the linear model
fullmodel = lm(y_sim ~ income + education + type + income:type + education:type - 1)
You can call model.matrix on fullmodel which will give you the X matrix for your linear model. From there you can specify which column you'd like to drop and refit your model.
X = model.matrix(fullmodel)
drop = which(colnames(X) == 'education:typeprof')
X1 = X[,-1]
newfit = lm(presitge ~ X1 - 1)
I am trying to build a predictive model from survey data. My DVs are questions on NPS and other like data points. My IVs are mainly demographical question. I keep getting a Variable lengths error using the following lines of code:
Model <- lm(Q6 ~ amount_spent + first_time + gender +
workshop_participation + adults + children +
household_adults + Below..25K. + X.25K.to..49K. +
X.50K.to..74K. + X.75K.to..99K. + X.100K.to..124K. +
X18.24. + X25.34. + X35.44. + X45.64.,
data = diy_festival2)
Here is the error:
Error in model.frame.default(formula = Q6 ~ amount_spent + first_time + :
variable lengths differ (found for 'Below..25K.')
What are some possible causes and what are some potential fixes I can try?
Your formula object is referencing (a) variable(s) that is not in diy_festival2. It is in the global environment, the debug suggests it is Below..25K.
x <- data.frame(x1=rnorm(100))
x2 <- rnorm(10)
model.matrix( ~ x1 + x2, data=x)
gives the error you have.