I am trying to see what people's willingness to pay is for either nuclear or wind energy (far away or local) through a stated choice preference. I used the multinomial logit model, however when estimating the discreet choice for the different scenarios I keep getting an error:
Error in t.default(x) : argument is not a matrix
While gmnl gives this error, mixl seems to be working fine
Code:
install.packages("gmnl")
install.packages("mlogit")
library("gmnl") # Load gmnl package
library("mlogit") # Load mlogit package
library(readxl)
Example_data <- read_excel("Example data.xlsx")
View(Example_data)
data <- as.data.frame(Example_data)
df01 <- mlogit.data(data,
id.var = "id",
choice = "Choice",
varying = 3:17,
shape = "wide",
sep = "")
lc <- gmnl(Choice ~ MODE + DWELLING + SIZE + COST + DISTANCE | 0 | 0 | 0 | 1 ,
data = df01,
model = 'lc',
Q = 3,
panel = TRUE,
method = "bhhh")
It could be that there is something wrong with my data. However, when comparing previous works from other people my data is setup in a similar way and I cannot run their calculations either.
From what I have seen with earlier posts, it could also be a package problem. But what can I do to fix it or continue if that is the case.
The picture below shows an example of the data for 2 individuals, which consists of 15 scenarios with 3 options to choose from.
Example_data
Related
I am trying to do a weighted survival analysis in R with the NHIS-dataset but while running the coxph-function the error all variables must be in design= argument occurs.
I created a sampleweight for my dataset as recommended on the NHIS-website by dividing the sampleweight by 18 for the 18 years I use.
Then I created a design object with the svydesign-function:
data <-
data %>%
mutate(
sampleweight = SAMPWEIGHT/18
)
data_design <- svydesign(id = ~data$PSU, weight = ~data$sampleweight, strata = ~data$STRATA, nest = TRUE, data = data)
I then created a survival-object with the Surv-function, where censored is a dummy-variable divided into dead/ not dead:
surv <- Surv(data_design$variables$AGE,
event = 1-(as.numeric(data_design$variables$censored)))
Then I plotted a Kaplan-Meier-Curve and also did a logrank-test, with BMI_NEW created by my own as a BMI variable:
km <- svykm(surv~BMI_NEW, design = data_design)
svyjskm(diabetes_km1,...)
logrank_BMI <- svylogrank(surv~BMI_NEW, design = data_design)
Everythings works well until now. I tried to do a cox-regression then but it doesn't work. Here is my code:
cox_fit <- svycoxph(surv~BMI_NEW, design = data_design)
I then get the error-message: "all variables must be in design= argument"
I am not sure why this error occures, because BMI_NEW is part of the data_design and it works for svykm and svylogrank.
Is there anybody who has an idea what I am doing wrong?
Thank you!
I am new to R and I am using it to analyse time series data (I am also new to this).
I have quarterly data for 15 years and I am interested in exploring the interplay between drinking and smoking rates in young people - treating smoking as the outcome variable. I was advised to use the gls command in the nlme package as this would allow me to include AR and MA terms. I know I could use more complex approaches like ARIMAX but as a first step, I would like to use simpler models.
After loading the data, specify the time series
data.ts = ts(data=data$smoke, frequency=4, start=c(data[1, "Year"], data[1, "Quarter"]))
data.ts.dec = decompose(data.ts)
After decomposing the data and some tests (KPSS and ADF test), it is clear that the data are not stationary so I differenced the data:
diff_dv<-diff(data$smoke, difference=1)
plot.ts(diff_dv, main="differenced")
data.diff.ts = ts(diff_dv, frequency=4, start=c(hse[1, "Year"], hse[1, "Quarter"]))
The ACF and PACF plots suggest AR(2) should also be included so I set up the model as:
mod.gls = gls(diff_dv ~ drink+time , data = data,
correlation=corARMA(p=2), method="ML")
However, when I run this command I get the following:
"Error in model.frame.default: variable lengths differ".
I understand from previous posts that this is due to the differencing and the fact that the diff_dv is now shorter. I have attempted fixing this by modifying the code but neither approach works:
mod.gls = gls(diff_dv ~ drink+time , data = data[1:(length(data)-1), ],
correlation=corARMA(p=2), method="ML")
mod.gls = gls(I(c(diff(smoke), NA)) ~ drink+time+as.factor(quarterly) , data = data,
correlation=corARMA(p=2), method="ML")
Can anyone help with this? Is there a workaround which would allow me to run the -gls- command or is there an alternative approach which would be equivalent to the -gls- command?
As a side question, is it OK to include time as I do - a variable with values 1 to 60? A similar question is for the quarters which I included as dummies to adjust for possible seasonality - is this OK?
Your help is greatly appreciated!
Specify na.action = na.omit or na.action = na.exclude to omit the rows with NA's. Here is an example using the built-in Ovary data set. See ?na.fail for info on the differences between these two.
Ovary2 <- transform(Ovary, dfoll = c(NA, diff(follicles)))
gls(dfoll ~ sin(2*pi*Time) + cos(2*pi*Time), Ovary2,
correlation = corAR1(form = ~ 1 | Mare), na.action = na.exclude)
I have had an issue with analysing survey data on r using the survey and tab packages.
I think I am setting up the survey design object correctly, but when i try to run the tabmean.survey function comparing the means across more than 2 categories, the function does not recognise the variable in the design.
Here's the example using my data:
svyd<-svydesign(id=~psu, #PSU variable
strata=~strata, #Strata variable
weights=~ca_betaindin_xw, #Weight variable
data=usds)
svyd_emp<-subset(svyd, usds$samp_employ==1) #subset the data to required analytic sample
t1<-tabmeans.svy(age~ethnicity,
design = svyd_emp) #Run tabmeans.svy comparing means of age by ethnicity
Which produces this error:
Error in svyglm.survey.design(Age ~ 1, design = design) :
all variables must be in design= argument
When I try the same function with a binary variable the function works
t2<-tabmeans.svy(age~sex,
design = svyd_emp) #Run tabmeans.svy comparing means of age by sex
#WORKS
Comparing means across multi categorical variables using this function has previously worked. I can't figure out why the function is throwing up an error now. The survey.design object had the variables listed in the object.
I cannot share my data but I have reproduced the same issue using the 'api' dataset in the survey package.
data(api)
sdesign<-svydesign(id=~dnum+snum,
strata=~stype,
weights=~pw,
data=apistrat,
nest = TRUE)
t3<-tabmeans.svy(api00~stype, # stype has 3 categories = DOESNT WORK
design=sdesign)
t4<-tabmeans.svy(api00~sch.wide,
design=sdesign) # sch.wide has 2 categories = WORKS
Appreciate any thoughts or suggestions on how to get around this issue.
Many thanks
Thanks for the reproducible example. When I run it, I get
> t3<-tabmeans.svy(api00~stype, # stype has 3 categories = DOESNT WORK
+ design=sdesign)
Error in svyglm.survey.design(Age ~ 1, design = design) :
all variables must be in design= argument
> traceback()
4: stop("all variables must be in design= argument")
3: svyglm.survey.design(Age ~ 1, design = design)
2: svyglm(Age ~ 1, design = design)
1: tabmeans.svy(api00 ~ stype, design = sdesign)
which is disconcerting, because why is it trying to find an Age variable? (This was masked a little in your example, because you have an age variable).
Looking at the code for tabmeans.svy I see
if (num.groups == 2) {
fit <- svyttest(formula, design = design)
diffmeans <- -fit$estimate
diffmeans.ci <- -rev(as.numeric(fit$conf.int))
p <- fit$p.value
}
else {
fit1 <- svyglm(Age ~ 1, design = design)
fit2 <- svyglm(Age ~ Sex, design = design)
fit <- do.call(anova, c(list(object = fit1, object2 = fit2),
anova.svyglm.list))
p <- as.numeric(fit$p)
}
which explains the problem: if there are more than two groups it ignores your variables and instead tests for an effect of Sex on Age.
I suspect a cut-and-paste error by the maintainer. I have filed a GitHub issue. Unfortunately, I can't see a simple work-around.
I am trying to apply weights given with NIS data using the R package "survey", but I have been unsuccessful. I am fairly new to R and survey commands.
This is what I have tried:
# Create the unweighted dataset
d <- read.dta13(path)
# This produces the correct weighted amount of cases I need.
sum(d$DISCWT) # This produces the correct weighted amount of cases I need.
library(survey)
# Create survey object
dsvy <- svydesign(id = ~ d$HOSP_NIS, strata = ~ d$NIS_STRATUM, weights = ~ d$DISCWT, nest = TRUE, data = d)
d$count <- 1
svytotal(~d$count, dsvy)
However I get the following error after running the survey total:
Error in onestrat(x[index, , drop = FALSE], clusters[index], nPSU[index][1], :
Stratum (1131) has only one PSU at stage 1
Any help would be greatly appreciated, thank you!
The error indicates that you have specified a design where one of the strata has just a single primary sampling unit. It's not possible to get an unbiased estimate of variance for a design like that: the contribution of stratum 1131 will end up as 0/0.
As you see, R's default response is to give an error, because a reasonably likely explanation is that the data or the svydesign statement is wrong. Sometimes, as here, that's not what you want, and the global option 'survey.lonely.psu' describes other ways to respond. You want to set
options(survey.lonely.psu = "adjust")
This and other options are documented at help(surveyoptions)
I am trying to fit a multi-state model using R package R2BayesX. How can I do so correctly? There is no example in the manual. Here is my attempt.
activity is 1/0 ie the states
time is time
patient id is the random effect I want
f <- activity ~ sx(time,bs="baseline")+sx(PatientId, bs="re")
b <- bayesx(f, family = "multistate", method = "MCMC", data=df)
Note: created new output directory
Warning message:
In run.bayesx(file.path(res$bayesx.prg$file.dir, prg.name = res$bayesx.prg$prg.name), :
an error occurred during runtime of BayesX, please check the BayesX
logfile!
I'm not sure what kind of model exactly you want to specify but I tried to provide an artificial non-sensical data set to make the error above reproducible:
set.seed(1)
df <- data.frame(
activity = rbinom(1000, prob = 0.5, size = 1),
time = rep(1:50, 20),
id = rep(1:20, each = 50)
)
Possibly, you could provide an improved example. And then I can run your code:
library("R2BayesX")
f <- activity ~ sx(time, bs = "baseline") + sx(id, bs = "re")
b <- bayesx(f, family = "multistate", method = "MCMC", data = df)
This leads to the warning above and you can inspect BayesX's logfile via:
bayesx_logfile(b)
which tells you (among other information):
ERROR: family multistate is not allowed for method regress
So here only REML estimation appears to be supported, but:
b <- bayesx(f, family = "multistate", method = "REML", data = df)
also results in an error, the logfile says:
ERROR: Variable state has to be specified as a global option!
So the state has to be provided in a different way. I guess that you tried to do so by the binary response but it seems that the response should be the time variable (as in survival models) and then an additional state indicator needs to be provided somehow. I couldn't find an example for this in the BayesX manuals, though. I recommend that you contact the BayesX mailing list and/or the R2BayesX package maintainer with a more specific question and a reproducible example.