mlogit() outputs NA's for nested logit model in R

mlogit() outputs NA's for nested logit model in R - r

I am trying to estimate a nested logit model of company siting choices, with nests = countries and alternatives = provinces, based on a number of alternative-specific characteristics as well as some company-specific characteristics. I formatted my data to a "long" structure using:
data <- mlogit.data(DB, choice="Occurrence", shape="long", chid.var="IDP", varying=6:ncol(DB), alt.var="Prov")
Here's a sample of the data:
IDP Occurrence From Prov ToC Dist Price Yield
5p1.APY 5p1 FALSE Sao Paulo APY PY 0.0000000 0.3698913 0.0000000
5p1.BOQ 5p1 FALSE Sao Paulo BOQ PY 0.6495493 0.3698913 0.0000000
5p1.CHA 5p1 FALSE Sao Paulo CHA AR 0.7870593 0.4622464 0.4461496
5p1.COR 5p1 FALSE Sao Paulo COR AR 0.3747480 0.4622464 0.5536546
5p1.FOR 5p1 FALSE Sao Paulo FOR AR 0.6822188 0.4622464 0.4402772
5p1.JUY 5p1 FALSE Sao Paulo JUY AR 1.0000000 0.4622464 0.3617038
Note that I've reduced the table to a few variables for clarity but would normally use more.
The code I use for the nested logit is the following:
nests <- list(Bolivia="SCZ",Paraguay=c("PHY","BOQ","APY"),Argentina=c("CHA","COR","FOR","JUY","SAL","SFE","SDE"))
nml <- mlogit(Occurrence ~ DistComp + PriceComp + YieldComp, data=data, nests=nests, unscaled=T)
summary(nml)
When running this model, I get the following output:
> summary(nml)
Call:
mlogit(formula = Occurrence ~ DistComp + PriceComp + YieldComp,
data = data, nests = nests, unscaled = T)
Frequencies of alternatives:
APY BOQ CHA COR FOR JUY PHY
SAL SCZ SDE SFE
0.1000000 0.0666667 0.1333333 0.0250000 0.0750000 0.0083333 0.0083333
0.1166667 0.2583333 0.1750000 0.0333333
bfgs method
1 iterations, 0h:0m:0s
g'(-H)^-1g = 1E+10
last step couldn't find higher value
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
BOQ:(intercept) -0.29923 NA NA NA
CHA:(intercept) -1.25406 NA NA NA
COR:(intercept) -1.76020 NA NA NA
FOR:(intercept) -1.97083 NA NA NA
JUY:(intercept) -4.14476 NA NA NA
PHY:(intercept) -2.63961 NA NA NA
SAL:(intercept) -1.72047 NA NA NA
SCZ:(intercept) -0.15714 NA NA NA
SDE:(intercept) -0.57449 NA NA NA
SFE:(intercept) -2.47345 NA NA NA
DistComp 2.44322 NA NA NA
PriceComp 2.45202 NA NA NA
YieldComp 3.15611 NA NA NA
iv.Bolivia 1.00000 NA NA NA
iv.Paraguay 1.00000 NA NA NA
iv.Argentina 1.00000 NA NA NA
Log-Likelihood: -221.84
McFadden R^2: 0.10453
Likelihood ratio test : chisq = 51.79 (p.value = 2.0552e-09)
I don't understand what causes the NAs in the output, considering that I prepared the data using mlogit.data(). Any help on this would be greatly appreciated.
Best,
Yann

Related

clogit output producing NAs but partially show up when I use cbind

(updated)
Hey Stack Overflowers,
I'm trying to run a series of MLM fixed-effect logistic regressions with R's clogit function. When I add additional covariates to my model, the summary output shows NAs. But, when I used the cbind function, some of the missing covariate coefficients show up.
Here's my model 1 equation and output:
> model1 <- clogit(chldwork~lag_aspgrade_binned+age+strata(childid), data=finaletdtlag, method = 'exact')
> summary(model1)
Call:
coxph(formula = Surv(rep(1, 2686L), chldwork) ~ lag_aspgrade_binned +
age + strata(childid), data = finaletdtlag, method = "exact")
n= 2686, number of events= 2287
coef exp(coef) se(coef) z Pr(>|z|)
lag_aspgrade_binnedhigh school 1.04156 2.83363 0.52572 1.981 0.04757 *
lag_aspgrade_binnedno primary 1.31891 3.73935 0.89010 1.482 0.13841
lag_aspgrade_binnedprimary some hs 0.85000 2.33964 0.56244 1.511 0.13072
lag_aspgrade_binnedsome college 1.28607 3.61855 0.41733 3.082 0.00206 **
age -0.39600 0.67301 0.03105 -12.753 < 2e-16 ***
Here's my model two equation:
model2
<- clogit(chldwork~lag_aspgrade_binned+age+sex+chldeth+typesite+selfwlth+enroll+strata(childid), data=finaletdtlag, method = 'exact')
summary(model2)
Here's the summary output:
> summary(model2)
Call:
coxph(formula = Surv(rep(1, 2686L), chldwork) ~ lag_aspgrade_binned +
age + sex + chldeth + typesite + selfwlth + enroll + strata(childid),
data = finaletdtlag, method = "efron")
n= 2675, number of events= 2277
(11 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
lag_aspgrade_binnedhigh school 0.32943 1.39018 0.13933 2.364 0.0181 *
lag_aspgrade_binnedno primary 0.46553 1.59286 0.25154 1.851 0.0642 .
lag_aspgrade_binnedprimary some hs 0.33477 1.39762 0.15728 2.128 0.0333 *
lag_aspgrade_binnedsome college 0.36268 1.43718 0.11792 3.076 0.0021 **
age -0.07638 0.92647 0.01020 -7.486 7.11e-14 ***
sex1 NA NA 0.00000 NA NA
chldeth2 NA NA 0.00000 NA NA
chldeth3 NA NA 0.00000 NA NA
chldeth4 NA NA 0.00000 NA NA
chldeth6 NA NA 0.00000 NA NA
chldeth7 NA NA 0.00000 NA NA
chldeth8 NA NA 0.00000 NA NA
chldeth9 NA NA 0.00000 NA NA
chldeth99 NA NA 0.00000 NA NA
typesite1 NA NA 0.00000 NA NA
selfwlth1 0.04031 1.04113 0.29201 0.138 0.8902
selfwlth2 0.11971 1.12717 0.28736 0.417 0.6770
selfwlth3 0.07928 1.08251 0.29189 0.272 0.7859
selfwlth4 0.05717 1.05884 0.30231 0.189 0.8500
selfwlth5 0.39709 1.48750 0.43653 0.910 0.3630
selfwlth99 NA NA 0.00000 NA NA
enroll1 -0.20443 0.81511 0.08890 -2.300 0.0215 *
enroll88 NA NA 0.00000 NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But here's what happens when I use the cbind function to show all of my models next to each other. Note, that coefficients sex through chldeth99 are not in model one.
cbind results:
> cbind(coef(model1), (coef(model2)), coef(model3)) #creating side by side list of all model coefficients
[,1] [,2] [,3]
lag_aspgrade_binnedhigh school 1.0415583 0.27198991 0.32827106
lag_aspgrade_binnedno primary 1.3189131 0.37986205 0.46103492
lag_aspgrade_binnedprimary some hs 0.8499958 0.27831739 0.33256493
lag_aspgrade_binnedsome college 1.2860726 0.30089261 0.36214068
age -0.3960015 -0.06233958 -0.07653464
sex1 1.0415583 NA NA
chldeth2 1.3189131 NA NA
chldeth3 0.8499958 NA NA
chldeth4 1.2860726 NA NA
chldeth6 -0.3960015 NA NA
chldeth7 1.0415583 NA NA
chldeth8 1.3189131 NA NA
chldeth9 0.8499958 NA NA
chldeth99 1.2860726 NA NA
typesite1 -0.3960015 NA NA
selfwlth1 1.0415583 0.03245507 0.04424493
selfwlth2 1.3189131 0.09775395 0.12743276
selfwlth3 0.8499958 0.06499650 0.08854499
selfwlth4 1.2860726 0.05038224 0.07092755
selfwlth5 -0.3960015 0.32162830 0.38079232
selfwlth99 1.0415583 NA NA
enroll1 1.3189131 -0.16966609 -0.30366842
enroll88 0.8499958 NA NA
sex1:enroll1 1.2860726 0.27198991 0.24088361
sex1:enroll88 -0.3960015 0.37986205 NA
Much gratitude for any insights. Wishing you all the best as another year wraps up--special shoutout to those currently in school still on the grind.

PSCL Returning NAs for zeroinfl negbin model

I am trying to run a Zero-Inflated Negative Binomial Count Model on some data containing the number of campaign visits by a politician by county. (Log Liklihood tests indicate Negative Binomial is correct, Vuong test suggests Zero-Inflated, though that could be thrown off by the fact that my Zero-Inflated model is clearly not converging.) I am using the pscl package in R. The problem is that when I run
Call:
zeroinfl(formula = Sanders_Adjacent_Clinton_Visit ~ Relative_Divisiveness + Obama_General_Percent_12 +
Percent_Over_65 + Percent_F + Percent_White + Percent_HS + Per_Capita_Income +
Poverty_Rate + MRP_Ideology_Mean + Swing_State, data = Unity_Data, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.96406 -0.24339 -0.11744 -0.03183 16.21356
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.216e+01 NA NA NA
Relative_Divisiveness -3.831e-01 NA NA NA
Obama_General_Percent_12 1.904e+00 NA NA NA
Percent_Over_65 -4.848e-02 NA NA NA
Percent_F 1.737e-01 NA NA NA
Percent_White 2.980e+00 NA NA NA
Percent_HS -3.563e-02 NA NA NA
Per_Capita_Income 7.413e-05 NA NA NA
Poverty_Rate -2.273e-02 NA NA NA
MRP_Ideology_Mean -8.316e-01 NA NA NA
Swing_State 1.580e+00 NA NA NA
Log(theta) 9.595e+00 NA NA NA
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.024e+02 NA NA NA
Relative_Divisiveness -3.265e+00 NA NA NA
Obama_General_Percent_12 -2.300e+01 NA NA NA
Percent_Over_65 -7.768e-02 NA NA NA
Percent_F 2.873e+00 NA NA NA
Percent_White 5.156e+00 NA NA NA
Percent_HS -5.097e-01 NA NA NA
Per_Capita_Income 2.831e-04 NA NA NA
Poverty_Rate 1.391e-02 NA NA NA
MRP_Ideology_Mean -2.569e+00 NA NA NA
Swing_State 5.075e-01 NA NA NA
Theta = 14696.9932
Number of iterations in BFGS optimization: 94
Log-likelihood: -596.5 on 23 Df
Obviously, all of those NA's are less then helpful to me. Any advice would be greatly appreciated! I'm pretty novice at R, StackOverflow, and Statistics, but trying to learn. I'm trying to provide everything needed for the minimal reproducible example, but I don't see anywhere to share my actual data... so if that's something you need in order to answer the question, let me know where I can put it!

Issue with a multiple regression model in R

First let me apologize but I'm a biologist starting in the world of bioinformatics and therefore in R programming and statistics.
I have to do an analysis of a multilinear regression model with the data (Penta) from Library(mvdalav).
I have to try different models including the PLS model that is the model that is normally used for this data set (https://rdrr.io/cran/mvdalab/f/README.md)
However, they ask us to play with the data more models and I'm very lost as the data seems to always give me errors:
1) Normal multiple regression model:
> mod2<-mod1<-lm(Penta1$log.RAI~.,Penta1)
> summary(mod2)
Call:
lm(formula = Penta1$log.RAI ~ ., data = Penta1)
Residuals:
ALL 30 residuals are 0: no residual degrees of freedom!
Coefficients: (15 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e-01 NA NA NA
Obs.NameAAWAA 8.500e-01 NA NA NA
Obs.NameAAYAA 5.600e-01 NA NA NA
Obs.NameEKWAP 1.400e+00 NA NA NA
Obs.NameFEAAK 4.000e-01 NA NA NA
Obs.NameFSPFR 7.400e-01 NA NA NA
Obs.NameGEAAK -4.200e-01 NA NA NA
Obs.NameLEAAK 5.000e-01 NA NA NA
Obs.NamePGFSP 1.000e+00 NA NA NA
Obs.NameRKWAP 2.080e+00 NA NA NA
Obs.NameRYLPT 5.000e-01 NA NA NA
Obs.NameVAAAK 1.114e-15 NA NA NA
Obs.NameVAAWK 3.300e-01 NA NA NA
Obs.NameVAWAA 1.530e+00 NA NA NA
Obs.NameVAWAK 1.550e+00 NA NA NA
Obs.NameVEAAK 6.100e-01 NA NA NA
Obs.NameVEAAP 2.800e-01 NA NA NA
Obs.NameVEASK 3.000e-01 NA NA NA
Obs.NameVEFAK 1.670e+00 NA NA NA
Obs.NameVEGGK -9.000e-01 NA NA NA
Obs.NameVEHAK 1.630e+00 NA NA NA
Obs.NameVELAK 6.900e-01 NA NA NA
Obs.NameVESAK 3.800e-01 NA NA NA
Obs.NameVESSK 1.000e-01 NA NA NA
Obs.NameVEWAK 2.830e+00 NA NA NA
Obs.NameVEWVK 1.810e+00 NA NA NA
Obs.NameVKAAK 2.100e-01 NA NA NA
Obs.NameVKWAA 1.810e+00 NA NA NA
Obs.NameVKWAP 2.450e+00 NA NA NA
Obs.NameVWAAK 1.400e-01 NA NA NA
S1 NA NA NA NA
L1 NA NA NA NA
P1 NA NA NA NA
S2 NA NA NA NA
L2 NA NA NA NA
P2 NA NA NA NA
S3 NA NA NA NA
L3 NA NA NA NA
P3 NA NA NA NA
S4 NA NA NA NA
L4 NA NA NA NA
P4 NA NA NA NA
S5 NA NA NA NA
L5 NA NA NA NA
P5 NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 29 and 0 DF, p-value: NA
2) Study the reduced model provided by the stepwise method. The aim is to compare the RMSE of the reduced model and the complete model for the training group and for the test group.
step(lm(log.RAI~.,data = penta),direction = "backward")
Error in step(lm(log.RAI ~ ., data = penta), direction = "backward") :
AIC is -infinity for this model, so 'step' cannot proceed
3)Find the best model by the criteria of the AIC and by the adjusted R2
4) PLS model --> what fits the data following:https://rdrr.io/cran/mvdalab/f/README.md
5)Also study it with the Ridge Regression method with the lm.ridge () function or similar
6) Finally we will study the LASSO method with the lars () function of Lasso project.
I'm super lost with why the data.frame gave those errors and also how to develop the analysis. Any help with any of the parts would be much appreciated
Kind regards

Ok after reading the vignette, Penta is some data obtained from drug discovery and the first column is the unique identifier. To do regression or downstream analysis you need to exclude this column. For the steps below, I simply do Penta[,-1] as input data
For the first part, this works:
library(mvdalab)
data(Penta)
summary(lm(log.RAI~.,data = Penta[,-1]))
Call:
lm(formula = log.RAI ~ ., data = Penta[, -1])
Residuals:
Min 1Q Median 3Q Max
-0.39269 -0.12958 -0.05101 0.07261 0.63414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.80263 0.92495 -0.868 0.40016
S1 -0.09783 0.03895 -2.512 0.02489 *
L1 0.03236 0.04973 0.651 0.52576
P1 -0.10795 0.08521 -1.267 0.22587
S2 0.08670 0.04428 1.958 0.07043 .
Second part for AIC is ok as well:
step(lm(log.RAI~.,data = Penta[,-1]),direction="backward")
Start: AIC=-57.16
log.RAI ~ S1 + L1 + P1 + S2 + L2 + P2 + S3 + L3 + P3 + S4 + L4 +
P4 + S5 + L5 + P5
Df Sum of Sq RSS AIC
- P3 1 0.00150 1.5374 -59.132
- L4 1 0.00420 1.5401 -59.080
If you want to select model with AIC, the one above works. For adjusted R^2 i think most likely there are packages out there that does this
For lm.ridge, do the same:
library(MASS)
fit=lm.ridge(log.RAI~.,data = Penta[,-1])
For lars, lasso, you need to have the predictors etc in a matrix, so let's do
library(lars)
data = as.matrix(Penta[,-1])
fit = lars(x=data[,-ncol(data)],y=data[,"log.RAI"],type="lasso")

Trend analysis for ANOVA with both btw-Ss and within-Ss factors

I want to do a trend analysis for an ANOVA that has both btw-Ss and within-Ss factors.
The btw factors are "treatments"
The within factors are "trials".
test.data <- data.frame(sid = rep(c("s1", "s2", "s3", "s4", "s5"), each = 4),
treatments = rep(c("a1", "a2"), each = 20),
trials = rep(c("b1", "b2", "b3", "b4"), 10),
responses = c(3,5,9,6,7,11,12,11,9,13,14,12,4,8,11,7,1,3,5,4,5,6,11,7,10,12,18,15,10,15,15,14,6,9,13,9,3,5,9,7))}
The ANOVA matches the one in the textbook (Keppel, 1973) exactly:
aov.model.1 <- aov(responses ~ treatments*trials + Error(sid/trials), data=tab20.09)
What I am having trouble with is the trend analysis. I want to look at the linear, quadratic, and cubic trends for “trials”. Would also be nice to look at these same trends for “treatments x trials”.
I have set up the contrasts for the trend analyses as:
contrasts(tab20.09$trials) <- cbind(c(-3, -1, 1, 3), c(1, -1, -1, 1), c(-1, 3, -3, 1))
contrasts(tab20.09$trials)
[,1] [,2] [,3]
b1 -3 1 -1
b2 -1 -1 3
b3 1 -1 -3
b4 3 1 1
for the linear, quadratic, and cubic trends.
According to Keppel the results for the trends should be:
TRIALS:
SS df MS F
(Trials) (175.70) 3
Linear 87.12 1 87.12 193.60
Quadratic 72.90 1 72.90 125.69
Cubic 15.68 1 15.68 9.50
TREATMENTS X TRIALS
SS df MS F
(Trtmt x Trials)
(3.40) 3
Linear 0.98 1 0.98 2.18
Quadratic 0.00 1 0.00 <1
Cubic 2.42 1 2.42 1.47
ERROR TERMS
(21.40) (24)
Linear 3.60 8 0.45
Quadratic 4.60 8 0.58
Cubic 13.20 8 1.65
I have faith in his answers as once upon the time I had to derive them myself using a 6 function calculator supplemented by paper and pencil. However, when I do this:
contrasts(tab20.09$trials) <- cbind(c(-3, -1, 1, 3), c(1, -1, -1, 1), c(-1, 3, -3, 1))
aov.model.2 <- aov(responses ~ treatments*trials + Error(sid/trials), data=tab20.09)
summary(lm(aov.model.2))
what I get seems not to make sense.
summary(lm(aov.model.2))
Call:
lm(formula = aov.model.2)
Residuals:
ALL 40 residuals are 0: no residual degrees of freedom!
Coefficients: (4 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.750e+00 NA NA NA
treatmentsa2 3.500e+00 NA NA NA
trials1 6.500e-01 NA NA NA
trials2 -1.250e+00 NA NA NA
trials3 -4.500e-01 NA NA NA
sids10 -3.250e+00 NA NA NA
sids2 4.500e+00 NA NA NA
sids3 6.250e+00 NA NA NA
sids4 1.750e+00 NA NA NA
sids5 -2.500e+00 NA NA NA
sids6 -2.000e+00 NA NA NA
sids7 4.500e+00 NA NA NA
sids8 4.250e+00 NA NA NA
sids9 NA NA NA NA
treatmentsa2:trials1 2.120e-16 NA NA NA
treatmentsa2:trials2 -5.000e-01 NA NA NA
treatmentsa2:trials3 5.217e-16 NA NA NA
trials1:sids10 1.500e-01 NA NA NA
trials2:sids10 7.500e-01 NA NA NA
trials3:sids10 5.000e-02 NA NA NA
trials1:sids2 -1.041e-16 NA NA NA
trials2:sids2 -2.638e-16 NA NA NA
trials3:sids2 5.000e-01 NA NA NA
trials1:sids3 -1.500e-01 NA NA NA
trials2:sids3 -2.500e-01 NA NA NA
trials3:sids3 4.500e-01 NA NA NA
trials1:sids4 -5.000e-02 NA NA NA
trials2:sids4 -7.500e-01 NA NA NA
trials3:sids4 1.500e-01 NA NA NA
trials1:sids5 -1.000e-01 NA NA NA
trials2:sids5 5.000e-01 NA NA NA
trials3:sids5 3.000e-01 NA NA NA
trials1:sids6 -1.000e-01 NA NA NA
trials2:sids6 5.000e-01 NA NA NA
trials3:sids6 -2.000e-01 NA NA NA
trials1:sids7 4.000e-01 NA NA NA
trials2:sids7 5.000e-01 NA NA NA
trials3:sids7 -2.000e-01 NA NA NA
trials1:sids8 -5.000e-02 NA NA NA
trials2:sids8 2.500e-01 NA NA NA
trials3:sids8 6.500e-01 NA NA NA
trials1:sids9 NA NA NA NA
trials2:sids9 NA NA NA NA
trials3:sids9 NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 39 and 0 DF, p-value: NA
Any ideas what I am doing wrong? I suspect there is some problem with “lm” and the ANOVA but I don’t know what and I don’t know how to put in my trend analyses.
###### MORE DETAILS in response to ssdecontrol's response
Well, "trials" is a factor as it codes four levels of experience that are being manipulated. Likewise "sid" is the "subject identification number" that is definitely "nominal" not "ordinal" or "interval". Subjects are pretty much always treated as Factors in ANOVAS.
However, I did try both of these changes, but it greatly distorted the ANOVA (try it yourself and compare). Likewise, it didn't seem to help. PERHAPS MORE DIRECTLY RELEVANT, when I try to create and apply my contrasts I am told that it cannot be done as my numerics need to be factors:
contrasts(tab20.09$trials) <- cbind(c(-3, -1, 1, 3), c(1, -1, -1, 1), c(-1, 3, -3, 1))
Error in `contrasts<-`(`*tmp*`, value = c(-3, -1, 1, 3, 1, -1, -1, 1, :
contrasts apply only to factors
STARTING OVER
I seem to make more progress using contr.poly as in
contrasts(tab20.09$trials) <- contr.poly(levels(tab20.09$trials))
The ANOVA doesn't change at all. So that is good and when I do:
lm.model <- lm(responses ~ trials, data = tab20.09)
summary.lm(lm.model)
I get basically the same pattern as Keppel.
BUT, as I am interested in the linear trend of the interaction (treatments x trials), not just on trials, I tried this:
lm3 <- lm(responses ~ treatments*trials, data = tab20.09)
summary.lm(lm3)
and the ME of "trials" goes away . . .
In Keppel’s treatment, he calculated separate error terms for each contrast (i.e., Linear, Quadratic, and Cubic) and used that on both the main effect of “trial” as well as on the “treatment x trial” interaction.
I certainly could hand calculate all of these things again. Perhaps I could even write R functions for the general case; however, it seems difficult to believe that such a basic core contrast for experimental psychology has not yet found an R implementation!!??
Any help or suggestions would be greatly appreciated. Thanks. W

It looks like trials and sids are factors, but you are intending for them to be numeric/integer. Run sapply(tab20.09, class) to see if that's the case. That's what the output means; instead of fitting a continuous/count interaction, it's fitting a dummy variable for each level of each variable and computing all of the interactions between them.
To fix it, just reassign tab20.09$trials <- as.numeric(tab20.09$trials) and tab20.09$sids <- as.numeric(tab20.09$sids) in list syntax, or you can use matrix syntax like tab20.09[, c("trials", "sids")] <- apply(tab20.09[, c("trials", "sids")], 2, as.numeric). The first one is easier in this case, but you should be aware of the second one as well.

how to identify time series is stationarity or not by package fUnitRoots in R language

I have one time series, let's say
694 281 479 646 282 317 790 591 573 605 423 639 873 420 626 849 596 486 578 457 465 518 272 549 437 445 596 396 259 390
Now, I want to forecast the following values by ARIMA Model, but ARIMA requires the time series to be stationarity, so before this, I have to identify the time series above matches the requirement or not, then fUnitRoots comes up.
I think http://cran.r-project.org/web/packages/fUnitRoots/fUnitRoots.pdf can offer some help, but there is no simple tutorial
I just want one small demo to show how to identify one time series, is there any one?
thanks in advance.

I will give example using urca package in R.
library(urca)
data(npext) # This is the data used by Nelson and Plosser (1982)
sample.data<-npext
head(sample.data)
year cpi employmt gnpdefl nomgnp interest indprod gnpperca realgnp wages realwag sp500 unemploy velocity M
1 1860 3.295837 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
2 1861 3.295837 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
3 1862 3.401197 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
4 1863 3.610918 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
5 1864 3.871201 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
6 1865 3.850148 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
I will use ADF to perform the unit root test on industrial production index as an illustration. The lag is selected based on the SIC. I use trend as there is trend in the date .
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.31644 -0.04813 0.00965 0.05252 0.20504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.052208 0.017273 3.022 0.003051 **
z.lag.1 -0.176575 0.049406 -3.574 0.000503 ***
tt 0.007185 0.002061 3.486 0.000680 ***
z.diff.lag 0.124320 0.089153 1.394 0.165695
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09252 on 123 degrees of freedom
Multiple R-squared: 0.09796, Adjusted R-squared: 0.07596
F-statistic: 4.452 on 3 and 123 DF, p-value: 0.005255
Value of test-statistic is: -3.574 11.1715 6.5748
Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
#Interpretation: BIC selects the lag 1 as optimal lag. The test statistics -3.574 is less than the critical value tau3 at 5 percent (-3.430). So, the null that there is an unit root is is rejected only at 5 percent.
Also, check the free forecasting book available here

You can, of course, carry out formal tests such as the ADF test, but I would suggest carrying out "informal tests" of stationarity as a first step.
Inspecting the data visually using plot() will help you identify whether or not the data is stationary.
The next step would be to investigate the autocorrelation function and partial autocorrelation function of the data. You can do this by calling both the acf() and pacf() functions. This will not only help you decide whether or not the data is stationary, but it will also help you identify tentative ARIMA models that can later be estimated and used for forecasting if they get the all clear after carrying out the necessary diagnostic checks.
You should, indeed, pay caution to the fact that there are only 30 observations in the data that you provided. This falls below the practical minimum level of about 50 observations necessary for forecasting using ARIMA models.
If it helps, a moment after I plotted the data, I was almost certain the data was probably stationary. The estimated acf and pacf seem to confirm this view. Sometimes informal tests like that suffice.
This little-book-of-r-for-time-series may help you further.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

mlogit() outputs NA's for nested logit model in R - r

Related

clogit output producing NAs but partially show up when I use cbind

PSCL Returning NAs for zeroinfl negbin model

Issue with a multiple regression model in R

Trend analysis for ANOVA with both btw-Ss and within-Ss factors

how to identify time series is stationarity or not by package fUnitRoots in R language

Categories

Resources