clogit output producing NAs but partially show up when I use cbind

clogit output producing NAs but partially show up when I use cbind - r

(updated)
Hey Stack Overflowers,
I'm trying to run a series of MLM fixed-effect logistic regressions with R's clogit function. When I add additional covariates to my model, the summary output shows NAs. But, when I used the cbind function, some of the missing covariate coefficients show up.
Here's my model 1 equation and output:
> model1 <- clogit(chldwork~lag_aspgrade_binned+age+strata(childid), data=finaletdtlag, method = 'exact')
> summary(model1)
Call:
coxph(formula = Surv(rep(1, 2686L), chldwork) ~ lag_aspgrade_binned +
age + strata(childid), data = finaletdtlag, method = "exact")
n= 2686, number of events= 2287
coef exp(coef) se(coef) z Pr(>|z|)
lag_aspgrade_binnedhigh school 1.04156 2.83363 0.52572 1.981 0.04757 *
lag_aspgrade_binnedno primary 1.31891 3.73935 0.89010 1.482 0.13841
lag_aspgrade_binnedprimary some hs 0.85000 2.33964 0.56244 1.511 0.13072
lag_aspgrade_binnedsome college 1.28607 3.61855 0.41733 3.082 0.00206 **
age -0.39600 0.67301 0.03105 -12.753 < 2e-16 ***
Here's my model two equation:
model2
<- clogit(chldwork~lag_aspgrade_binned+age+sex+chldeth+typesite+selfwlth+enroll+strata(childid), data=finaletdtlag, method = 'exact')
summary(model2)
Here's the summary output:
> summary(model2)
Call:
coxph(formula = Surv(rep(1, 2686L), chldwork) ~ lag_aspgrade_binned +
age + sex + chldeth + typesite + selfwlth + enroll + strata(childid),
data = finaletdtlag, method = "efron")
n= 2675, number of events= 2277
(11 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
lag_aspgrade_binnedhigh school 0.32943 1.39018 0.13933 2.364 0.0181 *
lag_aspgrade_binnedno primary 0.46553 1.59286 0.25154 1.851 0.0642 .
lag_aspgrade_binnedprimary some hs 0.33477 1.39762 0.15728 2.128 0.0333 *
lag_aspgrade_binnedsome college 0.36268 1.43718 0.11792 3.076 0.0021 **
age -0.07638 0.92647 0.01020 -7.486 7.11e-14 ***
sex1 NA NA 0.00000 NA NA
chldeth2 NA NA 0.00000 NA NA
chldeth3 NA NA 0.00000 NA NA
chldeth4 NA NA 0.00000 NA NA
chldeth6 NA NA 0.00000 NA NA
chldeth7 NA NA 0.00000 NA NA
chldeth8 NA NA 0.00000 NA NA
chldeth9 NA NA 0.00000 NA NA
chldeth99 NA NA 0.00000 NA NA
typesite1 NA NA 0.00000 NA NA
selfwlth1 0.04031 1.04113 0.29201 0.138 0.8902
selfwlth2 0.11971 1.12717 0.28736 0.417 0.6770
selfwlth3 0.07928 1.08251 0.29189 0.272 0.7859
selfwlth4 0.05717 1.05884 0.30231 0.189 0.8500
selfwlth5 0.39709 1.48750 0.43653 0.910 0.3630
selfwlth99 NA NA 0.00000 NA NA
enroll1 -0.20443 0.81511 0.08890 -2.300 0.0215 *
enroll88 NA NA 0.00000 NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But here's what happens when I use the cbind function to show all of my models next to each other. Note, that coefficients sex through chldeth99 are not in model one.
cbind results:
> cbind(coef(model1), (coef(model2)), coef(model3)) #creating side by side list of all model coefficients
[,1] [,2] [,3]
lag_aspgrade_binnedhigh school 1.0415583 0.27198991 0.32827106
lag_aspgrade_binnedno primary 1.3189131 0.37986205 0.46103492
lag_aspgrade_binnedprimary some hs 0.8499958 0.27831739 0.33256493
lag_aspgrade_binnedsome college 1.2860726 0.30089261 0.36214068
age -0.3960015 -0.06233958 -0.07653464
sex1 1.0415583 NA NA
chldeth2 1.3189131 NA NA
chldeth3 0.8499958 NA NA
chldeth4 1.2860726 NA NA
chldeth6 -0.3960015 NA NA
chldeth7 1.0415583 NA NA
chldeth8 1.3189131 NA NA
chldeth9 0.8499958 NA NA
chldeth99 1.2860726 NA NA
typesite1 -0.3960015 NA NA
selfwlth1 1.0415583 0.03245507 0.04424493
selfwlth2 1.3189131 0.09775395 0.12743276
selfwlth3 0.8499958 0.06499650 0.08854499
selfwlth4 1.2860726 0.05038224 0.07092755
selfwlth5 -0.3960015 0.32162830 0.38079232
selfwlth99 1.0415583 NA NA
enroll1 1.3189131 -0.16966609 -0.30366842
enroll88 0.8499958 NA NA
sex1:enroll1 1.2860726 0.27198991 0.24088361
sex1:enroll88 -0.3960015 0.37986205 NA
Much gratitude for any insights. Wishing you all the best as another year wraps up--special shoutout to those currently in school still on the grind.

Related

PSCL Returning NAs for zeroinfl negbin model

I am trying to run a Zero-Inflated Negative Binomial Count Model on some data containing the number of campaign visits by a politician by county. (Log Liklihood tests indicate Negative Binomial is correct, Vuong test suggests Zero-Inflated, though that could be thrown off by the fact that my Zero-Inflated model is clearly not converging.) I am using the pscl package in R. The problem is that when I run
Call:
zeroinfl(formula = Sanders_Adjacent_Clinton_Visit ~ Relative_Divisiveness + Obama_General_Percent_12 +
Percent_Over_65 + Percent_F + Percent_White + Percent_HS + Per_Capita_Income +
Poverty_Rate + MRP_Ideology_Mean + Swing_State, data = Unity_Data, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.96406 -0.24339 -0.11744 -0.03183 16.21356
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.216e+01 NA NA NA
Relative_Divisiveness -3.831e-01 NA NA NA
Obama_General_Percent_12 1.904e+00 NA NA NA
Percent_Over_65 -4.848e-02 NA NA NA
Percent_F 1.737e-01 NA NA NA
Percent_White 2.980e+00 NA NA NA
Percent_HS -3.563e-02 NA NA NA
Per_Capita_Income 7.413e-05 NA NA NA
Poverty_Rate -2.273e-02 NA NA NA
MRP_Ideology_Mean -8.316e-01 NA NA NA
Swing_State 1.580e+00 NA NA NA
Log(theta) 9.595e+00 NA NA NA
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.024e+02 NA NA NA
Relative_Divisiveness -3.265e+00 NA NA NA
Obama_General_Percent_12 -2.300e+01 NA NA NA
Percent_Over_65 -7.768e-02 NA NA NA
Percent_F 2.873e+00 NA NA NA
Percent_White 5.156e+00 NA NA NA
Percent_HS -5.097e-01 NA NA NA
Per_Capita_Income 2.831e-04 NA NA NA
Poverty_Rate 1.391e-02 NA NA NA
MRP_Ideology_Mean -2.569e+00 NA NA NA
Swing_State 5.075e-01 NA NA NA
Theta = 14696.9932
Number of iterations in BFGS optimization: 94
Log-likelihood: -596.5 on 23 Df
Obviously, all of those NA's are less then helpful to me. Any advice would be greatly appreciated! I'm pretty novice at R, StackOverflow, and Statistics, but trying to learn. I'm trying to provide everything needed for the minimal reproducible example, but I don't see anywhere to share my actual data... so if that's something you need in order to answer the question, let me know where I can put it!

Issue with a multiple regression model in R

First let me apologize but I'm a biologist starting in the world of bioinformatics and therefore in R programming and statistics.
I have to do an analysis of a multilinear regression model with the data (Penta) from Library(mvdalav).
I have to try different models including the PLS model that is the model that is normally used for this data set (https://rdrr.io/cran/mvdalab/f/README.md)
However, they ask us to play with the data more models and I'm very lost as the data seems to always give me errors:
1) Normal multiple regression model:
> mod2<-mod1<-lm(Penta1$log.RAI~.,Penta1)
> summary(mod2)
Call:
lm(formula = Penta1$log.RAI ~ ., data = Penta1)
Residuals:
ALL 30 residuals are 0: no residual degrees of freedom!
Coefficients: (15 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e-01 NA NA NA
Obs.NameAAWAA 8.500e-01 NA NA NA
Obs.NameAAYAA 5.600e-01 NA NA NA
Obs.NameEKWAP 1.400e+00 NA NA NA
Obs.NameFEAAK 4.000e-01 NA NA NA
Obs.NameFSPFR 7.400e-01 NA NA NA
Obs.NameGEAAK -4.200e-01 NA NA NA
Obs.NameLEAAK 5.000e-01 NA NA NA
Obs.NamePGFSP 1.000e+00 NA NA NA
Obs.NameRKWAP 2.080e+00 NA NA NA
Obs.NameRYLPT 5.000e-01 NA NA NA
Obs.NameVAAAK 1.114e-15 NA NA NA
Obs.NameVAAWK 3.300e-01 NA NA NA
Obs.NameVAWAA 1.530e+00 NA NA NA
Obs.NameVAWAK 1.550e+00 NA NA NA
Obs.NameVEAAK 6.100e-01 NA NA NA
Obs.NameVEAAP 2.800e-01 NA NA NA
Obs.NameVEASK 3.000e-01 NA NA NA
Obs.NameVEFAK 1.670e+00 NA NA NA
Obs.NameVEGGK -9.000e-01 NA NA NA
Obs.NameVEHAK 1.630e+00 NA NA NA
Obs.NameVELAK 6.900e-01 NA NA NA
Obs.NameVESAK 3.800e-01 NA NA NA
Obs.NameVESSK 1.000e-01 NA NA NA
Obs.NameVEWAK 2.830e+00 NA NA NA
Obs.NameVEWVK 1.810e+00 NA NA NA
Obs.NameVKAAK 2.100e-01 NA NA NA
Obs.NameVKWAA 1.810e+00 NA NA NA
Obs.NameVKWAP 2.450e+00 NA NA NA
Obs.NameVWAAK 1.400e-01 NA NA NA
S1 NA NA NA NA
L1 NA NA NA NA
P1 NA NA NA NA
S2 NA NA NA NA
L2 NA NA NA NA
P2 NA NA NA NA
S3 NA NA NA NA
L3 NA NA NA NA
P3 NA NA NA NA
S4 NA NA NA NA
L4 NA NA NA NA
P4 NA NA NA NA
S5 NA NA NA NA
L5 NA NA NA NA
P5 NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 29 and 0 DF, p-value: NA
2) Study the reduced model provided by the stepwise method. The aim is to compare the RMSE of the reduced model and the complete model for the training group and for the test group.
step(lm(log.RAI~.,data = penta),direction = "backward")
Error in step(lm(log.RAI ~ ., data = penta), direction = "backward") :
AIC is -infinity for this model, so 'step' cannot proceed
3)Find the best model by the criteria of the AIC and by the adjusted R2
4) PLS model --> what fits the data following:https://rdrr.io/cran/mvdalab/f/README.md
5)Also study it with the Ridge Regression method with the lm.ridge () function or similar
6) Finally we will study the LASSO method with the lars () function of Lasso project.
I'm super lost with why the data.frame gave those errors and also how to develop the analysis. Any help with any of the parts would be much appreciated
Kind regards

Ok after reading the vignette, Penta is some data obtained from drug discovery and the first column is the unique identifier. To do regression or downstream analysis you need to exclude this column. For the steps below, I simply do Penta[,-1] as input data
For the first part, this works:
library(mvdalab)
data(Penta)
summary(lm(log.RAI~.,data = Penta[,-1]))
Call:
lm(formula = log.RAI ~ ., data = Penta[, -1])
Residuals:
Min 1Q Median 3Q Max
-0.39269 -0.12958 -0.05101 0.07261 0.63414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.80263 0.92495 -0.868 0.40016
S1 -0.09783 0.03895 -2.512 0.02489 *
L1 0.03236 0.04973 0.651 0.52576
P1 -0.10795 0.08521 -1.267 0.22587
S2 0.08670 0.04428 1.958 0.07043 .
Second part for AIC is ok as well:
step(lm(log.RAI~.,data = Penta[,-1]),direction="backward")
Start: AIC=-57.16
log.RAI ~ S1 + L1 + P1 + S2 + L2 + P2 + S3 + L3 + P3 + S4 + L4 +
P4 + S5 + L5 + P5
Df Sum of Sq RSS AIC
- P3 1 0.00150 1.5374 -59.132
- L4 1 0.00420 1.5401 -59.080
If you want to select model with AIC, the one above works. For adjusted R^2 i think most likely there are packages out there that does this
For lm.ridge, do the same:
library(MASS)
fit=lm.ridge(log.RAI~.,data = Penta[,-1])
For lars, lasso, you need to have the predictors etc in a matrix, so let's do
library(lars)
data = as.matrix(Penta[,-1])
fit = lars(x=data[,-ncol(data)],y=data[,"log.RAI"],type="lasso")

Summary of model returning NA

I'm new to r and not sure how fix the error I'm getting.
Here is the summary of my data:
> summary(data)
Metro MrktRgn MedAge numHmSales
Abilene : 1 Austin-Waco-Hill Country : 6 20-25: 3 Min. : 302
Amarillo : 1 Far West Texas : 1 25-30: 6 1st Qu.: 1057
Arlington: 1 Gulf Coast - Brazos Bottom:10 30-35:28 Median : 2098
Austin : 1 Northeast Texas :14 35-40: 6 Mean : 7278
Bay Area : 1 Panhandle and South Plains: 5 45-50: 2 3rd Qu.: 5086
Beaumont : 1 South Texas : 7 50-55: 1 Max. :83174
(Other) :40 West Texas : 3
AvgSlPr totNumLs MedHHInc Pop
Min. :123833 Min. : 1257 Min. :37300 Min. : 2899
1st Qu.:149117 1st Qu.: 6028 1st Qu.:53100 1st Qu.: 56876
Median :171667 Median : 11106 Median :57000 Median : 126482
Mean :188637 Mean : 24302 Mean :60478 Mean : 296529
3rd Qu.:215175 3rd Qu.: 25472 3rd Qu.:66200 3rd Qu.: 299321
Max. :303475 Max. :224230 Max. :99205 Max. :2196000
NA's :1
then I make a model with AvSlPr as the y variable and other the other variables as x variables
> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)
but when I do a summary of the model, I get NA for the Std. Error, t value, and t p-values.
> summary(model1)
Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales +
totNumLs + MedHHInc + Pop)
Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!
Coefficients: (15 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 143175 NA NA NA
MetroAmarillo 24925 NA NA NA
MetroArlington 35258 NA NA NA
MetroAustin 160300 NA NA NA
MetroBay Area 68642 NA NA NA
MetroBeaumont 5942 NA NA NA
...
MrktRgnWest Texas NA NA NA NA
MedAge25-30 NA NA NA NA
MedAge30-35 NA NA NA NA
MedAge35-40 NA NA NA NA
MedAge45-50 NA NA NA NA
MedAge50-55 NA NA NA NA
numHmSales NA NA NA NA
totNumLs NA NA NA NA
MedHHInc NA NA NA NA
Pop NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 44 and 0 DF, p-value: NA
Does anyone know whats going wrong and how I can fix this? Also, I'm not supposed to be using dummy variables.

Your Metro variable always refers to a single line for each factor level. You need at least two points to fit a line. Let me demonstrate with an example:
dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!
#Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.33801 NA NA NA
#MetroB 0.47350 NA NA NA
#MetroC -0.04118 NA NA NA
#MetroD 0.20047 NA NA NA
#MrktRgn NA NA NA NA
#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: NaN
#F-statistic: NaN on 3 and 0 DF, p-value: NA
But if we add more data so that at least some of the factor levels have more than one row of data, the linear model can be calculated:
dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
# 1 2 3 4 5 6 7
# 9.021e-17 2.643e-01 7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03 1.498e-01
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.24279 0.30406 0.798 0.50834
#MetroB -0.10207 0.38858 -0.263 0.81739
#MetroC -0.06696 0.39471 -0.170 0.88090
#MetroD 0.06804 0.41243 0.165 0.88413
#MrktRgn 0.70787 0.06747 10.491 0.00896 **
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared: 0.9857, Adjusted R-squared: 0.9571
#F-statistic: 34.45 on 4 and 2 DF, p-value: 0.02841
The data used to fit the model need be re-thought. What is the goal of the analysis? What data are needed to achieve the goal?

mlogit() outputs NA's for nested logit model in R

I am trying to estimate a nested logit model of company siting choices, with nests = countries and alternatives = provinces, based on a number of alternative-specific characteristics as well as some company-specific characteristics. I formatted my data to a "long" structure using:
data <- mlogit.data(DB, choice="Occurrence", shape="long", chid.var="IDP", varying=6:ncol(DB), alt.var="Prov")
Here's a sample of the data:
IDP Occurrence From Prov ToC Dist Price Yield
5p1.APY 5p1 FALSE Sao Paulo APY PY 0.0000000 0.3698913 0.0000000
5p1.BOQ 5p1 FALSE Sao Paulo BOQ PY 0.6495493 0.3698913 0.0000000
5p1.CHA 5p1 FALSE Sao Paulo CHA AR 0.7870593 0.4622464 0.4461496
5p1.COR 5p1 FALSE Sao Paulo COR AR 0.3747480 0.4622464 0.5536546
5p1.FOR 5p1 FALSE Sao Paulo FOR AR 0.6822188 0.4622464 0.4402772
5p1.JUY 5p1 FALSE Sao Paulo JUY AR 1.0000000 0.4622464 0.3617038
Note that I've reduced the table to a few variables for clarity but would normally use more.
The code I use for the nested logit is the following:
nests <- list(Bolivia="SCZ",Paraguay=c("PHY","BOQ","APY"),Argentina=c("CHA","COR","FOR","JUY","SAL","SFE","SDE"))
nml <- mlogit(Occurrence ~ DistComp + PriceComp + YieldComp, data=data, nests=nests, unscaled=T)
summary(nml)
When running this model, I get the following output:
> summary(nml)
Call:
mlogit(formula = Occurrence ~ DistComp + PriceComp + YieldComp,
data = data, nests = nests, unscaled = T)
Frequencies of alternatives:
APY BOQ CHA COR FOR JUY PHY
SAL SCZ SDE SFE
0.1000000 0.0666667 0.1333333 0.0250000 0.0750000 0.0083333 0.0083333
0.1166667 0.2583333 0.1750000 0.0333333
bfgs method
1 iterations, 0h:0m:0s
g'(-H)^-1g = 1E+10
last step couldn't find higher value
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
BOQ:(intercept) -0.29923 NA NA NA
CHA:(intercept) -1.25406 NA NA NA
COR:(intercept) -1.76020 NA NA NA
FOR:(intercept) -1.97083 NA NA NA
JUY:(intercept) -4.14476 NA NA NA
PHY:(intercept) -2.63961 NA NA NA
SAL:(intercept) -1.72047 NA NA NA
SCZ:(intercept) -0.15714 NA NA NA
SDE:(intercept) -0.57449 NA NA NA
SFE:(intercept) -2.47345 NA NA NA
DistComp 2.44322 NA NA NA
PriceComp 2.45202 NA NA NA
YieldComp 3.15611 NA NA NA
iv.Bolivia 1.00000 NA NA NA
iv.Paraguay 1.00000 NA NA NA
iv.Argentina 1.00000 NA NA NA
Log-Likelihood: -221.84
McFadden R^2: 0.10453
Likelihood ratio test : chisq = 51.79 (p.value = 2.0552e-09)
I don't understand what causes the NAs in the output, considering that I prepared the data using mlogit.data(). Any help on this would be greatly appreciated.
Best,
Yann

how to identify time series is stationarity or not by package fUnitRoots in R language

I have one time series, let's say
694 281 479 646 282 317 790 591 573 605 423 639 873 420 626 849 596 486 578 457 465 518 272 549 437 445 596 396 259 390
Now, I want to forecast the following values by ARIMA Model, but ARIMA requires the time series to be stationarity, so before this, I have to identify the time series above matches the requirement or not, then fUnitRoots comes up.
I think http://cran.r-project.org/web/packages/fUnitRoots/fUnitRoots.pdf can offer some help, but there is no simple tutorial
I just want one small demo to show how to identify one time series, is there any one?
thanks in advance.

I will give example using urca package in R.
library(urca)
data(npext) # This is the data used by Nelson and Plosser (1982)
sample.data<-npext
head(sample.data)
year cpi employmt gnpdefl nomgnp interest indprod gnpperca realgnp wages realwag sp500 unemploy velocity M
1 1860 3.295837 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
2 1861 3.295837 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
3 1862 3.401197 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
4 1863 3.610918 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
5 1864 3.871201 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
6 1865 3.850148 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
I will use ADF to perform the unit root test on industrial production index as an illustration. The lag is selected based on the SIC. I use trend as there is trend in the date .
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.31644 -0.04813 0.00965 0.05252 0.20504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.052208 0.017273 3.022 0.003051 **
z.lag.1 -0.176575 0.049406 -3.574 0.000503 ***
tt 0.007185 0.002061 3.486 0.000680 ***
z.diff.lag 0.124320 0.089153 1.394 0.165695
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09252 on 123 degrees of freedom
Multiple R-squared: 0.09796, Adjusted R-squared: 0.07596
F-statistic: 4.452 on 3 and 123 DF, p-value: 0.005255
Value of test-statistic is: -3.574 11.1715 6.5748
Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
#Interpretation: BIC selects the lag 1 as optimal lag. The test statistics -3.574 is less than the critical value tau3 at 5 percent (-3.430). So, the null that there is an unit root is is rejected only at 5 percent.
Also, check the free forecasting book available here

You can, of course, carry out formal tests such as the ADF test, but I would suggest carrying out "informal tests" of stationarity as a first step.
Inspecting the data visually using plot() will help you identify whether or not the data is stationary.
The next step would be to investigate the autocorrelation function and partial autocorrelation function of the data. You can do this by calling both the acf() and pacf() functions. This will not only help you decide whether or not the data is stationary, but it will also help you identify tentative ARIMA models that can later be estimated and used for forecasting if they get the all clear after carrying out the necessary diagnostic checks.
You should, indeed, pay caution to the fact that there are only 30 observations in the data that you provided. This falls below the practical minimum level of about 50 observations necessary for forecasting using ARIMA models.
If it helps, a moment after I plotted the data, I was almost certain the data was probably stationary. The estimated acf and pacf seem to confirm this view. Sometimes informal tests like that suffice.
This little-book-of-r-for-time-series may help you further.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

clogit output producing NAs but partially show up when I use cbind - r

Related

PSCL Returning NAs for zeroinfl negbin model

Issue with a multiple regression model in R

Summary of model returning NA

mlogit() outputs NA's for nested logit model in R

how to identify time series is stationarity or not by package fUnitRoots in R language

Categories

Resources