I am attempting to fit an analytical model to my observed data on human somatic growth. The generalized model is a 3-parameter logarithmic growth curve where:
s= B0+B1*t+B2*log(t),
where s is a growth parameter whether it be a bone length or stature and t is an age.
I have attempted to run through various iterations of code to both form a likelihood function and and maximize teh returned parameters. To be completely honest, I am totally unsure if I am writing this correctly, but the most recent iteration of my attempts looks as such using a dataframe title "cedar":
cedar.f<-function(b){sum((cedar$FLEN~b[0]+b[1]*cedar$MINAGE+b[2]*log(cedar$MINAGE))^2)}
cedar.optimx<-optimx(c(0,0,0), fn = cedar.f, control = list(all.methods=TRUE, save.failures=TRUE, maxit=5000, maximize=TRUE))
cedar$MINAGE=c(2.5,0.5,6,0.01,0.01,0.01,1,0.01,12,0.01,0.01,1,12,1,4.5,1,4.5,0.01,7.8,11,4,7.5,9,0.25,0.01,0.01,0.01,0.1,1,1,0.01,0.01)
cedar$FLEN=c(167,150,300,54,60,78,152,72, 452,84,81,163,450,137,271,151,261,73,330,378,226,319,378,89,78,89,65,81,144,122, 50,55
Above, I have provided my attempt at the function and the optimization. I have received several errors in my attempts at doing this including:
argument "cedar" is missing ,no default
non-numeric argument to binary operator
Really, I am here to ask what recommendations anyone may have in writing this function so as to maximize the best fit of the data to the analytical human growth curve. If I am going about this all wrong / writing the function wrong, any help would be appreciated.
Thank you all.
cedar <- data.frame(MINAGE=c(2.5,0.5,6,0.01,0.01,0.01,1,0.01,12,0.01,0.01,1,12,1,4.5,1,4.5,0.01,7.8,11,4,7.5,9,0.25,0.01,0.01,0.01,0.1,1,1,0.01,0.01),
FLEN=c(167,150,300,54,60,78,152,72,452,84,81,163,450,137,271,151,261,73,330,378,226,319,378,89,78,89,65,81,144,122,50,55))
# Sum of squared errors
# Minus sign is for function minimization
cedar.f <- function(b) {
-sum( (cedar$FLEN - (b[1] + b[2]*cedar$MINAGE + b[3]*log(cedar$MINAGE)))^2 )
}
library(optimx)
cedar.optimx <- optimx( c(1,1,1), fn = cedar.f,
control = list(all.methods=TRUE, save.failures=TRUE, maxit=5000, maximize=TRUE))
# p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtimes
# BFGS 120.4565 24.41910 11.25419 -7.674935e+03 25 8 NA 0 TRUE TRUE 0.00
# CG 120.4565 24.41910 11.25419 -7.674935e+03 1072 298 NA 0 TRUE TRUE 0.15
# Nelder-Mead 120.4714 24.41647 11.25186 -7.674947e+03 258 NA NA 0 TRUE TRUE 0.02
# L-BFGS-B 120.4565 24.41910 11.25419 -7.674935e+03 17 17 NA 0 TRUE TRUE 0.01
# nlm 120.4564 24.41910 11.25417 -7.674935e+03 NA NA 12 0 TRUE TRUE 0.01
# nlminb 120.4565 24.41910 11.25419 -7.674935e+03 21 48 13 0 TRUE TRUE 0.02
# spg 120.4565 24.41910 11.25419 -7.674935e+03 99 NA 92 0 TRUE TRUE 0.06
# ucminf 120.4564 24.41910 11.25417 -7.674935e+03 10 10 NA 0 TRUE TRUE 0.00
# Rcgmin NA NA NA -8.988466e+307 NA NA NA 9999 NA NA 0.00
# Rvmmin NA NA NA -8.988466e+307 NA NA NA 9999 NA NA 0.00
# newuoa 120.4565 24.41910 11.25419 -7.674935e+03 118 NA NA 0 TRUE TRUE 0.01
# bobyqa 120.4565 24.41910 11.25419 -7.674935e+03 142 NA NA 0 TRUE TRUE 0.02
# nmkb 120.4566 24.41907 11.25421 -7.674935e+03 213 NA NA 0 TRUE TRUE 0.03
# hjkb 1.0000 1.00000 1.00000 -1.363103e+06 1 NA 0 9999 NA NA 0.00
Alternatively, model coefficients can be estimated using a simple linear model:
fitlm <- lm(FLEN~MINAGE+log(MINAGE), data=cedar)
coef(fitlm)
# Intercept) MINAGE log(MINAGE)
# 120.45654 24.41910 11.25419
The estimated function can be plotted as follows:
optpar <- as.matrix(cedar.optimx[1,1:3])
estim_fun <- function(x, b=optpar) {
b[1] + b[2]*x + b[3]*log(x)
}
curve(estim_fun, from=min(cedar$MINAGE), to=max(cedar$MINAGE))
Related
I'm trying to use optimx for a constrained nonlinear problem, but I just can't find an example online that I can adjust (I'm not an R programmer). I found that I should be using the below to test a few algorithms
optimx(par, fn, lower=low, upper=up, method=c("CG", "L-BFGS-B", "spg", "nlm"))
I understand par is just an example of a feasible solution. So, if I have two variables and (0,3) is feasible I can just do par <- c(0,3). If I want to minimise
2x+3y
subject to
2x^2 + 3y^2 <= 100
x<=3
-x<=0
-y<=-3
I guess i can set fn like
fn <- function(x){return 2*x[0]+3*x[1]}
but how do I set lower and upper for my constraints?
Many thanks!
1) We can incorporate the constraints within the objective function by returning a large number if any constraint is violated.
For most methods (but not Nelder Mead) the requirement is that the objective function be continuous and differentiable and requires a starting value in the interior of the feasible region, not the boundary. These requirements are not satisfied for f below but we will try it anyways.
library(optimx)
f <- function(z, x = z[1], y = z[2]) {
if (2*x^2 + 3*y^2 <= 100 && x<=3 && -x<=0 && -y<=-3) 2*x+3*y else 1e10
}
optimx(c(0, 3), f, method = c("Nelder", "CG", "L-BFGS-B", "spg", "nlm"))
## p1 p2 value fevals gevals niter convcode kkt1 kkt2 xtime
## Nelder-Mead 0 3 9 187 NA NA 0 FALSE FALSE 0.00
## CG 0 3 9 41 1 NA 0 FALSE FALSE 0.00
## L-BFGS-B 0 3 9 21 21 NA 52 FALSE FALSE 0.00
## spg 0 3 9 1077 NA 1 0 FALSE FALSE 0.05
## nlm 0 3 9 NA NA 1 0 FALSE FALSE 0.00
1a) This also works with optim where Nelder Mead is the default (or you could try constrOptim which explcitly supports inequality constraints).
optim(c(0, 3), f)
## $par
## [1] 0 3
##
## $value
## [1] 9
##
## $counts
## function gradient
## 187 NA
$convergence
[1] 0
$message
NULL
2) Above we notice that the 2x^2 + 3y^2 <= 100 constraint is not active so we can drop it. Now since the objective function is increasing in both x and y independently it is obvious that we want to set both of them to their lower bounds so c(0, 3) is the answer.
If we want to use optimx anyways then we just use upper= and lower= arguments for those methods that use them.
f2 <- function(z, x = z[1], y = z[2]) 2*x+3*y
optimx(c(0, 3), f2, lower = c(0, 3), upper = c(3, Inf),
method = c("L-BFGS-B", "spg", "nlm"))
## p1 p2 value fevals gevals niter convcode kkt1 kkt2 xtime
## L-BFGS-B 0 3 9 1 1 NA 0 FALSE NA 0.00
## spg 0 3 9 1 NA 0 0 FALSE NA 0.01
## nlminb 0 3 9 1 2 1 0 FALSE NA 0.00
## Warning message:
## In BB::spg(par = par, fn = ufn, gr = ugr, lower = lower, upper = upper, :
## convergence tolerance satisified at intial parameter values.
Working with R, I performed a KNN Algorithm knn <- train(x = x_train, y = y_train, method = "knn") with this dataframe:
1 0.35955056 0.62068966 0.34177215 0.27 0.7260274 0.22 MIT
2 0.59550562 0.56321839 0.35443038 0.15 0.7260274 0.22 MIT
3 0.52808989 0.35632184 0.45569620 0.13 0.7397260 0.22 NUC
4 0.34831461 0.35632184 0.34177215 0.54 0.6575342 0.22 MIT
5 0.44943820 0.31034483 0.44303797 0.17 0.6712329 0.22 CYT
6 0.43820225 0.47126437 0.34177215 0.65 0.7260274 0.22 MIT
7 0.41573034 0.36781609 0.48101266 0.20 0.7945205 0.34 NUC
8 0.49438202 0.42528736 0.56962025 0.36 0.6712329 0.22 MIT
9 0.32584270 0.29885057 0.49367089 0.15 0.7945205 0.30 CYT
10 0.35955056 0.29885057 0.41772152 0.21 0.7260274 0.27 NU
...
Obtaining this result:
k-Nearest Neighbors
945 samples
6 predictor
8 classes: 'CYT', 'ERL', 'EXC', 'ME', 'MIT', 'NUC', 'POX', 'VAC'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 945, 945, 945, 945, 945, 945, ...
Resampling results across tuning parameters:
k Accuracy Kappa
5 0.5273630 0.3760233
7 0.5480598 0.4004283
9 0.5667651 0.4242597
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was k = 9.
After that, I wanted to do a confusion matrix with this code:
predictions <- predict(knn, x_test)
results <- data.frame(real = y_test, predicted = predictions)
attach(results)
confusionMatrix(real, predicted)
And I got this results:
Confusion Matrix and Statistics
Reference
Prediction CYT ERL EXC ME MIT NUC POX VAC
CYT 73 0 0 3 7 44 0 0
ERL 0 0 0 1 0 0 0 0
EXC 2 0 6 3 1 0 0 0
ME 5 0 1 68 2 11 0 0
MIT 19 0 0 8 44 14 0 0
NUC 57 0 0 6 8 74 0 0
POX 3 0 0 0 1 2 0 0
VAC 3 0 2 2 1 1 0 0
Overall Statistics
Accuracy : 0.5614
95% CI : (0.5153, 0.6068)
No Information Rate : 0.3432
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.417
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: CYT Class: ERL Class: EXC Class: ME Class: MIT Class: NUC Class: POX Class: VAC
Sensitivity 0.4506 NA 0.66667 0.7473 0.68750 0.5068 NA NA
Specificity 0.8258 0.997881 0.98704 0.9501 0.89951 0.7822 0.98729 0.98093
Pos Pred Value 0.5748 NA 0.50000 0.7816 0.51765 0.5103 NA NA
Neg Pred Value 0.7420 NA 0.99348 0.9403 0.94832 0.7798 NA NA
Prevalence 0.3432 0.000000 0.01907 0.1928 0.13559 0.3093 0.00000 0.00000
Detection Rate 0.1547 0.000000 0.01271 0.1441 0.09322 0.1568 0.00000 0.00000
Detection Prevalence 0.2691 0.002119 0.02542 0.1843 0.18008 0.3072 0.01271 0.01907
Balanced Accuracy 0.6382 NA 0.82685 0.8487 0.79350 0.6445 NA NA
I would like to know why I have got this NAs in my sensibility in the class ERL for example.
Did I do something wrong ?
What is the reason of these NAs. I can provided the completed dataframe if necessary.
Based on the confusion matrix, your prediction set is lacking data with the classification of ERL, POX, and VOC which is leading to the NAs in the Statistics by Class.
Take a look at the Sensitivity of Class ERL for example. Sensitivity, also called the True Positive Rate, is calculated as the number of correct positive predictions divided by the total number of positives.
Positive ERL Predictions = 0
Actual ERL Classifications = 0
Sensitivity ERL = 0/0 which leads to the NA.
I am trying to make a nls fit for a little bit complicated expression that includes two integrals with two of the fit parameters in their upper limits.
I got the error
"Error in nlsModel(formula, mf, start, wts) : singular gradient
matrix at initial parameter estimates".
I have searched already in the previous answers, but didn't help. The parameters initialization seem to be ok, I have tried to change the parameters but none work. If my function has just one integral everything works very nicely, but when adding a second integral term just got the error. I don't believe the function is over-parametrized, as I have performed other fits with much more parameters and they worked. Below I have wrote a list with some data.
The minimal example is the following:
integrand <- function(X) {
return(X^4/(2*sinh(X/2))^2)
}
fitting = function(T1, T2, N, D, x){
int1 = integrate(integrand, lower=0, upper = T1)$value
int2 = integrate(integrand, lower=0, upper = T2)$value
return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2
)+(448.956*(x/T1)^3*int1)+(299.304*(x/T2)^3*int2))
}
fit = nls(y ~ fitting(T1, T2, N, D, x),
start=list(T1=400,T2=200,N=0.01,D=2))
------>For reference, the fit that worked is the following:
integrand <- function(X) {
return(X^4/(2*sinh(X/2))^2)
}
fitting = function(T1, N, D, x){
int = integrate(integrand, lower=0, upper = T1)$value
return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(748.26)*(x/T1)^3*int)
}
fit = nls(y ~ fitting(T1 , N, D, x), start=list(T1=400,N=0.01,D=2))
------->Data to illustrate the problem:
dat<- read.table(text="x y
0.38813 0.0198
0.79465 0.02206
1.40744 0.01676
1.81532 0.01538
2.23105 0.01513
2.64864 0.01547
3.05933 0.01706
3.47302 0.01852
3.88791 0.02074
4.26301 0.0256
4.67607 0.03028
5.08172 0.03507
5.48327 0.04283
5.88947 0.05017
6.2988 0.05953
6.7022 0.07185
7.10933 0.08598
7.51924 0.0998
7.92674 0.12022
8.3354 0.1423
8.7384 0.16382
9.14656 0.19114
9.55062 0.22218
9.95591 0.25542", header=TRUE)
I cannot figure out what happen. I need to perform this fit for three integral components, but even for two I have this problem. I appreciate so much your help. Thank you.
You could try some other optimizers:
fitting1 <- function(par, x, y) {
sum((fitting(par[1], par[2], par[3], par[4], x) - y)^2)
}
library(optimx)
res <- optimx(c(400, 200, 0.01, 2),
fitting1,
x = DF$x, y = DF$y,
control = list(all.methods = TRUE))
print(res)
# p1 p2 p3 p4 value fevals gevals niter convcode kkt1 kkt2 xtimes
#BFGS 409.7992 288.6416 -0.7594461 39.00871 1.947484e-03 101 100 NA 1 NA NA 0.22
#CG 401.1281 210.9087 -0.9026459 20.80900 3.892929e-01 215 101 NA 1 NA NA 0.25
#Nelder-Mead 414.6402 446.5080 -1.1298606 -227.81280 2.064842e-03 89 NA NA 0 NA NA 0.02
#L-BFGS-B 412.4477 333.1338 -0.3650530 37.74779 1.581643e-03 34 34 NA 0 NA NA 0.06
#nlm 411.8639 333.4776 -0.3652356 37.74855 1.581644e-03 NA NA 45 0 NA NA 0.04
#nlminb 411.9678 333.4449 -0.3650271 37.74753 1.581643e-03 50 268 48 0 NA NA 0.07
#spg 422.0394 300.5336 -0.5776862 38.48655 1.693119e-03 1197 NA 619 0 NA NA 1.06
#ucminf 412.7390 332.9228 -0.3652029 37.74829 1.581644e-03 45 45 NA 0 NA NA 0.05
#Rcgmin NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#Rvmmin NA NA NA NA 8.988466e+307 NA NA NA 9999 NA NA 0.00
#newuoa 396.3071 345.1165 -0.3650286 37.74754 1.581643e-03 3877 NA NA 0 NA NA 1.02
#bobyqa 410.0392 334.7074 -0.3650289 37.74753 1.581643e-03 7866 NA NA 0 NA NA 2.07
#nmkb 569.0139 346.0856 282.6526588 -335.32320 2.064859e-03 75 NA NA 0 NA NA 0.01
#hjkb 400.0000 200.0000 0.0100000 2.00000 3.200269e+00 1 NA 0 9999 NA NA 0.01
Levenberg-Marquardt converges too, but nlsLM fails when it tries to create an nls model object from the result because the gradient matrix is singular:
library(minpack.lm)
fit <- nlsLM(y ~ fitting(T1, T2, N, D, x),
start=list(T1=412,T2=333,N=-0.36,D=38), data = DF, trace = TRUE)
#It. 0, RSS = 0.00165827, Par. = 412 333 -0.36 38
#It. 1, RSS = 0.00158186, Par. = 417.352 329.978 -0.3652 37.746
#It. 2, RSS = 0.00158164, Par. = 416.397 330.694 -0.365025 37.7475
#It. 3, RSS = 0.00158164, Par. = 416.618 330.568 -0.365027 37.7475
#It. 4, RSS = 0.00158164, Par. = 416.618 330.568 -0.365027 37.7475
#Error in nlsModel(formula, mf, start, wts) :
# singular gradient matrix at initial parameter estimates
Can anyone tell me if it is possible to incorporate:
a)an interaction term
b)a random effect
in a Tobit regression model in R?
For the interaction term I have been working on the following script, but that doesn't work.
fit <- vglm(GAG_p_DNA~factor(condition)+factor(time)+factor(condition):factor(time),
tobit(Lower = 0))
Error in if ((temp <- sum(wz[, 1:M, drop = FALSE] < wzepsilon))) warning(paste(temp, :
argument is not interpretable as logical
I have also tried this with dummy variables, created in the following way:
time.ch<- C(time, helmert,2)
print(attributes(time.ch))
condition.ch<-C(condition, helmert, 3)
print(attributes(condition.ch))
but I get the same error.
Part of the dataset (GAG_p_DNA values of zero are left censored) (Warning: to those who may be copying this. The OP used tabs as separators.)
Donor Time Condition GAG_p_DNA cens_GAG_p_DNA
1 1 6 0.97 1
1 1 10 0.93 1
1 7 2 16.65 1
1 7 6 0.94 1
1 7 10 1.86 1
1 28 2 21.66 1
1 28 6 0.07 1
1 28 10 3.48 1
2 1 1 1.16 1
2 1 2 2.25 1
2 1 6 2.41 1
2 1 10 1.88 1
2 7 2 13.19 1
2 7 10 2.54 1
2 28 2 23.93 1
2 28 6 0 0
2 28 10 15.17 1
I most likely need to use a Tobit regression model, as it seems that a Cox model with left censored data is not supported by R...
fit<- survfit(Surv(GAG_p_DNA, cens_GAG_p_DNA, type="left")~factor(condition)+factor(Time))] [Error in coxph(Surv(GAG_p_DNA, cens_GAG_p_DNA, type = "left") ~ factor(condition) + : Cox model doesn't support "left" survival data
Try this:
survreg(Surv( GAG_p_DNA, cens_GAG_p_DNA, type='left') ~
factor(Time)*factor(Condition), data=sdat, dist='gaussian')
(Recommended by Therneau: http://markmail.org/search/?q=list%3Aorg.r-project.r-help+therneau+left+censor+tobit#query:list%3Aorg.r-project.r-help%20therneau%20left%20censor%20tobit+page:1+mid:fnczjvrnjlx5jsp5+state:results )
--- earlier effort;
With that tiny dataset (where I have corrected the use of tabs as separators) you won't get much. I corrected two errors (spelling of "Condition" and using 0 for left censoring where it should be 2 and it runs without error:
sdat$cens_GAG_p_DNA[sdat$cens_GAG_p_DNA==0] <- 2
fit <- survfit(Surv(GAG_p_DNA, cens_GAG_p_DNA, type="left") ~
factor(Condition) + factor(Time), data=sdat)
Warning messages:
1: In min(jtimes) : no non-missing arguments to min; returning Inf
2: In min(jtimes) : no non-missing arguments to min; returning Inf
3: In min(jtimes) : no non-missing arguments to min; returning Inf
4: In min(jtimes) : no non-missing arguments to min; returning Inf
5: In min(jtimes) : no non-missing arguments to min; returning Inf
6: In min(jtimes) : no non-missing arguments to min; returning Inf
7: In min(jtimes) : no non-missing arguments to min; returning Inf
8: In min(jtimes) : no non-missing arguments to min; returning Inf
9: In min(jtimes) : no non-missing arguments to min; returning Inf
> fit
Call: survfit(formula = Surv(GAG_p_DNA, cens_GAG_p_DNA, type = "left") ~
factor(Condition) + factor(Time), data = sdat)
records n.max n.start events median
factor(Condition)=1, factor(Time)=1 1 2 2 0 1.16
factor(Condition)=2, factor(Time)=1 1 2 2 0 2.25
factor(Condition)=2, factor(Time)=7 2 3 3 0 14.92
factor(Condition)=2, factor(Time)=28 2 3 3 0 22.80
factor(Condition)=6, factor(Time)=1 2 3 3 0 1.69
factor(Condition)=6, factor(Time)=7 1 2 2 0 0.94
factor(Condition)=6, factor(Time)=28 2 2 2 2 0.00
factor(Condition)=10, factor(Time)=1 2 3 3 0 1.41
factor(Condition)=10, factor(Time)=7 2 3 3 0 2.20
factor(Condition)=10, factor(Time)=28 2 3 3 0 9.32
0.95LCL 0.95UCL
factor(Condition)=1, factor(Time)=1 NA NA
factor(Condition)=2, factor(Time)=1 NA NA
factor(Condition)=2, factor(Time)=7 13.19 NA
factor(Condition)=2, factor(Time)=28 21.66 NA
factor(Condition)=6, factor(Time)=1 0.97 NA
factor(Condition)=6, factor(Time)=7 NA NA
factor(Condition)=6, factor(Time)=28 0.00 NA
factor(Condition)=10, factor(Time)=1 0.93 NA
factor(Condition)=10, factor(Time)=7 1.86 NA
factor(Condition)=10, factor(Time)=28 3.48 NA
The other aspect which I would call an error as well would be not using a data argument to regression functions. Trying to use "attached" dataframes, with any regression function but especially with the 'survival' package, will often cause strange errors.
I did find that putting in an interaction by way of hte formula method generated this error:
Error in survfit.formula(Surv(GAG_p_DNA, cens_GAG_p_DNA, type = "left") ~ :
Interaction terms are not valid for this function
And I also found that coxme::coxme, which I had speculated might give you access to mixed effects, did not handle left censoring.
fit <- coxme(Surv(GAG_p_DNA, cens_GAG_p_DNA, type="left")~factor(Condition)*factor(Time), data=sdat)
Error in coxme(Surv(GAG_p_DNA, cens_GAG_p_DNA, type = "left") ~ factor(Condition) * :
Cox model doesn't support 'left' survival data
I have one time series, let's say
694 281 479 646 282 317 790 591 573 605 423 639 873 420 626 849 596 486 578 457 465 518 272 549 437 445 596 396 259 390
Now, I want to forecast the following values by ARIMA Model, but ARIMA requires the time series to be stationarity, so before this, I have to identify the time series above matches the requirement or not, then fUnitRoots comes up.
I think http://cran.r-project.org/web/packages/fUnitRoots/fUnitRoots.pdf can offer some help, but there is no simple tutorial
I just want one small demo to show how to identify one time series, is there any one?
thanks in advance.
I will give example using urca package in R.
library(urca)
data(npext) # This is the data used by Nelson and Plosser (1982)
sample.data<-npext
head(sample.data)
year cpi employmt gnpdefl nomgnp interest indprod gnpperca realgnp wages realwag sp500 unemploy velocity M
1 1860 3.295837 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
2 1861 3.295837 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
3 1862 3.401197 NA NA NA NA -0.1053605 NA NA NA NA NA NA NA NA
4 1863 3.610918 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
5 1864 3.871201 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
6 1865 3.850148 NA NA NA NA 0.0000000 NA NA NA NA NA NA NA NA
I will use ADF to perform the unit root test on industrial production index as an illustration. The lag is selected based on the SIC. I use trend as there is trend in the date .
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression trend
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.31644 -0.04813 0.00965 0.05252 0.20504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.052208 0.017273 3.022 0.003051 **
z.lag.1 -0.176575 0.049406 -3.574 0.000503 ***
tt 0.007185 0.002061 3.486 0.000680 ***
z.diff.lag 0.124320 0.089153 1.394 0.165695
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09252 on 123 degrees of freedom
Multiple R-squared: 0.09796, Adjusted R-squared: 0.07596
F-statistic: 4.452 on 3 and 123 DF, p-value: 0.005255
Value of test-statistic is: -3.574 11.1715 6.5748
Critical values for test statistics:
1pct 5pct 10pct
tau3 -3.99 -3.43 -3.13
phi2 6.22 4.75 4.07
phi3 8.43 6.49 5.47
#Interpretation: BIC selects the lag 1 as optimal lag. The test statistics -3.574 is less than the critical value tau3 at 5 percent (-3.430). So, the null that there is an unit root is is rejected only at 5 percent.
Also, check the free forecasting book available here
You can, of course, carry out formal tests such as the ADF test, but I would suggest carrying out "informal tests" of stationarity as a first step.
Inspecting the data visually using plot() will help you identify whether or not the data is stationary.
The next step would be to investigate the autocorrelation function and partial autocorrelation function of the data. You can do this by calling both the acf() and pacf() functions. This will not only help you decide whether or not the data is stationary, but it will also help you identify tentative ARIMA models that can later be estimated and used for forecasting if they get the all clear after carrying out the necessary diagnostic checks.
You should, indeed, pay caution to the fact that there are only 30 observations in the data that you provided. This falls below the practical minimum level of about 50 observations necessary for forecasting using ARIMA models.
If it helps, a moment after I plotted the data, I was almost certain the data was probably stationary. The estimated acf and pacf seem to confirm this view. Sometimes informal tests like that suffice.
This little-book-of-r-for-time-series may help you further.