ARIMA, ARMA and AICs? - r

data <-c(88, 84, 85, 85, 84, 85, 83, 85, 88, 89, 91, 99, 104, 112, 126, 138, 146,151, 150, 148, 147, 149, 143, 132, 131, 139, 147, 150, 148, 145, 140, 134, 131, 131, 129, 126, 126, 132, 137, 140, 142, 150, 159, 167, 170, 171, 172, 172, 174, 175, 172, 172, 174, 174, 169, 165, 156, 142, 131, 121, 112, 104, 102, 99, 99, 95, 88, 84, 84, 87, 89, 88, 85, 86, 89, 91, 91, 94, 101, 110, 121, 135, 145, 149, 156, 165, 171, 175, 177, 182, 193, 204, 208, 210, 215, 222, 228, 226, 222, 220)
Why do the ARMA models acting on the first differences of the data differ from the corresponding ARIMA models?
for (p in 0:5)
{
for (q in 0:5)
{
#data.arma = arima(diff(data), order = c(p, 0, q));cat("p =", p, ", q =", q, "AIC =", data.arma$aic, "\n");
data.arma = arima(data, order = c(p, 1, q));cat("p =", p, ", q =", q, "AIC =", data.arma$aic, "\n");
}
}
Same with Arima(data,c(5,1,4)) and Arima(diff(data),c(5,0,4)) in the forecast package. I can get the desired consistency with
auto.arima(diff(data),max.p=5,max.q=5,d=0,approximation=FALSE, stepwise=FALSE, ic ="aic", trace=TRUE);
auto.arima(data,max.p=5,max.q=5,d=1,approximation=FALSE, stepwise=FALSE, ic ="aic", trace=TRUE);
but it seems the holder of the minimum AIC estimate for these data has not been considered by the algorithm behind auto.arima; hence the suboptimal choice of ARMA(3,0) instead of ARMA(5,4) acting on the first differences. A related question is how much the two AIC estimates should differ before one considers one model better than the other has little to do wuth programming - the smallest AIC holder should at least be considered/reported, even though 9 coefficients may be a bit too much for a forecast from 100 observations.
My R questions are:
1) Vectorised version of the double loop so it is faster?
2) Why does arima(5,1,4) acting on the data differ from arma(5,4) acting on the first differences of the data? Which one is to be reported?
3) How do I sort the AICs output so that the smaller come first?
Thanks.

There are a lot of questions and issues raised here. I'll try to respond to each of them.
Arima() is just a wrapper for arima(), so it will give the same model.
arima() handles a model with differencing by using a diffuse prior. That is not the same as just differencing the data before fitting the model. Consequently, you will get slightly different results from arima(x,order=c(p,1,q)) and arima(diff(x),order=c(p,0,q)).
auto.arima() handles differencing directly and does not use a diffuse prior when fitting. So you will get the same results from auto.arima(x,d=1,...) and auto.arima(diff(x),d=0,...)
auto.arima() has an argument max.order which specifies the maximum of p+q. By default, max.order=5, so your arima(5,1,4) would not be considered. Increase max.order if you want to consider such large models (although I wouldn't recommend it).
You can't vectorize a loop involving nonlinear optimization at each iteration.
If you want to sort your output, you'll need to save it to a data.frame and then sort on the relevant column. The code currently just spits out the results as it goes and nothing is saved except for the most recent model fitted.

Related

Writing a function to run a F-test with two separate shapiro-wilks tests conducted inside the function

The function should perform as follows: The function takes the
arguments: x1, x2, alt = "two-sided", lev = 0.95, where the equality
indicates the default value.
•The arguments x1 and x2 are the X1 and X2 samples, respectively.
•The argument alt is the alternative hypothesis whose two other
possible values are "greater" and "less".
•The argument lev is the confidence level 1 −α. ii. The function
returns an R list containing the test statistic, p-value, confidence
level, and confidence interval.
iii. Inside the function, two Shapiro-Wilk tests of normality are
conducted separately for the two samples (note the normality
assumption at the beginning of the problem). If one or both p-values
are less than 0.05, a warning message is printed out explaining the
situation.
Here is what I have come up with so far but not sure how to create one function to run both:
library(stats)
x1 <- c(103, 94, 110, 87, 98, 102, 86, 98, 109, 92)
x2 <- c(97, 82, 123, 92, 175, 88, 118, 81, 165, 97, 134, 92, 87, 114)
var.test(x1, x2, alternative = "two.sided", conf.level = 0.95)
shapiro.test(x1)$p.value < 0.05|shapiro.test(x2)$p.value < 0.05
Some hints:
Your task is to write a function, so you should have something like this:
my_function <- function(x1, x2, alt = "two-sided", level = 0.95){
# fill in the body of the function here
}
You can do whatever you need to do in the body of the function.
Recall that in R, the last evaluated line of a function is automatically its returned value. So, you might choose to have your last line be list(...) as described in the problem statement.
It will be useful to store results of tests, etc. as variables inside your function, e.g. test_output_1 <- ... so that you can reference those things later in the body of your function.

Lookup value from another matrix

I have a stepwise structure of tariffs for treatments. Treatment between 0-33 hrs receive the tariff $96. Treatments between 34 and 96 hours receive 224, etc.
I would like to create graph with treatment hours and tariffs with the hrs on the x-axis and tariff on y-axis. In order to do that I need to create a variable that gives me the corresponding tariff for each treatment hour ('hr'). How do I do this in R?
min <- c(381, 201, 97, 34, 0)
max <- c(NA, 380, 200, 96, 33)
tariff2019 <- c(779, 536, 368, 224, 96)
dat <- data.frame(hr=seq(401))
dat$tariff <-

predicting values witn non-linear regression

My non-linear model is the following:
fhw <- data.frame(
time=c(10800, 10810, 10820, 10830, 10840, 10850, 10860, 10870, 10880, 10890),
water=c( 105, 103, 103, 104, 107, 109, 112, 113, 113, 112)
)
nl <- nls(formula = water ~ cbind(1,poly(time,4),sin(omega_1*time+phi_1),
sin(omega_2*time+phi_2),
sin(omega_3*time+phi_3)), data = fhw,
start = list(omega_1=(2*pi)/545, omega_2=(2*pi)/205,
omega_3=(2*pi)/85, phi_1=pi, phi_2=pi, phi_3=pi),
algorithm = "plinear", control = list(maxiter = 1000))
Time is between 10800 and 17220, but I want to predict ahead. Using the function predict like this:
predict(nl,data.frame(time=17220:17520))
gives wrong results, since the first value it returns is complete different than the last value it return when I use predict(nl). I think the problem has something to do with poly, but I'm not sure. Furthermore, predicting at one time point, gives the error: degree' must be less than number of unique points. Can anybody help?

exponential function in R

I have basic knowledge in R, I would like to know how to write a code of an exponential function in R
F(X)=B(1-e^-AX)
where A=lambda parameter, B is a parameter represents the Y data, X represents the X data below.
I need the exponential model to generate the curve to fit the data; for example:
X <- c(22, 44, 69, 94, 119, 145, 172, 199, 227, 255)
PS: this x-axis in numbers (in millions).
Y <- c(1, 7, 8, 12, 12, 14, 14, 18, 19, 22)
This y-axis
any idea of how to write the code and fit this model in the data...?
In R you can write an exponential function with exp(), in your case:
F <- Y*(1-exp(-A*X))

Polynomial regression in Maple

In Maple I have two lists
A:=[seq(i, i=1..10)];
B:=[10, 25, 43, 63, 83, 92, 99, 101, 101, 96];
Is it possible to do polynomial or power regression in Maple?
I want to fit a trend line as a 3rd order polynomium where each point is (A[i], B[i]).
All you need is
Statistics:-LinearFit([1,x,x^2,x^3], A, B, x);

Resources