ADF test in R and Gretl - Why are the results different? - r

I am working on a time series-based study on the Czech Republic. I have macroeconomic data from 1993 to 2021. I tested my time series for stationarity using both R (function adfTest from package fUnitRoots) and Gretl. The results are significantly different to the point that for example the differences of GDP are strongly stationary according to Gretl, but nonstationary according to R. Both the test statistics and p-values are different. Do you have any idea why is that and which result is correct?
The test statistic for differences (I used the "constant" version and 3 lags as recommended by R)
According to R: -1.8587
According to Gretl: -4.27469
The p-values:
According to R: 0.3727
According to Gretl: 0.0004865
I am also enclosing the data
Year;GDP_(CZKm)
1993;1 205 330
1994;1 375 851
1995;1 596 306
1996;1 829 255
1997;1 971 024
1998;2 156 624
1999;2 252 983
2000;2 386 289
2001;2 579 126
2002;2 690 982
2003;2 823 452
2004;3 079 207
2005;3 285 601
2006;3 530 881
2007;3 859 533
2008;4 042 860
2009;3 954 320
2010;3 992 870
2011;4 062 323
2012;4 088 912
2013;4 142 811
2014;4 345 766
2015;4 625 378
2016;4 796 873
2017;5 110 743
2018;5 410 761
2019;5 791 498
2020;5 709 131
2021;6 108 717

Related

How to cancel a bias and analyse the data?

I have a data table like this one, I would like to know which type of substrate (called "Litières" / "Branchages" / "Racines") contributes the most to each score.
in r :
Substrate<-c('Litières','Litières','Racines','Branchages','Branchages','Litières','Branchages','Litières','Litières' )
One<-c(0,22,216,36,288,351,28,12,0)
Two<-c(574,755,1248,504,882,810,431,537,56)
Three<-c(1352,1248,706,1476,846,855,1334,1152,1628)
Four<-c(261,162,17,171,171,171,394,486,503)
x<-data.frame(Substrate,One,Two,Three,Four)
or in a table :
Substrate
One
Two
Three
Four
Litières
0
574
1352
261
Litières
22
755
1248
162
Racines
216
1248
706
17
Branchages
36
504
1476
171
Branchages
288
882
846
171
Litières
351
810
855
171
Branchages
28
431
1334
394
Litières
12
537
1152
486
Litières
0
56
1628
503
However you will notice that the number of substrate is not the same between each type of substrate. How to cancel this bias?
Thank !

How to calculate Williams %R in RStudio?

I am trying to write a function to calculate Williams %R on data in R. Here is my code:
getSymbols('AMD', src = 'yahoo', from = '2018-01-01')
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
-100 * ((highh - close) / (highh - lowl))
}
williampr = wr(AMD$AMD.High, AMD$AMD.Low, AMD$AMD.Close, n = 10)
After implementing a buy/sell/hold signal, it returns integer(0):
## 1 = BUY, 0 = HOLD, -1 = SELL
## implement Lag to shift the time back to the previous day
tradingSignal = Lag(
## if wpr is greater than 0.8, BUY
ifelse(Lag(williampr) > 0.8 & williampr < 0.8,1,
## if wpr signal is less than 0.2, SELL, else, HOLD
ifelse(Lag(williampr) > 0.2 & williampr < 0.2,-1,0)))
## make all missing values equal to 0
tradingSignal[is.na(tradingSignal)] = 0
## see how many SELL signals we have
which(tradingSignal == "-1")
What am I doing wrong?
It would have been a good idea to identify that you were using the package quantmod in your question.
There are two things preventing this from working.
You didn't inspect what you expected! Your results in williampr are all negative. Additionally, you multiplied the values by 100, so 80% is 80, not .8. I removed -100 *.
I have done the same thing so many times.
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
((highh - close) / (highh - lowl))
}
That's it. It works now.
which(tradingSignal == "-1")
# [1] 13 15 19 22 39 71 73 84 87 104 112 130 134 136 144 146 151 156 161 171 175
# [22] 179 217 230 255 268 288 305 307 316 346 358 380 386 404 449 458 463 468 488 492 494
# [43] 505 510 515 531 561 563 570 572 574 594 601 614 635 642 644 646 649 666 668 672 691
# [64] 696 698 719 729 733 739 746 784 807 819 828 856 861 872 877 896 900 922 940 954 968
# [85] 972 978 984 986 1004 1035 1048 1060

No seasonal plot using ETS

I have a time series of 'bicoal.tons' which contains measurements of annual coal production from 1920 to 1968. This data is saved under the name of time_series.
Time Series:
Start = 1920
End = 1968
Frequency = 1
[1] 569 416 422 565 484 520 573 518 501 505 468 382 310 334 359 372 439 446 349 395
[21] 461 511 583 590 620 578 534 631 600 438 516 534 467 457 392 467 500 493 410 412
[41] 416 403 422 459 467 512 534 552 545
For decomposition, I used the code plot(ets(time_series)) and got the following outcome.
As you can see, I cannot find the seasonal nor the random effect plot. Is there something I have done wrong?
Your data is annual so seasonality does not apply (i.e. season is within a year).
Assuming you have a monthly / quarterly / semi-annual data, then ets() may pick a model without seasonality. To force seasonality, you can do something like below (A = additive, see ?ets):
plot(ets(dat, model = "ZZA"))

How to call a variable in loops of R? (create arrays as dictionary)

I'd like to define a series of variables in a for loop. (create a array as dictionary. Convert tops to d1 as shown below)
Firstly, I assign values to them (d1~d11);
then I try to define the names of these variables.
How should I call specific variables in the names() function to make it work like "names(d1)<-..."
for (i = 1:11)
{
assign(paste("d",i,sep=""),tops[,2*i])
names(eval(parse(text=paste("d",i,sep=""))))<-tops[,2*i-1]
}
> tops[,c(1,2)]
V1 V2 V3 V4 V5 V6
1 shift 2136 shift 2211 shift 2324
2 bed 1463 k 1551 plant 1664
3 run 1338 bed 1527 run 1466
4 plant 1309 run 1504 k 1456
5 k 1294 hr 1484 bed 1390
6 hr 1285 clean 1464 hr 1366
7 check 1255 plant 1386 clean 1359
8 clean 1203 check 1261 s 1254
9 s 1052 s 1205 check 1048
10 unload 1024 start 1115 end 1028
11 chang 1023 fine 1113 fine 1020
12 fine 960 chang 1104 start 1006
13 end 924 end 1050 chang 977
14 start 905 stop 974 stop 950
15 pellet 878 pellet 915 pellet 897
16 work 866 work 907 remov 874
17 due 856 screen 900 sinter 862
18 stop 853 bwr 888 side 841
19 complet 772 side 888 due 809
20 remov 750 due 861 conveyor 792
21 requir 726 complet 841 work 777
22 sinter 711 sinter 834 north 771
23 south 710 conveyor 775 south 760
24 side 688 north 768 west 738
25 issu 682 remov 764 belt 737
26 t 675 ok 759 carri 735
27 belt 672 t 753 screen 727
28 carri 668 requir 750 stock 725
29 strand 649 unload 749 unload 719
30 conveyor 646 chute 747 chute 688
> d1
shift bed run plant k hr check clean s
2136 1463 1338 1309 1294 1285 1255 1203 1052
unload chang fine end start pellet work due stop
1024 1023 960 924 905 878 866 856 853
complet remov requir sinter south side issu t belt
772 750 726 711 710 688 682 675 672
carri strand conveyor
668 649 646
> length(d1)
[1] 30
I hope I make it clear. if not, please free to ask me
As David mentioned, don't assign 11 different variables; create a list with 11 elements. This will simplify your code considerably.
d <- lapply(1:11, function(i) tops[, 2 * i = 1])

Smoothing a plot in r

I have a time series. If i draw this time series I have such a diagram
my Data:
539 532 531 538 544 554 575 571 543 559 511 525 512 540
535 514 524 527 532 547 564 548 572 564 549 532 519 520
520 543 550 542 528 523 531 548 554 574 575 560 534 518
511 519 527 554 543 527 540 524 523 539 569 552 553 540
522 522 492 519 532 527 532 550 535 517 551 548 571 574
539 535 515 512 510 527 533 543 540 533 519 539 555 542
574 543 555 539 507 522 518 519 516 546 523 530 532 539
540 568 554 563 550 526 509 492 525 519 527 526 515 530
531 553 563 562 576 568 539 516 512 500 516 542 522 527
523 531
How can I smooth this graph, to see the sin function more clearly
Here are some things to get you started.
df <- data.frame(index=1:length(values),values)
# loess smoothing; note the use of predict(fit)
fit.loess <- loess(values~index,df,span=.1)
plot(df, type="l", col="blue",main="loess")
lines(df$index,predict(fit.loess),col="red")
# non-linear regression usign a single sine term
fit.nls <- nls(values~a*sin(b*index+c)+d,df,
start=c(a=1000,b=pi/10,c=0,d=mean(df$values)))
plot(df, type="l", col="blue",main="sin [1 term]")
lines(df$index,predict(fit.nls),col="red")
# non-linear regression using 2 sine terms
fit.nls <- nls(values~a1*sin(b1*index+c1)+a2*sin(b2*index+c2)+d,df,
start=c(a1=1000,b1=pi/10,c1=1,
a2=1000,b2=pi/2,c2=1,d=mean(df$values)))
plot(df, type="l", col="blue",main="sin [2 terms]")
lines(df$index,predict(fit.nls),col="red")
From the non-linear fits you can get an estimate of the period (b) using summary(fit.nls).
Read the documentation on loess, nls, and predict
You can use a smoothing function from any R package you wish. Basically, you can perform a moving average function like ARIMA models.
Something that is very easy to explore is this scenario (I hope this helps you):
#Read the data
cd4Data <- read.table("./RData/cd4.data", col.names=c("time", "cd4", "age", "packs", "drugs", "sex", "cesd", "id"))
cd4Data <- cd4Data[order(cd4Data$time),]
head(cd4Data)
#Plot the data
par(mfrow=c(1,1))
plot(cd4Data$time,cd4Data$cd4,pch=19,cex=0.1)
#A moving average (With 3 points average)
plot(cd4Data$time,cd4Data$cd4,pch=19,cex=0.1)
aveTime <- aveCd4 <- rep(NA,length(3:(dim(cd4Data)[1]-2)))
for(i in 3:(dim(cd4Data)[1]-2)){
aveTime[i] <- mean(cd4Data$time[(i-2):(i+2)])
aveCd4[i] <- mean(cd4Data$cd4[(i-2):(i+2)])
}
lines(aveTime,aveCd4,col="blue",lwd=3)
#Average many more points
plot(cd4Data$time,cd4Data$cd4,pch=19,cex=0.1)
aveTime <- aveCd4 <- rep(NA,length(201:(dim(cd4Data)[1]-200)))
for(i in 201:(dim(cd4Data)[1]-2)){
aveTime[i] <- mean(cd4Data$time[(i-200):(i+200)])
aveCd4[i] <- mean(cd4Data$cd4[(i-200):(i+200)])
}
lines(aveTime,aveCd4,col="blue",lwd=3)

Resources