Smoothing a plot in r - r

I have a time series. If i draw this time series I have such a diagram
my Data:
539 532 531 538 544 554 575 571 543 559 511 525 512 540
535 514 524 527 532 547 564 548 572 564 549 532 519 520
520 543 550 542 528 523 531 548 554 574 575 560 534 518
511 519 527 554 543 527 540 524 523 539 569 552 553 540
522 522 492 519 532 527 532 550 535 517 551 548 571 574
539 535 515 512 510 527 533 543 540 533 519 539 555 542
574 543 555 539 507 522 518 519 516 546 523 530 532 539
540 568 554 563 550 526 509 492 525 519 527 526 515 530
531 553 563 562 576 568 539 516 512 500 516 542 522 527
523 531
How can I smooth this graph, to see the sin function more clearly

Here are some things to get you started.
df <- data.frame(index=1:length(values),values)
# loess smoothing; note the use of predict(fit)
fit.loess <- loess(values~index,df,span=.1)
plot(df, type="l", col="blue",main="loess")
lines(df$index,predict(fit.loess),col="red")
# non-linear regression usign a single sine term
fit.nls <- nls(values~a*sin(b*index+c)+d,df,
start=c(a=1000,b=pi/10,c=0,d=mean(df$values)))
plot(df, type="l", col="blue",main="sin [1 term]")
lines(df$index,predict(fit.nls),col="red")
# non-linear regression using 2 sine terms
fit.nls <- nls(values~a1*sin(b1*index+c1)+a2*sin(b2*index+c2)+d,df,
start=c(a1=1000,b1=pi/10,c1=1,
a2=1000,b2=pi/2,c2=1,d=mean(df$values)))
plot(df, type="l", col="blue",main="sin [2 terms]")
lines(df$index,predict(fit.nls),col="red")
From the non-linear fits you can get an estimate of the period (b) using summary(fit.nls).
Read the documentation on loess, nls, and predict

You can use a smoothing function from any R package you wish. Basically, you can perform a moving average function like ARIMA models.
Something that is very easy to explore is this scenario (I hope this helps you):
#Read the data
cd4Data <- read.table("./RData/cd4.data", col.names=c("time", "cd4", "age", "packs", "drugs", "sex", "cesd", "id"))
cd4Data <- cd4Data[order(cd4Data$time),]
head(cd4Data)
#Plot the data
par(mfrow=c(1,1))
plot(cd4Data$time,cd4Data$cd4,pch=19,cex=0.1)
#A moving average (With 3 points average)
plot(cd4Data$time,cd4Data$cd4,pch=19,cex=0.1)
aveTime <- aveCd4 <- rep(NA,length(3:(dim(cd4Data)[1]-2)))
for(i in 3:(dim(cd4Data)[1]-2)){
aveTime[i] <- mean(cd4Data$time[(i-2):(i+2)])
aveCd4[i] <- mean(cd4Data$cd4[(i-2):(i+2)])
}
lines(aveTime,aveCd4,col="blue",lwd=3)
#Average many more points
plot(cd4Data$time,cd4Data$cd4,pch=19,cex=0.1)
aveTime <- aveCd4 <- rep(NA,length(201:(dim(cd4Data)[1]-200)))
for(i in 201:(dim(cd4Data)[1]-2)){
aveTime[i] <- mean(cd4Data$time[(i-200):(i+200)])
aveCd4[i] <- mean(cd4Data$cd4[(i-200):(i+200)])
}
lines(aveTime,aveCd4,col="blue",lwd=3)

Related

ADF test in R and Gretl - Why are the results different?

I am working on a time series-based study on the Czech Republic. I have macroeconomic data from 1993 to 2021. I tested my time series for stationarity using both R (function adfTest from package fUnitRoots) and Gretl. The results are significantly different to the point that for example the differences of GDP are strongly stationary according to Gretl, but nonstationary according to R. Both the test statistics and p-values are different. Do you have any idea why is that and which result is correct?
The test statistic for differences (I used the "constant" version and 3 lags as recommended by R)
According to R: -1.8587
According to Gretl: -4.27469
The p-values:
According to R: 0.3727
According to Gretl: 0.0004865
I am also enclosing the data
Year;GDP_(CZKm)
1993;1 205 330
1994;1 375 851
1995;1 596 306
1996;1 829 255
1997;1 971 024
1998;2 156 624
1999;2 252 983
2000;2 386 289
2001;2 579 126
2002;2 690 982
2003;2 823 452
2004;3 079 207
2005;3 285 601
2006;3 530 881
2007;3 859 533
2008;4 042 860
2009;3 954 320
2010;3 992 870
2011;4 062 323
2012;4 088 912
2013;4 142 811
2014;4 345 766
2015;4 625 378
2016;4 796 873
2017;5 110 743
2018;5 410 761
2019;5 791 498
2020;5 709 131
2021;6 108 717

How to cancel a bias and analyse the data?

I have a data table like this one, I would like to know which type of substrate (called "Litières" / "Branchages" / "Racines") contributes the most to each score.
in r :
Substrate<-c('Litières','Litières','Racines','Branchages','Branchages','Litières','Branchages','Litières','Litières' )
One<-c(0,22,216,36,288,351,28,12,0)
Two<-c(574,755,1248,504,882,810,431,537,56)
Three<-c(1352,1248,706,1476,846,855,1334,1152,1628)
Four<-c(261,162,17,171,171,171,394,486,503)
x<-data.frame(Substrate,One,Two,Three,Four)
or in a table :
Substrate
One
Two
Three
Four
Litières
0
574
1352
261
Litières
22
755
1248
162
Racines
216
1248
706
17
Branchages
36
504
1476
171
Branchages
288
882
846
171
Litières
351
810
855
171
Branchages
28
431
1334
394
Litières
12
537
1152
486
Litières
0
56
1628
503
However you will notice that the number of substrate is not the same between each type of substrate. How to cancel this bias?
Thank !

How to calculate Williams %R in RStudio?

I am trying to write a function to calculate Williams %R on data in R. Here is my code:
getSymbols('AMD', src = 'yahoo', from = '2018-01-01')
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
-100 * ((highh - close) / (highh - lowl))
}
williampr = wr(AMD$AMD.High, AMD$AMD.Low, AMD$AMD.Close, n = 10)
After implementing a buy/sell/hold signal, it returns integer(0):
## 1 = BUY, 0 = HOLD, -1 = SELL
## implement Lag to shift the time back to the previous day
tradingSignal = Lag(
## if wpr is greater than 0.8, BUY
ifelse(Lag(williampr) > 0.8 & williampr < 0.8,1,
## if wpr signal is less than 0.2, SELL, else, HOLD
ifelse(Lag(williampr) > 0.2 & williampr < 0.2,-1,0)))
## make all missing values equal to 0
tradingSignal[is.na(tradingSignal)] = 0
## see how many SELL signals we have
which(tradingSignal == "-1")
What am I doing wrong?
It would have been a good idea to identify that you were using the package quantmod in your question.
There are two things preventing this from working.
You didn't inspect what you expected! Your results in williampr are all negative. Additionally, you multiplied the values by 100, so 80% is 80, not .8. I removed -100 *.
I have done the same thing so many times.
wr = function(high, low, close, n) {
highh = runMax((high),n)
lowl = runMin((low),n)
((highh - close) / (highh - lowl))
}
That's it. It works now.
which(tradingSignal == "-1")
# [1] 13 15 19 22 39 71 73 84 87 104 112 130 134 136 144 146 151 156 161 171 175
# [22] 179 217 230 255 268 288 305 307 316 346 358 380 386 404 449 458 463 468 488 492 494
# [43] 505 510 515 531 561 563 570 572 574 594 601 614 635 642 644 646 649 666 668 672 691
# [64] 696 698 719 729 733 739 746 784 807 819 828 856 861 872 877 896 900 922 940 954 968
# [85] 972 978 984 986 1004 1035 1048 1060

No seasonal plot using ETS

I have a time series of 'bicoal.tons' which contains measurements of annual coal production from 1920 to 1968. This data is saved under the name of time_series.
Time Series:
Start = 1920
End = 1968
Frequency = 1
[1] 569 416 422 565 484 520 573 518 501 505 468 382 310 334 359 372 439 446 349 395
[21] 461 511 583 590 620 578 534 631 600 438 516 534 467 457 392 467 500 493 410 412
[41] 416 403 422 459 467 512 534 552 545
For decomposition, I used the code plot(ets(time_series)) and got the following outcome.
As you can see, I cannot find the seasonal nor the random effect plot. Is there something I have done wrong?
Your data is annual so seasonality does not apply (i.e. season is within a year).
Assuming you have a monthly / quarterly / semi-annual data, then ets() may pick a model without seasonality. To force seasonality, you can do something like below (A = additive, see ?ets):
plot(ets(dat, model = "ZZA"))

Data point missing when conducing phylogenetic independent contrasts on R

I am conducting a phylogenetic independent contrast on sex ratios in 200 bird species using RStudio. I need to be able to identify which post-contrast value is from which species so I can find an outlier. However, the row names disappear when I conduct the pic and when I try to add them, I get the error message
Error in names(csr) <- row.names(data) : 'names' attribute [200] must be the same length as the vector [199]
I downloaded my pic values for 'csr' to a .csv and found that the length is indeed now 199. Why is a species missing and how can I attach the species name to my pic values?
Quick note: I thought maybe this was an issue of duplicate data in my original data frame but I checked and none of my species are duplicated.
New R user, thanks in advance!
Library(ape)
Library(maps)
Library(phytools)
Library(geiger)
#upload data
>data<-read.csv(file.choose(),row.names=1)
#upload set of trees
> et<-read.nexus(file.choose())
#form consensus tree
> cet<-consensus.edges(et,"least.squares")
[1] "RSS: 74.744666106374"
#make tree dichotomous
> rcet<-multi2di(cet)
#check tree and data match up
> name.check(rcet,data)
[1] "OK"
> sr<-data$SR
> names(sr)<-row.names(data)
> csr<-pic(sr,rcet)
#check to see if row names are attached
>head.matrix(csr)
201 202 203 204
5.090712e-04 -9.531727e-04 1.648872e-03 -4.288288e-03
205 206 207 208
1.460023e-03 1.940847e-03 1.430754e-03 8.495663e-04
209 210 211 212
-1.590387e-03 -4.440047e-03 5.930776e-03 3.885212e-03
213 214 215 216
-8.180639e-03 4.020204e-03 9.256576e-03 2.211563e-02
217 218 219 220
-1.236238e-02 2.187909e-02 8.064300e-03 2.221089e-02
221 222 223 224
-2.730282e-02 5.591690e-03 -1.043775e-02 -5.360213e-03
225 226 227 228
1.414753e-02 1.078473e-02 -2.452003e-03 2.211674e-03
229 230 231 232
3.004410e-03 -5.365461e-03 -5.391057e-03 5.968397e-03
233 234 235 236
8.282451e-03 -7.260091e-03 4.575852e-03 2.860073e-03
237 238 239 240
1.052456e-02 -1.903541e-03 1.125396e-02 6.927645e-03
241 242 243 244
-3.089605e-02 -1.153509e-02 3.953120e-02 5.213560e-02
245 246 247 248
5.349170e-03 1.309613e-03 8.532669e-03 3.641861e-02
249 250 251 252
-2.571262e-02 2.900506e-03 -3.481454e-02 -6.424101e-03
253 254 255 256
9.802964e-03 -3.150135e-03 -1.101131e-02 -2.131249e-02
257 258 259 260
6.274294e-02 2.587949e-02 -8.674770e-04 6.398537e-03
261 262 263 264
-2.207722e-02 6.961859e-03 -7.092074e-03 3.326304e-04
265 266 267 268
-8.826976e-04 2.446652e-02 3.202071e-03 -4.300357e-03
269 270 271 272
-8.697415e-03 1.632332e-02 1.139373e-02 -8.293938e-03
273 274 275 276
3.187131e-03 -2.838793e-03 -3.491220e-03 7.986199e-03
277 278 279 280
-5.931380e-03 8.005507e-04 -1.515201e-03 6.203605e-03
281 282 283 284
-1.763623e-03 2.263001e-02 2.058192e-03 -6.677623e-03
285 286 287 288
-6.068511e-04 1.232161e-02 1.137790e-02 1.129776e-02
289 290 291 292
1.467367e-02 -1.221627e-02 -1.236961e-02 2.468580e-03
293 294 295 296
-1.562174e-02 2.392474e-03 -2.466936e-04 9.032847e-03
297 298 299 300
7.028428e-03 -1.605058e-02 1.090764e-01 3.823460e-03
301 302 303 304
-3.617284e-04 -3.620753e-03 -1.493839e-03 1.757362e-03
305 306 307 308
4.024892e-03 1.011166e-03 3.607874e-04 2.564815e-04
309 310 311 312
1.339123e-03 7.928470e-04 -1.579597e-03 5.422977e-03
313 314 315 316
-2.079001e-03 9.967008e-03 9.050382e-03 -8.922487e-03
317 318 319 320
-1.695307e-04 1.028737e-02 1.216367e-02 3.031379e-03
321 322 323 324
-1.263116e-02 -1.537278e-02 2.242444e-04 2.426469e-03
325 326 327 328
-2.664895e-03 3.884286e-03 6.880529e-03 -5.927206e-04
329 330 331 332
9.830635e-03 1.280008e-03 1.424032e-02 -7.288540e-04
333 334 335 336
-9.240581e-04 3.195132e-04 -4.259236e-03 -2.214205e-03
337 338 339 340
-6.881941e-03 -6.423759e-03 -2.609067e-03 -3.503663e-05
341 342 343 344
2.788641e-03 1.372338e-03 6.089936e-04 9.587636e-04
345 346 347 348
3.785345e-03 -2.026423e-03 -1.177728e-03 6.512821e-04
349 350 351 352
-3.906498e-04 -8.785059e-03 -1.431750e-03 -2.324442e-04
353 354 355 356
-1.076415e-03 1.441769e-03 -1.714267e-03 -1.674929e-03
357 358 359 360
8.652113e-04 1.238680e-03 -7.712385e-04 2.297910e-02
361 362 363 364
2.757363e-03 -1.088451e-05 -7.907335e-03 -4.752825e-03
365 366 367 368
-1.202807e-03 5.597340e-03 -2.864217e-04 8.340569e-04
369 370 371 372
-3.930913e-03 -5.725912e-03 4.980890e-04 3.697257e-03
373 374 375 376
5.995110e-03 -1.339679e-03 -9.186386e-03 8.241024e-03
377 378 379 380
-6.799925e-03 -3.594279e-03 3.265258e-03 3.038261e-03
381 382 383 384
-9.738195e-04 1.535296e-03 -8.603250e-04 -4.378884e-03
385 386 387 388
-2.952824e-03 2.063849e-03 -4.624888e-03 3.525655e-03
389 390 391 392
-5.207749e-03 -9.276466e-04 1.684872e-03 2.511384e-03
393 394 395 396
-2.189145e-03 -1.098284e-02 -4.546533e-03 -1.349024e-03
397 398 399
-5.619031e-04 -5.592868e-03 8.620104e-03
#they’re not so go to add them
> names(csr)<-row.names(data)
Error in names(csr) <- row.names(data) :
'names' attribute [200] must be the same length as the vector [199]

Resources