Extracting seasonal effect without using stl or decompose - r

I have a data named 'bicoal' which consists of annual bituminous coal production in the United States from 1920 to 1968.
`Time Series:
Start = 1920
End = 1968
Frequency = 1
[1] 569 416 422 565 484 520 573 518 501 505 468 382 310 334 359 372 439 446 349 395
[21] 461 511 583 590 620 578 534 631 600 438 516 534 467 457 392 467 500 493 410 412
[41] 416 403 422 459 467 512 534 552 545`
I made a time series, saved under the name time_series, and wanted to extract the seasonal effect using the code plot(decompose(time_series)) and plot(stl(time_series)), but got an error message
Error in stl(time_series) :
series is not periodic or has less than two periods
Error in decompose(time_series) :
time series has no or less than 2 periods
If stl nor decompose doesn't work, is there a way to extract the seasonal effect?

Without seeing how your time series is constructed I think this might be your problem.
data <- rep(seq(1,5),5)
ts.1 <- ts(data)
stl(ts.1)
Now to fix this issue the ts function has a frequency argument that defines the period of the data.
ts.2 <- ts(data, frequency = 5)
stl(ts.2, s.window = "periodic")

Related

Automize portfolios volatilities computation in R

Thanks for reading my post. I have a series of portfolios created from the combination of several stocks. I should compute the volatility of those portfolios using the historical daily performances of each stock. Since I have all the combinations in one data frame (called final_output), and all stocks return in another data frame (called perf, where the columns are stocks and rows days) I don't know which will be the most efficient way to automize the process. Below you can find an extract:
> Final_output
ISIN_1 ISIN_2 ISIN_3 ISIN_4
2 CH0595726594 CH1111679010 XS1994697115 CH0587331973
3 CH0595726594 CH1111679010 XS1994697115 XS2027888150
4 CH0595726594 CH1111679010 XS1994697115 XS2043119358
5 CH0595726594 CH1111679010 XS1994697115 XS2011503617
6 CH0595726594 CH1111679010 XS1994697115 CH1107638921
7 CH0595726594 CH1111679010 XS1994697115 XS2058783270
8 CH0595726594 CH1111679010 XS1994697115 JE00BGBBPB95
> perf
CH0595726594 CH1111679010 XS1994697115 CH0587331973
626 0.0055616769 -0.0023656130 1.363791e-03 1.215922e-03
627 0.0086094443 0.0060037334 0.000000e+00 2.519220e-03
628 0.0053802380 0.0009027081 0.000000e+00 7.508635e-04
629 -0.0025213543 -0.0022046297 4.864050e-05 1.800720e-04
630 0.0192416817 0.0093401627 -6.079767e-03 3.800836e-03
631 -0.0101224820 0.0051741294 6.116956e-03 -1.345184e-03
632 -0.0013293793 -0.0100475153 -4.494163e-03 -1.746106e-03
633 0.0036350604 0.0012999350 3.801130e-03 -5.997121e-05
634 0.0030097434 -0.0011484496 -1.187614e-03 -2.069131e-03
635 0.0002034381 0.0030493901 -1.851762e-03 -3.806280e-04
636 -0.0035594427 0.0167455769 -2.148123e-04 -4.709560e-04
637 0.0007654623 -0.0051958237 -3.711191e-04 1.604010e-04
638 0.0107592678 -0.0016260163 4.298764e-04 3.397951e-03
639 0.0050953486 -0.0007403020 2.011738e-03 8.790770e-04
640 0.0008532851 -0.0071121648 -9.746114e-04 5.389598e-04
641 -0.0068204614 0.0133810874 -9.755622e-05 -1.346674e-03
642 0.0091395678 0.0102591793 1.717157e-03 -1.977785e-03
643 0.0027520640 -0.0157912638 1.256440e-03 -1.301119e-04
644 -0.0048902196 0.0039494471 -1.624514e-03 -3.373340e-03
645 -0.0116838833 0.0062450826 6.625549e-04 1.205255e-03
646 0.0004566442 -0.0018570102 -3.456636e-03 4.474138e-03
647 0.0041586368 0.0085679315 4.435933e-03 1.957455e-03
648 0.0007575758 0.0002912621 0.000000e+00 2.053306e-03
649 0.0046429473 -0.0138309230 -4.435798e-03 1.541798e-03
650 0.0049731250 -0.0488164953 4.181975e-03 -9.733133e-04
651 0.0008497451 -0.0033110870 2.724477e-04 -7.555498e-04
652 0.0004494831 0.0049831300 -8.657588e-04 -1.790813e-04
653 -0.0058905751 0.0020143588 8.178287e-04 -1.213991e-03
654 0.0000000000 0.0167525773 4.864050e-05 9.365068e-04
655 0.0010043186 0.0048162231 0.000000e+00 -2.110146e-03
656 -0.0024079462 -0.0100403633 -2.431907e-03 -9.176600e-04
657 -0.0095544604 -0.0193670047 0.000000e+00 -8.935435e-03
658 0.0008123477 0.0114339172 2.437835e-03 5.530483e-03
659 0.0022828734 -0.0015415446 -3.239300e-03 2.765060e-03
660 0.0049096523 -0.0001029283 3.199079e-02 2.327835e-03
661 -0.0027702226 -0.0357198003 9.456712e-04 3.189602e-04
662 -0.0008081216 -0.0139311449 -2.891020e-02 -1.295363e-03
663 -0.0033867462 0.0068745264 -2.529552e-03 -1.496588e-04
664 -0.0015216068 -0.0558572120 -3.023653e-03 -7.992975e-03
665 0.0052829422 0.0181072771 4.304652e-03 -3.319519e-03
666 0.0084386054 0.0448545861 -8.182748e-04 4.279284e-03
667 -0.0076664829 -0.0059415480 -2.047362e-04 6.059936e-03
668 -0.0062108665 -0.0039847073 7.313506e-04 5.993467e-04
669 -0.0053350948 0.0068119154 -1.042631e-02 -2.056524e-03
670 -0.0263588067 0.0245395479 -2.188962e-02 -6.732491e-03
671 -0.0021511018 0.0220649895 1.412435e-02 1.702085e-03
672 0.0205058100 -0.0007179119 3.057527e-03 -1.002423e-02
673 0.0096862280 -0.0194488633 1.207407e-03 -1.553899e-03
674 0.0007143951 -0.0068557672 6.227450e-03 1.790274e-03
675 -0.0021926470 -0.0051114507 -6.267498e-03 -1.035691e-03
676 0.0076655765 -0.0139300847 6.583825e-03 3.059472e-03
677 -0.0032457653 0.0180480206 -4.635495e-03 1.064002e-03
678 0.0036633764 0.0060676410 -2.762676e-04 5.364970e-04
679 -0.0008111122 -0.0013635410 -1.065898e-03 1.214059e-03
680 0.0050228311 0.0055141267 3.003507e-03 1.121643e-03
681 -0.0007067495 0.0147281558 -2.699002e-03 -1.514035e-04
682 -0.0024248548 0.0002573473 -2.113685e-03 -1.423409e-03
683 -0.0002025624 0.0138417207 -4.374895e-03 1.415328e-04
684 -0.0141822418 -0.0169517332 -3.578920e-03 -1.799234e-03
685 -0.0005651749 -0.0259693324 -5.926428e-03 -3.635333e-03
686 0.0004112688 0.0133043570 -1.545642e-03 1.981828e-03
687 -0.0150565262 -0.0107757493 -1.717916e-02 -1.328749e-02
688 0.0039129754 -0.0441013167 -8.376631e-03 -5.653841e-04
689 0.0019748467 0.0115063340 -2.835394e-02 7.868428e-03
690 0.0072614108 0.0358764014 3.586897e-02 7.960077e-03
691 -0.0003604531 0.0106119001 1.024769e-04 -2.733651e-04
What I should do is look for each portfolio (each row of final_output is a portfolio, i.e. 4 stocks portfolio) in perf and compute the volatility (standard deviation) of that portfolio using the stocks historical daily performances of the last three months. (Of course, here I have pasted only 4 stocks performances for simplicity.) Once done for the first, I should do the same for all the other rows (portfolios).
Below is the formula I used for computing the volatility:
#formula for computing the volatility
sqrt(t(weights) %*% covariance_matrix %*% weights)
#where covariance_matrix is
cov(portfolio_component_monthly_returns)
#All the portfolios are equiponderated
weights = [ 0.25 0.25 0.25 0.25 ]
What I'm trying to do since yesterday is to automize the process for all the rows, indeed I have more than 10'000 rows. I'm an RStudio naif, so even trying and surfing on the new I have no results and no ideas of how to automize it. Would someone have a clue how to do it?
Hope to have been clearer as possible, in case do not hesitate to ask me.
Many thanks

Equation for non linear data

I have a set of non linear data. The data is the X & Y coordinates of different objects/points in a video( that is the x&y pixel co-ordinates of same objects in all the frames in a video.) upon plotting the values in one frame, I am getting a nonlinear graph as shown in the picture.
I want to form an equation for this graph so that, if I have a known X coorrdinate in this frame, then the corresponding Y coordinate can be obtained using this equation.(kind of predicting the new position, I am not sure this idea is correct or not)
OR
If this idea is illogical, can you suggest something that will work so that I can predict the location of new object using these data.
Any help or new ideas is highly appreciated.
A sample of my data is given below:
X Y
----------
214 182
830 185
1451 173
219 554
1453 548
214 941
830 934
1455 942
213 190
829 193
1450 181
218 561
1452 555
214 945
830 938
1455 946
213 190
828 193
1451 182
219 560
1452 554
214 945
830 938
1455 946
213 190
829 193
1450 181
219 556
1453 550
215 936
830 929
1455 937
I have selected 9 objects in each frame, so the first 9 data set belongs to one frame, and so on..
Your XY data looks like this:
There are clusters located on corners and mid-edges.
and when the lines that connect successive points are added
The points should come in groups of 8, in the sequence shown above. You can predict the location of a point using the index
// predict location `(x,y)` of point based on index `i`
point = MOD(i-1,8)+1; // get number 1-8 of the point (as shown above)
select case point
case [1,4,6] : x = 215;
case [2,7] : x = 829;
case [3,5,8] : x = 1463;
end select
select case point
case [1,2,3] : y = 186;
case [4,5] : y = 555;
case [6,7,8] : y = 940;
end select
You have to cut this curve in lot of linear lines, so following the value of X, you will be on linear line and its easy to calculate the equation of line knowing 2 points of this line

Report the mean number of characters in Corpus document

So I have a corpus setup reading bunch of text file with paragraphs in them.
library('tm')
my.text.location <- "C:/Users//.../*/"
apapers <- VCorpus(DirSource(my.text.location))
Now I need to find the mean of the characters in each text. Running a
mean(nchar(apapers), na.rm =T) results in a very weird output, more than the number of characters.
Any other way to get the mean?
You didn't supply a reproducible example, but rowMeans(sapply(apapers, nchar)) will return the mean number of characters over all documents. "Content" is the column you need.
A longer version is running a sapply over the corpus counting the number of per document. Transpose this data and turn it into a data.frame. The data.frame will contain two columns, content and meta. Content is the one you need. Taking the mean of the content column will give you the average number of characters in a document. The advantage of this is that you have the table in case you need to report the numbers.
# your code
my_count <- data.frame(t(sapply(apapers, nchar)))
mean(my_count$content)
Reproducible example using the crude dataset:
library(tm)
data("crude")
crude <- as.VCorpus(crude)
# in one statement
rowMeans(sapply(crude, nchar))
content meta
1220.30 453.15
# longer version keeping intermediate results.
my_count <- data.frame(t(sapply(crude, nchar)))
mean(my_count$content)
[1] 1220.3
my_count
content meta
127 527 440
144 2634 458
191 330 444
194 394 441
211 552 441
236 2774 455
237 2747 477
242 930 453
246 2115 440
248 2066 466
273 2241 458
349 593 492
352 621 468
353 591 445
368 629 440
489 876 445
502 1166 446
543 463 447
704 1797 456
708 360 451

Convert time values to numeric while keeping time characteristics

I have a data set which contains interval times of different events occurring. What I want to do, is convert the data into a numeric vector, so its easier to manipulate and run summaries/make graphs etc, while keeping its time characteristics. Here is a snippet of my data:
data <- c( "03:31", "12:17", "16:29", "09:52", "04:01", "09:00", "06:29",
"04:17", "04:42")
class(data)
[1] character
The obvious answer is :
as.numeric(data)
But I get this error:
Warning message:
NAs introduced by coercion
I thought of maybe taking the ':' out, but then it loses its time characteristics. By that, I mean that if I sum values together say 347 and 543, it would give me 890 as opposed to 930. Here is the code that I would use to take the colon out, which works fine for its purpose:
Nocolon <- gsub("[:]", "", Data, perl=TRUE)
"0331" "1217" "1629" "0952" "0401" "0900" "0629" "0417" "0442"
So essentially, what I want is for my time values to be in a form which is easy to manipulate and analyse. My idea is for it to be a numeric vector, but that is from my minimal understanding of R. My actual code has thousands of time values, and I want to create a plot that will allow me to view and determine whether the values follow a statistical distribution.
Thanks in advance!
Here are some approaches. All convert to minutes. For example, the first component is "03:31" which is 3 * 60 + 31 = 211 minutes. (1) to (5) do not use any packages.
1) %*% It works by reading data into a 2 column data frame with hours and minutes. That is converted to a matrix so that it can be matrix multiplied by c(60, 1). Finally, unravel it with c.
c(as.matrix(read.table(text = data, sep = ":")) %*% c(60, 1))
[1] 211 737 989 592 241 540 389 257 282
2) with This variation is even shorter. It creates the same data frame but and then simply mulitiplies the first column (V1) by 60 and adds it to the second column (V2).
with(read.table(text = data, sep = ":"), 60*V1+V2)
[1] 211 737 989 592 241 540 389 257 282
3) complex This converts each component to a complex number and then performs the required arithmetic on the real and imaginary parts:
data_c <- as.complex(sub(":(\\d+)", "+\\1i", data))
60 * Re(data_c) + Im(data_c)
## [1] 211 737 989 592 241 540 389 257 282
3a) This variation of (3) also works and avoids regular expressions:
data_c <- as.complex(paste0(chartr(":", "+", data), "i"))
60 * Re(data_c) + Im(data_c)
## [1] 211 737 989 592 241 540 389 257 282
4) eval This converts each component into an arithmetic expression which evaluates to the number of minutes and then performs the evalution. Using eval is not really recommended when you can avoid it so this one is less desirable:
sapply(parse(text = sub("(\\d+):", "60*\\1+", data)), eval)
## [1] 211 737 989 592 241 540 389 257 282
5) POSIXlt We can convert to "POSIXlt" class and then use the hour and min components:
with(unclass(as.POSIXlt(data, format = "%H:%M")), 60 * hour + min)
## [1] 211 737 989 592 241 540 389 257 282
6) chron Using the chron package we can paste on the seconds, convert to "times" class and then convert to minutes:
library(chron)
24 * 60 * as.numeric(times(paste0(data, ":00")))
## [1] 211 737 989 592 241 540 389 257 282
7) lubridate Using the lubridate package we can convert it using hm and then to numeric giving seconds and finally dividing by 60 to give minutes:
as.numeric(hm(data)) / 60
## [1] 211 737 989 592 241 540 389 257 282
Use the as.difftime function designed for this:
as.difftime(data, format="%H:%M", units="mins")
#Time differences in mins
#[1] 211 737 989 592 241 540 389 257 282

How do I make sure numbers are numeric from a .txt?

I'm setting up a script to extract the thickness and voltages from a single column text file and perform a Weibull distribution on it. When I try to use fitdistr() I get an error stating "'x' must be a non-empty numeric vector". R is supposed to interpret numbers in text files as numeric but that doesn't seem to be happening. Any thoughts?
filename <- "SampleBreakdownSet.txt"
d <- read.table(filename, header = FALSE, sep = "")
#Extract thickness from the dataset; set to variable t
t = d[1,1]
#Extract the breakdown voltages and toss into dataset, BDV
BDV = tail(d,(nrow(d)-1))
#Calculates the breakdown field from the thickness and BDV
BDF = (BDV*10000) / t
#Calculates the Weibull parameters from the input breakdown voltages.
fitdistr(BDF, densfun ="weibull", lower = 0)
fitdistr(BDF, densfun ="weibull", lower = 0)
Error in fitdistr(BDF, densfun = "weibull", lower = 0) :
'x' must be a non-empty numeric vector
Sample data I'm using:
2
200
250
450
320
100
400
200
403
502
203
420
120
342
304
253
423
534
534
243
253
423
123
433
534
234
633
432
342
543
532
123
453
231
532
342
213
243
You are passing a data.frame to fitdistr, but you should be passing the vector itself.
Try this:
d <- read.table(text='200
250
450
320
100
400
200
403
502
203
420
120
342
304
253
423
534
534
243
253
423
123
433
534
234
633
432
342
543
532
123
453
231
532
342
213
243', header=FALSE)
t <- d[1,1]
#Extract the breakdown voltages and toss into dataset, BDV
BDV <- d[-1, 1]
BDF <- (BDV*10000) / t
library(MASS)
fitdistr(BDF, densfun ="weibull", lower = 0)
You could also refer to the relevant column when calling fitdistr, e.g.:
fitdistr(BDF$V1, densfun ="weibull", lower = 0)
# shape scale
# 2.745485e+00 1.997509e+04
# (3.716797e-01) (1.283667e+03)

Resources