Automize portfolios volatilities computation in R - r

Thanks for reading my post. I have a series of portfolios created from the combination of several stocks. I should compute the volatility of those portfolios using the historical daily performances of each stock. Since I have all the combinations in one data frame (called final_output), and all stocks return in another data frame (called perf, where the columns are stocks and rows days) I don't know which will be the most efficient way to automize the process. Below you can find an extract:
> Final_output
ISIN_1 ISIN_2 ISIN_3 ISIN_4
2 CH0595726594 CH1111679010 XS1994697115 CH0587331973
3 CH0595726594 CH1111679010 XS1994697115 XS2027888150
4 CH0595726594 CH1111679010 XS1994697115 XS2043119358
5 CH0595726594 CH1111679010 XS1994697115 XS2011503617
6 CH0595726594 CH1111679010 XS1994697115 CH1107638921
7 CH0595726594 CH1111679010 XS1994697115 XS2058783270
8 CH0595726594 CH1111679010 XS1994697115 JE00BGBBPB95
> perf
CH0595726594 CH1111679010 XS1994697115 CH0587331973
626 0.0055616769 -0.0023656130 1.363791e-03 1.215922e-03
627 0.0086094443 0.0060037334 0.000000e+00 2.519220e-03
628 0.0053802380 0.0009027081 0.000000e+00 7.508635e-04
629 -0.0025213543 -0.0022046297 4.864050e-05 1.800720e-04
630 0.0192416817 0.0093401627 -6.079767e-03 3.800836e-03
631 -0.0101224820 0.0051741294 6.116956e-03 -1.345184e-03
632 -0.0013293793 -0.0100475153 -4.494163e-03 -1.746106e-03
633 0.0036350604 0.0012999350 3.801130e-03 -5.997121e-05
634 0.0030097434 -0.0011484496 -1.187614e-03 -2.069131e-03
635 0.0002034381 0.0030493901 -1.851762e-03 -3.806280e-04
636 -0.0035594427 0.0167455769 -2.148123e-04 -4.709560e-04
637 0.0007654623 -0.0051958237 -3.711191e-04 1.604010e-04
638 0.0107592678 -0.0016260163 4.298764e-04 3.397951e-03
639 0.0050953486 -0.0007403020 2.011738e-03 8.790770e-04
640 0.0008532851 -0.0071121648 -9.746114e-04 5.389598e-04
641 -0.0068204614 0.0133810874 -9.755622e-05 -1.346674e-03
642 0.0091395678 0.0102591793 1.717157e-03 -1.977785e-03
643 0.0027520640 -0.0157912638 1.256440e-03 -1.301119e-04
644 -0.0048902196 0.0039494471 -1.624514e-03 -3.373340e-03
645 -0.0116838833 0.0062450826 6.625549e-04 1.205255e-03
646 0.0004566442 -0.0018570102 -3.456636e-03 4.474138e-03
647 0.0041586368 0.0085679315 4.435933e-03 1.957455e-03
648 0.0007575758 0.0002912621 0.000000e+00 2.053306e-03
649 0.0046429473 -0.0138309230 -4.435798e-03 1.541798e-03
650 0.0049731250 -0.0488164953 4.181975e-03 -9.733133e-04
651 0.0008497451 -0.0033110870 2.724477e-04 -7.555498e-04
652 0.0004494831 0.0049831300 -8.657588e-04 -1.790813e-04
653 -0.0058905751 0.0020143588 8.178287e-04 -1.213991e-03
654 0.0000000000 0.0167525773 4.864050e-05 9.365068e-04
655 0.0010043186 0.0048162231 0.000000e+00 -2.110146e-03
656 -0.0024079462 -0.0100403633 -2.431907e-03 -9.176600e-04
657 -0.0095544604 -0.0193670047 0.000000e+00 -8.935435e-03
658 0.0008123477 0.0114339172 2.437835e-03 5.530483e-03
659 0.0022828734 -0.0015415446 -3.239300e-03 2.765060e-03
660 0.0049096523 -0.0001029283 3.199079e-02 2.327835e-03
661 -0.0027702226 -0.0357198003 9.456712e-04 3.189602e-04
662 -0.0008081216 -0.0139311449 -2.891020e-02 -1.295363e-03
663 -0.0033867462 0.0068745264 -2.529552e-03 -1.496588e-04
664 -0.0015216068 -0.0558572120 -3.023653e-03 -7.992975e-03
665 0.0052829422 0.0181072771 4.304652e-03 -3.319519e-03
666 0.0084386054 0.0448545861 -8.182748e-04 4.279284e-03
667 -0.0076664829 -0.0059415480 -2.047362e-04 6.059936e-03
668 -0.0062108665 -0.0039847073 7.313506e-04 5.993467e-04
669 -0.0053350948 0.0068119154 -1.042631e-02 -2.056524e-03
670 -0.0263588067 0.0245395479 -2.188962e-02 -6.732491e-03
671 -0.0021511018 0.0220649895 1.412435e-02 1.702085e-03
672 0.0205058100 -0.0007179119 3.057527e-03 -1.002423e-02
673 0.0096862280 -0.0194488633 1.207407e-03 -1.553899e-03
674 0.0007143951 -0.0068557672 6.227450e-03 1.790274e-03
675 -0.0021926470 -0.0051114507 -6.267498e-03 -1.035691e-03
676 0.0076655765 -0.0139300847 6.583825e-03 3.059472e-03
677 -0.0032457653 0.0180480206 -4.635495e-03 1.064002e-03
678 0.0036633764 0.0060676410 -2.762676e-04 5.364970e-04
679 -0.0008111122 -0.0013635410 -1.065898e-03 1.214059e-03
680 0.0050228311 0.0055141267 3.003507e-03 1.121643e-03
681 -0.0007067495 0.0147281558 -2.699002e-03 -1.514035e-04
682 -0.0024248548 0.0002573473 -2.113685e-03 -1.423409e-03
683 -0.0002025624 0.0138417207 -4.374895e-03 1.415328e-04
684 -0.0141822418 -0.0169517332 -3.578920e-03 -1.799234e-03
685 -0.0005651749 -0.0259693324 -5.926428e-03 -3.635333e-03
686 0.0004112688 0.0133043570 -1.545642e-03 1.981828e-03
687 -0.0150565262 -0.0107757493 -1.717916e-02 -1.328749e-02
688 0.0039129754 -0.0441013167 -8.376631e-03 -5.653841e-04
689 0.0019748467 0.0115063340 -2.835394e-02 7.868428e-03
690 0.0072614108 0.0358764014 3.586897e-02 7.960077e-03
691 -0.0003604531 0.0106119001 1.024769e-04 -2.733651e-04
What I should do is look for each portfolio (each row of final_output is a portfolio, i.e. 4 stocks portfolio) in perf and compute the volatility (standard deviation) of that portfolio using the stocks historical daily performances of the last three months. (Of course, here I have pasted only 4 stocks performances for simplicity.) Once done for the first, I should do the same for all the other rows (portfolios).
Below is the formula I used for computing the volatility:
#formula for computing the volatility
sqrt(t(weights) %*% covariance_matrix %*% weights)
#where covariance_matrix is
cov(portfolio_component_monthly_returns)
#All the portfolios are equiponderated
weights = [ 0.25 0.25 0.25 0.25 ]
What I'm trying to do since yesterday is to automize the process for all the rows, indeed I have more than 10'000 rows. I'm an RStudio naif, so even trying and surfing on the new I have no results and no ideas of how to automize it. Would someone have a clue how to do it?
Hope to have been clearer as possible, in case do not hesitate to ask me.
Many thanks

Related

Extracting seasonal effect without using stl or decompose

I have a data named 'bicoal' which consists of annual bituminous coal production in the United States from 1920 to 1968.
`Time Series:
Start = 1920
End = 1968
Frequency = 1
[1] 569 416 422 565 484 520 573 518 501 505 468 382 310 334 359 372 439 446 349 395
[21] 461 511 583 590 620 578 534 631 600 438 516 534 467 457 392 467 500 493 410 412
[41] 416 403 422 459 467 512 534 552 545`
I made a time series, saved under the name time_series, and wanted to extract the seasonal effect using the code plot(decompose(time_series)) and plot(stl(time_series)), but got an error message
Error in stl(time_series) :
series is not periodic or has less than two periods
Error in decompose(time_series) :
time series has no or less than 2 periods
If stl nor decompose doesn't work, is there a way to extract the seasonal effect?
Without seeing how your time series is constructed I think this might be your problem.
data <- rep(seq(1,5),5)
ts.1 <- ts(data)
stl(ts.1)
Now to fix this issue the ts function has a frequency argument that defines the period of the data.
ts.2 <- ts(data, frequency = 5)
stl(ts.2, s.window = "periodic")

Report the mean number of characters in Corpus document

So I have a corpus setup reading bunch of text file with paragraphs in them.
library('tm')
my.text.location <- "C:/Users//.../*/"
apapers <- VCorpus(DirSource(my.text.location))
Now I need to find the mean of the characters in each text. Running a
mean(nchar(apapers), na.rm =T) results in a very weird output, more than the number of characters.
Any other way to get the mean?
You didn't supply a reproducible example, but rowMeans(sapply(apapers, nchar)) will return the mean number of characters over all documents. "Content" is the column you need.
A longer version is running a sapply over the corpus counting the number of per document. Transpose this data and turn it into a data.frame. The data.frame will contain two columns, content and meta. Content is the one you need. Taking the mean of the content column will give you the average number of characters in a document. The advantage of this is that you have the table in case you need to report the numbers.
# your code
my_count <- data.frame(t(sapply(apapers, nchar)))
mean(my_count$content)
Reproducible example using the crude dataset:
library(tm)
data("crude")
crude <- as.VCorpus(crude)
# in one statement
rowMeans(sapply(crude, nchar))
content meta
1220.30 453.15
# longer version keeping intermediate results.
my_count <- data.frame(t(sapply(crude, nchar)))
mean(my_count$content)
[1] 1220.3
my_count
content meta
127 527 440
144 2634 458
191 330 444
194 394 441
211 552 441
236 2774 455
237 2747 477
242 930 453
246 2115 440
248 2066 466
273 2241 458
349 593 492
352 621 468
353 591 445
368 629 440
489 876 445
502 1166 446
543 463 447
704 1797 456
708 360 451

R Conditional summing

I've just started my adventure with programming in R. I need to create a program summing numbers divisible by 3 and 5 in the range of 1 to 1000, using the '%%' operator. I came up with an idea to create two matrices with the numbers from 1 to 1000 in one column and their remainders in the second one. However, I don't know how to sum the proper elements (kind of "sum if" function in Excel). I attach all I've done below. Thanks in advance for your help!
s1<-1:1000
in<-s1%%3
m1<-matrix(c(s1,in), 1000, 2, byrow=FALSE)
s2<-1:1000
in2<-s2%%5
m2<-matrix(c(s2,in2),1000,2,byrow=FALSE)
Mathematically, the best way is probably to find the least common multiple of the two numbers and check the remainder vs that:
# borrowed from Roland Rau
# http://r.789695.n4.nabble.com/Greatest-common-divisor-of-two-numbers-td823047.html
gcd <- function(a,b) if (b==0) a else gcd(b, a %% b)
lcm <- function(a,b) abs(a*b)/gcd(a,b)
s <- seq(1000)
s[ (s %% lcm(3,5)) == 0 ]
# [1] 15 30 45 60 75 90 105 120 135 150 165 180 195 210
# [15] 225 240 255 270 285 300 315 330 345 360 375 390 405 420
# [29] 435 450 465 480 495 510 525 540 555 570 585 600 615 630
# [43] 645 660 675 690 705 720 735 750 765 780 795 810 825 840
# [57] 855 870 885 900 915 930 945 960 975 990
Since your s is every number from 1 to 1000, you could instead do
seq(lcm(3,5), 1000, by=lcm(3,5))
Just use sum on either result if that's what you want to do.
Props to #HoneyDippedBadger for figuring out what the OP was after.
See if this helps
x =1:1000 ## Store no. 1 to 1000 in variable x
x ## print x
Div = x[x%%3==0 & x%%5==0] ## Extract Nos. divisible by 3 & 5 both b/w 1 to 1000
Div ## Nos. Stored in DIv which are divisible by 3 & 5 both
length(Div)
table(x%%3==0 & x%%5==0) ## To see how many are TRUE for given condition
sum(Div) ## Sums up no.s divisible by both 3 and 5 b/w 1 to 1000

R row summing to a new column [duplicate]

This question already has answers here:
Sum rows in data.frame or matrix
(7 answers)
Closed 7 years ago.
I'm new to R and this is probablly a very simple question.
I just haven't been able to make rsum/apply work
My task is to add all the different expense categories in my dataframe and create a new variable with this value like this:
(not the original)
Food Dress Car
235 564 532
452 632 719
... ... ...
and then
Food Dress Car Total
235 564 532 1331
452 632 719 1803
... ... ... ...
I have tried:
rowsum, apply and aggregate and can't get it right
You can use addmargins after converting to matrix
addmargins(as.matrix(df1),2)
# Food Dress Car Sum
#[1,] 235 564 532 1331
#[2,] 452 632 719 1803
Or use rowSums
df1$Total <- rowSums(df1)
Or with Reduce
df1$Total <- Reduce(`+`, df1)
With apply functions:
cbind(dat, Total = apply(dat, 1, sum))
Food Dress Car Total
1 235 564 532 1331
2 452 632 719 1803
or with just a:
cbind(dat, Total = rowSums(dat))
Food Dress Car Total
1 235 564 532 1331
2 452 632 719 1803

R How to remove duplicates from a list of lists

I have a list of lists that contain the following 2 variables:
> dist_sub[[1]]$zip
[1] 901 902 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928
[26] 929 930 931 933 934 935 936 937 938 939 940 955 961 962 963 965 966 968 969 970 975 981
> dist_sub[[1]]$hu
[1] 4990 NA 168 13224 NA 3805 NA 6096 3884 4065 NA 16538 NA 12348 10850 NA
[17] 9322 17728 NA 13969 24971 5413 47317 7893 NA NA NA NA NA 140 NA 4
[33] NA NA NA NA NA 13394 8939 NA 3848 7894 2228 17775 NA NA NA
> dist_sub[[2]]$zip
[1] 921 934 952 956 957 958 959 960 961 962 965 966 968 969 970 971
> dist_sub[[2]]$hu
[1] 17728 140 4169 32550 18275 NA 22445 0 13394 8939 3848 7894 2228 17775 NA 12895
Is there a way remove duplicates such that if a zipcode appears in one list is removed from other lists according to specific criteria?
Example: zipcode 00921 is present in the two lists above. I'd like to keep it only on the list with the lowest sum of hu (housing units). In this I would like to keep zipcode 00921 in the 2nd list only since the sum of hu is 162,280 in list 2 versus 256,803 in list 1.
Any help is very much appreciated.
Here is a simulate dataset for your problem so that others can use it too.
dist_sub <- list(list("zip"=1:10,
"hu"=rnorm(10)),
list("zip"=8:12,
"hu"=rnorm(5)),
list("zip"=c(1, 3, 11, 7),
"hu"=rnorm(4))
)
Here's a solution that I was able to come up with. I realized that loops were really the cleaner way to do this:
do.this <- function (x) {
for(k in 1:(length(x) - 1)) {
for (l in (k + 1):length(x)) {
to.remove <- which(x[[k]][["zip"]] %in% x[[l]][["zip"]])
x[[k]][["zip"]] <- x[[k]][["zip"]][-to.remove]
x[[k]][["hu"]] <- x[[k]][["hu"]][-to.remove]
}
}
return(x)
}
The idea is really simple: for each set of zips we keep removing the elements that are repeated in any set after it. We do this until the penultimate set because the last set will be left with no repeats in anything before it.
The solution to use the criterion you have, i.e. lowest sum of hu can be easily implemented using the function above. What you need to do is reorder the list dist_sub by sum of hu like so:
sum_hu <- sapply(dist_sub, function (k) sum(k[["hu"]], na.rm=TRUE))
dist_sub <- dist_sub[order(sum_hu, decreasing=TRUE)]
Now you have dist_sub sorted by sum_hu which means that for each set the sets that come before it have larger sum_hu. Therefore, if sets at values i and j (i < j) have values a in common, then a should be removed from ith element. That is what this solution does too. Do you think that makes sense?
PS: I've called the function do.this because I usually like writing generic solutions while this was a very specific question, albeit, an interesting one.

Resources