I am trying to calculate a rolling sum for a time series of returns r ranging over T dates. However at each date t when I calculate the rolling sum, I want to factor in a weight w for each number in the rolling sum.
The formula would be for every date t:
[Sum from i=1 to m](w(i)*r(t-i-1))
Lets look at a very simple example. I have a return series of T=6 returns r. For each date t I want to calculate the rolling sum over the last two dates (m=2). I also want to weight the first observation twice as much as the second.
r <- c(100,110,100,110,100,110)
w <- c(1,0.5)
I know that I can easily do the rolling sum using the filter function:
filter(r, rep(1, 2))
However I am not able to include the weight factor into the rolling sum. The following line gives the wrong result of c(155, 155, 155, 155, 155, NA)
filter(r*w, rep(1, 2))
where I would really like to have the result c(155, 160, 155, 160, 155, NA)
Any help is appreciated.
Here's one way to do it:
filter(r, rev(w))
# [1] 155 160 155 160 155 NA
An important information about the argument filter from the help page of ?filter:
filter
a vector of filter coefficients in reverse time order (as for AR or MA coefficients).
rollapply in the zoo package can do that:
> rollapply(r, 2, crossprod, w, fill = NA)
[1] 155 160 155 160 155 NA
Related
I am trying to carry out following operation in R
I have different series of data,
series 1: 75, 56, 100, 23, 38, 40 series 2: 60, 18, 86, 100, 44
I would like to annex these data. To do so, I have to multiply series 1 by 1.5 to make last data of series 1 (40) match with the first data of the second series (60) (40*1.5=60)
Same way I would like to match many different series, but for other series I will need to multiply by other numbers. For another series i.e Series1: ...20 ; Series 2: 80... I would have to multiply it by 4.
How can I carry out such an operation to many series in many data frames?
Thanks in advance,
Given two vectors x and y, the function f(x,y) below will convert x the way you desire.
f <- function(x,y) x*(y[1]/x[length(x)])
Usage:
x = c(75,56,100,23,38,40)
y = c(60,18,86,100,44)
f(x,y)
Output:
[1] 112.5 84.0 150.0 34.5 57.0 60.0
However, how this approach gets applied to "many series in many data frames" depends on the actual structure you have, and what type of output you want.
Say I have three individuals and I know the payment they require to enter different amounts of land into a scheme. I want to know how much land each participant would enter into the scheme for a given payment rate. I want them to enter the max amount they are willing for that payment rate. Previously I did this with a long ifelse statement, but that will not run inside a loop, so I'm looking for an alternative.
In this example, I've excluded a load of areas so it just presents as if participants can enter 50, 49 or 1 unit(s) of area.
paym_sh1a=200
paym_area_50 <- c(250, 150, 210)
paym_area_49 <- c(240, 130, 190)
paym_area_1 <- c(100, 20, 90)
area_enrolled<-
ifelse(paym_area_50<paym_sh1a,50,ifelse(paym_area_49<paym_sh1a,49,
ifelse(paym_area_1<paym_sh1a,1,0)))
You could create a table of your values:
paym_area = rbind(paym_area_50, paym_area_49, paym_area_1)
And then use vectorised operations more effectively. In particular, since your thresholds are decreasing, you could compare the whole table to the sh1a value, and count how many rows are below it:
(sums = colSums(paym_area < paym_sh1a))
# [1] 1 3 2
This vector can be used as an index into your results:
values = c(0, 50, 49, 1)
(result = values[sums + 1L])
# [1] 50 1 49
I want to write a R function that give me the value on 10th percentile of observations. I want to use this function with sapply. Like for mean, sapply is
sapply(1 : n, function(i) mean(a1))
Say e.g. I have 100 values and 10th percentile of 100 is 10 i want that the value that is on 10th line get printed.
X
45
80
70
56
78
78
56
90
35
190
.......... up till 100 values
Desired output: print value on 10th line i.e. 190 in the above column
I want the function to calculate first 10th percentile of my 100 observations and later just print the value that come on that position.
The desired function will look like:
give_quant_value <- function(vec, quantile) {
return(vec[quantile(1:length(vec),p=quantile,type=1)])
}
where
- vec is vector of interest
- quantile is quantile of interest
Proof:
set.seed(42)
a1 <- rnorm(100)
give_quant_value(a1, 0.1)
-0.062714099052421
You may read more about quantile function while typing ?quantile in your terminal
Here is a simple function I wrote: It takes on data as a data.frame object.
select_percentile<-function(df,n){
df<-df
nth<-(n/nrow(df))*100
return(df[nth,])
}
Using data from another answer:
set.seed(42)
a1 <- rnorm(100)
df1<-as.data.frame(a1)
select_percentile(df1,10)
Result:
#[1] -0.0627141
Ridit scoring (https://en.wikipedia.org/wiki/Ridit_scoring) is often used to transform an ordinal categorial variable ino relative frequency (proportion of cases below a given value, plus one half of the proportion at that value).
How would you do this in R?
FURTHER UPDATE
These and several other functions are now available in the CRAN package ridittools, maintained by yours truly.
UPDATE
Remove rather silly code involving constructing a conversion matrix, I'd forgotten about cumsum()
# Convert vector of counts to ridits
to.ridit <- function(v) {
(cumsum(v) - .5 * v) / sum(v)
}
# Calculate mean ridit for vector of counts relative to reference group
mean.ridit <- function(v, ref) {
sum(to.ridit(ref) * v ) / sum(v)
}
# Calculate mean ridits for several groups
# x is matrix of counts
# margin is 1 for groups in rows, 2 for groups in columns
# If ref is omitted, totals across groups are used as reference group
# If ref is a vector of counts, it's used as reference group
# Otherwise, ref is the number (or name if it exists) of the group to use as reference
ridits <- function(x, margin, ref=NULL) {
if (length(ref) > 1) {
refgroup <- ref
} else if (length(ref) == 1) {
if (margin==1) {
refgroup <- x[ref,]
} else {
refgroup <- x[, ref]
}
} else {
refgroup <- apply(x, 3-margin, sum)
}
apply(x, margin, mean.ridit, refgroup)
}
Example (Fleiss, 1981: severity of car accidents):
to.ridit(c(17, 54, 60, 19, 9, 6, 14))
[1] 0.04748603 0.24581006 0.56424581 0.78491620 0.86312849 0.90502793 0.96089385
Note
Although my code is somewhat less flexible than the Ridit::ridit package mentioned in another answer, it seems to be quite a bit faster:
# Influenza subtypes by age as of week ending 2/24/18 (US CDC)
> flu.age
BY BV BU H3 H1
0-4 274 91 92 1808 500
5-24 1504 274 698 5090 951
25-64 1665 101 567 7538 1493
65+ 1476 35 330 9541 515
# Using CRAN package
> system.time(ridit(flu.age,2))
user system elapsed
3.746 0.007 3.756
# Using my code
> system.time(ridits(flu.age,2))
user system elapsed
0.001 0.000 0.000
The following package may solve your problem. Especially the command Ridit::ridit is useful as it is described in the following way.
An extension of the Kruskal-Wallis test that allow specify arbitrary reference group. Also provide
Mean Ridit for each group. Mean Ridit of a group is an estimate of probability a random observation
from that group will be greater than or equal to a random observation from reference group.
https://cran.r-project.org/web/packages/Ridit/Ridit.pdf
An alternative approach would be using a binary choice model like Probit, Logit or Exact Logit and extracting the predicted independent variables, i.e. 0 or 1.
I've been trying for a while now to produce a code that brings me a new vector of the sum of the 25 previous rows for an original vector.
So if we say I have a variable Y with 500 rows and I would like a running sum, in a new vector, which contains the sum of rows [1:25] then [2:26] for the length of Y, such as this:
y<-1:500
runsum<-function(x){
cumsum(x)-cumsum(x[26:length(x)])
}
new<-runsum(y)
I've tried using some different functions here and then even using the apply functions on top but none seem to produce the right answers....
Would anyone be able to help? I realise it's probably very easy for many of the community here but any help would be appreciated
Thanks
This function calculates the sum of the 24 preceding values and the actual value:
movsum <- function(x,n=25){filter(x,rep(1,n), sides=1)}
It is easy to adapt to sum only preceding values, if this is what you really want.
In addition to Roland's answer you could use the zoo library
library ( zoo )
y <- 1:500
rollapply ( zoo ( y ), 25, sum )
HTH
I like Roland's answer better as it relies on a time series function and will probably be pretty quick. Since you mentioned you started going down the path of using apply() and friends, here's one approach to do that:
y<-1:500
#How many to sum at a time?
n <- 25
#Create a matrix of the appropriate start and end points
mat <- cbind(start = head(y, -(n-1)), end = tail(y, -(n-1)))
#Check output
rbind(head(mat,3), tail(mat,3))
#-----
start end
1 25
2 26
3 27
[474,] 474 498
[475,] 475 499
[476,] 476 500
#add together
apply(mat, 1, function(x) sum(y[x[1]]:y[x[2]]))
#Is it the same as Roland's answer after removing the NA values it returns?
all.equal(apply(mat, 1, function(x) sum(y[x[1]]:y[x[2]])),
movsum(y)[-c(1:n-1)])
#-----
[1] TRUE