I have dataframe (return.monthly) of the form
Date Return
2001-09-1 0.0404775
2001-10-1 -0.01771575
2001-11-1 -0.03304925
etc.
i.e. monthly returns over a period of time (2 years). I would like to calculate quarterly returns, i.e just take 3 observations and calculate the sum.
I tried
return.quarterly <- xts(return.monthly[, -1], return.monthly[,1])
function <- function(x) sum(x)
time <- 3
return.quarterly$return_q <- rollapply(return.quarterly$Return, FUN=function,
width=time, align="left", na.pad=T)
Obviously this formula calculates returns over a rolling window, i.e. it takes observation 1-3 and calculates the sum, then 2-4 and calculates the sum, etc. What I want is however 1-3, 4-6, 7-9...
How could I do that?
Thanks in advance, Dani
You can use apply.quarterly from xts to compute the mean over a quarter:
apply.quarterly(return.quarterly,mean) #Jan-Feb-Mar first quarter etc.
BTW: shouldn't you consider instead of the mean the sum, for quarterly returns?
Related
I'm trying to calculate the cumulative quantile (10 percentile, 25 percentile, etc.) over a column in a large dataset (over 10 million).
I tried to use the function cumquant from the cumstats package but it takes long (longer than an hour; a toy test shows that it takes more than 40 seconds to obtain results for a vector with 100,000 values (e.g. cumquant(1:100000,p=0.1)).
Is there a more efficient way to calculate it using data.table (or others)?
I have a data frame with in column 1 the dates and in columns 2 to 180 returns for stocks. I want to compute the returns for all stocks. I have tried for loops however cannot find the right syntax. Should I maybe use apply function? Help is appreciated.
my data looks like
date
comp x
comp n
01-11
price x1
price n1
02-11
price x2
price n2
Where the companies run from x through n and are in columns 2 intill 180. For all these companies I want to obtain the returns. Thus calculating ( price 2 - price 1 ) / (price 1). I have tried to do this by using a for loop and the Delt command. However, I keep getting errors. Is there also another way to do this? for example by creating a new dataframe?
colSums
To sum up the columns 2:180 you can and store it in another column of the data.frame you can do simply:
data$totalReturn <- colSums(data[,c(seq(2,180))])
rowSums
If you want to get the sum columwise, you can use the function rowSums, e.g.
totalReturn <- rowSums(data[,c(seq(2,180))])
dplyr
If there are several entries (rows) for one day you should consider using the package dplyr and the functions group_byand summarize or the function aggregate.
I have a dataframe x
and I need to calculate the number of steps from the 1st column by days or by certain 5-min intervals.
This code for dates works fine
b<-summarise(group_by(x,date),h = sum(steps))
But when I change date on interval,
b<-summarise(group_by(x,interval),h = sum(steps))
it returns only NA values
I am brand new to r and I am trying to calculate the proportion of the number of 'i' for each timepoint and then average them. I do not know the command for this but I have the script to find the total number of 'i' in the time points.
C1imask<-C16.3[,2:8]== 'i'&!is.na(C16.3[,2:8])
C16.3[,2:8][C1imask]
C1inactive<-C16.3[,2:8][C1imask]
length(C1inactive)
C1bcmask<-C16.3[,8]== 'bc'&!is.na(C16.3[,8])
C16.3[,8][C1bcmask]
C1broodcare<-C16.3[,8][C1bcmask]
length(C1broodcare)
C1amask<-C16.3[,12]== 'bc'&!is.na(C16.3[,12])
C16.3[,12][C1amask]
C1after<-C16.3[,12][C1amask]
length(C1after)
C1<-length(C1after)-length(C1broodcare)
C1
I'd try taking the mean of a logical vector created with the test. You would use na.rm as an argument to mean. You will get the proportion of non-NA values that meet the test rather than the proportion of with number of rows as the denominator.
test <- sample( c(0,1,NA), 100, replace=TRUE)
mean( test==0, na.rm=TRUE)
#[1] 0.5072464
If you needed a proportion of total number of rows you would use sum and divide by nrow(dframe_name). You can then use sapply or lapply to iterate across a group of columns.
I am a beginner with R and I have a question about simple functions such as mean or standard deviation for a big data set. My data shows monthly returns for hedge funds for the past 30 years and has 1550 columns for all hedge funds. I saw that I can calculate the mean with the mean function for a specific column by referring to the column with the name of my dataset and a $ and the no. of the column. However, I was wondering how I can get the mean for every hedge fund (which is every column) without assigning every single column. Thanks in advance for your help!
We can use colMeans
colMeans(df1, na.rm=TRUE)
where 'df1' is the dataset.
or another option would be to loop through the columns and calculate the mean
vapply(df1, mean, na.rm=TRUE, numeric(1))