Avoid for loop using data.table - r

I have a simulation over time (dev_quarters) that looks like this, which is a data.table :
simulation <- data.table(`Scenario ID` = 1, dev_quarter = seq(1:80), brand = 1, proportion = runif(80))
For each scenario, we have n_brand, n_scenario and a proportion.
I try to code the following : for each scenario, for each brand, compute the difference of the proportion between the beginning and the end of the year, for each year.
I made the following to recover the corresponding dev_quarters for each year :
x <- 2002:2021
lookup_T <- as.integer(format(Sys.Date(), "%Y"))
lookup_period <- data.table(years = lookup_T-x+1, quarters_t = (lookup_T-x+1)*4, quarters_t1 = (lookup_T-x+2)*4)
With a small example
n_scenario <- 1
n_brand <- 10
An ugly code that uses for loops :
result <- data.table(`Scenario ID` = numeric(), years = numeric(), brand = numeric(), proportion = numeric())
for(i in 1:n_scenario){
for(j in 1:n_brand){
prop_per_year <- c()
# for each year
for(k in 1:length(x)){
year <- lookup_period[k, ]
quarter_start_year <- year[["quarters_t"]]
quarter_end_year <- year[["quarters_t1"]]
end_year_prop <- simulation[`Scenario ID`==i & brand==j & dev_quarter==quarter_end_year]
start_year_prop <- simulation[`Scenario ID`==i & brand==j & dev_quarter==quarter_start_year]
prop_this_year <- max(end_year_prop[["proportion"]] - start_year_prop[["proportion"]], 0)
prop_per_year <- append(prop_per_year, prop_this_year)
}
result_temp <- data.table(`Scenario ID` = i, years = x, brand = j, proportion = prop_per_year)
result <- rbind(result, result_temp)
}
}
I considered to filter my data.table, using only rows were dev_quarters were 4k factors, but the issue remains the same about the for loops.
How can I avoid them using data.table ?
Thanks.

The absolute change in proportion between the 4th and 1st quarter can be calculated much more easily.
simulation[, year := 2002 + (dev_quarter-1) %/% 4] # Easier way to calculate the year
simulation[, .(change = last(proportion) - first(proportion)), by = c("Scenario ID", "brand", "year")

Related

How to create mulitple lagged variables over a grouped time series?

I've found the solution to this question for either grouped lags or lagging one group multiple times, but not both. For this example, I'd like to use the following dataframe:
df <- data.frame(doy = rep((1:2), each = 6), years = rep(2017:2022), var = rnorm(6))
How can I create multiple lags (lvar1, lvar2, lvar3) that exist for two different groups, so that the result is that my 2017 group 2 does not have the lagged variable for 2022 in group 1?
library(dplyr)
df <- data.frame(doy = rep((1:2), each = 6), years = rep(2017:2022), var = rnorm(6))
lag_fn <- list()
lags <- 1:10
for(i in lags) {
fixer <- function(x, i) {
force(i)
return(
function(x) {
return(dplyr::lag(x, i))
}
)}
lag_fn[[i]] <- fixer(x, i)
}
df %>% group_by('your_group_column') %>%
mutate(across("your_value_column", lag_fn, .names = "lag_{.fn}"))
BR

How to create a summation function with data frame in R?

Just for fun, I am trying to create a basic savings calculator. My current code is:
value <- function(years,apr,initial,investment) {
df <- as.data.frame(matrix(nrow = years, ncol = 2))
colnames(df) <- c("year","value")
df$years <- c(1:years)
for (i in 1:years) {
current_value <-(last_value+investment)*apr
}
#repeating calculation for the data frame
print(df)
What I am trying to do is have the calculator create a table that displays the value each year. I've adapted my code from an old homework assignment, so I am not concerned with how to make the data frame. However, I do not know how to make the formula for the summation.
I am trying to model
Current Value = (Cumulative Value + Investment)*(Annual Percentage Rate)
As an example, let's say initial value is 10, investment is 10, and the APR is 1.05
(10+10)*(1.05)=21
(21+10)*(1.05)=32.55
(32.55+10)*(1.05)=44.68
and so on.
Year is there to number the rows accordingly.
We can use Reduce with accumulate = TRUE
calc_fun <- function(years,apr,initial,investment) {
value <- Reduce(function(x, y) (x + investment) * y, rep(apr, year), initial,
accumulate = TRUE)
data.frame(year = 0:year, value)
}
calc_fun(3, 1.05, 10, 10)
# year value
#1 0 10.0000
#2 1 21.0000
#3 2 32.5500
#4 3 44.6775
Using for loop we can do
calc_fun1 <- function(years,apr,initial,investment) {
value <- numeric(years + 1)
value[1] <- initial
for (i in 1:years) value[i + 1] <- (value[i] + investment) * apr
data.frame(year = 0:year, value)
}

Adding a column to a data frame by calculating each value to be added

Good evening,
I asked a question earlier and found it hard to implement the solution so I am gonna reask it in a more clear way.
I have the problem, that I want to add a column to a dataframe of daily returns of a stock. Lets say its normally distributed and I would like to add a column that contains the value at risk (hist) whose function I wrote myself.
The restriction is that each observation should be assigned to my function and take the last 249 observations as well.
So when the next observation is calculated it should also take only the last 249 observations of the das before. So the input values should move as the time goes on. In other words I want values from 251 days ago to be excluded. Hopefully I explained myself well enough. If not maybe the code speaks for me:
df<- data.frame(Date=seq(ISOdate(2000,1,1), by = "days", length.out = 500), Returns=rnorm(500))
#function
VaR.hist<- function(x, n=250, hd=20, q=0.05){
width<-nrow(x)
NA.x<-na.omit(x)
quantil<-quantile(NA.x[(width-249):width],probs=q)
VaR<- quantil*sqrt(hd)%>%
return()
}
# Run the function on the dataframe
df$VaR<- df$Returns%>%VaR.hist()
Error in (width - 249):width : argument of length 0
This is the Error code that I get and not my new Variable...
Thanks !!
As wibom wrote in the comment nrow(x) does not work for vectors. What you need is length() instead. Also you do not need return() in the last line as R automatically returns the last line of a function if there is no early return() before.
library(dplyr)
df<- data.frame(Date=seq(ISOdate(2000,1,1), by = "days", length.out = 500), Returns=rnorm(500))
#function
VaR.hist <- function(x, n=250, hd=20, q=0.05){
width <- length(x) # here you need length as x is a vector, nrow only works for data.frames/matrixes
NA.x <- na.omit(x)
quantil <- quantile(NA.x[(width-249):width], probs = q)
quantil*sqrt(hd)
}
# Run the function on the dataframe
df$VaR <- df$Returns %>% VaR.hist()
It's a bit hard to understand what you want to do exactly.
My understanding is that you wish to compute a new variable VarR, calculated based on the current and previous 249 observations of df$Returns, right?
Is this about what you wish to do?:
library(tidyverse)
set.seed(42)
df <- tibble(
Date = seq(ISOdate(2000, 1, 1), by = "days", length.out = 500),
Returns=rnorm(500)
)
the_function <- function(i, mydata, hd = 20, q = .05) {
r <-
mydata %>%
filter(ridx <= i, ridx > i - 249) %>%
pull(Returns)
quantil <- quantile(r, probs = q)
VaR <- quantil*sqrt(hd)
}
df <-
df %>%
mutate(ridx = row_number()) %>%
mutate(VaR = map_dbl(ridx, the_function, mydata = .))
If you are looking for a base-R solution:
set.seed(42)
df <- data.frame(
Date = seq(ISOdate(2000, 1, 1), by = "days", length.out = 500),
Returns = rnorm(500)
)
a_function <- function(i, mydata, hd = 20, q = .05) {
r <- mydata$Returns[mydata$ridx <= i & mydata$ridx > (i - 249)]
quantil <- quantile(r, probs = q)
VaR <- quantil*sqrt(hd)
}
df$ridx <- 1:nrow(df) # add index
df$VaR <- sapply(df$ridx, a_function, mydata = df)

how to split data frame by time interval

I have two data frames, first is the daily return of 3 securities, second is the weights of the securities, as the following:
daily.return <- data.frame(date = seq.Date(from = as.Date("2015-01-01"),
by = "days",
length.out = 100),
a = runif(100,-0.1,0.1),
b = runif(100,-0.1,0.1),
c = runif(100,-0.1,0.1))
weights <- data.frame(startDate = c(as.Date("2015-01-01"),
as.Date("2015-02-10"),
as.Date("2015-03-15")),
endDate = c(as.Date("2015-02-09"),
as.Date("2015-03-14"),
as.Date("2015-04-10")),
a = c(0.3,0.5,0.2),
b = c(0.4,0.2,0.1),
c = c(0.3,0.3,0.7)
)
I know how to split data fame by weeks etc.., if we convert data frame to xts;but how to split this daily.return according to startDate and endDate in weights?
Suppose a fund have this three securities,how to calculate the fund nav and daily return?
This should do the job.
daily.return <- data.frame(date = seq.Date(from = as.Date("2015-01-01"),
by = "days",
length.out = 100),
a = runif(100,-0.1,0.1),
b = runif(100,-0.1,0.1),
c = runif(100,-0.1,0.1))
weights <- data.frame(startDate = c(as.Date("2015-01-01"),
as.Date("2015-02-10"),
as.Date("2015-03-15")),
endDate = c(as.Date("2015-02-09"),
as.Date("2015-03-14"),
as.Date("2015-04-10")),
a = c(0.3,0.5,0.2),
b = c(0.4,0.2,0.1),
c = c(0.3,0.3,0.7)
)
library(quantmod)
daily.xts <- as.xts(daily.return[,-1],daily.return[,1])
# Assuming that the total period is the same in both the data frames
weights.xts <- xts(matrix(NA,nrow(daily.xts),3),order.by=index(daily.xts))
names(weights.xts) <- c("a","b","c")
for (i in 1:nrow(weights)){
temp.inputs <- weights[i,]
temp.period <- paste(temp.inputs[,1],temp.inputs[,2],sep="/")
len <- nrow(weights.xts[temp.period])
weights.xts[temp.period,1:3] <- matrix(rep(as.numeric(temp.inputs[,3:5]),len),len,byrow=T)
}
weighted.returns <- daily.xts * weights.xts
weighted.returns <- as.xts(rowSums(weighted.returns),index(weighted.returns))
names(weighted.returns) <- "Weighted Returns"
weighted.returns$Cumulative <- cumsum(weighted.returns)
plot(weighted.returns$Cumulative)
You can split daily.return according to start and end date in weights using apply, performing row-wise operation
apply(weights, 1, function(x) daily.return[daily.return$date >= x[1]
& daily.return$date <= x[2], ])
This will give a list of 3 dataframes splitted according to the range in weights.
EDIT
If I have understood correctly, you want each value in the column a, b, c of the daily.return to multiply with respective columns in the weights.
apply(weights, 1, function(x) {
A <- daily.return[daily.return$date >= x[1] & daily.return$date <= x[2], ]
t(t(A[, 2:4]) * as.numeric(x[3:5]))
}
)

Combining frequencies and summary statistics in one table?

I just discovered the power of plyr frequency table with several variables in R
and I am still struggling to understand how it works and I hope some here can help me.
I would like to create a table (data frame) in which I can combine frequencies and summary stats but without hard-coding the values.
Here an example dataset
require(datasets)
d1 <- sleep
# I classify the variable extra to calculate the frequencies
extraClassified <- cut(d1$extra, breaks = 3, labels = c('low', 'medium', 'high') )
d1 <- data.frame(d1, extraClassified)
The results I am looking for should look like that :
require(plyr)
ddply(d1, "group", summarise,
All = length(ID),
nLow = sum(extraClassified == "low"),
nMedium = sum(extraClassified == "medium"),
nHigh = sum(extraClassified == "high"),
PctLow = round(sum(extraClassified == "low")/ length(ID), digits = 1),
PctMedium = round(sum(extraClassified == "medium")/ length(ID), digits = 1),
PctHigh = round(sum(extraClassified == "high")/ length(ID), digits = 1),
xmean = round(mean(extra), digits = 1),
xsd = round(sd(extra), digits = 1))
My question: how can I do this without hard-coding the values?
For the records:
I tried this code, but it does not work
ddply (d1, "group",
function(i) c(table(i$extraClassified),
prop.table(as.character(i$extraClassified))),
)
Thanks in advance
Here's an example to get you started:
foo <- function(x,colfac,colval){
tbl <- table(x[,colfac])
res <- cbind(n = nrow(x),t(tbl),t(prop.table(tbl)))
colnames(res)[5:7] <- paste(colnames(res)[5:7],"Pct",sep = "")
res <- as.data.frame(res)
res$mn <- mean(x[,colval])
res$sd <- sd(x[,colval])
res
}
ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")
Don't take anything in that function foo as gospel. I just wrote that off the top of my head. Surely improvements/modifications are possible, but at least it's something to start with.
Thanks to Joran.
I slighlty modified your function to make it more generic (without reference to the position of the variables) .
require(plyr)
foo <- function(x,colfac,colval)
{
# table with frequencies
tbl <- table(x[,colfac])
# table with percentages
tblpct <- t(prop.table(tbl))
colnames( tblpct) <- paste(colnames(t(tbl)), 'Pct', sep = '')
# put the first part together
res <- cbind(n = nrow(x), t(tbl), tblpct)
res <- as.data.frame(res)
# add summary statistics
res$mn <- mean(x[,colval])
res$sd <- sd(x[,colval])
res
}
ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")
and it works !!!
P.S : I still do not understand what (group) stands for but

Resources