For example, I’m currently working on a function that allows you to see how much money you might have if you invested in the stock market. It’s currently using a loop structure, which is really irritating me, because I know there probably is a better way to code this and leverage vectors in R. I’m also creating dummy vectors before running the function, which seems a bit strange too.
Still a beginner at R (just started!), so any helpful guidance is highly appreciated!
set.seed(123)
##Initial Assumptions
initialinvestment <- 50000 # e.g., your starting investment is $50,000
monthlycontribution <- 3000 # e.g., every month you invest $3000
months <- 200 # e.g., how much you get after 200 months
##Vectors
grossreturns <- 1 + rnorm(200, .05, .15) # approximation of gross stock market returns
contribution <- rep(monthlycontribution, months)
wealth <- rep(initialinvestment, months + 1)
##Function
projectedwealth <- function(wealth, grossreturns, contribution) {
for(i in 2:length(wealth))
wealth[i] <- wealth[i-1] * grossreturns[i-1] + contribution[i-1]
wealth
}
##Plot
plot(projectedwealth(wealth, grossreturns, contribution))
I would probably write
Reduce(function(w,i) w * grossreturns[i]+contribution[i],
1:months,initialinvestment,accum=TRUE)
but that's my preference for using functionals. There is nothing wrong with your use of a for loop here.
Related
I am struggling with a portion of the data analysis for some research I have carried out. Other researchers have used an equation to estimate population growth rate that I would like to implement, but I am hitting a wall with trying to do so. Below is the equation:
Where N0 is the initial number of females in a cohort,
Ax in the number of females emerging on day X, Wx is a measure of mean female size on day x
per replicate, f(wx) is a function relating fecundity to female size, and D is the time (in days)
for a female to reproduce.
N0 (n=15) and D (7) are fixed numbers that I can put in the equation. f(wx) is a function that I have (y = 91.85x - 181.40). Below is a small sample of my data:
df <- data.frame(replicate = c('1','1','2','2','3','3','4','4'),
size = c(5.1, 4.9, 4.7, 4.6, 5.1,2.4,4.3,4.4),
day_emerging = c('6','7','6','7','6','8','7','6'))
I am sorry if this is a bad question for this site. I am just lost for how to handle this. I need R to be able to do the equation for different days. I'm not sure if that is actually possible with my current data format, because R will have to figure out how many females emerged on day x and then perform the other calculations for that day. So maybe this is impossible.
Thank you very much for any advice you can offer.
Here is a base R solution. Hope this is what you are after
dfs <- split(df,df$day_emerging)
p <- sum(sapply(dfs, function(v) nrow(v)*f(mean(v$size))))
q <- sum(sapply(dfs, function(v) nrow(v)*as.numeric(unique(v$day_emerging))*f(mean(v$size))))
res <- log(p/n)/(D + q/p)
such that
> res
[1] 0.5676656
DATA
n <- 15
D <- 7
f <- function(x) 91.85*x-181.4
df <- data.frame(replicate = c('1','1','2','2','3','3','4','4'),
size = c(5.1, 4.9, 4.7, 4.6, 5.1,2.4,4.3,4.4),
day_emerging = c('6','7','6','7','6','8','7','6'))
The answer to this is not particularly R-specific, but rather a skill in and of itself. What you want to do is translate a formal mathematical language into one that works in R (or Python or Matlab,etc).
This is a skill that's worth developing. In python-like psuedocode:
numerator = math.log((1 / n_0) * sum(A * f(w))
denominator = D + (sum(X * A * f(w)) / sum(A * f(w))
r_prime = numerator / denominator
As you can see, there's a lot of unknown variables that you'll have to set previously. Also, things f(w) will need to be defined as helper functions earlier in the script so they can be used. In general, you just want to be able to break down your equation into small parts that you can verify are correct.
It very much helps to do some unit testing with these things - package the equation as a function (or set of small functions that you'll use together) and feed it data that you've run through the equation and verified in another way - by hand, or by a more familiar package. This way, you only have to worry about expressing it in the correct syntax and will know when you've gotten everything correct.
I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R" and in the Students projects session (chapter 24) i am doing the "The pipe spiders of Brunswick" problem, but i am stuck on one part of an evolutionary algorithm, where you need to perform some data perturbation according to the sentence bellow:
"With probability 0.5 each element of the vector is perturbed, independently
of the others, by an amount normally distributed with mean 0 and standard
deviation 0.1"
What does being "perturbed" really mean here? I dont really know which operation I should be doing with my vector to make this perturbation happen and im not finding any answers to this problem.
Thanks in advance!
# using the most important features, we create a ML model:
m1 <- lm(PREDICTED_VALUE ~ PREDICTER_1 + PREDICTER_2 + PREDICTER_N )
#summary(m1)
#anova(m1)
# after creating the model, we perturb as follows:
#install.packages("perturb") #install the package
library(perturb)
set.seed(1234) # for same results each time you run the code
p1_new <- perturb(m1, pvars=c("PREDICTER_1","PREDICTER_N") , prange = c(1,1),niter=200) # your can change the number of iterations to any value n. Total number of iteration would come to be n+1
p1_new # check the values of p1
summary(p1_new)
Perturbing just means adding a small, noisy shift to a number. Your code might look something like this.
x = sample(10, 10)
ind = rbinom(length(x), 1, 0.5) == 1
x[ind] = x[ind] + rnorm(sum(ind), 0, 0.1)
rbinom gets the elements to be modified with probability 0.5 and rnorm adds the perturbation.
I am looking for a function, or package, that will help me with this goal. I've looked through several packages but can't find what I am looking for:
Lets say I have an xts object with 10 columns and 250 rows.
What I want to do is run a simulation, such that I get a robust calculation of my performance metric over the period.
So, lets say that I have 250 data points, I want to run x number of simulations over random samples of the data computing the Sharpe Ratio using the function (PerformanceAnalytics::SharpeRatio) varying the random samples to be lengths 30-240, and then find the average. Keep in mind I want to do this for every column and I'd rather not have to use apply if possible. I'd also like to find something that processes the information rather quickly.
What package or functions would best serve this purpose?
Thank you!
Subsetting xts objects for the rows you want to randomly sample should be good enough, performance wise, if that is your main concern. If you want some other concrete examples, you may find it useful to look at the monte carlo simulation functions recently added to the R blotter package:
https://github.com/braverock/blotter/blob/master/R/mcsim.R
Your requirements are quite detailed and a little tricky to follow, but I think this example may be what you're after?
This solution does use apply functions though! Because it just makes life easier. If you don't use lapply, the code will expand quickly and distract from achieving the goal quickly (and you risk introducing bugs with longer, messier code; one reason to use apply family functions where you can).
library(quantmod)
library(PerformanceAnalytics)
# Set up the data:
syms <- c("GOOG", "FB", "TSLA", "SNAP", "MU")
getSymbols(syms)
z <- do.call(merge, lapply(syms, function(s) {
x <- get(s)
dailyReturn(Cl(x))
}))
# Here we have 250 rows, 5 columns:
z <- tail(z, 250)
colnames(z) <- paste0(syms, ".rets")
subSample <- function(x, n.sub = 40) {
# Assuming subsampling by row, preserving all returns and cross symbol dependence structure at a given timestamp
ii <- sample(1:NROW(x), size = n.sub, replace = FALSE)
# sort in order to preserve time ordering?
ii <- sort(ii)
xs <- x[ii, ]
xs
}
set.seed(5)
# test:
z2 <- subSample(z, n.sub = 40)
zShrp <- SharpeRatio(z2)[1, ]
# now run simulation:
nSteps <- seq(30, 240, by = 30)
sharpeSimulation <- function(x, n.sub) {
x <- subSample(x, n.sub)
SharpeRatio(x)[1, ]
}
res <- lapply(nSteps, FUN = sharpeSimulation, x = z)
res <- do.call(rbind, res)
resMean <- colMeans(res)
resMean
# GOOG.rets FB.rets TSLA.rets SNAP.rets MU.rets
# 0.085353854 0.059577882 0.009783841 0.026328660 0.080846592
Do you realise that SharpeRatio uses sapply? And it's likely other performance metrics you want to use will as well. Since you seem to have something against apply (possibly all apply functions in R), this might be worth noting.
I've been trying hard to recreate this model in R:
Model
(FARHANI 2012)
I've tried many things, such as a cumsum paste - however that would not work as I could not assign strings the correct variable as it kept thinking that L was a function.
I tried to do it manually, I'm only looking for p,q = 1,2,3,4,5 however after starting I realized how inefficient this is.
This is essentially what I am trying to do
model5 <- vector("list",20)
#p=1-5, q=0
model5[[1]] <- dynlm(DLUSGDP~L(DLUSGDP,1))
model5[[2]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2))
model5[[3]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3))
model5[[4]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3)+L(DLUSGDP,4))
model5[[5]] <- dynlm(DLUSGDP~L(DLUSGDP,1)+L(DLUSGDP,2)+L(DLUSGDP,3)+L(DLUSGDP,4)+L(DLUSGDP,5))
I'm also trying to do this for regressing DLUSGDP on DLWTI (my oil variable's name) for when p=0, q=1-5 and also p=1-5, q=1-5
cumsum would not work as it would sum the variables rather than treating them as independent regresses.
My goal is to run these models and then use IC to determine which should be analyzed further.
I hope you understand my problem and any help would be greatly appreciated.
I think this is what you are looking for:
reformulate(paste0("L(DLUSGDP,", 1:n,")"), "DLUSGDP")
where n is some order you want to try. For example,
n <- 3
reformulate(paste0("L(DLUSGDP,", 1:n,")"), "DLUSGDP")
# DLUSGDP ~ L(DLUSGDP, 1) + L(DLUSGDP, 2) + L(DLUSGDP, 3)
Then you can construct your model fitting by
model5 <- vector("list",20)
for (i in 1:20) {
form <- reformulate(paste0("L(DLUSGDP,", 1:i,")"), "DLUSGDP")
model5[[i]] <- dynlm(form)
}
I am looking for a way to quickly calculate realized volatility on a rolling FORWARD looking basis. So I want to calculate the standard deviation using today as the first observation for the next n days.
At the moment, I calculate realized volatility in the backward direction with the following code:
index.realized <- xts(apply(index.ret,2,runSD,n=125), index(index.ret))*sqrt(252)
index.realized <- na.locf(index.realized, fromLast=TRUE)
I tried setting n = -125 but unsurprisingly, that doesn't work.
Thank you.
EDIT
To clarify what I am trying to do, here is the for loop I am using to accomplish this:
for(i in 1:nrow(index.ret)){
bear.realized[i,] = sd(bear.ret[i:(i+124),]) * sqrt(252)
index.realized[i,] = sd(index.ret[i:(i+124),]) * sqrt(252)
}
For the last 124 observations where I don't have enough data to compute the volatility, I want it to take the last "correct" calculation and use it for the rest of the series.
One way to do it is to "lag" your series with a negative k (note that k is interpreted differently in lag.xts than lag.ts and lag.zoo).
getSymbols("SPY")
spy <- ROC(Cl(SPY))
# note that k is interpreted differently from lag.ts and lag.zoo
spy$SPY.Lag <- lag(spy,-125)
# remove trailing NA
spy <- na.omit(spy)
rv <- runSD(spy$SPY.Lag,n=125)*sqrt(252)
OK I solved it. It's actually quite simple, was just thinking about this the completely wrong way.
index.realized <- xts(apply(index.ret,2,runSD,n=125), index(index.ret))*sqrt(252)
index.realized <- lag(index.realized, -124)
index.realized <- na.locf(index.realized)
Just calculate the realized volatility as per normal, and then lag it by the appropriate number so that it is "forward looking".