Function input not recognised - local & global environment issue - r

I am writing a function to group together actions I regularly take on time series data. I have included all libraries I am using in the script as I think my issue may be to do with plyr / dplyr being (rightly) super specific about the environment of each variable.
The first function works great, but when getting to the second one, R doesn't recognise the input as 'x', but spits out the error: 'Error in eval(predvars, data, env) : object 'x' not found.'
Why is this happening?
library(plyr)
library(dplyr)
library(caret)
library(xts)
library(forecast)
library(imputeTS)
library(lubridate)
x1 = arima.sim(list(order = c(0,1,0)), n = 119)
timetrend <- function(x, starts, ends, frequency) {
y <- list()
y[[1]] <- ts(x, start = starts, end = ends, frequency = frequency)
y[[2]] <- decompose(y[[1]])
y[[3]] <- y[[1]] - y[[2]]$seasonal - y[[2]]$random
return(y)
}
plottime <- function(x) { #takes a timetrend list as input
t <- tslm(x[[3]] ~ trend)
plot(x[[3]])
lines(t$fitted.values)
return(t)
}
use functions from here
result <- timetrend(x = x1,
starts = c(2000, 01, 01), ends = c(2009, 12, 01), frequency = 12)
plottime(x = result)

I could make it work with the following code.
plottime <- function(x) { #takes a timetrend list as input
y=x[[3]]
t <- tslm(formula = y ~ trend)
plot(x[[3]])
lines(t$fitted.values)
return(t)
}
Not sure why it is happening, maybe the use of indexing x[[3]] in the formula argument is a problem?

Related

Solution for filter() not working within a For Loop?

I'm trying to fit a von Bertalanffy growth function (VGBF) in r to my data grouped by a serial number.
This is a snippet of my data:
Serial_No<- c(315,315,315,315,315,315,315,316,316,316,316,317,317,317,317,317,317,317,317,317,318,318,318,318,319,319,319,319)
Year<-c(1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945)
tl<-c(19,33,46,55,63,66,70,22,39,55,65,20,40,51,56,60,62,63,64,65,26,43,54,60,28,43,53,61)
age<-c(1,2,3,4,5,6,7,1,2,3,4,1,2,3,4,5,6,7,8,9,1,2,3,4,1,2,3,4))
df<-data.frame(Serial_No, Year, tl, age)
I've been following this example: https://www.r-bloggers.com/2020/01/von-bertalanffy-growth-plots-ii/
and have changed my code to as follows:
vb <- vbFuns()
predict2 <- function(x) predict(x,data.frame(age=ages))
agesum <- group_by(df,Serial_No) %>%
summarize(minage=min(age),maxage=max(age))
Serial_Nos <- unique(df$Serial_No)
nSerial_Nos <- length(Serial_Nos)
cfs <- cis <- preds1 <- preds2 <- NULL
for (i in 1:nSerial_Nos) {
cat(Serial_Nos[i],"Loop\n")
tmp1 <- filter(df,Serial_No==Serial_Nos[i])
sv1 <- vbStarts(tl~age,data=tmp1)
fit1 <- nls(tl~vb(age,Linf,K,t0),data=tmp1,start=sv1)
cfs <- rbind(cfs,coef(fit1))
boot1 <- Boot(fit1)
tmp2 <- confint(boot1)
cis <- rbind(cis,c(tmp2["Linf",],tmp2["K",],tmp2["t0",]))
ages <- seq(-1,16,0.2)
boot2 <- Boot(fit1,f=predict2)
tmp2 <- data.frame(Serial_No=Serial_Nos[i],age=ages,
predict(fit1,data.frame(age=ages)),
confint(boot2))
preds1 <- rbind(preds1,tmp2)
tmp2 <- filter(tmp2,age>=agesum$minage[i],age<=agesum$maxage[i])
preds2 <- rbind(preds2,tmp2)
}
The code runs, but the results from the VBGF returned are the same for every serial no, which can't be the case. I think it's the filter function not working in the above code.
I've searched for solutions but can't get it to work.
If anyone can please help, or knows of a solution i'd really appreciate it.
Thank you in advance
Model fit with package growthrates
The following post describes an alternative approach without for-loop and filter. Similar loop-free solutions can be implemented using the common nls function and lapply in "base" R or group_by in "tidyverse".
Model definition
The growthrates package does not contain a von Bertalanffy function, so it has to be provided as user supplied model, as described in the package vignette. Here I borrowed the function from package FSA and adapted it accordingly:
library("growthrates")
grow_von_bert <- function(time, parms) {
with(as.list(parms), {
y <- Linf * (1 - exp(-K * (time - t0)))
as.matrix(data.frame(time = time, y = y))
})
}
Test of the model with a single example
p <- c(t0=5, Linf=10, K=.1)
time <- seq(5, 100)
plot(grow_von_bert(time, p), type="l")
Fit of a single data example
It is always a good idea to fit one or more single examples first, before doing this for all.
df1 <- subset(df, Serial_No == 315)
fit1 <- fit_growthmodel(df1$age, df1$tl,
FUN = grow_von_bert, p=c(t0=0, Linf=70, K=0.1))
summary(fit1)
Fit of all data sets
This can be done in a loop or with appropriate tidyverse functions, whipe package growthrates has such a function already built in, so all models can be fitted with a single function call. It is of course necessary to specify good start parameters, either the same for all curves or individual parameter sets, depending on the quality of the data. Here is the complete code including the data of the OP:
library("growthrates")
df <- data.frame(
Serial_No = factor(c(315,315,315,315,315,315,315,316,316,316,316,317,317,317,317,
317,317,317,317,317,318,318,318,318,319,319,319,319)),
year = c(1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,
1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945,1945),
tl = c(19,33,46,55,63,66,70,22,39,55,65,20,40,51,56,60,62,63,64,65,26,43,54,60,28,
43,53,61),
age = c(1,2,3,4,5,6,7,1,2,3,4,1,2,3,4,5,6,7,8,9,1,2,3,4,1,2,3,4)
)
grow_von_bert <- function(time, parms) {
with(as.list(parms), {
y <- Linf * (1 - exp(-K * (time - t0)))
as.matrix(data.frame(time = time, y = y))
})
}
fit <- all_growthmodels(tl ~ age | Serial_No,
data=df,
FUN = grow_von_bert,
p=c(t0=0, Linf=70, K=0.1))
results(fit)
par(mfrow=c(2,3))
plot(fit, las=1)

R loop for univariate rolling window on multiple variables in DF

I'm sure this is obvious but i'm a begginer in R and i spent a good part of the morning trying to solve this...
So, I have a DF of 24 time series of 3410 observations. I want to create a loop to procede a rolling window regression. I want to regress all the independant variable DF[,2:24] on the dependant variable DF[,1] in a rolling window and extract the evolution of the R2 in time.
I tried the following code but it isn't working:
resultsList <- list()
for (i in 1:ncol(var)) {
lmfit <- roll_regres(var[,1]~var[,i], var, 126, do_compute = c("sigmas", "r.squareds"))
resultsList[[i]] <- summary(lmfit)
}
I get the following error:
Error in roll_cpp(Y = y, X = x, window = width, do_compute_R_sqs = do_compute_R_sqs, :
'dchdd' failed with code 1
I also tried the following code using rollapply but it isn't working either:
resultsList <- list()
for (i in 1:ncol(var)) {
lmfit <- rollapply(var, width = 126, FUN = function(x) lm(var[, i] ~ var[,1]), by =1, align = "left")
resultsList[[i]] <- summary(lmfit)
}
I get the following error:
Error in zoo(rval, index(x)[i]) :
“x” : attempt to define invalid zoo object
If anyone could help me with this i would be VERY gratefull.
Thank you in advance for any help you can provide.
I solved the problem; here is the code:
#Loop RW 6 months
resultsList6M <- list()
for (i in 1:ncol(var)) {
lmfit <- roll_regres(var[,i]~cu, var, 126,do_compute = c("r.squareds"))
resultsList6M[[i]] <- lmfit["r.squareds"]
}
RS16m <- ts(as.numeric(as.character(unlist(resultsList6M[[1]]["r.squareds"]))),start = c(2007, 5), frequency = 252)
ts_plot(RS16m,
title = "126 Days RW, CADUSD ~ EURUSD ",
Xtitle = "Time",
Ytitle = "RSQUARED",
width = 1)
Now, I can do a graphical analysis of the R2 in time, i plan the repeat the rolling window on 6 months, 1 year and 3 years.

Value-at-Risk (Extreme-Value Theory) using Monte Carlo Simulation in R

I have code that successfully calculates VaR based on Extreme Value Theory using historical data. I'm trying to run this same code on multiple simulated price paths (i.e. calculating a VaR for each path) and then taking the median or average of those VaRs.
Every example I could find online had the simulation function return the price at the end of the period and then they replicated the function X many time. That makes sense to me, except that I essentially need to calculate value-at-risk for each simulated path. Below is the code I have so far. I can say that the code works when using historical data (i.e. the "evt" function works fine and the datatable is populated correctly when the lossOnly, u, and evtVar lines aren't in a function). However, I've been trying to implement simulation in the second function and trying various combinations, which have all failed.
library('RODBC')
library('nor1mix')
library('fExtremes')
library('QRM')
library('fGarch')
#function for computing the EVT VaR
evt <- function(data,u){
#fit excess returns to gpd to get estimates
gpdfit = tryCatch({
gpdfit <- gpdFit(data,u,type="mle")
}, warning = function(w) {
gpdfit <- gpdFit(data,u,type="mle",optfunc="nlminb")
return(gpdfit)
}, error = function(e) {
gpdfit <- gpdFit(data,u,type="pwm",optfunc="nlminb")
return(gpdfit)
}, finally = {})
#now calculate VaRs
xi <- gpdfit#fit$par.ests["xi"]
beta <- gpdfit#fit$par.ests["beta"]
Nu <- length(gpdfit#data$exceedances)
n <- length(data)
evtVar95 <- (u+((beta/xi)*(((n/Nu)*.05)^(-xi) - 1.)))*100
evtVar99 <- (u+((beta/xi)*(((n/Nu)*.01)^(-xi) - 1.)))*100
evtVar997 <- (u+((beta/xi)*(((n/Nu)*.003)^(-xi) - 1.)))*100
evtVar999 <- (u+((beta/xi)*(((n/Nu)*.001)^(-xi) - 1.)))*100
#return calculations
return(cbind(evtVar95,evtVar99,evtVar997,evtVar999,u,xi,beta,Nu,n))
}
#data <- read.table("pricedata.txt")
prices <- data$V1
returns <- diff(log(prices)) #or returns <- log(prices[-1]/prices[-n])
xi <- mean(returns)
std <- sd(returns)
N <- length(prices)
lstval <- prices[N]
options(scipen = 999)
p <- c(lstval, rep(NA, N-1))
gen.path <- function(){
N <- length(prices)
for(i in 2:N)
p[i] <- p[i-1] * exp(rnorm(1, xi, std))
# plot(p, type = "l", col = "brown", main = "Simulated Price")
#evt calculation
#first get only the losses and then make them absolute
lossOnly <- abs(p[p<0])
#get threshold
u <- quantile(lossOnly, probs = 0.9, names=FALSE)
evtVar <- evt(lossOnly,u)
return(evtVar)
}
runs <- 10
sim.evtVar <- replicate(runs, gen.path())
evtVar <- mean(sim.evtVar)
#add data to total table
VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
In short, I'm trying to run the value-at-risk function (first function) within the monte carlo function (second function) and trying to put the average simulated values into a data tables. I know the first function works, but it's the second function that's driving me crazy. There are the errors I'm getting:
> sim.evtVar <- replicate(runs, gen.path())
Error in if (xi > 0.5) { : missing value where TRUE/FALSE needed
Called from: .gpdpwmFit(x, u)
Browse[1]> evtVar <- mean(sim.evtVar)
Error during wrapup: object 'sim.evtVar' not found
Browse[1]>
> #add data to total table
> VaR <- c(evtVar[1],evtVar[2],evtVar[3],evtVar[4],evtVar[5],evtVar[6],evtVar[7],evtVar[8],evtVar[9])
Error: object 'evtVar' not found
> DF <- data.frame(VaR, row.names=c("evtVar95","evtVaR_99","evtVaR_997","evtVaR_999","u","xi","beta","Nu","n"))
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""function"" to a data.frame
Any help you can provide is greatly appreciated! Thank you in advance!
I think the Problem is this row:
lstval <- prices[N]
because if you take a stock price, that can't ever be negative, you produce an empty vector at this row in your function:
lossOnly <- abs(p[p<0])
you should try instead:
lstval <- min(returns)
if you want the highest negative return of your dataset

Vectorize window.zoo over start= and end=

I have input data that look like that (reduced to two time-series for the example).
library(zoo)
begin <- as.Date(c('2003-02-12', '2003-01-23'))
end <- as.Date(c('2003-10-02', '2003-08-01'))
x.Date <- as.Date("2003-01-01") + seq(1, 365, 8) - 1
data <- matrix(rnorm(length(x.Date)*2), ncol = 2, dimnames = list(r = NULL, col = c('a', 'b')))
I'm trying to write a function that, for each time-series (x[,i]), averages the values for a window defined by begin[i] and end[i].
fun <- function(data, begin, end, dates) {
x <- zoo(data, dates)
xSub <- window(x, start = begin, end = end)
colMeans(xSub, na.rm = TRUE)
}
The function above (or a slightly modified version) works if a single time-series is provided, but is not properly vectorized over begin and end. Any idea how I could make this work?
# Slightly modified version working for single time-series
fun2 <- function(data, begin, end, dates) {
x <- zoo(data, dates)
xSub <- window(x, start = begin, end = end)
mean(xSub, na.rm = TRUE)
}
fun2(data[,1], begin[1], end[1], x.Date) # OK
fun(data, begin, end, x.Date) # Same window is used for both time-series
The function should reproduce the behaviour of this loop.
out <- c()
for(i in 1:ncol(data)) {
x <- zoo(data[,i], x.Date)
xSub <- window(x, start = begin[i], end = end[i])
out <- c(out, mean(xSub))
}
Thanks,
Loïc
Create the zoo object to be used, convert it to a list of zoo objects and Map (or mapply) over it.
z <- zoo(data, x.Date)
Map(window, as.list(z), start = begin, end = end)
Note that the key is to use as.list, not list.
mapply is probably the best way to do it.
fun <- function(data, begin, end, dates) {
x <- zoo(data, dates)
step1 <- mapply(window, start=begin, end=end, MoreArgs=list(x=x))
sapply(step1, colMeans, na.rm=TRUE)
}
An alternate answer that really shows how a vectorized solution can do anything a for loop does.
fun <- function(data, begin, end, dates) {
x <- zoo(data, dates)
paircount <- 1:length(begin)
sapply(paircount, function(i) mean(window(x[,i], start=begin[i], end=end[i]), na.rm=TRUE))
}

R: Error in xts - order.by

I am trying to (re)build a basic prediction model of the S&P 500 INDEX (data orignates from Yahoo finance)
I ran into some difficulties with the "ordering" of my data set.
During the build of data.model the following error occurs
Error in xts(new.x, x.index) : NROW(x) must match length(order.by)
After some research I realize that the problem is with the ordering, and it seems to lack ordering as is required for the underlying zoo package.
Is there an elegant way to solve this issue?! Thanks in advance
library(xts)
library(tseries)
library(quantmod)
GSPC <- as.xts(get.hist.quote("^GSPC",start="1970-01-02",
quote=c("Open", "High", "Low", "Close","Volume","AdjClose")))
head(GSPC)
T.ind <- function(quotes, tgt.margin = 0.025, n.days = 10) {
v <- apply(HLC(quotes), 1, mean)
r <- matrix(NA, ncol = n.days, nrow = NROW(quotes))
for (x in 1:n.days) r[, x] <- Next(Delt(v, k = x), x)
x <- apply(r, 1, function(x) sum(x[x > tgt.margin | x <
-tgt.margin]))
if (is.xts(quotes))
xts(x, time(quotes))
else x
}
myATR <- function(x) ATR(HLC(x))[, "atr"]
mySMI <- function(x) SMI(HLC(x))[, "SMI"]
myADX <- function(x) ADX(HLC(x))[, "ADX"]
myAroon <- function(x) aroon(x[, c("High", "Low")])$oscillator
myBB <- function(x) BBands(HLC(x))[, "pctB"]
myChaikinVol <- function(x) Delt(chaikinVolatility(x[, c("High", "Low")]))[, 1]
myCLV <- function(x) EMA(CLV(HLC(x)))[, 1]
myEMV <- function(x) EMV(x[, c("High", "Low")], x[, "Volume"])[, 2]
myMACD <- function(x) MACD(Cl(x))[, 2]
myMFI <- function(x) MFI(x[, c("High", "Low", "Close")], x[, "Volume"])
mySAR <- function(x) SAR(x[, c("High", "Close")])[, 1]
myVolat <- function(x) volatility(OHLC(x), calc = "garman")[, 1]
library(randomForest)
data.model <- specifyModel(T.ind(GSPC) ~ Delt(Cl(GSPC),k=1:10) +
myATR(GSPC) + mySMI(GSPC) + myADX(GSPC) + myAroon(GSPC) +
myBB(GSPC) + myChaikinVol(GSPC) + myCLV(GSPC) +
CMO(Cl(GSPC)) + EMA(Delt(Cl(GSPC))) + myEMV(GSPC) +
myVolat(GSPC) + myMACD(GSPC) + myMFI(GSPC) + RSI(Cl(GSPC)) +
mySAR(GSPC) + runMean(Cl(GSPC)) + runSD(Cl(GSPC)))
traceback() reveals the error occurs in the Delt(Cl(GSPC),k=1:10) call:
> Delt(Cl(GSPC),k=1:10)
Error in xts(new.x, x.index) : NROW(x) must match length(order.by)
Delt expects a (m x 1) object but you're passing a (m x 2) object. This is because GSPC has two columns that are matched by Cl ("Close" and "AdjClose"). This will probably cause headaches in other areas too...
Cl expects objects like those returned by getSymbols, where the adjusted close column is named "Adjusted". If you need to use get.hist.quote for some reason, just rename the "AdjClose" column after you download the data.
colnames(GSPC) <- c("Open", "High", "Low", "Close","Volume","Adjusted")
Delt(Cl(GSPC),k=1:10) # works now
## Error in xts(x, order.by = order.by, frequency = frequency, ...
## NROW(x) must match length(order.by)
I wasted hours running into this error. Regardless of whether or not I had the exact same problem, I'll show how I solved for this error message in case it saves you the pain I had.
I imported an Excel or CSV file (tried both) through several importing functions, then tried to convert my data (as either a data.frame or .zoo object) into an xts object and kept getting errors, this one included.
I tried creating a vector of dates seperately to pass in as the order.by parameter. I tried making sure the date vector the rows of the data.frame were the same. Sometimes it worked and sometimes it didn't, for reasons I can't explain. Even when it did work, R had "coerced" all my numeric data into character data. (Causing me endless problems, later. Watch for coercion, I learned.)
These errors kept happening until:
For xts conversion I used the date column from the imported Excel sheet as the order.by parameter with an as.Date() modifier, AND I *dropped the date column during the conversion to xts.*
Here's the working code:
xl_sheet <- read_excel("../path/to/my_excel_file.xlsx")
sheet_xts <- xts(xl_sheet[-1], order.by = as.Date(xl_sheet$date))
Note my date column was the first column, so the xl_sheet[-1] removed the first column.

Resources