How can I test for time differences in the parameters of my model if I have multiple panels?
For example, I use the stock dataset from RSiena:
d <- sienaDataCreate(sienaDependent(array(c(s501, s502, s503), dim=c(50, 50, 3))))
e <- getEffects(d)
e <- includeEffects(e, transTrip)
ans <- siena07(sienaAlgorithmCreate(nsub=4, n3=1000), data=d, effects=e, batch=TRUE)
I want to see whether the transTrip effect differs from period 1 to period 2.
All that is required is to pass the ans into the sienaTimeTest function:
tt <- sienaTimeTest(ans)
summary(tt)
This provides a nice test for whether there is any time heterogeneity in the parameters.
Related
I have a code which takes the input as the Yield Spread (dependent var.) and Forward Rates(independent var.) and operate an auto.arima to get the orders. Afterwards, I am forecasting the next 25 dates (forc.horizon). My training data are the first 600 (training). Then I am moving the time window 25 dates, meaning using the data from 26 to 625, estimating the auto.arima and then forecasting the data from 626 to 650 and so on. My data sets are 2298 rows (date) and 30 columns (maturity).
I want to store all of the forecasts and then plot the forecasted and real values in the same plot.
This is the code I have, but it doesn't store the forecasts in a way to plot later.
forecast.func <- function(NS.spread, ind.v, maturity, training, forc.horizon){
NS.spread <- NS.spread/100
forc <- c()
j <- 0
for(i in 1:floor((nrow(NS.spread)-training)/forc.horizon)){
# test data
y <- NS.spread[(1+j):(training+j) , maturity]
f <- ind.v[(1+j):(training+j) , maturity]
# auto- arima
c <- auto.arima(y, xreg = f, test= "adf")
# forecast
e <- ind.v[(training+j+1):(training+j+forc.horizon) , maturity]
h <- forecast(c, xreg = lagmatrix(e, -1))
forc <- c(forc, list(h))
j <- j + forc.horizon
}
return(forc)
}
a <- forecast.func(spread.NS.JPM, Forward.rate.JPM, 10, 600, 25)
lapply(a, plot)
Here's a link to my two datasets:
https://drive.google.com/drive/folders/1goCxllYHQo3QJ0IdidKbdmfR-DZgrezN?usp=sharing
LOOK AT THE END for a full functional example on how to handle AUTO.ARIMA MODEL with DAILY DATA using XREG and FOURIER SERIES with ROLLING STARTING TIMES and cross validated training and test.
Without a reproducible example no one can help you, because they can't run your code. You need to provide data. :-(
Even if it's not part of StackOverflow to discuss statistics matters, why don't you do an auto.arima with xreg instead of lm + auto.arima on residuals? Especially, considering how you forecast at the end, that training method looks really wrong. Consider using:
fit <- auto.arima(y, xreg = lagmatrix(f, -1))
h <- forecast(fit, xreg = lagmatrix(e, -1))
auto.arima will automatically calculate the best parameters by max likelihood.
On your coding question..
forc <- c() should be outside of the for loop, otherwise at every run you delete your previous results.
Same for j <- 0: at every run you're setting it back to 0. Put it outside if you need to change its value at every run.
The output of forecast is an object of class forecast, which is actually a type of list. Therefore, you can't use cbind effectively.
I'm my opinion, you should create forc in this way: forc <- list()
And create a list of your final results in this way:
forc <- c(forc, list(h)) # instead of forc <- cbind(forc, h)
This will create a list of objects of class forecast.
You can then plot them with a for loop by getting access at every object or with a lapply.
lapply(output_of_your_function, plot)
This is as far as I can go without a reproducible example.
FINAL EDIT
FULL FUNCTIONAL EXAMPLE
Here I try to sum up a conclusion out of the million comments we wrote.
With the data you provided, I built a code that can handle everything you need.
From training and test to model, till forecast and finally plotting which have the X axis with the time as required in one of your comments.
I removed the for loop. lapply is much better for your case.
You can leave the fourier series if you want to. That's how Professor Hyndman suggests to handle daily time series.
Functions and libraries needed:
# libraries ---------------------------
library(forecast)
library(lubridate)
# run model -------------------------------------
.daily_arima_forecast <- function(init, training, horizon, tt, ..., K = 10){
# create training and test
tt_trn <- window(tt, start = time(tt)[init] , end = time(tt)[init + training - 1])
tt_tst <- window(tt, start = time(tt)[init + training], end = time(tt)[init + training + horizon - 1])
# add fourier series [if you want to. Otherwise, cancel this part]
fr <- fourier(tt_trn[,1], K = K)
frf <- fourier(tt_trn[,1], K = K, h = horizon)
tsp(fr) <- tsp(tt_trn)
tsp(frf) <- tsp(tt_tst)
tt_trn <- ts.intersect(tt_trn, fr)
tt_tst <- ts.intersect(tt_tst, frf)
colnames(tt_tst) <- colnames(tt_trn) <- c("y", "s", paste0("k", seq_len(ncol(fr))))
# run model and forecast
aa <- auto.arima(tt_trn[,1], xreg = tt_trn[,-1])
fcst <- forecast(aa, xreg = tt_tst[,-1])
# add actual values to plot them later!
fcst$test.values <- tt_tst[,1]
# NOTE: since I modified the structure of the class forecast I should create a new class,
# but I didnt want to complicate your code
fcst
}
daily_arima_forecast <- function(y, x, training, horizon, ...){
# set up x and y together
tt <- ts.intersect(y, x)
# set up all starting point of the training set [give it a name to recognize them later]
inits <- setNames(nm = seq(1, length(y) - training, by = horizon))
# remove last one because you wouldnt have enough data in front of it
inits <- inits[-length(inits)]
# run model and return a list of all your models
lapply(inits, .daily_arima_forecast, training = training, horizon = horizon, tt = tt, ...)
}
# plot ------------------------------------------
plot_daily_forecast <- function(x){
autoplot(x) + autolayer(x$test.values)
}
Reproducible Example on how to use the previous functions
# create a sample data
tsp(EuStockMarkets) <- c(1991, 1991 + (1860-1)/365.25, 365.25)
# model
models <- daily_arima_forecast(y = EuStockMarkets[,1],
x = EuStockMarkets[,2],
training = 600,
horizon = 25,
K = 5)
# plot
plots <- lapply(models, plot_daily_forecast)
plots[[1]]
Example for the author of the post
# your data
load("BVIS0157_Forward.rda")
load("BVIS0157_NS.spread.rda")
spread.NS.JPM <- spread.NS.JPM / 100
# pre-work [out of function!!!]
set_up_ts <- function(m){
start <- min(row.names(m))
end <- max(row.names(m))
# daily sequence
inds <- seq(as.Date(start), as.Date(end), by = "day")
ts(m, start = c(year(start), as.numeric(format(inds[1], "%j"))), frequency = 365.25)
}
mts_spread.NS.JPM <- set_up_ts(spread.NS.JPM)
mts_Forward.rate.JPM <- set_up_ts(Forward.rate.JPM)
# model
col <- 10
models <- daily_arima_forecast(y = mts_spread.NS.JPM[, col],
x = stats::lag(mts_Forward.rate.JPM[, col], -1),
training = 600,
horizon = 25,
K = 5) # notice that K falls between ... that goes directly to the inner function
# plot
plots <- lapply(models, plot_daily_forecast)
plots[[5]]
I'm using mirtand mirtCAT libraries to obtain student's trait scores with an unidimensional dichotomic 3 parameters logistic model (in witch the items are already callibrated).
How can I change the trait scale interval outputted by fscores? More explicitly, given the vector F1 containing the scores of the N test takers, how to enforce that every entry of F1 is a member of a interval whose extremes are a < b?
I'm using the following code to get the abilities:
library(mirt)
library(mirtCAT)
test <- read.csv(test.csv)
par <- read.csv(par.csv)
a1 <- c(par[,1])
d <- c(par[,2])
g <- c(par[,3])
par <- data.frame(a1,d,g)
mod <- generate.mirt_object(parameters=par,itemtype='3PL')
scores <- fscores(mod,response.pattern=test)
Witch produces abilities from minus infinity to infinity. The documentation does not mention this, but is quite common in IRT analysis for interpretation reasons.
I am trying to calculate a regression variable based on a range of variables in my data set. I would like the regression variable (ei: Threshold 1) to be calculated using a different variable set in each iteration of running the regression.
Aim to collected SSR values for each threshold range, and thus identify the ideal threshold based on the data.
Data (df) variables: Yield, Prec, Price, 0C, 1C, 2C, 3C, 4C, 5C, 6C, 7C, 8C, 9C, 10C
Each loop calculates "thresholds" by selecting a different "b" each time.
a <- df$0C
b <- df$1C
Threshold1 <- (a-b)
Threshold2 <- (b)
Where "b" would be changing in each loop, ranging from 1C to 9C.
Each individual threshold set (1 and 2) should be used to run a regression, and save the SSR for comparison with the subsequent regression utilizing thresholds based on a new "b" value (ranging from 1C TO 9C)
Regression:
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
for each loop of the Regression, I vary the components of calculating thresholds in the following manner:
Current approach is centered around the following code:
df <- read.csv("Data.csv",header=TRUE)
names(df)
0C-9Cvarlist <- names(df)[9:19]
ssr.vec <- matrix(,21,1)
for(i in 1:length(varlist)){
a <- df$0C
b <- df$[i]
Threshold1 <- (a-b)
Threshold2 <- (b)
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
r2 <- summary(reg)$r.squared
ssr.vec[i,] <- c(varlist,r2)
}
colnames(ssr.vec) <- c("varlist","r2")
I am failing to achieve the desired result with the above approach.
Thank you.
I can spot quite a few mistakes...
You need to add variables of interest (Threshold1 anf Threshold2) to the data in the regression. Also, I think that you need to select varlist[i] and not varlist to create your ssr.vec. You need 2 columns to your ssr.vec which is a matrix, so you should call it matrix. You also cannot use something like df$[i] to extract a column! Why is the matrix of length 21 ?! Change the column name to C0,..,C9 and not 0C,..,9C.
For future reference, solve the simple errors before asking question... and include error messages in your post!
This should do the job:
df <- read.csv("Data.csv",header=TRUE)
names(df)[8:19] = paste0("C",0:10)
varlist <- names(df)[9:19]
ssr.vec <- matrix(,21,2)
for(i in 1:length(varlist)){
a <- df$C0
b <- df[,i+9]
df$Threshold1 <- (a-b)
df$Threshold2 <- (b)
reg <- lm(log(Yield)~Threshold1+Threshold2+log(Price)+prec+I(prec^2),data=df)
r2 <- summary(reg)$r.squared
ssr.vec[i,] <- c(varlist[i],r2)
}
colnames(ssr.vec) <- c("varlist","r2")
I'm brand new to R and trying to implement a simple model (which I will extend later) that deals with corporate bond defaults.
For starters, I'm using only two clients.
Parameters:
- two clients (which I name "A" and "B")
- a cash flow of $10,000 will be received from each client if they do not default within 10 years
- pulling together concepts using standard normal random variables, dependent uniform random variables and Gaussian copulas
- run some number of simulations
- store the sum of Client A cash flow plus Client B cash flow and store in a vector named "result"
- finally, take the average of the result vector
My code is:
# define variables
nSim <- 5 # of simulations
rho <- 0.3 # rho
lambda <- 0.01 # default intensity
T <- 10 # time to default
for (i in 1:nSim){
# Step 1: generate 2 independent standard normal random variables
z1 <- rnorm(1, mean=0, sd=1)
z2 <- rnorm(1, mean=0, sd=1)
# Step 2: map the normals into correlated normals
# by Cholesky composition of the correlation matrix
# w1 = z1
# w2 = rho(z1)+sqrt(1-(rho^2))*z2
w1 <- z1
w2 <- rho*z1 - sqrt(1-(rho^2))*z2
# Step 3: using the correlated normals, generate two dependent uniform variables
u <- runif(1, min=0, max=1)
v <- runif(1, min=0, max=1)
# Step 4: using the dependent uniforms, generate two dependent exponentials
tau.A <- (-1/lambda)*log(u)
tau.B <- (-1/lambda)*log(v)
payout.A <- if (tau.A > 10) {10000} else {0}
payout.B <- if (tau.B > 10) {10000} else {0}
result[i] = (payout.A[i] + payout.B[i])
}
# calculate expected value of portfolio
mean(result)
When I run this code, I'm getting an error of "NA" and can't figure out why (again, I'm brand new to R). I don't think each of the simulation values is being stored in the results vector, but don't know how to diagnose the problem.
Thanks in advance to anyone who can help!
--Sarah
Everything works until the results[i] <- (payout.A[i] + payout.B[i]) line. The problem is you never set results.
Before your for loop, add the line:
results <- vector('numeric', length = nSim)
This will create a vector of 0s with a length of nSim. In R is is best to preallocate the space instead of dynamically growing a vector using c().
No the problem is the presence of the [i] assignments in the results[i] <- (payout.A[i] + payout.B[i]) line.
The [i] assignment is okay for the results parameter but not the two payout parameters because each of these are being generated in each loop. So simply remove them to form the line:
results[i] <- (payout.A + payout.B)
will solve your issue. If you wish to keep each payout in its own vector then you need to assign it as such, but it seems that you don't.
I'm using the following code in R to get the P-value via ADF-Test between two Time Series : TS1 and TS2:
m <- lm(TS1 ~ TS2 + 0)
beta <- coef(m)[1]
sprd <- TS1 - beta*TS2
ht <- adf.test(sprd, alternative='stationary', k=0)
pval <- as.numeric(ht$p.value)
If I want to get P-value for ADF for one or two more Time Series (i.e: TS1,TS2 and TS3 or TS1,TS2,TS3 and TS4), what would be the proper syntax considering the above code?
Thanks!
Put your actual code into a function. You could use combn() to determine the pairs of series. After that loop for all the pairs, using paste to make the regression model, that would be passed to your function as a parameter, you will need to store the p-value into a vector or data.frame, for all the pairs . Good luck!!
t(combn(c("TS1","TS2","TS3","TS4"),2))
I think I have found the answer:
m <- lm(pair1 ~ pair2 + pair3)
beta1 <- coef(m)[1]
beta2 <- coef(m)[2]
sprd <- pair1 - beta1*pair2 - beta2*pair3
ht <- adf.test(sprd, alternative='stationary', k=0)