I want to quantify the structural effect of an event in a Time Series analysis. For doing so I'm using what the KFAS package has to offer.
The problem is that the code seems to get a bit tricky if I want to model simultaneously n different dates, so n different events.
Here's some example that should clarify, I hope:
library(fpp2)
library(KFAS)
# required libraries
y <- hyndsight # just for the example
plot(hyndsight)
abline(v = c(19, 35, 47), col = "red", lwd = 2)
Let's say that I want to check if each of the "red-line" event is a "permanent shock" (ie: it changes the level of the hyndsight series). Now here's the model that can do that with KFAS functions:
# Z and T component for event on pos 19
aZ1 <- array(0, c(1, 1, length(y)))
aZ1[1, 1, (19 + 1):length(y)] <- 1 # 1s from pos 19(+1) forward
aT1 <- array(1, c(1, 1, length(y)))
# Z and T component for event on pos 35
aZ2 <- array(0, c(1, 1, length(y)))
aZ2[1, 1, (35 + 1):length(y)] <- 1 # 1s from pos 35(+1) forward
aT2 <- array(1, c(1, 1, length(y)))
# Z and T component for event on pos 47
aZ3 <- array(0, c(1, 1, length(y)))
aZ3[1, 1, (47 + 1):length(y)] <- 1 # 1s from pos 47(+1) forward
aT3 <- array(1, c(1, 1, length(y)))
And here's the actual model:
mod <- SSModel(y~0+SSMtrend(2, list(NA, NA))+SSMseasonal(12, NA)+
SSMcustom(Z = aZ1, T = aT1,
R = matrix(0, 1, 0), Q = matrix(0, 0, 0),
a1 = 0, P = matrix(0), P1inf = matrix(1))+ # first event
SSMcustom(Z = aZ2, T = aT2,
R = matrix(0, 1, 0), Q = matrix(0, 0, 0),
a1 = 0, P = matrix(0), P1inf = matrix(1))+ # second event
SSMcustom(Z = aZ3, T = aT3,
R = matrix(0, 1, 0), Q = matrix(0, 0, 0),
a1 = 0, P = matrix(0), P1inf = matrix(1)), # third event
H = NA)
initial_val <- c(0,0,0,0,0,0,0) # the first 4 are always there
fit <- fitSSM(mod, intits = initial_val)
Now, the question is, how can I "automate" this process depending on the number of events that I want to model?
As you can see, for each event, I need to create a vector aZ and a vector aT. Those need to be passed in the model via the SSMcustom function, what if I have a new time series and I need to evaluate just two events, or four or more.
The problem is that I cannot keep adding SSMcustom to the model, I want to pass a new time series with a vector of n events, and automatically build the same model, except for the number of events to evaluate. Can I build a unique SSMcustom for all the events?
Also initial_val has to change, but that's less complicate it's always 4 + n, with n the number of events.
I know this question is a lot specific, maybe it's more for CrossValidated, but I'm not so sure.
Introduce "permanent shock" variables equals to 0 before the shock and equals to 1 after the shock (one variable per shock). Then add these variables as explanatory variables (regressors) in you model and look if they are significant.
Take care that each of these variables will reduce the degree of freedom. So you may want to test the model with the same coefficient for all the shocks.
Related
I am having trouble to convert a Matlab loop into R.
For every {t,tau}, calculate the o indexed with a value that refers to the value of [t,tau]
from within the loop. In the matlab code it is done via a cell structure that applies from outside the loop.
o{1,1} = [1 0]';
o{1,2} = [0 0]';
o{2,1} = [1 0]';
o{2,2} = [0 1]';
Within the loop something like this is calculated:
nat_log(A'*o{t,tau})
How can I manage the loop to use the values of t and tau that I set within the loop in order to adress certain "o" within a list?
First my attempt:
Timesteps = 2 # total timesteps/iterations
Tau = 2 # amounts of observations o within 1 timestep; at t=1 and tau-1 = o = [0 0], above denoted {1,2}
for(t in 1:length(Timesteps)){
qs[,t] = matrix(c(.5,.5)) # This is the posterior distribution to be updated,
for (t in 1:length(Timesteps)){ # but that's not that important
for (tau in 1:length(Timesteps)){
****some more math****
y = x * o[[t,tau]] # THIS is the important part!
etc. .... ..... .... ....
I tried something like
o<- vector("list", length = 2 * 2)
dim(o) <- matrix(c(2, 2))
o[[1,1]] <- matrix(c(1, 0), nrow = 2, ncol =1, byrow = TRUE)
o[[1,2]] <- matrix(c(0, 0), nrow = 2, ncol =1, byrow = TRUE)
o[[2,1]] <- matrix(c(1, 0), nrow = 2, ncol =1, byrow = TRUE)
o[[2,2]] <- matrix(c(0, 1), nrow = 2, ncol =1, byrow = TRUE)
The orginial matlab code that I am trying to reproduce can be found here:
https://github.com/rssmith33/Active-Inference-Tutorial-Scripts/blob/main/Pencil_and_paper_exercise_solutions.m
Line: 46-103 (it is essentially Bayes inference with Markov property...)
If someone has an idea, I would be very thankful. I am just a med student interested in math and would love to see this available for R as well.
Based on the answer for this question and its script, how can I print to the console the co-linear columns names?
Script:
library(corrplot)
library(caret)
x <- seq(0, 100, 1)
# colinear with x
y <- x + 2.3
# almost colinear with x / some small gaussian noise
z <- x + rnorm(mean = 0, sd = 5, n = 101)
# uncorrrelated gaussian
w <- rnorm(mean = 0, sd = 1, n = 101)
a <- z+seq(101, 200, 1)/.33 + rnorm(mean = 0, sd = 5, n = 1001)
b <- a -2.3
# this frame is made to exemplify the procedure
df <- data.frame(x = x, y = y, z = z, w = w, a=a, b=b)
corrplot(cor(df))
#drop perfectly multicollinear variables
constant<-rep(1,nrow(df))
tmp<-lm(constant ~ ., data=df)
to_keep<-tmp$coefficients[!is.na(tmp$coefficients)]
to_keep<-names(to_keep[-which(names(to_keep) == "(Intercept)")])
df_result<-df[to_keep]
corrplot(cor(df_result))
You want the variables not included in to_keep. Based off how to_keep is defined, you can write to_drop <- tmp$coefficients[is.na(tmp$coefficients)] to get the coefficients with NA values (meaning there are no estimates for the corresponding variables because they are collinear with others). Then, to print the names of those coefficients, you can simply do print(names(to_drop)).
However, keep in mind that: 1. this will only drop perfectly collinear variables in a hacky way and 2. the way this method decides which variables out of a set of perfectly collinear variables to drop is rather arbitrary (it will depend on the other of variables in your data).
I want to solve a linear optimization problem with binary decision variables to be solved in R (currently I am using the Rglkp package). However I am having trouble setting up the constraints.
Suppose a company wants to decide what quarters to sell their product to maximize their profit. But, if they want to sell they must sell in at least 3 quarters in a row. This is an example of what their profits might look like.
profits <- tibble(year = 1,
quarter = 1:4,
profit = c(23, -4, 6, -2))
I could then set up an Rglpk constraint matrix and solve as below.
cons.m <- matrix(c( 2, -1, -1, 0,
-2, 3, -2, -1,
-1, -2, 3, -1,
0, -1, -2, 2),
nrow = 4, byrow = T)
solution <- Rglpk_solve_LP(obj = profits$profit,
mat = cons.m,
dir = rep("<=", 4),
rhs = rep(0, 4),
types = rep("B", 4),
max = T)
solution$solution
[1] 1 1 1 0
Which says I should sell in the first 3 quarters and not sell in Q4. This is clearly the correct solution.
How could I extend this solution to work with 12 periods, where I must sell at least 5 quarters in a row?
profits.new <- tibble(year = rep(1:3, each = 4),
quarter = 1:12,
profit = runif(12, -20, 20))
I realize I can generate all combinations and then select the maximum that meets the requirements, but I want a solution the can generalize to much larger cases where there would be too many combinations.
This can be modeled as:
where n is the minimum length of a production run.
This will only require T=12 constraints.
The total number of possible production runs >= n (with n=5, T=12) is 42.
Of course, this difference will increase (rather dramatically) for longer planning horizons. E.g. for T=24,n=5 we have 24 constraints vs 4316 possible solutions.
An optimal solution can look like:
There is much more to say about constraints like this.
I am running multiple chains of a MCMCglmm() model and I am trying to find the most efficient way to synthesize my output.
I am using mclapply() to run 4 chains and then combining each of the 4 chains into a list with lapply().
Here is my model and code to clean up and combine the chains. I am using this helpful tutorial for running the chains: https://github.com/tmalsburg/MCMCglmm-intro
Model:
library(parallel)
chains <- mclapply(1:4, function(i) {
MCMCglmm(outcome ~ 1 + pretest + race + satisfaction*race, data = data,
random = ~ provider,
prior = prior.1,
verbose = TRUE,
family = "gaussian",
nitt = 10000,
burnin = 5000,
thin = 10)
}, mc.cores=4)
My cleanup is a little clunky. Is there a way to run a lapply command (or I think what is needed is mapply) on both the fixed and random effects to combine them into the same list and subsequent data frame? In the end, I am hoping to have a data frame so I can add/ subtract posterior distributions and run summary statistics on them.
fixed <- lapply(chains, function(m) m$Sol) # Sol = fixed effects
fixed <- do.call(mcmc.list, fixed)
summary(fixed)
random <- lapply(chains, function(m) m$VCV) # VCV = variance
random <- do.call(mcmc.list, random)
summary(random)
fixed_df <- do.call(rbind, Map(data.frame, fixed))
random_df <- do.call(rbind, Map(data.frame, random))
chains_df <- cbind(fixed_df, random_df)
Ultimately, I am hoping to run one lapply() or mapply() and have a single fixed.random list of lists. I believe I can use the Map(data.frame, fixed.random) on that to create my data frame. My knowledge of the apply function is limited, so I'm hoping to learn more and apply it (no pun intended) to my datasets.
Unfortunately, the models output MCMC objects, so I am unable to create the exact structure. This is the best I can come up with:
list1 <- list(a = rnorm(100, 0, 1), b = rnorm(100, 0, 1))
list2 <- list(a = rnorm(100, 0, 1), b = rnorm(100, 0, 1))
list3 <- list(a = rnorm(100, 0, 1), b = rnorm(100, 0, 1))
list4 <- list(a = rnorm(100, 0, 1), b = rnorm(100, 0, 1))
list5 <- list(d = rnorm(100, 0, 1), e = rnorm(100, 0, 1))
list6 <- list(d = rnorm(100, 0, 1), e = rnorm(100, 0, 1))
list7 <- list(d = rnorm(100, 0, 1), e = rnorm(100, 0, 1))
list8 <- list(d = rnorm(100, 0, 1), e = rnorm(100, 0, 1))
fixed <- list(list1, list2, list3, list4)
random <- list(list5, list6, list7, list8)
Would the following do?
Say your four_mcmc is a list of models of the class "MCMCglmm" (chain1, chain2, etc.) and extract is the list of elements you want to read from the chains (in your case the fixed ("Sol") and random terms ("VCV")).
## The list of mcmcs
four_mcmc <- list(chain1, chain2, chain3, chain4)
## Which elements to extract from the MCMCs
extract <- c("VCV", "Sol")
You can use a get.element function to extract single elements lists from single chains:
## Extracting some specific elements from a chain
get.elements <- function(extract, mcmc) {
## Extracting the element
mcmc_elements <- sapply(extract, function(extract) mcmc[which(names(mcmc) == extract)])
}
## Extracting the VCV and Sol from one chain
str(get.elements(extract, chain1))
You can then simply apply this function to your list of chains:
## Applying get.element for each elements to extract on each chain
all_elements <- lapply(four_mcmc, function(mcmc, extract) get.elements(extract, mcmc), extract)
You can then easily summarise this table for each terms as a data frame with the terms as rows and the chains as columns
## Fixed terms table
fixed_terms <- as.data.frame(lapply(all_elements, function(X) X[[1]]))
## Random terms table
random_terms <- as.data.frame(lapply(all_elements, function(X) X[[2]]))
This code is simplified from the read.mulTree function from https://github.com/TGuillerme/mulTree.
[edit]
#headpoint suggested to simply use:
as.data.frame(lapply(chains, function(m) cbind(m$Sol, m$VCV)))
Which is more elegant but could be less portable.
I've written a simple correlation function that takes in three variables. "A" and "B" are numerical vectors of equal length, and "n" is the length.
Corr.fxn <- function(A, B, n){
Correlation <- (sum((A - mean(A))*(B - mean(B))) / (n-1)) / (sd(A)*sd(B))
return(Correlation)
}
The function works well enough, but I have many vectors I want to process. What's the best way to modify this code to process all "N take 2" unique analyses for my set of vectors "N"?
EDIT:
Example data showing the structure of the vectors:
A <- c(-1, 0, 1, -1, 0, 1, -1, 0, 1)
B <- c(1, 1, -1, 0, 1, -1, 0, 0, 1)
...
n <- length(A)
So let's say I have vectors A through Z and I want to modify my code to output a new vector containing all {26 take 2} correlation values.
Here is one possible way you can do it assuming you have a bunch of numeric vectors in a list v as follows:
v <- list()
for (i in 1:10) {
v[[i]] <- sample(1:10, 10, replace = TRUE)
}
apply(combn(1:10, 2), 2, function(x) Corr.fxn(v[[x[1]]], v[[x[2]]], length(v[[x[1]]])))
In this answer, I assume 2 things. First, you want to write a function yourself, since otherwise you can use Hmisc::rcorr. Second, you want the "N take 2" part to be inside the function, otherwise the ways suggested earlier are correct. In that case, you can do this:
Corr.fxn <- function(vectors, n){
pairs<- combn(length(vectors), 2)
npairs<- ncol(pairs)
cor.mat<- matrix(NA, nrow = length(vectors), ncol = npairs)
for (i in 1:ncol(pairs)){
A<- vectors[[pairs[1, i]]]
B<- vectors[[pairs[2, i]]]
cor.mat[pairs[1, i], pairs[2, i]] <- (sum((A - mean(A))*(B - mean(B))) / (n-1)) /(sd(A)*sd(B))
}
cor.mat[lower.tri(cor.mat)]<- cor.mat[upper.tri(cor.mat)] ###
diag(cor.mat)<- 1 ###
cor.mat<- data.frame(cor.mat) ###
row.names(cor.mat)<- colnames(cor.mat)<- names(vectors) ###
return(cor.mat)
}
The lines that end in ### are there for decorative reasons. The main input is a list called "vectors". So it works as follows:
A<- runif(100, 1, 100)
B<- runif(100, 30, 50)
C<- runif(100, 120, 200)
> Corr.fxn(list(A=A, B=B, C=C), n=100)
A B C
A 1.0000000 -0.11800104 -0.13980458
B -0.1180010 1.00000000 0.04933581
C -0.1398046 0.04933581 1.00000000