I have written a function which undertakes arima modelling, outputs a table of coefficients and p-values, ranks the p-values and returns an arima model with no significant variables.
The function takes two inputs, a time series objects, and a data frame.
Here is the code:
backward_stepwise<-function(x, y){
repeat{
arima_result<-auto_arima(x)
arima_pvals<-p_calc(arima_result)
arima_outputs<-run_outputs(arima_result, arima_pvals)
arima_ranked<-rank_pval(arima_outputs)
# temporary fix to .xreg being added to term names
for(i in 1:length(arima_ranked$term)){
arima_ranked$term<-gsub(arima_ranked$term, pattern = 'xreg.',
replacement = "")
}
remove_num_one<-remove_one(arima_ranked)
# removed the cond_select function so that y and x write over
themselves
y<-subset(y, select = colnames(y) != remove_num_one)
x<-as.ts(y)
if(min(arima_ranked$rank, na.rm = TRUE) != 1){
break
}
}
return(arima_result)
}
I am going to apply this to a list of time series objects and list of data frames
Example of time series list
CAN_V98
ADE_U91
ADE_V95
Example of data frames
CAN_V98
ADE_U91
ADE_V95
When I apply it view mapply or for loop, are either methods taking the values from the same index. I.e will the backward step-wise function strip variables from CAN_V98 and keep using CAN_V98 from the data frame list, or after performing its first loop it will use the second data frame from the list of data frames.
# Application via for loop
for(i in mkt_grd){
x<-list_ts_actual[[i]]
y<-list_df_actual[[i]]
ts_outputs[[i]]<-backward_stepwise(x, y)
}
# Application via mapply
ts_outputs1<-mcmapply(backward_stepwise, list_ts_actual,
list_df_actual,SIMPLIFY = FALSE)
Thank you for any assistance
mapply doesn't work like an embedded for loop ie. it will not run through every j for each i so to speak.
mapply(function(x, y){cat("x = ", x, " y = ", y, "\n")},
x = 1:5, y = 1:5)
x = 1 y = 1
x = 2 y = 2
x = 3 y = 3
x = 4 y = 4
x = 5 y = 5
as you can see if it iterates through in parallel, if the length of list y is shorter it will recycle, ie,
mapply(function(x, y){cat("x = ", x, " y = ", y, "\n")},
x = 1:5, y = 1:3)
x = 1 y = 1
x = 2 y = 2
x = 3 y = 3
x = 4 y = 1
x = 5 y = 2
Related
I am working on an assignment, which tasks me to generate a list of data, using the below code.
##Use the make_data function to generate 25 different datasets, with mu_1 being a vector
x <- seq(0, 3, len=25)
make_data <- function(a){
n = 1000
p = 0.5
mu_0 = 0
mu_1=a
sigma_0 = 1
sigma_1 = 1
y <- rbinom(n, 1, p)
f_0 <- rnorm(n, mu_0, sigma_0)
f_1 <- rnorm(n, mu_1, sigma_1)
x <- ifelse(y == 1, f_1, f_0)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
list(train = data.frame(x = x, y = as.factor(y)) %>% slice(-test_index),
test = data.frame(x = x, y = as.factor(y)) %>% slice(test_index))
}
dat <- sapply(x,make_data)
The code looks good to go, and 'dat' appears to be a 25 column, 2 row table, each with its own data frame.
Now, each data frame within a cell has 2 columns.
And this is where I get stuck.
While I can get to the data frame in row 1, column 1, just fine (i.e. just use dat[1,1]), I can't reach the column of 'x' values within dat[1,1]. I've experimented with
dat[1,1]$x
dat[1,1][1]
But they only throw weird responses: error/null.
Any idea how I can pull the column? Thanks.
dat[1, 1] is a list.
class(dat[1, 1])
#[1] "list"
So to reach to x you can do
dat[1, 1]$train$x
Or
dat[1, 1][[1]]$x
As a sidenote, instead of having this 25 X 2 matrix as output in dat I would actually prefer to have a nested list.
dat <- lapply(x,make_data)
#Access `x` column of first list from `train` dataset.
dat[[1]]$train$x
However, this is quite subjective and you can chose whatever format you like the best.
I'd like to convert a tbl_cube with a list-valued measure into a tibble, but am running into problems.
For example
Y <- array(1:4, dim = c(2,2))
is a "cube" of integers (it only has two axes so it's really a matrix) which can be turned into a tbl_cube as follows
Y_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(y = Y))
which in turn can be converted into a tibble as follows as_tibble(Y_cube).
However, if we start with a list-valued array then we can create the tbl_cube:
X <- array(lapply(1:4, seq_len), dim = c(2,2))
X_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(x = X))
but as_tibble(X_cube) generates an error:
Error: Can't subset elements that don't exist.
x The locations 3 and 4 don't exist.
i There are only 2 elements.
I'm using dplyr version dplyr_0.8.5.
I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim
I have a df as follows:
t r
1 0 100.00000
2 1 135.86780
3 2 149.97868
4 3 133.77316
5 4 97.08129
6 5 62.15988
7 6 50.19177
and so on...
I want to apply a rolling regression using lm(r~t).
However, I want to estimate one model for each iteration, where the iterations occur over a set time window t+k. Essentially, the first model should be estimated with t=0,t=1,...t=5, if k = 5, and the second model estimated with t=1, t=2,...,t=6, and so on.
In other words, it iterates from a starting point with a set window t+k where k is some pre-specified window length and applies the lm function over that particular window length iteratively.
I have tried using lapply like this:
mdls = lapply(df, function(x) lm(r[x,]~t))
However, I got the following error:
Error in r[x, ] : incorrect number of dimensions
If I remove the [x,], each iteration gives me the same model, in other words using all the observations.
If I use rollapply:
coefs = rollapply(df, 3, FUN = function(x) coef(lm(r~t, data =
as.data.frame(x))), by.column = FALSE, align = "right")
res = rollapply(df, 3, FUN = function(z) residuals(lm(r~t, data =
as.data.frame(z))), by.column = FALSE, align = "right")
Where:
t = seq(0,15,1)
r = (100+50*sin(0.8*t))
df = as.data.frame(t,r)
I get 15 models, but they are all estimated over the entire dataset, providing the same intercepts and coefficients. This is strange as I managed to make rollapply work just before testing it in a new script. For some reason it does not work again, so I am perplexed as to whether R is playing tricks on me, or whether there is something wrong with my code.
How can I adjust these methods to make sure they iterate according to my wishes?
I enclose a possible solution. The idea is to use a vector 1: nrow (df) in the function rollapply to indicate which rows we want to select.
df = data.frame(t = 0:6, r = c(100.00000, 135.86780, 149.97868, 133.77316, 97.08129, 62.15988, 50.19177))
N = nrow(df)
require(zoo)
# Coefficients
coefs <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- coef(lm(r~t))
return(out)
})
# Residuals
res <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- residuals(lm(r~t))
return(out)
})
Let's say I have the code below:
data = data.frame(x=numeric(), y=numeric(), z=numeric(), ans=numeric())
x = rnorm(10,0,.01)
y = rnorm(10,0,.45)
z = rnorm(10,0,.8)
ans = x+y+z
data = rbind(data, data.frame(x=x, y=y, z=z, ans=ans))
example = function(ans) {
ans^2
}
data$result = example(data$ans)
I want to use a while loop to assess the ans column of the dataframe and if all of the numbers in the column are less than 0 perform the example function on the ans column. If they are not all below 0 I would like it to run again until they are. Any help is appreciated.
You can use any to test whether there are negative x values.
data = data.frame(x=rnorm(10,0,.01),
y = rnorm(10,0,.45),
z = rnorm(10,0,.8))
while(any(data$x >= 0)){
data$x[data$x >= 0] <- rnorm(length(data$x[data$x >= 0]),0,.01)
}
data$ans = x+y+z
print(data)
x y z ans
1 -0.0014348613 0.51931771 -0.4695617 1.2199625
2 -0.0037155145 -0.72322260 0.4650501 2.2660665
3 -0.0007619743 0.42842295 -0.3660313 0.2690932
4 -0.0068680912 0.36888855 1.4445536 0.6955025
5 -0.0134698425 -0.17174076 -1.2325956 0.7463931
6 -0.0029502825 -0.04208495 -1.4656484 -0.7020727
7 -0.0027566384 0.09476311 -0.1328970 -0.1437156
8 -0.0188576808 -0.25938843 -0.6648152 0.4843587
9 -0.0013769550 -0.00792926 0.4946057 -2.1885040
10 -0.0026376453 -0.15831996 -0.1263073 -0.2611772