Using lapply and the lm function together in R - r

I have a df as follows:
t r
1 0 100.00000
2 1 135.86780
3 2 149.97868
4 3 133.77316
5 4 97.08129
6 5 62.15988
7 6 50.19177
and so on...
I want to apply a rolling regression using lm(r~t).
However, I want to estimate one model for each iteration, where the iterations occur over a set time window t+k. Essentially, the first model should be estimated with t=0,t=1,...t=5, if k = 5, and the second model estimated with t=1, t=2,...,t=6, and so on.
In other words, it iterates from a starting point with a set window t+k where k is some pre-specified window length and applies the lm function over that particular window length iteratively.
I have tried using lapply like this:
mdls = lapply(df, function(x) lm(r[x,]~t))
However, I got the following error:
Error in r[x, ] : incorrect number of dimensions
If I remove the [x,], each iteration gives me the same model, in other words using all the observations.
If I use rollapply:
coefs = rollapply(df, 3, FUN = function(x) coef(lm(r~t, data =
as.data.frame(x))), by.column = FALSE, align = "right")
res = rollapply(df, 3, FUN = function(z) residuals(lm(r~t, data =
as.data.frame(z))), by.column = FALSE, align = "right")
Where:
t = seq(0,15,1)
r = (100+50*sin(0.8*t))
df = as.data.frame(t,r)
I get 15 models, but they are all estimated over the entire dataset, providing the same intercepts and coefficients. This is strange as I managed to make rollapply work just before testing it in a new script. For some reason it does not work again, so I am perplexed as to whether R is playing tricks on me, or whether there is something wrong with my code.
How can I adjust these methods to make sure they iterate according to my wishes?

I enclose a possible solution. The idea is to use a vector 1: nrow (df) in the function rollapply to indicate which rows we want to select.
df = data.frame(t = 0:6, r = c(100.00000, 135.86780, 149.97868, 133.77316, 97.08129, 62.15988, 50.19177))
N = nrow(df)
require(zoo)
# Coefficients
coefs <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- coef(lm(r~t))
return(out)
})
# Residuals
res <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- residuals(lm(r~t))
return(out)
})

Related

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Input data must have class mids

I'm working on a school project where I need to impute missing data and after the imputation with mice I'm trying to produce completed data sets with the complete-function.
When I run them one by one everything works fine, but I'd like to use a for-loop in case I want to have more than just m = 5 imputations. Now, when trying to run the for-loop, I always get the error
Error in complete(imputation[1]) : Input data must have class 'mids'.
However when I look up the class it is mids, what's going wrong here?
This is my code:
imputation <- mice(data = data, m = 5, method = "norm", maxit = 1, seed = 500)
m <- 5
for(i in 1:m){
completeData[m] <- complete(imputation[m])
print(summary(completeData[m]))
}
Could someone maybe help me out here?
We are getting error because the class is not mids:
imputation[1]
# $call
# mice(data = walking, m = 5, maxit = 0, seed = 500)
class(imputation[1])
# [1] "list"
From the manual for ?complete:
Usage
complete(x, action = 1, include = FALSE)
library(mice)
# dummy data imputation
data(walking)
imputation <- mice(walking, max = 0, m = 5, seed = 500)
# using for loop
m <- 5
for(i in 1:m){
completeData <- complete(imputation, m)
print(summary(completeData))
}
# I prefer to use lapply
lapply(seq(imputation$m), function(i) summary(complete(imputation, i)))

How to make custom function more robust and less error prone

Here is my data:
LoDFs <- list(first = mtcars[, c(1:3)], second = mtcars[, c(4:6)])
row.names(LoDFs[[1]]) <- NULL
row.names(LoDFs[[2]]) <- NULL
Here is my function:
RollapplyMultipleFuncsAndWins <- function(df.val, df.name, window.size, funs, ..., GroupByWindowSize = TRUE){
library(zoo) # REQUIRED FOR rollapply
by.rows <- 1
combinations <- expand.grid(window.size, funs)
combinations <- cbind(combinations, rep(names(funs), each = length(window.size)))
colnames(combinations) <- c("window.size", "func.call", "func.name")
combinations$window.size <- sprintf(paste0("%0", max(nchar(combinations$window.size)), "d"),
combinations$window.size)
LoMs <- apply(combinations, by.rows, function(x) {
rollapply(
df.val,
width = as.numeric(x[["window.size"]]),
by = as.numeric(x[["window.size"]]),
FUN = x[["func.call"]],
align = "left")})
# COLUMN NAMING CONVENTION: column_name.function_name
LoMs <- lapply(seq_along(LoMs), function(x) {
colnames(LoMs[[x]]) <- paste(colnames(LoMs[[x]]),
combinations$func.name[x],
sep=".");
LoMs[[x]] })
# MULTIPLE FUNCTIONS WITH SAME WINDOW SIZE IN ONE DATASETS
# LIST ELEMENTS NAMING CONVENTION: dataset_name.window_size
if (GroupByWindowSize){
df.win.grps <- lapply(unique(combinations$window.size), function(x) { grep(x, combinations$window.size) })
LoMs <- lapply(df.win.grps, function(x){ do.call(cbind, LoMs[x]) })
names(LoMs) <- paste(rep(df.name, each=length(df.win.grps)),
unique(combinations$window.size),
sep=".")
}
# MULTIPLE FUNCTIONS WITH SAME WINDOW SIZE IN MULTIPLE DATASETS
# LIST ELEMENTS NAMING CONVENTION: dataset_name.function_name.window_size
else {
names(LoMs) <- paste(rep(df.name, each=nrow(combinations)),
combinations$func.name,
combinations$window.size,
sep=".")
}
return(LoMs)
}
Purpose of this function is to apply multiple functions with multiple rollings/movings windows size over one dataset. It takes size of rollings/movings and functions as inputs and creates all possible combinations of those values. For example when you pass c(2, 3, 10) as window.size and c(median = median, mean = mean) as funs It will create following combinations (which says that median and mean will be called with rolling/moving window of size 2, 3, 10 for specified dataset):
window.size func.call func.name
1 02 function (x, na.rm = FALSE) , UseMethod("median") median
2 03 function (x, na.rm = FALSE) , UseMethod("median") median
3 10 function (x, na.rm = FALSE) , UseMethod("median") median
4 02 function (x, ...) , UseMethod("mean") mean
5 03 function (x, ...) , UseMethod("mean") mean
6 10 function (x, ...) , UseMethod("mean") mean
Function then returns list of matrices where each matrix corresponds to results obtained using particular window size including results from all functions (if GroupByWindowSize is TRUE) or list of matrices where each matrix corresponds to results obtained using particular window size and particular function (if GroupByWindowSize is FALSE). You can try e.g. following to better understand what I mean:
res_one_def <- RollapplyMultipleFuncsAndWins(LoDFs[[1]], names(LoDFs)[1], c(2, 3), c(median = median, mean = mean))
res_one_non_def <- RollapplyMultipleFuncsAndWins(LoDFs[[1]], names(LoDFs)[1], c(2, 3), c(median = median, mean = mean), GroupByWindowSize=FALSE)
Problem is when I want same window size but multiple functions e.g.:
res_one_def <- RollapplyMultipleFuncsAndWins(LoDFs[[1]], names(LoDFs)[1], c(1), c(median = median, mean = mean))
I've figured out that the problem is with calling LoMs <- apply(combinations, by.rows, function(x) { .... line. Instead of list of matrices (as it previously returns) it now returns one matrix and I do not know why (now the combinations is of same type as before just smaller):
window.size func.call func.name
1 1 function (x, na.rm = FALSE) , UseMethod("median") median
2 1 function (x, ...) , UseMethod("mean") mean
Questions:
Why I get the error described above?
If you check the code you can see that I'm building combinations as expand.grid(window.size, funs) but what if I want to being able to handle also expand.grid(funs, window.size) (notice reordered arguments) will if (GroupByWindowSize){ branch correctly work also in this example (let's pretend that combinations will be passed as argument to function so I want to being able to handle various types)?
Is possible somehow define naming convention for list elements in the beginning of function and easily switch it from dataset_name.window_size to e.g. dataset_name.function_name.window_size in both if-else branches? As you can see now the names(LoMs) ... in both branches is very different, I'm curious if it is possible to make it unique somehow?
How can I make this code more robust and more generic in general, Is my approach correct or is there better way? Any ideas welcomed.

Replacing a rolling average for loop with apply in R

I want to test the correlations between moving averages of varying lengths and a dependent variable. I've written a for loop that gets the job done but obviously for loops are not the ideal solution. I was wondering if someone could give me some pointers on how to replace the functionality of this for loop with apply as a more elegant solution? I've provided code and test data.
library(zoo)
# a function that calculates the correlation between moving averages for
different lengths of window
# the input functions are "independent": the variable over which to apply the
moving function
# "dependent": the output column, "startLength": the shortest window length,
"endLength" the longest window length
# "functionType": the function to apply (mean, sd, etc.)
MovingAverageCorrelation <- function(indepedent, depedent, startLength, endLength, functionType) {
# declare an matrix for the different rolling functions and a correlation vector
avgMat <- matrix(nrow = length(depedent), ncol = (endLength-startLength+1))
corVector <- rep(NA, ncol(avgMat))
# run the rollapply function over the data and calculate the corresponding correlations
for (i in startLength:endLength) {
avgMat[, i] <- rollapply(indepedent, width = i, FUN = functionType,
na.rm = T, fill = NA, align = "right")
corVector[i] <- cor(avgMat[, i], depedent, use = "complete.obs")
}
return(corVector)
}
# set test data
set.seed(100)
indVector <- runif(1000)
depVector <- runif(1000)
# run the function over the data
cor <- MovingAverageCorrelation(indVector, depVector, 1, 100, "mean")
Thanks!
Try sapply:
sapply(1:100, function(i) cor(rollapplyr(indVector, i, mean, na.rm = TRUE, fill = NA),
depVector, use = "complete.obs"))
If there are no NAs in your inputs this would work and is substantially faster:
sapply(1:100, function(i) cor(rollmeanr(indVector, i, fill = NA), depVector, use = "comp"))

How to use mapply to calculate CCF for list of pairs of time series?

I am trying to apply functions described here for a set of time series. For this, mapply seems to be a good approach but I guess there is some problem either in defining the function or in using mapply.
Here is the example code, where I found some discrepancy in the format of dataframe being returned and might be the source of error.
# define the function to apply
ccffunction <- function(x, y, plot = FALSE){
ts1 = get(x)
ts2 = get(y)
d <- ccf(ts1, ts2,lag.max = 24, plot = plot)
cor = d$acf[,,1]
lag = d$lag[,,1]
dd <- data.frame(lag = lag, ccf = cor)
return(t(dd)) # if I dont take transpose, not getting a df but info on the contents.
# It seems that mapply is adding the results from two series vertically ;
# and main part may be to define correct format of object returned
}
# List of time series simulated for testing results
rm(list = ls())
set.seed(123)
ts1 = arima.sim(model = list(ar=c(0.2, 0.4)), n = 10)
ts2 = arima.sim(model = list(ar=c(0.1, 0.2)), n = 10)
ts3 = arima.sim(model = list(ar=c(0.1, 0.8)), n = 10)
assign("series1", ts1)
assign("series2" , ts2)
assign("series3" , ts3)
tslist <- list(series1 = ts1, series2 = ts2, series3 = ts3)
# convert to mts object if it makes any difference
tsmts <- do.call(cbind, tslist)
class(tsmts)
# create pairs of time series using combn function
tspairs <- combn(names(tslist), 2)
tspairs
tspairs2 <- combn(colnames(tsmts), 2)
tspairs2
try1 <- mapply(ccffunction, tspairs[1, ], tspairs[2, ])
try2 <- mapply(function(x, y){ccf(x, y)}, tspairs2[1, ], tspairs2[2,])
I expected try2 to work directly when pairs of time series are created as combn(tslist, 2) and using plyr::mlply to input time series as arguments but that approach does not work or not using correctly.
Is there a way to find CCF matrix for a set of time series using this approach or any alternatives ?
Edits : Tried to make the question more clear and specific.
Thanks.
You can try this:
ccff <- function(tsVec)
{
return (list(ccf(tsVec[[1]], tsVec[[2]], plot=FALSE)))
}
corList <- aaply(combn(tslist, 2), 2, ccff)
The results are stored in corList which can then accessed through corList[[1]].
KeyPoints:
Note the tsVec[[1]] in the function definition. ccff essentially receives a list, hence the [[]].
Also note the return (list(...)) in the function definition. That is needed to be able to merge all the return values from the function into a single data structure from the caller.
Hope this helps.
Thank you,
GK
http://gk.palem.in/
ccf cannot get the time-series object - which is what the get in try1 does.
So, in try2 you are simply passing ccf two strings, because it cannot see the time-series objects.
> ccf("a_string","another_string")
Error in acf(X, lag.max = lag.max, plot = FALSE, type = type, na.action = na.action) :
'x' must be numeric
and
mapply(function(x, y){ccf(x, y)}, tspairs2[1, ], tspairs2[2,])
Error in acf(X, lag.max = lag.max, plot = FALSE, type = type, na.action = na.action) :
'x' must be numeric

Resources