I'd like to convert a tbl_cube with a list-valued measure into a tibble, but am running into problems.
For example
Y <- array(1:4, dim = c(2,2))
is a "cube" of integers (it only has two axes so it's really a matrix) which can be turned into a tbl_cube as follows
Y_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(y = Y))
which in turn can be converted into a tibble as follows as_tibble(Y_cube).
However, if we start with a list-valued array then we can create the tbl_cube:
X <- array(lapply(1:4, seq_len), dim = c(2,2))
X_cube <- tbl_cube(dimensions = list(A = c('foo','bar'),
B = c('x','y')),
measures = list(x = X))
but as_tibble(X_cube) generates an error:
Error: Can't subset elements that don't exist.
x The locations 3 and 4 don't exist.
i There are only 2 elements.
I'm using dplyr version dplyr_0.8.5.
Related
In my function below, from time to time (please run a few times to see), I get an error message from inside the data.frame that says x and y differ by 2 rows.
I was wondering how this occasional error could be fixed?
x = rnorm(1e2)
h = hist(x = x, plot = F)
DF = data.frame(
x = unlist(sapply(1:length(h$mids), function(i) rep(h$mids[i], each = h$counts[i]))),
y = unlist(sapply(h$counts, function(c) 1:c)))
plot(DF$x, DF$y)
Error in data.frame(x = unlist(sapply(1:length(h$mids), function(i) rep(h$mids[i], :
arguments imply differing number of rows: 100, 102
You get some h$counts as 0 and when you run unlist(sapply(h$counts, function(c) 1:c))) it generates a sequence from 1:0 which is unwanted. You can modify the way you create the dataframe and it should work ok.
DF1 <- data.frame(x = rep(h$mids, h$counts),y = sequence(h$counts))
I have written a function which undertakes arima modelling, outputs a table of coefficients and p-values, ranks the p-values and returns an arima model with no significant variables.
The function takes two inputs, a time series objects, and a data frame.
Here is the code:
backward_stepwise<-function(x, y){
repeat{
arima_result<-auto_arima(x)
arima_pvals<-p_calc(arima_result)
arima_outputs<-run_outputs(arima_result, arima_pvals)
arima_ranked<-rank_pval(arima_outputs)
# temporary fix to .xreg being added to term names
for(i in 1:length(arima_ranked$term)){
arima_ranked$term<-gsub(arima_ranked$term, pattern = 'xreg.',
replacement = "")
}
remove_num_one<-remove_one(arima_ranked)
# removed the cond_select function so that y and x write over
themselves
y<-subset(y, select = colnames(y) != remove_num_one)
x<-as.ts(y)
if(min(arima_ranked$rank, na.rm = TRUE) != 1){
break
}
}
return(arima_result)
}
I am going to apply this to a list of time series objects and list of data frames
Example of time series list
CAN_V98
ADE_U91
ADE_V95
Example of data frames
CAN_V98
ADE_U91
ADE_V95
When I apply it view mapply or for loop, are either methods taking the values from the same index. I.e will the backward step-wise function strip variables from CAN_V98 and keep using CAN_V98 from the data frame list, or after performing its first loop it will use the second data frame from the list of data frames.
# Application via for loop
for(i in mkt_grd){
x<-list_ts_actual[[i]]
y<-list_df_actual[[i]]
ts_outputs[[i]]<-backward_stepwise(x, y)
}
# Application via mapply
ts_outputs1<-mcmapply(backward_stepwise, list_ts_actual,
list_df_actual,SIMPLIFY = FALSE)
Thank you for any assistance
mapply doesn't work like an embedded for loop ie. it will not run through every j for each i so to speak.
mapply(function(x, y){cat("x = ", x, " y = ", y, "\n")},
x = 1:5, y = 1:5)
x = 1 y = 1
x = 2 y = 2
x = 3 y = 3
x = 4 y = 4
x = 5 y = 5
as you can see if it iterates through in parallel, if the length of list y is shorter it will recycle, ie,
mapply(function(x, y){cat("x = ", x, " y = ", y, "\n")},
x = 1:5, y = 1:3)
x = 1 y = 1
x = 2 y = 2
x = 3 y = 3
x = 4 y = 1
x = 5 y = 2
I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim
I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}
I have a df as follows:
t r
1 0 100.00000
2 1 135.86780
3 2 149.97868
4 3 133.77316
5 4 97.08129
6 5 62.15988
7 6 50.19177
and so on...
I want to apply a rolling regression using lm(r~t).
However, I want to estimate one model for each iteration, where the iterations occur over a set time window t+k. Essentially, the first model should be estimated with t=0,t=1,...t=5, if k = 5, and the second model estimated with t=1, t=2,...,t=6, and so on.
In other words, it iterates from a starting point with a set window t+k where k is some pre-specified window length and applies the lm function over that particular window length iteratively.
I have tried using lapply like this:
mdls = lapply(df, function(x) lm(r[x,]~t))
However, I got the following error:
Error in r[x, ] : incorrect number of dimensions
If I remove the [x,], each iteration gives me the same model, in other words using all the observations.
If I use rollapply:
coefs = rollapply(df, 3, FUN = function(x) coef(lm(r~t, data =
as.data.frame(x))), by.column = FALSE, align = "right")
res = rollapply(df, 3, FUN = function(z) residuals(lm(r~t, data =
as.data.frame(z))), by.column = FALSE, align = "right")
Where:
t = seq(0,15,1)
r = (100+50*sin(0.8*t))
df = as.data.frame(t,r)
I get 15 models, but they are all estimated over the entire dataset, providing the same intercepts and coefficients. This is strange as I managed to make rollapply work just before testing it in a new script. For some reason it does not work again, so I am perplexed as to whether R is playing tricks on me, or whether there is something wrong with my code.
How can I adjust these methods to make sure they iterate according to my wishes?
I enclose a possible solution. The idea is to use a vector 1: nrow (df) in the function rollapply to indicate which rows we want to select.
df = data.frame(t = 0:6, r = c(100.00000, 135.86780, 149.97868, 133.77316, 97.08129, 62.15988, 50.19177))
N = nrow(df)
require(zoo)
# Coefficients
coefs <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- coef(lm(r~t))
return(out)
})
# Residuals
res <- rollapply(data = 1:N, width = 3, FUN = function(x){
r = df$r[x]
t = df$t[x]
out <- residuals(lm(r~t))
return(out)
})