Combinef in R HTS package - Original Level Names lost - r

When combining forecasts using combinef() the resulting hts time series has level names which are like A, AA, AB... It doesn't retain the level names from the supplied hts time series.
In the example below, bottom level names are "A10A" & "A10B", while the resulting bottom names are "AA","BB".
Is there a way to retain the original level names in the combined forecast object?
library(forecast)
library(hts)
abc <- ts(5 + matrix(sort(rnorm(20)), ncol = 2, nrow = 10))
colnames(abc) <- c("A10A", "A10B")
y <- hts(abc, characters = c(3, 1))
h <- 12
ally <- aggts(y)
allf <- matrix(NA, nrow = h, ncol = ncol(ally))
for(i in 1:ncol(ally))
allf[,i] <- forecast(auto.arima(ally[,i]), h = h)$mean
allf <- ts(allf)
y.f <- combinef(allf, get_nodes(y), weights = NULL, keep = "gts", algorithms = "lu")

At the end of your code, add the following lines:
colnames(allf) <- colnames(ally)
colnames(y.f$bts) <- colnames(y$bts)
y.f$nodes <- y$nodes
y.f$labels <- y$labels

Related

How to create a single data frame with multiple vectors result of a loop operation?

I have a .wav file and want to get power spectrums for successive no overlapping time windows.
The data of the power spectrum is obtained with the next function, once seewave and tuneR libraries are loaded:
n <- 0:1
sound1 <- readWave("D:\\sound.wav")
result <- do.call(cbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,wl=16,plot=FALSE)))
result1 <- data.frame(result)
The ouput will be
structure(list(x = c(0, 2.75625, 5.5125, 8.26875, 11.025, 13.78125,
16.5375, 19.29375), y = c(1, 0.551383594277632, 0.0742584974502194,
0.0399059818168578, 0.0218500553648978, 0.0176655910374274,
0.00904887363707214,
0.00333698474894753), x.1 = c(0, 2.75625, 5.5125, 8.26875, 11.025,
13.78125, 16.5375, 19.29375), y.1 = c(1, 0.558106398109396,
0.145460335046358,
0.0804097312947365, 0.0476025570412434, 0.0393549921764155,
0.0203584314573552,
0.00737927765210362)), class = "data.frame", row.names = c(NA,
But in the resultant df I only need y and y.1 but no x and x.1. As you may notice x and 1.x have the same data and such iformation is redundant. In short: I only need y data.
Thankyou for your suggestions!
There are more than a few ways to do what you are talking about. I don't know the length of the vector you are talking about though or the way meanspec returns its data, so you will have to fill that in yourself
vec_length <- length(amplitude_vector)
wav_df <- data.frame(matrix(nrow = 0, ncol = vec_length + 1))
for(i in 0:(end-1)){
#Add relevant code to get the amplitude vector from the function below
amp_vec <- meanspec(sound1, from = i, to = i+1, plot = FALSE)...
wav_df <- rbind(wav_df,c(i,amp_vec))
}
colnames(wav_df) <- c("start-time",...)#Add in the other column names that you want
wav_df should then have the information you want.
You may use lapply -
n <- 0:9 #to end at 9-10;change as per your preference
Sound1 <- readWave("D:\\Sound.wav")
result <- do.call(rbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,plot=FALSE)))
result
#to get dataframe as output
#result <- data.frame(result)

For loop for matrix operations

Trying to use "for loop" in R. I have a vector of length 44 with 4401 observations read from data file "data.csv".
I am converting it to a matrix for working on each column as a time series data.
I want to extract each column, do forecasting and then make a matrix for that.
What is the easiest way to do that?
library(forecast)
data<-read.table(file="data.csv",sep=",",row.names=NULL,header=FALSE)
x <- matrix(1:47, ncol = 1, byrow = FALSE)
for (i in 1:4401)
{
y <- data[i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel=auto.arima(y_ts)
forecast=predict(AutoArimaModel, 3)
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
ym = data.matrix(y)
z = rbind(ym,output)
x = cbind(x,z)}
It is just running for i = 1 and giving me error as below:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'
So, your code needed a partial re-write!
If I understand, you want to get 3 forecasts for every 44 time-series data. I used the .xlsx data that you provided.
library(forecast)
library(readxl)
data<-read_excel("data.xlsx",col_names = F)
z <- NULL
data <- t(data)
forecast_horizon <- 3
for (i in 1:ncol(data)){
y <- data[,i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel <- auto.arima(y_ts)
forecast <- tryCatch(predict(AutoArimaModel, forecast_horizon),
error = function(e) data.frame(pred = rep(NA,forecast_horizon)))
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
z = cbind(z,output)
}
Pay attention to the usage of tryCatch which is used because there is one time series that produces errors when accessing the predictions (you can investigate further why this is the case.)
Use the tibbletime package: https://www.business-science.io/code-tools/2017/09/07/tibbletime-0-0-1.html
Read the data with readr::read_csv such that it's a tibble. Turn it into a tibbletime with your date vector. Use tmap_* functions as described in the article to encapsulate your forecasting code and map them to the columns of the tibbletime.
The article should have all the info you need to implement this.
The problem seems to be your data source. This works:
n_col <- 5
n_rows <- 44
#generate data
data <- data.frame(replicate(n_col, rnorm(n_rows)))
x <- matrix(1:47, ncol = 1, byrow = FALSE)
for (i in seq_len(n_col)) {
y <- data[i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel=auto.arima(y_ts)
forecast=predict(AutoArimaModel, 3)
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
ym = data.matrix(y)
z = rbind(ym,output)
x = cbind(x,z)}
x
As an aside, I think I would approach it like this, especially if you have 4,401 fields to perform an auto.arima on:
y_ts <- ts(data, start = c(2016, 1), end = c(2019, 8), frequency = 12)
library(future.apply)
plan(multiprocess)
do.call(
cbind,
future_lapply(y_ts,
function(y_t) {
AutoArimaModel = auto.arima(y_t)
forecast = predict(AutoArimaModel, 3)
output = matrix(forecast$pred, ncol = 1, byrow = F)
ym = data.matrix(y_t)
z = rbind(ym, output)
}
)
)

Create a matrix from a list consisting of unequal matrices for individual bootstraps

I tried to create a matrix from a list which consists of N unequal matrices...
The reason to do this is to make R individual bootstrap samples.
In the example below you can find e.g. 2 companies, where we have 1 with 10 & 1 with just 5 observations.
Data:
set.seed(7)
Time <- c(10,5)
xv <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2);
y <- matrix( c(rnorm(10,5,2), rnorm(5,20,1)));
z <- matrix(c(rnorm(10,5,2), rnorm(5,20,1), rnorm(10,5,2), rnorm(5,20,1)), ncol=2)
# create data frame of input variables which helps
# to conduct the rowise bootstrapping
data <- data.frame (y = y, xv = xv, z = z);
rows <- dim(data)[1];
cols <- dim(data)[2];
# create the index to sample from the different panels
cumTime <- c(0, cumsum (Time));
index <- findInterval (seq (1:rows), cumTime, left.open = TRUE);
# draw R individual bootstrap samples
bootList <- replicate(R = 5, list(), simplify=F);
bootList <- lapply (bootList, function(x) by (data, INDICES = index, FUN = function(x) dplyr::sample_n (tbl = x, size = dim(x)[1], replace = T)));
---------- UNLISTING ---------
Currently, I try do it incorrectly like this:
Example for just 1 entry of the list:
matrix(unlist(bootList[[1]], recursive = T), ncol = cols)
The desired output is just
bootList[[1]]
as a matrix.
Do you have an idea how to do this & if possible reasonably efficient?
The matrices are then processed in unfortunately slow MLE estimations...
i found a solution for you. From what i gather, you have a Dataframe containing all observations of all companies, which may have different panel lengths. And as a result you would like to have a Bootstap sample for each company of same size as the original panel length.
You mearly have to add a company indicator
data$company = c(rep(1, 10), rep(2, 5)) # this could even be a factor.
L1 = split(data, data$company)
L2 = lapply(L1, FUN = function(s) s[sample(x = 1:nrow(s), size = nrow(s), replace = TRUE),] )
stop here if you would like to have saperate bootstap samples e.g. in case you want to estimate seperately
bootdata = do.call(rbind, L2)
Best wishes,
Tim

Dynamic Bayesian Network - multivariate - repetitive events - bnstruct R Package

I am looking for an approach to train a dynamic bayesian network (DBN), using the package bnstruct, for a special case where data is collected from similar events. Being so, 1) I would like to train my DBN feeding it with one event per time.
As in the real case the number of events, rows and columns are big, 2) it would be better if some parallel processing could be implemented to improve perfomance.
A dummy code is provided below, where all data must be fed at once, disregarding event boundaries.
library(bnstruct)
numEvents <- 40
numRows <- 5
numCols <- 3
mat <- matrix(data = rnorm(numEvents * numRows * numCols), ncol = numCols)
varNames <- paste0("var", 1:numCols)
colnames(mat) <- varNames
dataset <- BNDataset(data = mat, discreteness = rep(F, ncol(mat)), variables = varNames, node.sizes = rep(3, ncol(mat)))
dbn <- learn.dynamic.network(dataset, num.time.steps = numCols)
Thanks.
The data you are generating is treated in bnstruct as a DBN with 3 layers, each consisting of a single node. The right way of treating a dataset as a sequence of events is to consider variable X in event i as a different variable from the same variable X in event j, as learn.dynamic.network is just a proxy for learn.network with an implicit layering. That is, your dataset doesn't have to be constructed by adding rows, but by adding columns.
Section 4.1.2 of the vignette has an explanation of how to learn a DBN.
The right way of constructing and using a dataset in your example is
mat <- matrix(data = rnorm(numEvents * numRows * numCols), ncol = numCols * numEvents)
varNames <- rep(paste0("var", 1:numCols), numEvents)
colnames(mat) <- varNames
dataset <- BNDataset(data = mat, discreteness = rep(F, ncol(mat)), variables = varNames, node.sizes = rep(3, ncol(mat)))
dbn <- learn.dynamic.network(dataset, num.time.steps = numEvents)
dbn will have 120 effective nodes, divided in 40 layers.
Coming to the first question: one idea is to provide an initial network as starting point for the successive time steps. Assuming the dataset at time step t+1 is obtained by adding new columns to the dataset used at time step t, you have to manually adapt the BN object to represent the dataset.
From the package vignette:
It is also possible to provide an initial network as starting point for the
structure search. This can be done using the initial.network argument, which
accepts three kinds of inputs:
a BN object (with a structure);
a matrix containing the adjacency matrix representing the structure of a
network;
the string random.chain for starting from a randomly sampled chain-like
network.
The simplest option is probably to keep an expand the DAG with 0s at every augmentation, to have a network with more nodes, and no edges going to the new nodes, and to use that new DAG as starting point. In your example:
library(bnstruct)
numEvents <- 40
numRows <- 5
numCols <- 3
mat <- matrix(data = rnorm(numRows * numCols), ncol = numCols)
varNames <- paste0("var", 1:numCols)
colnames(mat) <- varNames
dataset <- BNDataset(data = mat,
discreteness = rep(F, ncol(mat)),
variables = varNames,
node.sizes = rep(3, ncol(mat)))
dbn <- learn.network(dataset)
for (event in 2:numEvents) {
# collect new data
new.mat <- matrix(data = rnorm(numRows * numCols), ncol = numCols)
colnames(new.mat) <- paste0(varNames, "_", event)
mat <- cbind(mat, new.mat)
dataset <- BNDataset(data = mat,
discreteness = rep(F, ncol(mat)),
variables = colnames(mat),
node.sizes = rep(3, ncol(mat)))
# expand structure of the DBN, adding the nodes relative to the new event
dbn.dag <- dag(dbn)
n.nodes <- ncol(dbn.dag)
new.dag <- matrix(0, nrow=ncol(mat), ncol=ncol(mat))
new.dag[1:n.nodes, 1:n.nodes] <- dbn.dag
# learn
dbn <- learn.dynamic.network(dataset,
initial.network = new.dag,
num.time.steps = event)
}
This will, however, re-learn the whole DBN every time. If edges can go only to the immediate following layer you can trim the search space by providing a layer.struct parameter, or by learning using two events at a time and manually building the larger DBN.
For the second question, bnstruct at the moment does not provide parallel processing.

Creating empty rows on matrix of various size

I am trying to generate a matrix with the following data, Is there any way to create empty rows to make matrix the same size?
#Generating original data
n <- c(12,24)
mu <- c(6.573,6.5)
sigma <- sqrt(0.25)
Diseased.Data <- round(rnorm(n[1],mu[1],sigma),4)
Healthy.Data <- round(rnorm(n[2],mu[2],sigma),4)
g <- c(2,3,4)
cstar.pool <- (mu[1]+mu[2])/2
#generating pooled data
for(i in 1:3){
assign(paste("pool.dis.data",i,sep = ""),replicate(n[1]/g[i],mean(sample(Diseased.Data,g[i]))))
assign(paste("pool.hel.data",i,sep = ""),replicate(n[2]/g[i],mean(sample(Healthy.Data,g[i]))))
}
#generating the pooled diseased data matrix
dis.mat1<- matrix(data = pool.dis.data1,length(pool.dis.data1),1)
dis.mat2 <- matrix(data = pool.dis.data2,length(pool.dis.data2),1)
dis.mat3 <- matrix(data = pool.dis.data3,length(pool.dis.data3),1)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat2 <- rbind(dis.mat2,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.mat3 <- rbind(dis.mat3,NA)
dis.matrix <- matrix(NA, max(length(pool.dis.data1),length(pool.dis.data2),length(pool.dis.data3)),3)
dis.matrix[,1] <- cbind(dis.mat1)
dis.matrix[,2] <- cbind(dis.mat2)
dis.matrix[,3] <- cbind(dis.mat3)
I'd say your best bet is to start out with an empty matrix of the size you need. You can tell matrix to specify the dimensions on creation like so:
new <- matrix( data = NA, nrow = 10, ncol = 20 )
So you just need to create a value for each dimension, based on your input data:
num.rows <- max( length(n), length(mu), ... )
num.columns <- [ I'd just enter a numeric value here ]
new <- matrix( data = NA, nrow = num.rows, ncol = num.columns )
Then you can fill the columns as needed, making sure to leave any excess empty. For example:
new[(1:length(n)),3] <- n
The "1:length(n)" part there will tell R to stop filling the column once the values you've given it have been entered. Otherwise R will continue filling, and you'll get repeated values, which I'm guessing you don't want.

Resources