For loop for matrix operations - r

Trying to use "for loop" in R. I have a vector of length 44 with 4401 observations read from data file "data.csv".
I am converting it to a matrix for working on each column as a time series data.
I want to extract each column, do forecasting and then make a matrix for that.
What is the easiest way to do that?
library(forecast)
data<-read.table(file="data.csv",sep=",",row.names=NULL,header=FALSE)
x <- matrix(1:47, ncol = 1, byrow = FALSE)
for (i in 1:4401)
{
y <- data[i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel=auto.arima(y_ts)
forecast=predict(AutoArimaModel, 3)
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
ym = data.matrix(y)
z = rbind(ym,output)
x = cbind(x,z)}
It is just running for i = 1 and giving me error as below:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
'data' must be of a vector type, was 'NULL'

So, your code needed a partial re-write!
If I understand, you want to get 3 forecasts for every 44 time-series data. I used the .xlsx data that you provided.
library(forecast)
library(readxl)
data<-read_excel("data.xlsx",col_names = F)
z <- NULL
data <- t(data)
forecast_horizon <- 3
for (i in 1:ncol(data)){
y <- data[,i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel <- auto.arima(y_ts)
forecast <- tryCatch(predict(AutoArimaModel, forecast_horizon),
error = function(e) data.frame(pred = rep(NA,forecast_horizon)))
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
z = cbind(z,output)
}
Pay attention to the usage of tryCatch which is used because there is one time series that produces errors when accessing the predictions (you can investigate further why this is the case.)

Use the tibbletime package: https://www.business-science.io/code-tools/2017/09/07/tibbletime-0-0-1.html
Read the data with readr::read_csv such that it's a tibble. Turn it into a tibbletime with your date vector. Use tmap_* functions as described in the article to encapsulate your forecasting code and map them to the columns of the tibbletime.
The article should have all the info you need to implement this.

The problem seems to be your data source. This works:
n_col <- 5
n_rows <- 44
#generate data
data <- data.frame(replicate(n_col, rnorm(n_rows)))
x <- matrix(1:47, ncol = 1, byrow = FALSE)
for (i in seq_len(n_col)) {
y <- data[i]
y_ts <- ts(y, start=c(2016,1), end=c(2019,8), frequency=12)
AutoArimaModel=auto.arima(y_ts)
forecast=predict(AutoArimaModel, 3)
output <- matrix(forecast$pred, ncol = 1, byrow = FALSE)
ym = data.matrix(y)
z = rbind(ym,output)
x = cbind(x,z)}
x
As an aside, I think I would approach it like this, especially if you have 4,401 fields to perform an auto.arima on:
y_ts <- ts(data, start = c(2016, 1), end = c(2019, 8), frequency = 12)
library(future.apply)
plan(multiprocess)
do.call(
cbind,
future_lapply(y_ts,
function(y_t) {
AutoArimaModel = auto.arima(y_t)
forecast = predict(AutoArimaModel, 3)
output = matrix(forecast$pred, ncol = 1, byrow = F)
ym = data.matrix(y_t)
z = rbind(ym, output)
}
)
)

Related

How to create a single data frame with multiple vectors result of a loop operation?

I have a .wav file and want to get power spectrums for successive no overlapping time windows.
The data of the power spectrum is obtained with the next function, once seewave and tuneR libraries are loaded:
n <- 0:1
sound1 <- readWave("D:\\sound.wav")
result <- do.call(cbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,wl=16,plot=FALSE)))
result1 <- data.frame(result)
The ouput will be
structure(list(x = c(0, 2.75625, 5.5125, 8.26875, 11.025, 13.78125,
16.5375, 19.29375), y = c(1, 0.551383594277632, 0.0742584974502194,
0.0399059818168578, 0.0218500553648978, 0.0176655910374274,
0.00904887363707214,
0.00333698474894753), x.1 = c(0, 2.75625, 5.5125, 8.26875, 11.025,
13.78125, 16.5375, 19.29375), y.1 = c(1, 0.558106398109396,
0.145460335046358,
0.0804097312947365, 0.0476025570412434, 0.0393549921764155,
0.0203584314573552,
0.00737927765210362)), class = "data.frame", row.names = c(NA,
But in the resultant df I only need y and y.1 but no x and x.1. As you may notice x and 1.x have the same data and such iformation is redundant. In short: I only need y data.
Thankyou for your suggestions!
There are more than a few ways to do what you are talking about. I don't know the length of the vector you are talking about though or the way meanspec returns its data, so you will have to fill that in yourself
vec_length <- length(amplitude_vector)
wav_df <- data.frame(matrix(nrow = 0, ncol = vec_length + 1))
for(i in 0:(end-1)){
#Add relevant code to get the amplitude vector from the function below
amp_vec <- meanspec(sound1, from = i, to = i+1, plot = FALSE)...
wav_df <- rbind(wav_df,c(i,amp_vec))
}
colnames(wav_df) <- c("start-time",...)#Add in the other column names that you want
wav_df should then have the information you want.
You may use lapply -
n <- 0:9 #to end at 9-10;change as per your preference
Sound1 <- readWave("D:\\Sound.wav")
result <- do.call(rbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,plot=FALSE)))
result
#to get dataframe as output
#result <- data.frame(result)

Accessing a variable in a data frame by columns number in R?

I have a data frame as "df" and 41 variables var1 to var41. If I write this command
pcdtest(plm(var1~ 1 , data = df, model = "pooling"))[[1]]
I can see the test value. But I need to apply this test 41 times. I want to access variable by column number which is "df[1]" for "var1" and "df[41]" for "var41"
pcdtest(plm(df[1]~ 1 , data = dfp, model = "pooling"))[[1]]
But it fails. Could you please help me to do this? I will have result in for loop. And I will calculate the descriptive statistics for all the results. But it is very difficult to do test for each variable.
I think you can easily adapt the following code to your data. Since you didn't provide any of your data, I used data that comes with the plm package.
library(plm) # for pcdtest
# example data from plm package
data("Cigar" , package = "plm")
Cigar[ , "fact1"] <- c(0,1)
Cigar[ , "fact2"] <- c(1,0)
Cigar.p <- pdata.frame(Cigar)
# example for one column
p_model <- plm(formula = pop~1, data = Cigar.p, model = "pooling")
pcdtest(p_model)[[1]]
# run through multiple models
l_plm_models <- list() # store plm models in this list
l_tests <- list() # store testresults in this list
for(i in 3:ncol(Cigar.p)){ # start in the third column, since the first two are state and year
fmla <- as.formula(paste(names(Cigar.p)[i], '~ 1', sep = ""))
l_plm_models[[i]] <- plm(formula = as.formula(paste0(colnames(Cigar.p)[i], "~ 1", sep = "")),
data = Cigar.p,
model = "pooling")
l_tests[[i]] <- pcdtest(l_plm_models[[i]])[[1]]
}
testresult <- data.frame("z" = unlist(l_tests), row.names = (colnames(Cigar.p[3:11])))
> testresult
z
price 175.36476
pop 130.45774
pop16 155.29092
cpi 176.21010
ndi 175.51938
sales 99.02973
pimin 175.74600
fact1 176.21010
fact2 176.21010
# example for cipstest
matrix_results <- matrix(NA, nrow = 11, ncol = 2) # use 41 here for your df
l_ctest <- list()
for(i in 3:ncol(Cigar.p)){
l_ctest[[i]] <- cipstest(Cigar.p[, i], lags = 4, type = 'none', model = 'cmg', truncated = F)
matrix_results[i, 1] <- as.numeric(l_ctest[[i]][1])
matrix_results[i, 2] <- as.numeric(l_ctest[[i]][7])
}
res <- data.frame(matrix_results)
names(res) <- c('cips-statistic', 'p-value')
print(res)
Try using as.formula(), for example:
results <- list()
for (i in 1:41){
varName <- paste0('var',i)
frml <- paste0(varName, ' ~ 1')
results[[i]] <-
pcdtest(plm(as.formula(frml) , data = dfp, model = "pooling"))[[1]]
}
You can use reformulate to create the formula and apply the code for 41 times using lapply :
var <- paste0('var', 1:41)
result <- lapply(var, function(x) pcdtest(plm(reformulate('1', x),
data = df, model = "pooling"))[[1]])

Combinef in R HTS package - Original Level Names lost

When combining forecasts using combinef() the resulting hts time series has level names which are like A, AA, AB... It doesn't retain the level names from the supplied hts time series.
In the example below, bottom level names are "A10A" & "A10B", while the resulting bottom names are "AA","BB".
Is there a way to retain the original level names in the combined forecast object?
library(forecast)
library(hts)
abc <- ts(5 + matrix(sort(rnorm(20)), ncol = 2, nrow = 10))
colnames(abc) <- c("A10A", "A10B")
y <- hts(abc, characters = c(3, 1))
h <- 12
ally <- aggts(y)
allf <- matrix(NA, nrow = h, ncol = ncol(ally))
for(i in 1:ncol(ally))
allf[,i] <- forecast(auto.arima(ally[,i]), h = h)$mean
allf <- ts(allf)
y.f <- combinef(allf, get_nodes(y), weights = NULL, keep = "gts", algorithms = "lu")
At the end of your code, add the following lines:
colnames(allf) <- colnames(ally)
colnames(y.f$bts) <- colnames(y$bts)
y.f$nodes <- y$nodes
y.f$labels <- y$labels

MHSMM package R input data format with multiple variables

my problem is similar to the question as followingthe problem of R-input Format
I have tried the above code in the above link and revised some part to suit my data. my data is like follow
I want my data can be created as a data frame with 4 variable vectors. The code what I have revised is
formatMhsmm <- function(data){
nb.sequences = nrow(data)
nb.variables = ncol(data)
data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
# iterate over these in loops
rows <- 1: nb.sequences
# build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
id[i] = data_df[i,2]
}
# build vector with time value
time = numeric (length = nb.sequences)
for( i in rows)
{
time[i] = data_df[i,3]
}
# build vector with observation values
sequences = numeric(length = nb.sequences)
for(i in rows)
{
sequences[i] = data_df[i, 4]
}
data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
N <- as.numeric(table(data.df$id))
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)
The output observation is not the data of 4th col, it's a list of (4, 8, 12,...,396, 1, 1, ..., 56, 192,...,6550, 68, NA, NA,...) It has picked up 1/4 data of each col. Why it is like this?
Thank you very much!!!!
Why don't you simply count yout observations by Id, and create the hsmm.data object directly? Supposing yout dataframe is called "data", we have:
N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"
Extracted from http://www.jstatsoft.org/v39/i04/paper

How to extract the p.value and estimate from cor.test() in a data.frame?

In this example, I have temperatures values from 50 different sites, and I would like to correlate the Site1 with all the 50 sites. But I want to extract only the components "p.value" and "estimate" generated with the function cor.test() in a data.frame into two different columns.
I have done my attempt and it works, but I don't know how!
For that reason I would like to know how can I simplify my code, because the problem is that I have to run two times a Loop "for" to get my results.
Here is my example:
# Temperature data
data <- matrix(rnorm(500, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
# Extraction
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[,2:3] <- df1[c("estimate", "p.value")]
}
for (i in 1:50) {
df1 <- cor.test(data[,1], data[,i] )
df[i,2:3] <- df1[c("estimate", "p.value")]
}
df
I will appreciate very much your help :)
I might offer up the following as well (masking the loops):
result <- do.call(rbind,lapply(2:50, function(x) {
cor.result<-cor.test(data[,1],data[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
})
)
First of all, I'm guessing you had a typo in your code (you should have rnorm(5000 if you want unique values. Otherwise you're going to cycle through those 500 numbers 10 times.
Anyway, a simple way of doing this would be:
data <- matrix(rnorm(5000, 10:30, sd=5), nrow = 100, ncol = 50, byrow = TRUE,
dimnames = list(c(paste("Year", 1:100)),
c(paste("Site", 1:50))) )
# Empty data.frame
df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="")
estimates = numeric(50)
pvalues = numeric(50)
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
}
df$Estimate <- estimates
df$P.value <- pvalues
df
Edit: I believe your issue was is that in the line df <- data.frame(label=paste("Site", 1:50), Estimate="", P.value="") if you do typeof(df$Estimate), you see it's expecting an integer, and typeof(test$estimate) shows it spits out a double, so R doesn't know what you're trying to do with those two values. you can redo your code like thus:
df <- data.frame(label=paste("Site", 1:50), Estimate=numeric(50), P.value=numeric(50))
for (i in 1:50){
test <- cor.test(data[,1], data[,i])
df$Estimate[i] = test$estimate
df$P.value[i] = test$p.value
}
to make it a little more concise.
similar to the answer of colemand77:
create a cor function:
cor_fun <- function(x, y, method){
tmp <- cor.test(x, y, method= method)
cbind(r=tmp$estimate, p=tmp$p.value) }
apply through the data.frame. You can transpose the result to get p and r by row:
t(apply(data, 2, cor_fun, data[, 1], "spearman"))

Resources