Joining two time-series in same function - r

I'm trying to join two function results in one and trying to obtain results in one column.
My code
myfun <-function(x){
fit <-Arima(x, order = c(1,1,1), seasonal = list(order = c(0,1,0), period = 52),include.mean=TRUE,
include.constant = FALSE, method = 'CSS')
fit_a <- forecast(fit$fitted)
fit_a <- data.frame(fit_a$fitted)
colnames(fit_a)[1] <- "load"
fit_a$load <- as.data.frame(fit_a$load)
fit_b <- data.frame(forecast(fit,h=400))
fit_b <- data.frame(fit_b$Point.Forecast)
colnames(fit_b)[1] <- "load"
fit_b$load <- as.data.frame(fit_b$load)
return(rbind(fit_a,fit_b))
}
I'm getting values individually like return(fit_a) and return(fit_b) but while doing rbind() I can't because of individual time-series data.
Tried : c(fit_a,fit_b) showing two different ts( which confirms we are having output and just failing over rbind()).
Can someone help me, how to extract both fitted and forecasted values in same function.
Thanks in advance!

One way to return more than one object is to create a list of the objects, then return that. In your case you could use this at the end of your function:
fit <- list(fit_a, fit_b)
return(fit)
Then you can access the elements using fit[[1]] or fit[[2]].
You also have the option of naming the elements so you can access them using the $, like so:
fit <- list(fit_a = fit_a, fit_b = fit_b)
return(fit)
Then you can use fit$fit_a and fit$fit_b

Related

What does "invalid type (closure) for variable 'variable1'" mean and how do I fix it?

I am trying to write a function in R, which contains a function from another package. The code works perfectly outside a function.
I am guessing, it might have got to do something with the package I am using (survey).
A self-contained code example:
#activating the package
library(survey)
#getting the dataset into R
tm <- read.spss("tm.sav", to.data.frame = T, max.value.labels = 5)
# creating svydesign object (it basically contains the weights to adjust the variables (~persgew: also a column variable contained in the tm-dataset))
tm_w <- svydesign(ids=~0, weights = ~persgew, data = tm)
#getting overview of the welle-variable
#this variable is part of the tm-dataset. it is needed to execute the following steps
table(tm$welle)
# data manipulation as in: taking the v12d_gr-variable as well as the welle-variable and the svydesign-object to create a longitudinal variable which is transformed into a data frame that can be passed to ggplot
t <- svytable(~v12d_gr+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
v12d <- tt[2,]
v12d <- as.data.frame(v12d)
this is the code outside the function, working perfectly. since I have to transform quite a few variables in the exact same way, I aim to create a function to save up some time.
The following function is supposed to take a variable that will be transformed as an argument (v12sd2_gr).
#making sure the survey-object is loaded
tm_w <- svydesign(ids=~0, weights = ~persgew, data = data)
#trying to write a function containing the code from above
ltd_zsw <- function(variable1){
t <- svytable(~variable1+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
var_ltd_zsw <- tt[2,]
var_ltd_zsw <- as.data.frame(var_ltd_zsw)
return(var_ltd_zsw)
}
Calling the function:
#as v12d has been altered already, I am trying to transform another variable v12sd2_gr
v12sd2 <- ltd_zsw(v12sd2_gr)
Console output:
Error in model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design)) :
invalid type (closure) for variable 'variable1'
Called from: model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design))
How do I fix it? And what does it mean to dynamically build a formula and reformulating?
PS: I hope it is the appropriate way to answer to the feedback in the comments.
Update: I think I was able to trace the problem back to the argument I am passing (variable1) and I am guessing it has got something to do with the fact, that I try to call a formula within the function. But when I try to call the svytable with as.formula(svytable(~variable1+welle, tm_w))it still doesn't work.
What to do?
I have found a solution to the problem.
Here is the tested and working function:
ltd_test <- function (var, x, string1="con", string2="pro") {
print (table (var))
x$w12d_gr <- ifelse(as.numeric(var)>2,1,0)
x$w12d_gr <- factor(x$w12d_gr, levels = c(0,1), labels = c(string1,string2))
print (table (x$w12d_gr))
x_w <- svydesign(ids=~0, weights = ~persgew, data = x)
t <- svytable(~w12d_gr+welle, x_w)
tt <- round(prop.table(t,2)*100, digits=0)
w12d <- tt[2,]
w12d <- as.data.frame(w12d)
}
The problem appeared to be caused by the svydesgin()-fun. In its output it produces an object which is then used by the formula for svytable()-fun. Thats why it is imperative to first create the x_w-object with svydesgin() and then use the svytable()-fun to create the t-object.
Within the code snippet I posted originally in the question the tm_w-object has been created and stored globally.
Thanks for the help to everyone. I hope this is gonna be of use to someone one day!

How to use a for loop for the svyttest function in the survey package?

I am trying to use the svyttest function in a for loop in the survey package. I want to test for differences in proportions of responses between subpopulations in likert-scale type data. For example, in a survey question (1=strongly disagree, 5 = strongly agree), are there statistically significant differences in the proportion of "strongly disagree" responses between Groups 1 and 2?
I understand that I can also use the svyglm function from the survey package, but I have been unable to successfully use that in a for loop.
I also understand that there is a wtd.t.test in the weights package and the glm function in the stats package has a weights argument, but neither of these two options get the correct results. I need to use either the svyttest or the svyglm functions in the survey package.
For reference I have been looking
here and here for some help but have been unable to adapt these examples to my problem.
Thank you for your time and effort.
# create example survey data
ids <- 1:1000
stratas <- rep(c("strata1", "strata2","strata3","strata4"), each=250)
weight <- rep(c(5,2,1,1), each=250)
group <- rep(c(1,2), times=500)
q1 <- sample(1:5, 1000, replace = TRUE)
survey_data <- data.frame(ids, stratas, weight, group, q1)
# create example svydesign
library(survey)
survey_design <- svydesign(ids = ~0,
probs = NULL,
strata = survey_data$stratas,
weights = survey_data$weight,
data = survey_data)
# look at the proportions of q1 responses by group
prop.table(svytable(~q1+group, design = survey_design), margin = 2)
# t-test for significant differences in the proportions of the first item in q1
svyttest(q1== 1 ~ group, design = survey_design)
# trying a for loop for all five items
for(i in c(1:5)){
print(svyttest(q1== i ~ group, design = survey_design))
}
# I receive the following error:
Error in svyglm.survey.design(formula, design, family = gaussian()) :
all variables must be in design= argument
When dynamically updating a formula inside a function or a loop you need to invoke the as.formula() function to preserve the attributes of objects as variables. This should work:
# trying a for loop for all five items
for(i in c(1:5)){
print(svyttest(as.formula(paste("q1==", i, "~group")),
design = survey_design))
}
I tried some trick, you can use array, which you can use for your loop:
x=c()
for(i in c(1:5)){
x=append(x,as.formula(paste("q1==",i,"~ group")))
print(svyttest(x[[i]], design = survey_design))
}
With regards
Aleksei
I would use bquote
for(i in 1:5){
print(eval(
bquote(svyttest(q1== .(i) ~ group, design = survey_design))
))
}
In this example as.formula works just as well, but bquote is more general.

"Input datasets must be dataframes" error in kamila package in R

I have a mixed type data set, one continuous variable, and eight categorical variables, so I wanted to try kamila clustering. It gives me an error when I use one continuous variable, but when I use two continuous variables it is working.
library(kamila)
data <- read.csv("mixed.csv",header=FALSE,sep=";")
conInd <- 9
conVars <- data[,conInd]
conVars <- data.frame(scale(conVars))
catVarsFac <- data[,c(1,2,3,4,5,6,7,8)]
catVarsFac[] <- lapply(catVarsFac, factor)
kamRes <- kamila(conVars, catVarsFac, numClust=5, numInit=10,calcNumClust = "ps",numPredStrCvRun = 10, predStrThresh = 0.5)
Error in kamila(conVar = conVar[testInd, ], catFactor =
catFactor[testInd, : Input datasets must be dataframes
I think the problem is that the function assumes that you have at least two of both data types (i.e. >= 2 continuous variables, and >= 2 categorical variables). It looks like you supplied a single column index (conInd = 9, just column 9), so you have only one continuous variable in your data. Try adding another continuous variable to your continuous data.
I had the same problem (with categoricals) and this approach fixed it for me.
I think the ultimate source of the error in the program is at around line 170 of the source code. Here's the relevant snippet...
numObs <- nrow(conVar)
numInTest <- floor(numObs/2)
for (cvRun in 1:numPredStrCvRun) {
for (ithNcInd in 1:length(numClust)) {
testInd <- sample(numObs, size = numInTest, replace = FALSE)
testClust <- kamila(conVar = conVar[testInd,],
catFactor = catFactor[testInd, ],
numClust = numClust[ithNcInd],
numInit = numInit, conWeights = conWeights,
catWeights = catWeights, maxIter = maxIter,
conInitMethod = conInitMethod, catBw = catBw,
verbose = FALSE)
When the code partitions your data into a training set, it's selecting rows from a one-column data.frame, but that returns a vector by default in that case. So you end up with "not a data.frame" even though you did supply a data.frame. That's where the error comes from.
If you can't dig up another variable to add to your data, you could edit the code such that the calls to kamila in the cvRun for loop wrap the data.frame() function around any subsetted conVar or catFactor, e.g.
testClust <- kamila(conVar = data.frame(conVar[testInd,]),
catFactor = data.frame(catFactor[testInd,], ... )
and just save that as your own version of the function called say, my_kamila, which you could use instead.
Hope this helps.

Calculated values on imputed data

I'd like to do something like the following: (myData is a data table)
#create some data
myData = data.table(invisible.covariate=rnorm(50),
visible.covariate=rnorm(50),
category=factor(sample(1:3,50, replace=TRUE)),
treatment=sample(0:1,50, replace=TRUE))
myData[,outcome:=invisible.covariate+visible.covariate+treatment*as.integer(category)]
myData[,invisible.covariate:=NULL]
#process it
myData[treatment == 0,untreated.outcome:=outcome]
myData[treatment == 1,treated.outcome:=outcome]
myPredictors = matrix(0,ncol(myData),ncol(myData))
myPredictors[5,] = c(1,1,0,0,0,0)
myPredictors[6,] = c(1,1,0,0,0,0)
myImp = mice(myData,predictorMatrix=myPredictors)
fit1 = with(myImp, lm(treated.outcome ~ category)) #this works fine
for_each_imputed_dataset(myImp, #THIS IS NOT A REAL FUNCTION but I hope you get the idea
function(imputed_data_table) {
imputed_data_table[,treatment.effect:=treated.outcome-untreated.outcome]
})
fit2 = with(myImp, lm(treatment.effect ~ category))
#I want fit2 to be an object similar to fit1
...
I would like to add a calculated value to each imputed data set, then do statistics using that calculated value. Obviously the structure above is probably not how you'd do it. I'd be happy with any solution, whether it involves preparing the data table somehow before the mice, a step before the "fit =" as sketched above, or some complex function inside the "with" call.
The complete() function will generate the "complete" imputed data set for each of the requested iterations. But note that mice expects to work with data.frames, so it returns data.frames and not data.tables. (Of course you can convert if you like). But here is one way to fit all those models
imp = mice(myData,predictorMatrix=predictors)
fits<-lapply(seq.int(imp$m), function(i) {
lm(I(treated.outcome-untreated.outcome)~category, complete(imp, i))
})
fits
The results will be in a list and you can extract particular lm objects via fits[[1]], fits[[2]], etc

plyr + forecasting multiple regressors

Taking the content in this thread a bit further: I've gone as far as I can, but finally hit a wall. I'm looking to use PLYR to create some ARIMA models with exogenous regressors at scale. A high-level overview of the process I've been using (code with example data follows)
1) I have a dataframe with businesses, regions, revenue and orders, all by date
2) For each combination of business + region, I want to create a forecast for revenue based on previous values of revenue + previous values of orders.
3) I want to use an ARIMA model (using auto.arima() ) to figure out optimal orders for both revenue and orders, then apply that information to a forecast function
4) The problem I run into seems to boil down to not being able to pass multiple lists to a PLYR argument to operate on, which most likely in turn boils down to my not fully understanding how llply works (so hopefully this is an easy task)
Here's some sample data I'm working off:
library(plyr)
library(xts)
library(forecast)
data <- data.frame(
biz = sample(c("telco","shipping","tech"), 100, replace = TRUE),
region = sample(c("mideast","americas","asia"), 100, replace = TRUE),
date = rep(seq(as.Date("2010-02-01"), length=10, by = "1 day"),10),
revenue = sample(1:100),
orders = sample(1:100)
)
Edit: First, reorganize data through ddply to get rid of duplicate entries:
dataframe <- ddply(data, c("biz","region","date"), function(df) {
c(revenue = sum(df[,4]),
orders = sum(df[,5]))
})
Step 1: Create a list that contains the time series info for each combination of business + region:
list1 <- dlply(dataframe, .(biz,region), identity)
Step 2: Turn that list into an XTS object so we can use it for time-series analysis:
xtsobject <- llply(list1, function(list) {
xts(x=list[,c("revenue","orders")], order.by=list[,"date"])
})
Here's where I run into trouble. I want to make a list of orders from the auto.arima() function to pass into a forecast.Arima() function. This would be straightforward if I were just doing one variable with no exogenous regressors:
arimamodel1 <- llply(xtsobject, function(list) {
fity <- auto.arima(list$revenue)
})
And then I would apply that list to the forecast.Arima() function:
forecast1 <- llply(arimamodel1, function(model) {
forecast.Arima(model, h=2)
})
That comes out fine. I've tried changing the argument to include some room for the extra regressors, but I'm not sure the forecasts are actually pulling in the x values:
arimamodel2 <- llply(xtstest, function(list) {
fity <- auto.arima(list$revenue, xreg=list$orders)
fitx <- auto.arima(list$orders)
})
and the forecasts:
forecast2 <- llply(arimamodel2, function(model) {
forecast.Arima(model, h=2)
})
... But it seems like in the forecast function, I should be doing something to account for the x regressor model in the way I normally use forecast.Arima() with multiple regressors; something like:
forecast.Arima(model,h=2, xreg=forecast(model,h=2)$mean)
But this doesn't work. Does anybody have any insight into how to use PLYR to make forecasts based on auto.arima() for multiple regressors?
I'm pretty sure I figured this out, in case anybody stumbles on to this question. It's just a matter of making a function that passes through all these arguments, then passing that function through lapply or llply (the data in the question won't work for auto.arima because of the way it was created, but it works on the actual data I'm using):
arimafunc <- function(list) {
fity <- auto.arima(list$revenue, xreg=list$orders)
fitx <- auto.arima(list$orders)
forecast <- forecast.Arima(fity,h=2,xreg=forecast(fitx,h=2)$mean)
return(forecast)
}
then pass through the list apply:
forecasts <- lapply(xtsobject,FUN=arimafunc)
I'm sure there's a way to do this using built-in functionality of something like llply or from one of the base commands, mapply, but this works for now...

Resources