Time Series(ts) in R doesn't take additional input variables - r

I am using the ts function for time series prediction in R. I tried it using SSAS(SQL) and it gives pretty good analysis for my data set. But when I try it in R it is like I can't pass many input variables.This is the function which I passed.
m<-ts(myt$amount, start = c(2010,1), end = c(2016,12),frequency = 12)
Can anyone tell me that where can I pass other input variables. As example, in my case I predict future amounts using monthly time data set. But I have other additional variables like Sales_Cateory, Sales_subcategory,etc which can be used as input variables.
I tried passing them as last parameters but didn't see any change for my results.

Related

Integration of Time series model of R in Tableau

I am trying to integrate Time series Model of R in Tableau and I am new to integration. Please help me in resolving below mentioned Error. Below is my code in tableau for integration with R. Calculation is Valid bur getting an error.
SCRIPT_REAL(
"library(forecast);
cln_count_ts <- ts(.arg1,frequency = 7);
arima.fit <- auto.arima(log10(cln_count_ts));
forecast_ts <- forecast(arima.fit, h =10);",
SUM([Count]))
Error : Error in auto.arima(log10(cln_count_ts)) : No suitable ARIMA model found
When Tableau calls R, Python, or another tool, it does so as a "table calc". That means it sends the external system one or more vectors as arguments and expects a single vector in response.
Depending on your data and calculation, you may want to send all your data to R in a single call, passing a very large vector, or call it several times with different vectors - say forecasting each region separately. Or even call R multiple times with many vectors of size one (aka scalars).
So with table calcs, you have other decisions to make beyond just choosing the function to invoke. Chiefly, you have to decide how to partition your data for analysis. And in some cases, you also need to determine the order that the data appears in the vectors you send to R - say if the order implies a time series.
The Tableau terms for specifying how to divide and order data for table calculations are "partitioning and addressing". See the section on that topic in the online help. You can change those settings by using the "Edit Table Calc" menu item.

Exclude specific tensors being updated by optimizer in TensorFlow

I have two graphs, which I suppose to train them independently, which means I have two different optimizers, but at the same time one of them is using the tensor values of the other graph. As a result, I need to be able to stop specific tensors being updated while training one of the graphs. I have assigned two different namescopes two my tensors and using this code to control updates over tensors for different optimizers:
mentor_training_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "mentor")
train_op_mentor = mnist.training(loss_mentor, FLAGS.learning_rate, mentor_training_vars)
mentee_training_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "mentee")
train_op_mentee = mnist.training(loss_mentee, FLAGS.learning_rate, mentee_training_vars)
the vars variable is being used like below, in the training method of mnist object:
def training(loss, learning_rate, var_list):
# Add a scalar summary for the snapshot loss.
tf.summary.scalar('loss', loss)
# Create the gradient descent optimizer with the given learning rate.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Create a variable to track the global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
# Use the optimizer to apply the gradients that minimize the loss
# (and also increment the global step counter) as a single training step.
train_op = optimizer.minimize(loss, global_step=global_step, var_list=var_list)
return train_op
I'm using the var_list attribute of the optimizer class in order to control vars being updated by the optimizer.
Right now I'm confused whether I have done what I supposed to do appropriately, and even if there is anyway to check if any optimizer would only update partial of a graph?
I would appreciate if anyone can help me with this issue.
Thanks!
I have had a similar problem and used the same approach as you, i.e. via the var_list argument of the optimizer. I then checked whether the variables not intended for training stayed the same using:
the_var_np = sess.run(tf.get_default_graph().get_tensor_by_name('the_var:0'))
assert np.equal(the_var_np, pretrained_weights['the_var']).all()
pretrained_weights is a dictionary returned by np.load('some_file.npz') which I used to store the pre-trained weights to disk.
Just in case you need that as well, here is how you can override a tensor with a given value:
value = pretrained_weights['the_var']
variable = tf.get_default_graph().get_tensor_by_name('the_var:0')
sess.run(tf.assign(variable, value))

How can I export a Time Series model in R?

Is there a standard (or available) way to export a Time Series model in R? PMML would work, but when I I try to use the pmml library, perhaps incorrectly, I get an error:
For example, my code looks similar to this:
require(fpp)
library(forecast)
library(pmml)
data <- ts(livestock, start = 1970, end = 2000,frequency=3)
model <- ses(data , h=10 )
export <- pmml(model)
And the error I get is:
Error in UseMethod("pmml") : no applicable method for 'pmml' applied to an object of class "forecast"
Here is what I can tell:
When you use ses(), you're not creating a model; you're using a model to find a prediction (in particular, making a forecast via exponential smoothing for a time series). Your result is not a predictive model, but rather a particular prediction of a model for a particular data set. While I'm not that familiar with PMML, from what I can tell, it's not meant for the job you are trying to use it for.
If you want to export the time series and the result, I would say your best bet would be to just export a .csv file with the data; just about anything can read .csv's. A ts object is nothing more than a glorified vector, so you can export the data and the times. Additionally, model is just a table with data. So try this:
write.csv(model, file="forecast.csv")
If you want to write the ts object, try one of the following:
write.csv(data, file="ts1.csv") # No dates for index
write.csv(cbind("time" = time(data), "val" = data), file = "ts2.csv") # Adds dates

Taylor diagram from existing Correlation and Standard Dev values

Is it possible to create a Taylor diagram from already calculated correlation and standard deviation values?
I am doing model evaluation, and I have already the correlation and standard deviations values.I understand that there is already a package plotrix where by giving the observation and the modeled values, the diagram is created. However for the type of work that I am doing, it is easier to start by giving already the correlation and standard deviation values.
Is there any way I can do this in R?
There's no reason it shouldn't be possible, but the authors didn't seem to allow for that when they wrote the function. The function is a bit long and complex, but the part that does the calculation is at the top. It is possible to swap out that code and replace it to allow for the passing of summary statistics. Now, keep in mind what i'm about to do is a hack and i've only tested it with versions 3.5-5 of plotrix. Other version may not work.
Here will will create a new function taylor.diagram2 that takes all the code from taylor.diagram but adds in an extra if statement to check for a list of summarized data as the first argument
taylor.diagram2<-taylor.diagram
bl<-as.list(body(taylor.diagram))
cond<-list(
as.name("if"),
quote(is.list(ref) & missing(model)), #condition
quote({R<-ref$R; sd.r<-ref$sd.r; sd.f<-ref$sd.f}), #if true
as.call(c(as.symbol("{"), bl[3:8]))) #else
bl<-c(bl[1:2], as.call(cond), bl[9:length(bl)]) #splice in new code
body(taylor.diagram2)<-as.call(bl) #update function
Now we can test the function. First, we'll do things the standard way
#test data
aref<-rnorm(30,sd=2)
amodel1<-aref+rnorm(30)/2
#standard behavior function
taylor.diagram2(aref,amodel1, main="Standard Behavior"))
#summarized data
xx<-list(
R=cor(aref, amodel1, use = "pairwise"),
sd.r=sd(aref),
sd.f=sd(amodel1)
)
#modified behavior
taylor.diagram2(xx, main="Modified Behavior")
So the new taylor.diagram2 function can do both. If you pass it two vectors, it will do the standard behavior. If you pass it a list with the names R, sd.r, and sd.f, then it will do the same plot but with the values you passed in. Also, the model parameter must be empty for the modified version to work. That means if you want to set any additional parameter, you must use named parameters rather than positional arguments.

Proc expand in R

I am working on converting a SAS code into R and since I am relatively new to SAS I am having trouble understanding the following code snippet -
proc expand data=A out=B;
by number beg_date;
id date;
convert alpha1=calpha1/transformout=(+1 cuprod -1);
convert alpha2=calpha2/transformout=(+1 cuprod -1);
convert alpha3=calpha3/transformout=(+1 cuprod -1);
run;
I understand expand is used for expanding time series data like from monthly to quaterly or contract them. But what are the by and id statements for?
From referring to SAS Support, I believe that the BY statement is used to specify the variables so that the cumulative product is calculated for a group of that variable. As for the ID statement, I understand that it is a key to identify the observations.Can anyone tell me if my understanding is correct? Do I used the transform command in R for this purpose?
I don't have SAS license so I cannot try this out on a sample data and understand the output. Similarly, I don't have a raw data set to work on.
From your code snippet, it seems like this proc expand is going to create three variables calpha1, calpha2 and calpha3. cuprod is one of the options in proc expand that is going to output the cumulative product. So this is going to find the product of all alpha1, alpha2 and alpha3 within every beg_date group that was sorted like that in by statement. I believe that there should have been a proc sort before the proc expand for the use of the by statement.
Regarding the ID statement, it seems like original writer didn't want to use the default time settings of proc expand. Thus, by specifying date variable in id statement, the calculations would be based on the points in time given from date.
http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#etsug_expand_sect008.htm

Resources