Choosing the Length of Time Steps in Recurrent Neural Network - recurrent-neural-network

I have a time-series regression problem with a single predictor and a real-valued output, and I would like to use LSTM recurrent neural network to model the data. How should I choose the number of time steps in my model? Is there any upper limit for the length of LSTM layer?

LSTMs can be challenging to use when you have very long input sequences and only one or a handful of outputs. A reasonable limit of 250-500 time steps is often used in practice with large LSTM models.

How should I choose the number of time steps in my model?
It entirely depends on the task at hand, in short the time series frequency determines this if the data you have is at the following interval minutes then 60 , hours then 24 ,month then 12 and so on.
Simply stating At what level you need your predictions
Is there any upper limit for the length of LSTM layer?
It depends on the quantum of data .
A reasonable limit of 250-500 time steps in good to go. Long input sequences may result in vanishing gradients, and in turn, an unlearnable model

Related

How to use multilayer neural network to predict next day energey consumption in r

The objective of this question is to use a multilayer neural network (MLP-NN) to predict the next
step-ahead (i.e. next day) electricity consumption for the 11:00 hour case. The first 430 samples will be used as
the training data, while the remaining ones will be used as the testing set.
image of the energy consumption data. have 501 of these
Ok so i have no idea where to start. How do i determine the inputs? I have to use autoregressive model. Help
As you know, your data represents a time series.
An MLP cannot be used successfully for this type of task.
There are other types of networks for Sequence Learning, including the so-called Vanilla RNNs.
I recommend that you take a look at this link to better understand how they work.

How can Keras predict sequences of sales (individually) of 11106 distinct customers, each a series of varying length (anyway from 1 to 15 periods)

I am approaching a problem that Keras must offer an excellent solution for, but I am having problems developing an approach (because I am such a neophyte concerning anything for deep learning). I have sales data. It contains 11106 distinct customers, each with its time series of purchases, of varying length (anyway from 1 to 15 periods).
I want to develop a single model to predict each customer's purchase amount for the next period. I like the idea of an LSTM, but clearly, I cannot make one for each customer; even if I tried, there would not be enough data for an LSTM in any case---the longest individual time series only has 15 periods.
I have used types of Markov chains, clustering, and regression in the past to model this kind of data. I am asking the question here, though, about what type of model in Keras is suited to this type of prediction. A complication is that all customers can be clustered by their overall patterns. Some belong together based on similarity; others do not; e.g., some customers spend with patterns like $100-$100-$100, others like $100-$100-$1000-$10000, and so on.
Can anyone point me to a type of sequential model supported by Keras that might handle this well? Thank you.
I am trying to achieve this in R. Haven't been able to build a model that gives me more than about .3 accuracy.
I don't think the main difficulty is coming from which model to use as much as how to frame the problem.
As you mention, "WHO" is spending the money seems as relevant as their past transaction in knowing how much they will likely spend.
But you cannot train 10k+ models either for each customers.
Instead I would suggest clustering your customers base, and instead trying to fit a model by cluster, using all the time series combined for the customers in that cluster to train the same model.
This would allow each model to learn the spending pattern of that particular group.
For that you can use LTSM or RNN model.
Hi here's my suggestion and I will edit it later to provide you with more information
Since its a sequence problem you should use RNN based models: LSTM, GRU's

Is there a numerical method for approaching the first derivative at t = 0 s in a real-time application?

I want to improve step-by-step, whilst unevenly-sampled data are coming, the value of the first derivative at t = 0 s. For example, if you want to find the initial velocity in a projectile's motion, but you do not know its final position and velocity, however, you are receiving (slowly) the measurements of the projectile's current position and time.
Update - 26 Aug 2018
I would like to give you more details:
"Unevenly-sampled data" means the time intervals are not regular (irregular times between successive measurements). However, data have almost the same sampling frequency, i.e., it is about 15 min. Thus, there are some measurements without changes, because of the nature of the phenomenon (heat transfer). It gives an exponential tendency and I can fit data to a known model, but an important amount of information is required. For practical purposes, I only need to know the value of the very first slope for the whole process.
I tried a progresive Weighted Least Squares (WLS) fitting procedure, with a weight matrix such as
W = diag((0.5).^(1:kk)); % where kk is the last measurement id
But it was using preprocessed data (i.e., jitter-removing, smoothing, and fitting using the theoretical functional). I gave me the following result:
This is a real example of the problem and its "current solution"
It is good for me, but I would like to know if there is an optimal manner of doing that, but employing the raw data (or smoothed data).
IMO, additional data is not relevant to improve the estimate at zero. Because perturbations come into play and the correlation between the first and last samples goes decreasing.
Also, the asymptotic behavior of the phenomenon is probably not known rigorously (is it truly a first order linear model) ? And this can introduce a bias in the measurements.
I would stick to the first points (say up to t=20) and fit a simple model, say quadratic.
If in fact what you are trying to do is to fit a first order linear model to the data, then least-squares fitting on the raw data is fine. If there are significant outliers, robust fitting is preferable.

Merging Tree Models from two random forest models into one random forest model at H2O in R

I am relatively new to the machine learning ocean, please excuse me if some of my questions are really basic.
Current situation: The overall goal was trying to improve some code for h2o package in r running on the supercomputer cluster. However, since the data is too large that single node with h2o really takes more than a day, therefore, we have decided to use multiple nodes to run the model. I came up with an idea:
(1) Distribute each node to build (nTree/num_node) trees and saved into a model;
(2) running on the cluster at each node for (nTree/num_node) number of trees in the forest;
(3) Merging the trees back together and reform the original forest, and using the measurement results in average.
I later realized this could be risky. But I cannot find the actual support or against statement since I am not machine learning focused programmer.
Questions:
if this way of handling random forest will result in some risk, please reference me the link so I can have a basic idea why this is not right.
If this way is actually an "ok" way to do so. What should I be do to merge the trees, is there a package or method I can borrow from?
If this is actually a solved problem, please reference me the link, I may have searched the wrong keywords, and thank you!
The real number-involved example I can present here is:
I have a random forest task with 80k rows and 2k columns and wanted the number of trees are 64. What I have done is put 16 trees on each node running with the whole dataset, and each one of four nodes come up with an RF model. I am now trying to merge the trees from each model into this one big RF model and average the measurements (from each of those four models).
There is no need to merge the models. Unlike with boosting methods, every tree in a Random Forest is grown independently (just don't set the same seed prior to kicking off RF on each node!).
You are basically doing what Random Forest does on its own, which is to grow X independent trees and then average across the votes. Many packages provide an option to specify the number of cores or threads, in order to take advantage of this feature of RF.
In your case, since you have the same number of trees per node, you'll get 4 "models" back, but those are really just collections of 16 trees. To use it, I'd just keep the 4 models separate and when you want a prediction, average the prediction from each of the 4 models. Assuming you're going to be doing that more than once, you could write a small wrapper function to predict with the 4 models and average the output.
10,000 rows by 1,000 columns is not overly large and should not take that long to train an RF model.
It sound like something unexpected is happening.
While you can try to average models if you know what you are doing, I don't think it should be necessary in this case.

State-space model: measurement-driven steps?

I have a time series that seems to be well described by a univariate local level model (a changing bias in human visual perception, sampled at regular intervals). I have a hunch, however, that the underlying random walk is at least partly driven by the measurements themselves. In order to test this hypothesis, I measure two types of time series. The first type is measured at regular steps, every 12 minutes:
u(0), u(12), u(24), u(36), u(48), u(60), u(72), ...
The second type is measured at partly irregular intervals, alternating between every 6 or 18 minutes:
v(0), v(6), v(24), v(30), v(48), v(54), v(72), ...
Of course I could compare the 6- and 18-minute steps in the v-series: if they're no different, then nothing really happens to the random walk between the measurements. The trouble is that the measurement noise is large compared to the random walk steps.
Is there some more principled way I could test this hypothesis? For instance, could I fit a modified local-level model in which the even and odd random-walk steps have different variances, and compare the two variances?

Resources