After several reading up, I realise for my project I will need to use NAR because I am using history values to predict. I have a monthly data previous months. & I need to predict for the next 12 months.
T = tonndata(Target_Set1,false,false);
% Create a Nonlinear Autoregressive Network
feedbackDelays = 1:2;
hiddenLayerSize = 2;
net = narnet(feedbackDelays,hiddenLayerSize);
% Prepare the Data for Training and Simulation
% The function PREPARETS prepares timeseries data for a particular network,
% shifting time by the minimum amount to fill input states and layer states.
% Using PREPARETS allows you to keep your original time series data unchanged, while
% easily customizing it for networks with differing numbers of delays, with
% open loop or closed loop feedback modes.
[x,xi,ai,t] = preparets(net,{},{},T);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
% Train the Network
[net,tr] = train(net,x,t,xi,ai);
% Test the Network
y = net(x,xi,ai);
e = gsubtract(t,y);
performance = perform(net,t,y)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotresponse(t,y)
%figure, ploterrcorr(e)
%figure, plotinerrcorr(x,e)
% Closed Loop Network
% Use this network to do multi-step prediction.
% The function CLOSELOOP replaces the feedback input with a direct
% connection from the outout layer.
netc = closeloop(net);
[xc,xic,aic,tc] = preparets(netc,{},{},T);
yc = netc(xc,xic,aic);
perfc = perform(net,tc,yc)
% Step-Ahead Prediction Network
% For some applications it helps to get the prediction a timestep early.
% The original network returns predicted y(t+1) at the same time it is given y(t+1).
% For some applications such as decision making, it would help to have predicted
% y(t+1) once y(t) is available, but before the actual y(t+1) occurs.
% The network can be made to return its output a timestep early by removing one delay
% so that its minimal tap delay is now 0 instead of 1. The new network returns the
% same outputs as the original network, but outputs are shifted left one timestep.
nets = removedelay(net);
[xs,xis,ais,ts] = preparets(nets,{},{},T);
ys = nets(xs,xis,ais);
stepAheadPerformance = perform(net,ts,ys)
How do I plot the predicted values after training and validation.From the examples I have seen for NARXNET normally they will use the code below.
plot([cell2mat(T),nan(1,N); nan(1,length(T)),cell2mat(yPred);
nan(1,length(T)),cell2mat(targetSeriesVal)]') legend('Original
Targets','Network Predictions','Expected Outputs');
Thanks in advance.
you must use advanced script for generate code for forecasting future data
Related
The following code comes from this book, Statistics and Data Analysis For Financial Engineering, which describes how to generate simulation data of ARCH(1) model.
library(TSA)
library(tseries)
n = 10200
set.seed("7484")
e = rnorm(n)
a = e
y = e
sig2 = e^2
omega = 1
alpha = 0.55
phi = 0.8
mu = 0.1
omega/(1-alpha) ; sqrt(omega/(1-alpha))
for (t in 2:n){
a[t] = sqrt(sig2[t])*e[t]
y[t] = mu + phi*(y[t-1]-mu) + a[t]
sig2[t+1] = omega + alpha * a[t]^2
}
plot(e[10001:n],type="l",xlab="t",ylab=expression(epsilon),main="(a) white noise")
My question is that why we need to discard the first 10000 simulation?
========================================================
Bottom Line Up Front
Truncation is needed to deal with sampling bias introduced by the simulation model's initialization when the simulation output is a time series.
Details
Not all simulations require truncation of initial data. If a simulation produces independent observations, then no truncation is needed. The problem arises when the simulation output is a time series. Time series differ from independent data because their observations are serially correlated (also known as autocorrelated). For positive correlations, the result is similar to having inertia—observations which are near neighbors tend to be similar to each other. This characteristic interacts with the reality that computer simulations are programs, and all state variables need to be initialized to something. The initialization is usually to a convenient state, such as "empty and idle" for a queueing service model where nobody is in line and the server is available to immediately help the first customer. As a result, that first customer experiences zero wait time with probability 1, which is certainly not the case for the wait time of some customer k where k > 1. Here's where serial correlation kicks us in the pants. If the first customer always has a zero wait time, that affects some unknown quantity of subsequent customer's experiences. On average they tend to be below the long term average wait time, but gravitate more towards that long term average as k, the customer number, increases. How long this "initialization bias" lingers depends on both how atypical the initialization is relative to the long term behavior, and the magnitude and duration of the serial correlation structure of the time series.
The average of a set of values yields an unbiased estimate of the population mean only if they belong to the same population, i.e., if E[Xi] = μ, a constant, for all i. In the previous paragraph, we argued that this is not the case for time series with serial correlation that are generated starting from a convenient but atypical state. The solution is to remove some (unknown) quantity of observations from the beginning of the data so that the remaining data all have the same expected value. This issue was first identified by Richard Conway in a RAND Corporation memo in 1961, and published in refereed journals in 1963 - [R.W. Conway, "Some tactical problems on digital simulation", Manag. Sci. 10(1963)47–61]. How to determine an optimal truncation amount has been and remains an active area of research in the field of simulation. My personal preference is for a technique called MSER, developed by Prof. Pres White (University of Virginia). It treats the end of the data set as the most reliable in terms of unbiasedness, and works its way towards the front using a fairly simple measure to detect when adding observations closer to the front produces a significant deviation. You can find more details in this 2011 Winter Simulation Conference paper if you're interested. Note that the 10,000 you used may be overkill, or it may be insufficient, depending on the magnitude and duration of serial correlation effects for your particular model.
It turns out that serial correlation causes other problems in addition to the issue of initialization bias. It also has a significant effect on the standard error of estimates, as pointed out at the bottom of page 489 of the WSC2011 paper, so people who calculate the i.i.d. estimator s2/n can be off by orders of magnitude on the estimated width of confidence intervals for their simulation output.
Currently I am trying to do some experiments to try to determine thermal conductivity of my fluid which is ethanol.
To do so, I need to use the principle of TPS method which correspond to the kind of sensor I have.
I would like to plot on python ∆D(τ) function of τ and also, ∆T function of D(τ)
Basically, I have these formula which correspond to ∆D(τ).
formula ∆D(τ)
other variable
other variables
where Io is a modified Bessel function.
The paper that I am reading contains the following information that might help.
"From Eq X (D thau), we can see that the average temperature
increase in the hot disk sensor is proportional to a function
D(τ), which is a rather complicated function of a dimen-
sionless parameter τ = √κt/a, but, numerically, it can be
accurately evaluated to five or six significant figures.
When using the hot disk technique to determine thermal
transport properties, a constant electric current is supplied
to the sensor at time t = 0, then the temperature change of
the sensor is recorded as a function of time. The average
temperature increase across the hot disk sensor area can be
measured by monitoring the total resistance of the hot disk
sensor:
R = R0[1 + α ̄deltaT (t)], (28)
where R is the total electrical resistance at time t, R0 is the
initial resistance at t = 0, α is the temperature coefficient of
resistivity, which is well known for nickel. Eq. (28) allows us
to accurately determine ∆T as a function of time.
If one knows the relationship between t and τ, one can
plot ̄∆T as a function of D(τ), and a straight line should
be obtained. The slope of that line is P0/(π3/2aK), from
which thermal conductivity K can be calculated. However,
the proper value of τ is generally unknown, since τ = √κt/a
and the thermal diffusivity κ is unknown. To calculate the
thermal conductivity correctly, one normally makes a series
of computational plots of ∆T versus D(τ) for a range of κ
values. The correct value of κ will yield a straight line for the
∆T versus D(τ) plot. This optimization process can be done
by the software until an optimized value of κ is found. In
practice, we can measure the density and the specific heat of
the material separately, so that between K and κ, there is only
one independent parameter. Therefore, both thermal conduc-
tivity and thermal diffusivity of the sample can be obtained
from above procedure based on the transient measurement
using a hot disk sensor"
So if I understood I need to plot ∆T versus D(τ) from which for a certain value of characteristic time which would give me a straight line. However when I am trying to do so I will always obtain a straight line. the part that I'm not sure if the value of the modified bessel function. Please find attached my script .
from numpy import *
import scipy.special
from scipy.integrate import quad
from matplotlib.pyplot import *
def integer1(sigma):
return 1/(sigma**2)
tini = 0.015
tfin = 15
time = linspace(tini,tfin,num=1000)
n=7 # number of concentric circles of sensor
L = 1
L0 = np.i0(l*k)/(2*thau**2*n**2)
P0 = 0.1 #power
k = 1 #thermal diffusivity
a = 0.000958 # radius of biggest ring
λ = 0.169 #thermal conductivity of ethanol (im not sure if this is ok)
x=linspace(0.00000001,0.3,1000)
for K in range (0,len(x)):
# print (x[K])
theta = a**2/x[K]
Tlist = []
Dlist = []
for t in time:
thau = sqrt(t/theta)
som = 0
for l in range(L,n):
for k in range(1,n):
som += l*k*exp((-l**2+-k**2)/(4*thau**2*n**2))*L0
I = quad(integer1, 0, thau)
D = ((n*(n+1))**-2)*I[0]*som
T = (P0/(pi**(3/2)*a*λ))*D
Tlist.append(T)
Dlist.append(D)
figure(1)
plot(Dlist,Tlist)
show()
I am trying to the calculation from time 0,015 seconds until 15 seconds with 1000 points in total..0,015, 0,030, 0,045 and so on...
and I for my K I am going from values of 0.00000001 until 0.3 with 1000 points in total
The paper that I am looking at is called:
"Rapid thermal conductivity measurement with
a hot disk sensor. Part 1. Theoretical considerations"
I hope you could help with this one.
Thank you
I am struggling to find the correct API for releasing memory for an object created by the H2O grid. This code was pre-written by someone else and I am currently maintaining it.
#train grid search
gbm_grid1 <- h2o.grid(algorithm = "gbm" #specifies gbm algorithm is used
,grid_id = paste("gbm_grid1",current_date,sep="_") #defines a grid identification
,x = predictors #defines column variables to use as predictors
,y = y #specifies the response variable
,training_frame = train1 #specifies the training frame
#gbm parameters to remain fixed
,nfolds = 5 #specify number of folds for cross-validation is 5 (this acceptable here in order to reduce training time)
,distribution = "bernoulli" #specify that we are predicting a binary dependent variable
,ntrees = 1000 #specify the number of trees to build (1000 as essentially the maximum number of trees that can be built. Early stopping parameters defined later will make it unlikely our model will reach 1000 trees)
,learn_rate = 0.1 #specify the learn rate used of for gradient descent optimization (goal is to use as small a learn rate as possible)
,learn_rate_annealing = 0.995 #specifies that the learn rate will perpetually decrease by a factor of 0.995 (this can help speed up traing for our grid search)
,max_depth = tuned_max_depth
,min_rows = tuned_min_rows
,sample_rate = 0.8 #specify the amount of row observations used when making a split decision
,col_sample_rate = 0.8 #specify the amount of column observations used when making a split decision
,stopping_metric = "logloss" #specify loss function
,stopping_tolerance = 0.001 #specify minimum change required in stopping metric for individual model to continue training
,stopping_rounds = 5 #specify maximum amount of training rounds stopping metric has to change in excess of stopping tolerance
#specifies hyperparameters to fluctuate during model building in the grid search
,hyper_params = gbm_hp2
#specifies the search criteria that includes stop training etrics to speed up model building
,search_criteria = search_criteria2
#sets a reproducible seed
,seed = 123456
)
h2o.rm(gbm_grid1)
The problem is I believe this code was written awhile ago and has been deprecated since. h2o.rm(gbm_grid1) fails and R Studio tells me that I require a hex identifier. So I assigned my object an identifier and tried h2o.rm(gbm_grid1, "identifier.hex") and it tells me I cannot release this type of object.
The issue is I run out of memory if I move onto the next steps of the script. What should I do?
This is what I get with H2O.ls()
Yes, you can remove objects with h2o.rm(). You can use the variable name or key.
h2o.rm(your_object)
h2o.rm(‘your_key’)
You can use h2o.ls() to check what objects are in memory. Also, you can add the argument cascade = TRUE to the rm method to remove sub-models.
See more here
Some questions came to me as I read a paper 'Batch Normalization : Accelerating Deep Network Training by Reducing Internal Covariate Shift'.
In the paper, it says:
Since m examples from training data can estimate mean and variance of
all training data, we use mini-batch to train batch normalization
parameters.
My question is :
Are they choosing m examples and then fitting batch norm parameters concurrently, or choosing different set of m examples for each input dimension?
E.g. training set is composed of x(i) = (x1,x2,...,xn) : n-dimension
for fixed batch M = {x(1),x(2),...,x(N)}, perform fitting all gamma1~gamman and beta1~betan.
vs
For gamma_i, beta_i picking different batch M_i = {x(1)_i,...,x(m)_i}
I haven't found this question on cross-validated and data-science, so I can only answer it here. Feel free to migrate if necessary.
The mean and variance are computed for all dimensions in each mini-batch at once, using moving averages. Here's how it looks like in code in TF:
mean, variance = tf.nn.moments(incoming, axis)
update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, decay)
update_moving_variance = moving_averages.assign_moving_average(moving_variance, variance, decay)
with tf.control_dependencies([update_moving_mean, update_moving_variance]):
return tf.identity(mean), tf.identity(variance)
You shouldn't worry about technical details, here's what's going on:
First the mean and variance of the whole batch incoming are computed, along batch axis. Both of them are vectors (more precisely, tensors).
Then current values moving_mean and moving_variance are updated by an assign_moving_average call, which basically computes this: variable * decay + value * (1 - decay).
Every time batchnorm gets executed, it knows one current batch and some statistic of previous batches.
I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.