There is a wide set of examples for how to create various RNN architectures in Python with TensorFlow and Pytorch, and that includes the 1-to-many architecture. The question is how this can be done in FluxML with Julia Lang. With Keras in TensorFlow the return_sequences option to the RNN cell allows the states to be propagated, but from the documentation for FluxML https://fluxml.ai/Flux.jl/stable/models/recurrence/ , this does not seem to be implemented.
How should such an architecture be implemented in Flux ML?
When using an RNN unit in Chain such as Chain(rnn1,rnn2,rnn3) is Chain passing the output vectors (y) into the inputs of the following rnn unit or the hidden state (or both)?
How can the RNN or the Flux.RNNCell be used in different contexts and within the context of a Chain or model?
Assuming that the goal is to have a single input set at step 1 produce 2 y_hat outputs, so that the 1-to-many is a 1-to-2 recurrence network.
So the model input X dimension must have as many dimensions as the y_hat outputs so that the outputs at each cell become the new x inputs at the following subsequent step (hidden component is not altered directly). A possible model is (notice input dims and output dims are equal)
rnn_model = Chain( LSTM(feature_length=>12) , Dense( 12 => feature_length, sigmoid) , softmax )
In the training loop the gradient and loss can be found by aggregating the loss from each step where the unit is trying to predict the sequence step outcome. The key is that for 1-to-many the y_hat has to be directed as the subsequent x input in the following step so that y_hat1 becomes x_2. Here is a small example where the x_batch data is the first input data from a set of independent samples and y_tensor the target data for 2 steps indexed by the 3rd dim
Flux.reset!( rnn_model )
loss_tmp, grads = Flux.withgradient( rnn_model ) do model
loss = 0
y_hat1 = rnn_model( x_batch )
loss += Flux.crossentropy( y_hat1 , y_tensor[:,:,1] )
y_hat2 = rnn_model( y_hat1 )
loss += Flux.crossentropy( y_hat2 , y_tensor[:,:,2] )
return loss
end
Related
I am struggling to find the correct API for releasing memory for an object created by the H2O grid. This code was pre-written by someone else and I am currently maintaining it.
#train grid search
gbm_grid1 <- h2o.grid(algorithm = "gbm" #specifies gbm algorithm is used
,grid_id = paste("gbm_grid1",current_date,sep="_") #defines a grid identification
,x = predictors #defines column variables to use as predictors
,y = y #specifies the response variable
,training_frame = train1 #specifies the training frame
#gbm parameters to remain fixed
,nfolds = 5 #specify number of folds for cross-validation is 5 (this acceptable here in order to reduce training time)
,distribution = "bernoulli" #specify that we are predicting a binary dependent variable
,ntrees = 1000 #specify the number of trees to build (1000 as essentially the maximum number of trees that can be built. Early stopping parameters defined later will make it unlikely our model will reach 1000 trees)
,learn_rate = 0.1 #specify the learn rate used of for gradient descent optimization (goal is to use as small a learn rate as possible)
,learn_rate_annealing = 0.995 #specifies that the learn rate will perpetually decrease by a factor of 0.995 (this can help speed up traing for our grid search)
,max_depth = tuned_max_depth
,min_rows = tuned_min_rows
,sample_rate = 0.8 #specify the amount of row observations used when making a split decision
,col_sample_rate = 0.8 #specify the amount of column observations used when making a split decision
,stopping_metric = "logloss" #specify loss function
,stopping_tolerance = 0.001 #specify minimum change required in stopping metric for individual model to continue training
,stopping_rounds = 5 #specify maximum amount of training rounds stopping metric has to change in excess of stopping tolerance
#specifies hyperparameters to fluctuate during model building in the grid search
,hyper_params = gbm_hp2
#specifies the search criteria that includes stop training etrics to speed up model building
,search_criteria = search_criteria2
#sets a reproducible seed
,seed = 123456
)
h2o.rm(gbm_grid1)
The problem is I believe this code was written awhile ago and has been deprecated since. h2o.rm(gbm_grid1) fails and R Studio tells me that I require a hex identifier. So I assigned my object an identifier and tried h2o.rm(gbm_grid1, "identifier.hex") and it tells me I cannot release this type of object.
The issue is I run out of memory if I move onto the next steps of the script. What should I do?
This is what I get with H2O.ls()
Yes, you can remove objects with h2o.rm(). You can use the variable name or key.
h2o.rm(your_object)
h2o.rm(‘your_key’)
You can use h2o.ls() to check what objects are in memory. Also, you can add the argument cascade = TRUE to the rm method to remove sub-models.
See more here
I am new to machine learning tools and have installed Keras in R. While considering simpler models, I want to use neural network now for more special purposes. Generally, the neural network should be a function Phi: R^d -> R, where the input is d-dimensional.
Given are d-dimensional data for n time points such that the neural network calculates for each time an 1-dimensional target value. Thereof, I have M samples, i.e. the input is somewhat (samples,times,input_dimension)=(M,n,d), on which the neural network is separately applied. The output should be of the form (samples,times)=(M,n), so that for each time, the predicted value of the neural network is compared with the desired target - and this for every sample. Just for information, the range of the numbers are around d=5, n=1000, M=100.
Based on this, one would suggest to run a "usual" neural network on M*n samples with d-dimensional input and 1-dimensional target. However, the problem is that the loss function depends on the previous evaluations of the neural networks in each time step, i.e. the loss is of the form
l(y_pred,y_target) = sum_{i=1}^n (y^i_pred-y^i_target+f_i(...))^2
where y^i_pred and y^i_target are the predicted and target values of the ith time step, respectively, and f_i is an additional function (that depends on the second derivative of the neural network, but that is another story, and on the previous losses).
So far I have the following code in order to illustrate my problem:
input <- array(data1,dim=c(M,n,d))
target <- array(data2,dim=c(M,n))
myloss <- function(f,y_true,y_pred) {
K <- backend()
return(K$sum((y_pred-y_true+f)^2))
}
library(keras)
NN <- keras_model_sequential()
NN %>%
layer_dense(units=20,activation='relu',input_shape=c(n,d)) %>%
layer_dense(units=20,activation='relu') %>%
layer_dense(units=1,activation='relu')
summary(NN)
NN %>% compile(
loss function(y_true,y_pred) myloss(f,y_true,y_pred),
optimizer = "adam",
metrics = "acc"
)
history <- NN %>% fit(
input, target,
epochs = 30, batch_size = 20,
validation_split = 0.1
)
I get various error messages (concerning dimension of target and custom loss function), therefore my question: Is it even possible to incorporate my problem into a Keras model? Or should I use convolutional neural networks? I looked also at recurrent networks, but in my case, only the loss is dependent on the "previous values". Perhaps somebody can give an advice, I would appreciate your help.
For producing random forest the algorithm splits randomly the records and and attributes and builds decision tree.
For example if I use the following code:
set.seed(71)
rf <-randomForest(income~.,data=mydata, ntree=200)
I'll have 200 trees.
I can use the parameters mtry = number of variables selected at each split and sampsize = Sample size to be drawn from the data for growing each decision tree.
I would like to have for each of the 200 trees the numbers of lines (records) of mydata dataset that were chosen and the names of variables (attributes) that were chosen. How can I find it?
Depending on your settings in mtry/sampsize you can use the following code:
rf = randomForest(Species~.,data=iris,ntree=200,mtry=2,sampsize=30,keep.forest=TRUE,replace=FALSE,keep.inbag=TRUE)
out_vars = varUsed(rf,by.tree=TRUE) # gives the variables used in each tree
apply(out_vars,2,function(x) which(x!=0))
out_case = rf$inbag # gives the cases used in each tree
apply(out_case,2,function(x) which(x!=0))
Make sure you select keep.inbag=TRUE and replace=FALSE, see ?randomForest for documentation
I'm analyzing persistency using decision trees with 13 independent variables (7 of which are categorical) but I'm getting a tree considering only one numeric variable).
My code is:
fmla=STATUS~.
tm=rpart(fmla, data=trainData,method = "class")
This is happening because information gained obtained by splitting of the tree is less than the default threshold.
Rpart provides ways to control how a tree is built.The following are some controls that you can play with :
Rpart stops the tree splitting, if the information gained is less than 0.01(default).This information gain/rmse gain/gini gain is specified by the parameter "cp" in rpart controls.
parms options is used to indicate the splitting function.
rpart.controls is used to indicate tree parameters.
Most used options of rpart.controls are max depth ( tree depth ), minbucket (min. number of observations in each node ) and minsplit( the min. number of observations required in a node before splitting further. )
Pls note : By default when the method="class", the rpart classification tree uses gini index for node split criterion.
Example of customised rpart tree:
rpart(data=traindata,
formula = Status ~ .,
method = "class",
parms=list(split="information"),
control=rpart.control(minbucket=50,minsplit=100, cp=0.0000001, maxdepth=30))
I'm pretty new to R, so I hope you can help me!
I'm trying to do a simulation for my Bachelor's thesis, where I want to simulate how a stock evolves.
I've done the simulation in Excel, but the problem is that I can't make that large of a simulation, as the program crashes! Therefore I'm trying in R.
The stock evolves as follows (everything except $\epsilon$ consists of constants which are known):
$$W_{t+\Delta t} = W_t exp^{r \Delta t}(1+\pi(exp((\sigma \lambda -0.5\sigma^2) \Delta t+\sigma \epsilon_{t+\Delta t} \sqrt{\Delta t}-1))$$
The only thing here which is stochastic is $\epsilon$, which is represented by a Brownian motion with N(0,1).
What I've done in Excel:
Made 100 samples with a size of 40. All these samples are standard normal distributed: N(0,1).
Then these outcomes are used to calculate how the stock is affected from these (the normal distribution represent the shocks from the economy).
My problem in R:
I've used the sample function:
x <- sample(norm(0,1), 1000, T)
So I have 1000 samples, which are normally distributed. Now I don't know how to put these results into the formula I have for the evolution of my stock. Can anyone help?
Using R for (discrete) simulation
There are two aspects to your question: conceptual and coding.
Let's deal with the conceptual first, starting with the meaning of your equation:
1. Conceptual issues
The first thing to note is that your evolution equation is continuous in time, so running your simulation as described above means accepting a discretisation of the problem. Whether or not that is appropriate depends on your model and how you have obtained the evolution equation.
If you do run a discrete simulation, then the key decision you have to make is what stepsize $\Delta t$ you will use. You can explore different step-sizes to observe the effect of step-size, or you can proceed analytically and attempt to derive an appropriate step-size.
Once you have your step-size, your simulation consists of pulling new shocks (samples of your standard normal distribution), and evolving the equation iteratively until the desired time has elapsed. The final state $W_t$ is then available for you to analyse however you wish. (If you retain all of the $W_t$, you have a distribution of the trajectory of the system as well, which you can analyse.)
So:
your $x$ are a sampled distribution of your shocks, i.e. they are $\epsilon_t=0$.
To simulate the evolution of the $W_t$, you will need some initial condition $W_0$. What this is depends on what you're modelling. If you're modelling the likely values of a single stock starting at an initial price $W_0$, then your initial state is a 1000 element vector with constant value.
Now evaluate your equation, plugging in all your constants, $W_0$, and your initial shocks $\epsilon_0 = x$ to get the distribution of prices $W_1$.
Repeat: sample $x$ again -- this is now $\epsilon_1$. Plugging this in, gives you $W_2$ etc.
2. Coding the simulation (simple example)
One of the useful features of R is that most operators work element-wise over vectors.
So you can pretty much type in your equation more or less as it is.
I've made a few assumptions about the parameters in your equation, and I've ignored the $\pi$ function -- you can add that in later.
So you end up with code that looks something like this:
dt <- 0.5 # step-size
r <- 1 # parameters
lambda <- 1
sigma <- 1 # std deviation
w0 <- rep(1,1000) # presumed initial condition -- prices start at 1
# Show an example iteration -- incorporate into one line for production code...
x <- rnorm(1000,mean=0,sd=1) # random shock
w1 <- w0*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*x*sqrt(dt) -1)) # evolution
When you're ready to let the simulation run, then merge the last two lines, i.e. include the sampling statement in the evolution statement. You then get one line of code which you can run manually or embed into a loop, along with any other analysis you want to run.
# General simulation step
w <- w*exp(r*dt)*(1+exp((sigma*lambda-0.5*sigma^2)*dt +
sigma*rnorm(1000,mean=0,sd=1)*sqrt(dt) -1))
You can also easily visualise the changes and obtain summary statistics (5-number summary):
hist(w)
summary(w)
Of course, you'll still need to work through the details of what you actually want to model and how you want to go about analysing it --- and you've got the $\pi$ function to deal with --- but this should get you started toward using R for discrete simulation.