h2o autoencoders high error (h2o.mse) - r

I am trying to use h2o to create an autoencoder using its deeplearning function. I am feeding a set of data about 4000x50 in size to the deeplearning function (hidden node c(200)) and then using h2o.mse to check its error and I am getting about 0.4, a fairly high value.
Is there anyway to reduce that error by changing something in the deeplearning function?

I assume everything is the defaults, except defining a single hidden layer with 200 nodes?
Your first set of things to try are:
Use more epochs (or use less aggressive early stopping criteria)
Use a 2nd hidden layer
Use more nodes in your hidden layer(s)
Get more training data
Note that all of those will increase your training time.

You can use H2OGridSearch to find the best autoencoder model with the smallest MSE.
Below is an example in Python. Here you can find example in R.
def tuneAndTrain(hyperParameters, model, trainDataFrame):
h2o.init()
trainData=trainDataFrame.values
trainDataHex=h2o.H2OFrame(trainData)
modelGrid = H2OGridSearch(model,hyper_params=hyperParameters)
modelGrid.train(x=list(range(0,int(len(trainDataFrame.columns)))),training_frame=trainDataHex)
gridperf1 = modelGrid.get_grid(sort_by='mse', decreasing=True)
bestModel = gridperf1.models[0]
return bestModel
And you can call the above function to find and train the best model:
hiddenOpt = [[50,50],[100,100], [5,5,5],[50,50,50]]
l2Opt = [1e-4,1e-2]
hyperParameters = {"hidden":hiddenOpt, "l2":l2Opt}
bestModel=tuneAndTrain(hyperParameters,H2OAutoEncoderEstimator(activation="Tanh", ignore_const_cols=False, epochs=200),dataFrameTrainPreprocessed)

Related

Modelica/Dymola Run Linearized Model with Initial Values

I am new to Dymola and I want to run a linearized model with initial conditions.
I know how to Linearize it. I can get the StateSpace object in Command window or get the dslin.mat.
Now I want to run it with initial conditions. I found them in the dsin.txt file, but cant bring them together.
Is there an implemented way or do I need to write it on my own?
Best regards,
Axel
You can use the block Modelica.Blocks.Continuous.StateSpace to build a model containing a state-space description, as shown below:
The respective code is:
model StateSpaceModel
Modelica.Blocks.Continuous.StateSpace sys annotation (Placement(transformation(extent={{-10,-10},{10,10}})));
Modelica.Blocks.Sources.Step step(startTime=0.5) annotation (Placement(transformation(extent={{-60,-10},{-40,10}})));
equation
connect(step.y, sys.u[1]) annotation (Line(points={{-39,0},{-12,0}}, color={0,0,127}));
annotation (uses(Modelica(version="4.0.0")));
end StateSpaceModel;
Additionally you can use a script (or a Modelica function) that does some work for you. More precisely, it
linearizes any suitable model. I've used the state-space model from the MSL itself, so you can be sure the result is correct.
translates the above model to be able to set the parameters from the command line
sets the parameters of the state-space block called sys. This includes the ones for the initial conditions in x_start
simulates the model with the new parameters
// Get state-space description of a model
ss = Modelica_LinearSystems2.ModelAnalysis.Linearize("Modelica.Blocks.Continuous.StateSpace");
// Translate custom example, set parameters to result of the above linearization, add initial conditions for states and simulate
translateModel("StateSpaceModel")
sys.A = ss.A;
sys.B = ss.B;
sys.C = ss.C; // in case of an error here, check if 'OutputCPUtime == false;'
sys.D = ss.D;
sys.x_start = ones(size(sys.A,1));
simulateModel("StateSpaceModel", resultFile="StateSpaceModel");

How to use RWeka classifiers function attribute "options"?

In RWeka classifiers, there is an attribute "options" in the classifier's function call, e.g. Bagging(formula, data, subset, na.action, control = Weka_control(), options = NULL). Could some one please give an example (a sample R code) on how to define these options?
I would be interested in passing on some options (such as the number of iterations and size of each bag) to Bagging meta learner of RWeka. Thanks in advance!
You can get at the features that you mentioned, but not through options.
First, what does options do? According to the help page ?Bagging
Argument options allows further customization. Currently, options model and instances (or partial matches for these) are used: if set to TRUE, the model frame or the corresponding Weka instances, respectively, are included in the fitted model object, possibly speeding up subsequent computations on the object. By default, neither is included.
So options simply stores more information in the returned result. To get at the features that you want, you need to use control. You will need to construct the value for control using the function Weka_control. Without some help, it is hard to know how to use that, but luckily, help is available through WOW the Weka Option Wizard. Because there are many options, the output is long. I am going to truncate it to just the part about the features that you mentioned - the number of iterations and size of each bag. But do look at what else is available.
WOW(Bagging)
-P Size of each bag, as a percentage of the training set size. (default 100)
-I <num>
Number of iterations. (current value 10)
Number of arguments: 1.
Repeating: I have truncated the output to show just these two options.
Example: Iris data
Suppose that I wanted to use bagging with the iris data with the bag size being 90% of the data (instead of the default 100%) and with 20 iterations (instead of the default 10). First, I would build the Weka_control, then include that in my call to Bagging.
WC = Weka_control(P=90, I=20)
BagOfIrises = Bagging(Species ~ ., data=iris, control=WC)
I hope that this helps.

R+Tableau connection: Using linear regression & Relaimpo package; Working in R but not in connection

I am applying a linear regression model to data, and using the relaimpo package to find the most significant factors.
When running the following code in R, it works fine
library(readxl)
nba <- read_excel("XXXX")
View(nba)
library(relaimpo)
rec = lm(won ~ o_fgm + o_ftm + o_pts , data = nba)
x= calc.relimp(rec, type = c("lmg"), rela = TRUE, rank = TRUE)
x$lmg
I get output of:
o_fgm o_ftm o_pts
0.3374366 0.2628543 0.3997091
When connecting via Tableau I use the following code:
SCRIPT_REAL("
won=.arg1
o_fgm=.arg2
o_ftm=.arg3
o_pts=.arg4
library(relaimpo)
rec = lm(won ~ o_fgm + o_ftm + o_pts)
x= calc.relimp(rec, type = c('lmg'), rela = TRUE, rank = TRUE)
"
,MEDIAN([Won]),MEDIAN([O Fgm]),MEDIAN([O Ftm]),MEDIAN([O Pts]))
I am getting the following error:
An error occurred while communicating with the RServe service.
Error in calc.relimp.default.intern(object = structure(list(won = 39, : Too few complete observations for estimating this model
I have run it with just the regression and it runs fine; so it seems the issue is with the relaimpo package. There is limited documentation online on this package so I cannot find a fix; any help is really appreciated thanks!
Data is from kaggle at https://www.kaggle.com/open-source-sports/mens-professional-basketball
(the "basketball_teams.csv" file)
When Tableau calls R or Python using the SCRIPT_REAL() function, or any SCRIPT_XXX() function, it is using what Tableau calls a table calculation. This has the effect of passing R one or more vectors -- and receiving back vector results -- instead of calling the function once for each scalar cell.
However, you are responsible for specifying how to partition your aggregate results into vectors, and how to order the rows in the vectors you send to R or Python. You do that by specifying the "partitioning" and "addressing" of each table calc via the Edit Table Calc command (right click on a calc field).
So the most likely issue, is that you are sending R less data than you expect, perhaps many short vectors instead of the one long one you intend. Read about Table Calcs and partitioning and addressing in the online help. You specify partitioning in particular by the choice of which dimensions are not set to "compute using" (a synonym for addressing dimensions) The Table Calc editor gives you some visible feedback as you try different settings - I recommend using specific dimensions in most cases.
For table calcs, the choice of partitioning and addressing is as important as the actual formula.

Exclude specific tensors being updated by optimizer in TensorFlow

I have two graphs, which I suppose to train them independently, which means I have two different optimizers, but at the same time one of them is using the tensor values of the other graph. As a result, I need to be able to stop specific tensors being updated while training one of the graphs. I have assigned two different namescopes two my tensors and using this code to control updates over tensors for different optimizers:
mentor_training_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "mentor")
train_op_mentor = mnist.training(loss_mentor, FLAGS.learning_rate, mentor_training_vars)
mentee_training_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "mentee")
train_op_mentee = mnist.training(loss_mentee, FLAGS.learning_rate, mentee_training_vars)
the vars variable is being used like below, in the training method of mnist object:
def training(loss, learning_rate, var_list):
# Add a scalar summary for the snapshot loss.
tf.summary.scalar('loss', loss)
# Create the gradient descent optimizer with the given learning rate.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Create a variable to track the global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
# Use the optimizer to apply the gradients that minimize the loss
# (and also increment the global step counter) as a single training step.
train_op = optimizer.minimize(loss, global_step=global_step, var_list=var_list)
return train_op
I'm using the var_list attribute of the optimizer class in order to control vars being updated by the optimizer.
Right now I'm confused whether I have done what I supposed to do appropriately, and even if there is anyway to check if any optimizer would only update partial of a graph?
I would appreciate if anyone can help me with this issue.
Thanks!
I have had a similar problem and used the same approach as you, i.e. via the var_list argument of the optimizer. I then checked whether the variables not intended for training stayed the same using:
the_var_np = sess.run(tf.get_default_graph().get_tensor_by_name('the_var:0'))
assert np.equal(the_var_np, pretrained_weights['the_var']).all()
pretrained_weights is a dictionary returned by np.load('some_file.npz') which I used to store the pre-trained weights to disk.
Just in case you need that as well, here is how you can override a tensor with a given value:
value = pretrained_weights['the_var']
variable = tf.get_default_graph().get_tensor_by_name('the_var:0')
sess.run(tf.assign(variable, value))

how to initialize weights with the neuralnet package?

I am using the neuralnet package in R, but I have a problem when I want to initialize certain number of initial weights for my network. I have tried to do it based on the results that I got from the default random weights generated, but no luck at all.
This is the part where I should put the initial weights:
weigths<-c(-0.3,0.2,
0.2,0.05,
0,2,-0.1,
-0.1,0.2,0.2)
net=neuralnet(to~x1+x2,tdata,hidden=2,threshold=0.01,constant.weights=weights)
because I am considering that the weights follow this pattern:
Intercept.to.1layhid1 -5.0556934519949
x1.to.1layhid1 10.9208362719511
x2.to.1layhid1 12.9996270590530
Intercept.to.1layhid2 3.7047601228351
x1.to.1layhid2 -2.5636252939619
x2.to.1layhid2 -2.5759077405754
Intercept.to.to -1.6494794336705
1layhid.1.to.to 1.3502874764968
1layhid.2.to.to 1.6969811621181
but when I apply it I got the error:
Error in constant.weights != 0
Any help?
Thanks
You are looking for the startweights argument to initialize custom weights. This is in the documentation:
help(neuralnet)
startweights:
a vector containing starting values for the weights.
The weights will not be randomly initialized.
The constant.weights is used to specify fixed weights those which you would have excluded with the exclude agrument.

Resources