GridSearchCV for Multiples Models - pipeline

I would like to run different models using GridSearchCV.
models = {
"RandomForestRegressor": RandomForestRegressor(),
"AdaBoostRegressor": AdaBoostRegressor(),}
params = {
"RandomForestRegressor": {"n_estimators": [10, 50, 75], "max_depth": [10, 20, 50], "max_features": ["auto","sqrt","log2"]},
"AdaBoostRegressor": {"n_estimators": [50, 100],"learning_rate": [0.01,0.1, 0.5],"loss": ["linear","square"]},}

I hope this is helpful, but perhaps just add a parameter to your "create_model" function. FOr example, here is a very basic create_model function that uses the activation function as its argument as the parameter that GridsearchCV is trying to help you tune.
def create_model(activation_fn):
# create model
model = Sequential()
model.add(Dense(30, input_dim=feats, activation=activation_fn,
kernel_initializer='normal'))
model.add(Dropout(0.2))
model.add(Dense(10, activation=activation_fn))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# Compile model
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['mean_squared_error','mae'])
return model
Now what you can do is modify this to have a second argument called model_type (or whatever you want to call it).
def create_model(model_type = 'rfr'):
if model_type == 'rfr':
......
elif model_type == 'xgb':
.......
elif model_type == 'neural_network':
.......
Then in your params dictionary that is fed into the GridsearchCV that you call, you just give the model_type key a list of models that you want to tune (optimize over). Just make sure that within each block of code under a given "if" statement that you put in the proper code to create your desired model.

Related

How to calculate similarities between test and train documents

I'm trying to calculate similarities between test and train documents and label them. Here is the code but it doesn't work. I'd also appreciate if somebody could explain the idea.
def calculate_similarities(self, vecTestDoc, vectorsOfTrainDocs):
list_of_similarities = []
for vector in vectorsOfTrainDocs:
label = vectorsOfTrainDocs.key()
list_of_similarities += [(self.calculate_similarities(vector, vecTestDoc), label)]
return list_of_similarities
Here's the error:
File "..\classification.py", line 98, in calculate_similarities
label = vectorsOfTrainDocs.key()
AttributeError: 'list' object has no attribute 'key'
Edit: I've defined two more functions and have been working on a different solution. Here are they:
def cosine_similarity(self, weightedA, weightedB):
dotAB = dot(weightedA, weightedB)
normA = math.sqrt(dot(weightedA, weightedA))
normB = math.sqrt(dot(weightedB, weightedB))
return dotAB / (normA * normB)
def fit(self, doc_collection):
self.doc_collection = doc_collection
self.vectorsOfDoc_collection = [(doc, self.doc_collection.tfidf(doc.token_counts))
for doc in self.doc_collection.docid_to_doc.values()]
I believe something like this would work but there are still error messages... What should I change?
return [self.doc_collection.cosine_similarity(vecTestDoc) in vectorsOfTrainDocs]

Gradient flow stopped on a combined model

I meet with a problem that the gradient cannot backpropagate on a combined network. I checked lots of answers but cannot find a relevant solution to this problem. I would appreciate it so much if we can solve this.
I wanted to calculate the gradient for input data in this code:
for i, (input, target, impath) in tqdm(enumerate(data_loader)):
# print(‘input.shape:’, input.shape)
input = Variable(input.cuda(), requires_grad=True)
output = model(input)
loss = criterion(output, target.cuda())
loss = Variable(loss, requires_grad=True)
loss.backward()
print(‘input:’, input.grad.data)
but I got errror:
print(‘input:’, input.grad.data)
AttributeError: ‘NoneType’ object has no attribute ‘data’
and my model is a combined model that I loaded the parameters from two pretrained models.
I checked the requires_grad state-dict of model weights, it is true, however, the gradient of the model weights is None.
Is it because I load the state-dict that caused the gradient block?
How can I deal with this problem?
The model structure is attached below:
class resnet_model(nn.Module):
def __init__(self, opt):
super(resnet_model, self).__init__()
resnet = models.resnet101()
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, 1000)
if opt.resnet_path != None:
state_dict = torch.load(opt.resnet_path)
resnet.load_state_dict(state_dict)
print("resnet load state dict from {}".format(opt.resnet_path))
self.model1 = torch.nn.Sequential()
for chd in resnet.named_children():
if chd[0] != 'fc':
self.model1.add_module(chd[0], chd[1])
self.model2 = torch.nn.Sequential()
self.classifier = LINEAR_LOGSOFTMAX(input_dim=2048, nclass=200)
if opt.pretrained != None:
self.classifier_state_dict = torch.load('../checkpoint/{}_cls.pth'.format(opt.pretrained))
print("classifier load state dict from ../checkpoint/{}_cls.pth".format(opt.pretrained))
self.classifier.load_state_dict(self.classifier_state_dict)
for chd in self.classifier.named_children():
self.model2.add_module(chd[0], chd[1])
def forward(self, x):
x = self.model1(x)
x = x.view(-1, 2048)
x = self.model2(x)
return x
The problem is solved with this comment:
Why do you have this line: loss = Variable(loss, requires_grad=True) ?
Variable should not be used anymore.
So the line above should be deleted and to mark a Tensor for which you want gradients, you can use:
input = input.cuda().requires_grad_().

associative arrays in openscad?

Does openscad have any language primitive for string-keyed associative arrays (a.k.a hash maps, a.k.a dictionaries)? Or is there any convention for how to emulate associative arrays?
So far all I can think of is using vectors and using variables to map indexes into the vector to human readable names. That means there's no nice, readable way to define the vector, you just have to comment it.
Imagine I want to write something akin to the Python data structure:
bobbin_metrics = {
'majacraft': {
'shaft_inner_diameter': 9.0,
'shaft_outer_diameter': 19.5,
'close_wheel_diameter': 60.1,
# ...
},
'majacraft_jumbo': {
'shaft_inner_diameter': 9.0,
'shaft_outer_diameter': 25.0,
'close_wheel_diameter': 100.0,
},
# ...
}
such that I can reference it in model definitions in some recognisably hash-map-like way, like passing bobbin_metrics['majacraft'] to something as metrics and referencing metrics['close_wheel_diameter'].
So far my best effort looks like
# Vector indexes into bobbin-metrics arrays
BM_SHAFT_INNER_DIAMETER = 0
BM_SHAFT_OUTER_DIAMETER = 1
BM_CLOSE_WHEEL_DIAMETER = 2
bobbin_metrics_majacraft = [
9.0, # shaft inner diameter
19.5, # shaft outer diameter
60.1, # close-side wheel diameter
# ....
];
bobbin_metrics_majacraft_jumbo = [
9.0, # shaft inner diameter
25.0, # shaft outer diameter
100.0, # close-side wheel diameter
# ....
];
bobbin_metrics = [
bobbin_metrics_majacraft,
bobbin_metrics_majacraft_jumbo,
# ...
];
# Usage when passed a bobbin metrics vector like
# bobbin_metrics_majacraft as 'metrics' to a function
metrics[BM_SHAFT_INNER_DIAMETER]
I think that'll work. But it's U.G.L.Y.. Not quite "I write applications in bash" ugly, but not far off.
Is there a better way?
I'm prepared to maintain the data set outside openscad and have a generator for an include file if I have to, but I'd rather not.
Also, in honour of April 1 I miss the blink tag and wonder if the scrolling marquee will work? Tried 'em :)
I played around with the OpenSCAD search() function which is documented in the manual here;
https://en.wikibooks.org/wiki/OpenSCAD_User_Manual/Other_Language_Features#Search
The following pattern allows a form of associative list, it may not be optimal but does provide a way to set up a dictionary structure and retrieve a value against a string key;
// associative searching
// dp 2019
// - define the dictionary
dict = [
["shaft_inner_diameter", 9.0],
["shaft_outer_diameter", 19.5],
["close_wheel_diameter", 60.1]
];
// specify the serach term
term = "close_wheel_diameter";
// execute the search
find = search(term, dict);
// process results
echo("1", find);
echo ("2",dict[find[0]]);
echo ("3",dict[find[0]][1]);
The above produces;
Compiling design (CSG Tree generation)...
WARNING: search term not found: "l"
...
WARNING: search term not found: "r"
ECHO: "1", [2, 0]
ECHO: "2", ["close_wheel_diameter", 60.1]
ECHO: "3", 60.1
Personally, I would do this sort of thing in Python then generate the OpenSCAD as an intermediate file or maybe use the SolidPython library.
An example of a function that uses search() and does not produce any warnings.
available_specs = [
["mgn7c", 1,2,3,4],
["mgn7h", 2,3,4,5],
];
function selector(item) = available_specs[search([item], available_specs)[0]];
chosen_spec = selector("mgn7c");
echo("Specification was returned from function", chosen_spec);
The above will produce the following output:
ECHO: "Specification was returned from function", ["mgn7c", 1, 2, 3, 4]
Another very similar approach is using list comprehensions with a condition statement, just like you would in Python for example. Does the same thing, looks a bit simpler.
function selector(item) = [
for (spec = available_specs)
if (spec[0] == item)
spec
];

Error when implementing RBF kernel bandwidth differentiation in Pytorch

I'm implementing an RBF network by using some beginer examples from Pytorch Website. I have a problem when implementing the kernel bandwidth differentiation for the network. Also, Iwould like to know whether my attempt ti implement the idea is fine. This is a code sample to reproduce the issue. Thanks
# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable
def kernel_product(x,y, mode = "gaussian", s = 1.):
x_i = x.unsqueeze(1)
y_j = y.unsqueeze(0)
xmy = ((x_i-y_j)**2).sum(2)
if mode == "gaussian" : K = torch.exp( - xmy/s**2) )
elif mode == "laplace" : K = torch.exp( - torch.sqrt(xmy + (s**2)))
elif mode == "energy" : K = torch.pow( xmy + (s**2), -.25 )
return torch.t(K)
class MyReLU(torch.autograd.Function):
"""
We can implement our own custom autograd Functions by subclassing
torch.autograd.Function and implementing the forward and backward passes
which operate on Tensors.
"""
#staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
#staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.cuda.FloatTensor
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(H, D_in).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
# I've created this scalar variable (the kernel bandwidth)
s = Variable(torch.randn(1).type(dtype), requires_grad=True)
learning_rate = 1e-6
for t in range(500):
# To apply our Function, we use Function.apply method. We alias this as 'relu'.
relu = MyReLU.apply
# Forward pass: compute predicted y using operations on Variables; we compute
# ReLU using our custom autograd operation.
# y_pred = relu(x.mm(w1)).mm(w2)
y_pred = relu(kernel_product(w1, x, s)).mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum()
print(t, loss.data[0])
# Use autograd to compute the backward pass.
loss.backward()
# Update weights using gradient descent
w1.data -= learning_rate * w1.grad.data
w2.data -= learning_rate * w2.grad.data
# Manually zero the gradients after updating weights
w1.grad.data.zero_()
w2.grad.data.zero_()
However I get this error, which dissapears when I simply use a fixed scalar in the default input parameter of kernel_product():
RuntimeError: eq() received an invalid combination of arguments - got (str), but expected one of:
* (float other)
didn't match because some of the arguments have invalid types: (str)
* (Variable other)
didn't match because some of the arguments have invalid types: (str)
Well, you are calling kernel_product(w1, x, s) where w1, x and s are torch Variable while the definition of the function is: kernel_product(x,y, mode = "gaussian", s = 1.). Seems like s should be a string specifying the mode.

How to use S4 object programming in R

What's wrong with my R script? I'm trying to use a vector of user-defined objects (here a vector of "Page" objects) within another user-defined object (here a "Book" object)
setClass("Page",
slots = c(PageNo = "numeric", #scalar
Contents = "character") #vector of strings
)
setClass("Book",
slots = c(Pages = "vector", # Something wrong here? vector of pages ? "Page" or vector" or "list"
Title = "character") #vector of strings
)
setGeneric(name="AddPage", def=function(aBook, pageNo){standardGeneric("AddPage")})
setMethod(f="AddPage", signature="Book",
definition=function(aBook, pageNo)
{
page1 = new("Page")
page1#PageNo = pageNo
aBook#Pages = c(aBook#Pages, page1) # Something wrong here?
}
)
book1 = new("Book")
book1#Title = "Sample Book"
book1
book1#Pages
AddPage(book1, 1)
AddPage(book1, 2)
book1#Pages
Remember that R does not use reference semantics, so AddPage(book1, 1) creates a copy of book1, and updates that. In the method you don't return the updated object, and book1 remains unchanged.
Update the method so that it returns the modified object
setMethod(f="AddPage", signature="Book",
definition=function(aBook, pageNo)
{
page1 = new("Page")
page1#PageNo = pageNo
aBook#Pages = c(aBook#Pages, page1) # Something wrong here?
aBook
}
)
and assign the return value to the old variable
book1 = AddPage(book1, 1)
But this is a very inefficient approach -- the line aBook#Pages = c(aBook#Pages, page1) makes a copy of all existing pages (on the right-hand side, to create a longer vector; this will scale with the square of the number of Pages added to the book) and then copies the entire Book (for the assignment). In addition, creating individual objects is expensive and does not exploit R's 'vectorization'. A first step is to think of the object 'Page' as instead 'Pages', where the object models the columns rather than rows of a data frame. 'Book' then doesn't have vector of Page objects, but a single Pages object. This also implies a different approach to creating your 'book'.

Resources