Pipeline' object has no attribute 'feature_importances_ - jupyter-notebook

I have a problem with my code, I want to see the feature importance about vector from word2vec model, but I can't beacause it's a pipeline. Someone could help me to find a solution please ?
## Import the random forest model.
from sklearn.ensemble import RandomForestClassifier
## This line instantiates the model.
rf = Pipeline([
("word2vec vectorizer", MeanEmbeddingVectorizer(w2v)),
("Random_forest", RandomForestClassifier(n_estimators=100, max_depth=6,random_state=0))])
## Fit the model on your training data.
rf.fit(X_train, y_train)
## And score it on your testing data.
rf.score(X_test, y_test)
X = model.wv.syn0
X = X.astype(int)
def plot_feat_imp(model, X):
Feature_Imp = pd.DataFrame([X, rand_w2v_tfidf.feature_importances_]).transpose(
).sort_values(1, ascending=False)
plt.figure(figsize=(14, 7))
sns.barplot(y=Feature_Imp.loc[:, 0], x=Feature_Imp.loc[:, 1], data=Feature_Imp, orient='h')
plt.title("Importance des variables (qu'est ce qui explique le mieux la satisfaction)", fontsize=21)
plt.show()
return
MY PROBLEM IS HERE
AttributeError: 'Pipeline' object has no attribute 'feature_importances_'
plot_feat_imp(gbc_w2v, X)

Maybe not the answer you were seeking for, but if you want the feature_importances_ of your pipeline object you might want to first get into the best classifier.
This is possible with:
rf_fit = rf.fit(X_train, y_train)
feature_importances = rf_fit.best_estimator_._final_estimator.feature_importances_
Hope that helps.

Related

BERT attribution scores for token probability prediction

I've been trying to find a library or an example for getting token importance when a BERT model predicts a masked span, eg:
from transformers import BertTokenizerFast, BertForMaskedLM
import torch
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()
text = 'Brad Pitt is an [MASK] actor.'
tokenized_text = tokenizer.tokenize(text)
masked_index = tokenized_text.index("[MASK]")
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
# Predict all tokens
with torch.no_grad():
outputs = model(tokens_tensor)
predictions = outputs[0]
probs = torch.nn.functional.softmax(predictions[0, masked_index], dim=-1)
You could then pick the highest predicted value, or the 5 top values.
How would I go about calculating, let's say vanilla gradients or any other type or saliency method and see which tokens where important when predicting the masked token?
I read Ecco's documentation but they don't support attribution for BERT yet, AllenNLP has a demo for MLM task, but it's only for that demo, and I couldn't find anything relevant using SHAP or Captum.
Any help pointing to the right direction woudl be appreciated.

What's the difference between lgb.train() and lightgbm() in r?

I'm trying to build a regression model with R using lightGBM,
and i'm getting a bit confused with some functions and when/how to use them.
First one is what i've written in the title, what's the difference between lgb.train() and lightgbm()?
The description in the documentation(https://cran.r-project.org/web/packages/lightgbm/lightgbm.pdf) says that lgb.train is 'Logic to train with LightGBM' and lightgbm is 'Simple interface for training a LightGBM model', while both their outcome value is lgb.Booster, a trained model.
One difference I've found is that lgb.train() does not work with valids = , while lightgbm() does.
Second one is about a function lgb.cv(), regarding a cross validation in lightGBM. How do you apply the output of lgb.cv() to a model?
As I understood from the documentation i've linked above, it seems like the output of both lgb.cv and lgb.train is a model.
Is it correct to use it like the example below?
lgbcv <- lgb.cv(params,
lgbtrain,
nrounds = 1000,
nfold = 5,
early_stopping_rounds = 100,
learning_rate = 1.0)
lgbcv <- lightgbm(params,
lgbtrain,
nrounds = 1000,
early_stopping_rounds = 100,
learning_rate = 1.0)
Thank you in advance!
what's the difference between lgb.train() and lightgbm()?
These functions both train a LightGBM model, they're just slightly different interfaces. The biggest difference is in how training data are prepared. LightGBM training requires a special LightGBM-specific representation of the training data, called a Dataset. To use lgb.train(), you have to construct one of these beforehand with lgb.Dataset(). lightgbm(), on the other hand, can accept a data frame, data.table, or matrix and will create the Dataset object for you.
Choose whichever method you feel has a more friendly interface...both will produce a single trained LightGBM model (class "lgb.Booster").
that lgb.train() does not work with valids = , while lightgbm() does.
This is not correct. Both functions accept the keyword argument valids. Run ?lgb.train and ?lightgbm for documentation on those methods.
How do you apply the output of lgb.cv() to a model?
I'm not sure what you mean, but you can find an example of how to use lgb.cv() in the docs that show up when you run ?lgb.cv.
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")
model <- lgb.cv(
params = params
, data = dtrain
, nrounds = 5L
, nfold = 3L
, min_data = 1L
, learning_rate = 1.0
)
This returns an object of class "lgb.CVBooster". That object has multiple "lgb.Booster" objects in it (the trained models that lightgbm() or lgb.train() produce).
You can extract any one of these from model$boosters. However, in practice I don't recommend using the models from lgb.cv() directly. The goal of cross-validation is to get an estimate of the generalization error for a model. So you can use lgb.cv() to figure out the expected error for a given dataset + set of parameters (by looking at model$record_evals and model$best_score).

R xgb.importance shows error - "feature_names has less elements than there are features used in the model"

I'm exploring the XGBoost in R.
After training the model, I wanted to see the feature-importance data.
xgb.importance(model = bst)
The above call shows the following error. What might be wrong?
Error in xgb.model.dt.tree(feature_names = feature_names, text = model_text_dump, : feature_names has less elements than there are features used in the model
PN - I checked the following section of the xgboost lib code, but still couldn't figure out the actual issue.
# assign feature_names when available
if (!is.null(feature_names)) {
if (length(feature_names) <= max(as.numeric(td$Feature), na.rm = TRUE))
stop("feature_names has less elements than there are features used in the model")
td[isLeaf == FALSE, Feature := feature_names[as.numeric(Feature) + 1] ]
}
Ref - https://github.com/dmlc/xgboost/blob/master/R-package/R/xgb.model.dt.tree.R
I see that the nfeatures variable of the trained model is same as the number of features passed to this model.
Dose your model have feature_names function?
Perhaps try xgb.importance(feature_names=colnames(bst$feature_names), model = bst). Works for me.

prediction on test set for Gaussian Process Regression in R

The mlegp package explains how to do Gaussian Process fitting but the R code mentioned in the mlegp package only demonstrates the use of the predict method to reconstruct the original functional output. Can someone help me understand how to predict using GPR on a test set?
The function predict.gp (which gets called when you use predict on an mlpeg object) takes a newData argument, see ?predict.gp:
Usage:
## S3 method for class 'gp'
predict(object, newData = object$X, se.fit = FALSE, ...)
Arguments:
object: an object of class ‘gp’
newData: an optional data frame or matrix with rows corresponding to
inputs for which to predict. If omitted, the design matrix
‘X’ of ‘object’ is used.
...
Consider the simple model
library(mlepg)
x = -5:5
y = sin(x) + rnorm(length(x),sd = 0.1)
fit = mlegp(x, y)
Then
predict(fit)
and
predict(fit, newData = fit$X)
gives the same result. You can then change newData according to your test data.

Kaggle Digit Recognizer Using SVM (e1071): Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty

I am trying to solve the digit Recognizer competition in Kaggle and I run in to this error.
I loaded the training data and adjusted the values of it by dividing it with the maximum pixel value which is 255. After that, I am trying to build my model.
Here Goes my code,
Given_Training_data <- get(load("Given_Training_data.RData"))
Given_Testing_data <- get(load("Given_Testing_data.RData"))
Maximum_Pixel_value = max(Given_Training_data)
Tot_Col_Train_data = ncol(Given_Training_data)
training_data_adjusted <- Given_Training_data[, 2:ncol(Given_Training_data)]/Maximum_Pixel_value
testing_data_adjusted <- Given_Testing_data[, 2:ncol(Given_Testing_data)]/Maximum_Pixel_value
label_training_data <- Given_Training_data$label
final_training_data <- cbind(label_training_data, training_data_adjusted)
smp_size <- floor(0.75 * nrow(final_training_data))
set.seed(100)
training_ind <- sample(seq_len(nrow(final_training_data)), size = smp_size)
training_data1 <- final_training_data[training_ind, ]
train_no_label1 <- as.data.frame(training_data1[,-1])
train_label1 <-as.data.frame(training_data1[,1])
svm_model1 <- svm(train_label1,train_no_label1) #This line is throwing an error
Error : Error in predict.svm(ret, xhold, decision.values = TRUE) : Model is empty!
Please Kindly share your thoughts. I am not looking for an answer but rather some idea that guides me in the right direction as I am in a learning phase.
Thanks.
Update to the question :
trainlabel1 <- train_label1[sapply(train_label1, function(x) !is.factor(x) | length(unique(x))>1 )]
trainnolabel1 <- train_no_label1[sapply(train_no_label1, function(x) !is.factor(x) | length(unique(x))>1 )]
svm_model2 <- svm(trainlabel1,trainnolabel1,scale = F)
It didn't help either.
Read the manual (https://cran.r-project.org/web/packages/e1071/e1071.pdf):
svm(x, y = NULL, scale = TRUE, type = NULL, ...)
...
Arguments:
...
x a data matrix, a vector, or a sparse matrix (object of class
Matrix provided by the Matrix package, or of class matrix.csr
provided by the SparseM package,
or of class simple_triplet_matrix provided by the slam package).
y a response vector with one label for each row/component of x.
Can be either a factor (for classification tasks) or a numeric vector
(for regression).
Therefore, the mains problems are that your call to svm is switching the data matrix and the response vector, and that you are passing the response vector as integer, resulting in a regression model. Furthermore, you are also passing the response vector as a single-column data-frame, which is not exactly how you are supposed to do it. Hence, if you change the call to:
svm_model1 <- svm(train_no_label1, as.factor(train_label1[, 1]))
it will work as expected. Note that training will take some minutes to run.
You may also want to remove features that are constant (where the values in the respective column of the training data matrix are all identical) in the training data, since these will not influence the classification.
I don't think you need to scale it manually since svm itself will do it unlike most neural network package.
You can also use the formula version of svm instead of the matrix and vectors which is
svm(result~.,data = your_training_set)
in your case, I guess you want to make sure the result to be used as factor,because you want a label like 1,2,3 not 1.5467 which is a regression
I can debug it if you can share the data:Given_Training_data.RData

Resources