Is there a way to load a pre-trained model.
I have tried the load("model.joblib") and save("model.joblib", model) functions but the loaded model only ha about 10% accuracy on the validation data, and successfully generates about 10% successful adversarial examples.
The model accuracy was about 99.3% before saving, and about 87% of adversarial examples generated were successful.
If I train the loaded the loaded model for the same number of epochs as the original then I get the expected accuracy and adversarial example generation rate.
Is there a way to so save the model so that it can be loaded without needing to be retrained?
After calling load(filename) you need to call model.get_logits(x) and CrossEntropy again to update the values used for calculating loss and accuracy.
Related
I will try to explain my problem as best as i can. I am trying to externally validate a prediction model, made by one of my colleagues. For the external validation, I have collected data from a set of new patients.
I want to test the accuracy of the prediction model on this new dataset. Online i have found a way to do so, using the coef.orig function to extract de coefficients of the original prediction model (see the picture i added). Here comes the problem, it is impossible for me to repeat the steps my colleague did to obtain the original prediction model. He used multiple imputation and bootstrapping for the development and internal validation, making it very complex to repeat his steps. What I do have, is the computed intercept and coefficients from the original model. Step 1 from the picture i added, could therefor be skipped.
My question is, how can I add these coefficients into the regression model, without the use of the 'coef()' function?
Steps to externally validate the prediction model:
The coefficients I need to use:
I thought that the offset function would possibly be of use, however this does not allow me to set the intercept and all the coefficients for the variables at the same time
I am working on a Machine Learning Project. I have set up a ML pipeline for various stages of project. The Pipeline goes like -
Data Extraction -> Data Validation -> Preprocessing -> Training -> Model Evaluation
Model Evaluation, takes place after training is completed to determine if a model is approved or rejected.
Now what I want is model evaluation to take place during training itself at any point.
Say at about when 60% of the training is complete, the training is stopped and model is evaluated, based on which if the model is approved, it resumes the training.
How can the above scenario be implemented?
No you should only evaluate during the testing time if you tries to evaluate during train time like this you cant get the perfect accuracy of your model. As 60% training is done only the model is not trained on full dataset it might gave you high accuracy but your model can be overfitted model.
I have trained a model with keras/tensorflow in R Studio and stored it.
I now reload the model with
my_model <- load_model_hdf5("my_model.h5")
with
summary(my_model)
I get a summary of the model, including number and size of layers.
Yet, I don't see what the activation functions are.
Is there a way to the access the activation functions?
Secondly, is there also a way to access the hyperparameters as epoch number, batch size,.. that were used in training this model?
You can check Netron with loading your .h5 file. Netron is highly useful.
I did a deep learning model using keras. Model accuracy has 99% score.
$`loss`
[1] 0.03411416
$acc
[1] 0.9952607
When I do a prediction classes on my new data file using the model I have only 87% of classes well classified. My question is, why there is a difference between model accuracy and model prediction score?
Your 99% is on the Training Set, this is an indicator of own is performing your algorithm while training, you should never look at it as a reference.
You should always look at your Test Set, this is the real value that matters.
Fore more, your accuracies should always look like this (at least the style):
e.g. The training set accuracy always growing and the testing set following the same trend but below the training curve.
You will always never have the exact two same sets (training & testing/validating) so this is normal to have a difference.
The objective of the training set is to generalize your data and learn from them.
The objective of the testing set is to see if you generalized well.
If you're too far from your training set, either there a lot of difference between the two sets (mostly distribution, data types etc..), or if they are similar then your model overfits (which means your model is too close to your training data and if there is a little difference in your testing data, this will lead to wrong predictions).
The reason the model overfits is often that your model is too complicated and you must simplify it (e.g. reduce number of layers, reduce number of neurons.. etc)
I am very new to machine learning. I have a question about running predict on data used for training set.
Here are details: I took a portion of my initial dataset and split that portion into 80% (train) and 20% (test). I trained the model on 80% of training set
model <- train(name ~ ., data = train.df, method = ...)
and then run the model on 20% test data:
predict(model, newdata = test.df, type = "prob")
Now I want to predict using my trained model on initial dataset which also includes the training portion. Do I need to exclude that portion that was used for the training?
When you report accuracy to a third person about how good your machine learning model works, you always report the accuracy you get on the data set that was not used in training (and validation).
You can report your accuracy numbers for the over all data set but always include the remark that this data set also includes the data partition that was used for training the machine learning algorithm.
This care is taken to make sure your algorithm has not overfitted on your training set: https://en.wikipedia.org/wiki/Overfitting
Julie, I saw your comment below your original post. I would suggest you edit the original post and include your data split to be more complete in your question. It would also help to know what method of regression/classification you're using.
I'm assuming you're trying to assess the accuracy of your model with the 90% of data you left out. Depending on the number of samples you used in your training set you may or may not have the accuracy you'd like. Accuracy will also depend on your approach to the method of regression/classification you used.
To answer your question directly: you don't need to exclude anything from your dataset - the model doesn't change when you call predict().
All you're doing when you call predict is filling in the x-variables in your model with whatever data you supply. Your model was fitted to your training set, so if you supply training set data again it will still create predictions. Note though, for proving accuracy your results will be skewed if you include the set of data that you fit the model to since that's what it learned from to create predictions in the first place - kind of like watching a game, and then watching the same game again and being asked to make predictions about it.