Performance drop when converting Mask RCNN to uff format - tensorrt

My goal is to deploy a Mask RCNN model trained with the well known Matterport's repo with Nvidia deepstream.
To do so, first I have to convert the generated .h5 model into a .uff. This operation is decribed here.
After the conversion, I have run the generated .uff model with TensoRT and deepstream and it has a very poor performance compared to the .h5model (almost never detects/masks the objects).
Before the conversion, I have done the corresponding changes to handle NCWH models and configured the number of classes and backbone (in this case resnet50).
I don't know how to continue. Any advice could really healp me. Thanks!

To solve the problem one must use the same configuration for the training and the conversion.
In particular, since most of models start from tranfering learning from the pretrained coco model, one has to use its very same config.
In adition, the input images sizes have to be coherent with the trainning configuration.

Related

Dealing with class imbalance with mlr3

Lately I have been advised to change machine learning framework to mlr3. But I am finding transition somewhat more difficult than I thought at the beginning. In my current project I am dealing with highly imbalanced data which I would like to balance before training my model. I have found out this tutorial which explains how to deal with imbalance via pipelines and graph learner:
https://mlr3gallery.mlr-org.com/posts/2020-03-30-imbalanced-data/
I am afraid that this approach will also perform class balancing with new data predicting. Why would I want to do this and reduce my testing sample ?
So the two question that are rising:
Am I correct not to balance classes in testing data?
If so, is there a way of doing this in mlr3?
Of course I could just subset the training data manually and deal with imbalance myself but that's just not fun anymore! :)
Anyway, thanks for any answers,
Cheers!
to answer your questions:
I am afraid that this approach will also perform class balancing with new data predicting.
This is not correct, where did you get this?
Am I correct not to balance classes in testing data?
Class balancing usually works by adding or removing rows (or adjusting weights). All those steps should not be applied during the prediction step, as we want exactly one predicted value for each row in the data. Weights on the other hand usually have no effect during the prediction phase.
Your assumption is correct.
If so, is there a way of doing this in mlr3?
Just use the PipeOpas described in the blog post.
During training, it will do the specified over- or under- sampling, while it does nothing during the prediction.
Cheers,

Extract sample of features used to build each tree in H2O

In GBM model, following parameters are used -
col_sample_rate
col_sample_rate_per_tree
col_sample_rate_change_per_level
I understand how the sampling works and how many variables get considered for splitting at each level for every tree. I am trying to understand how many times each feature gets considered for making a decision. Is there a way to easily extract all sample of features used for making a splitting decision from the model object?
Referring to the explanation provided by H2O, http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/col_sample_rate.html, is there a way to know 60 randomly chosen features for each split?
Thank you for your help!
If you want to see which features were used at a given split in a give tree you can navigate the H2OTree object.
For R see documentation here and here
For Python see documentation here
You can also take a look at this Blog (if this link ever dies just do a google search for H2OTree class)
I don’t know if I would call this easy, but the MOJO tree visualizer spits out a graphviz dot data file which is turned into a visualization. This has the information you are interested in.
http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo

data mining with unstructured data how to implement?

I have unstructured data (screenshot of app) and semi-structured data(screen dumping file), i chose store it in hbase. my goal is find defect or issue on app (meaningfull data). Now, I'd like to apply data mining on these, so that is kind of text mining ? and how can i apply some data mining technical on this data ?
To begin with, you can use rule based approach where you define set of rules which detects the defect scenario.
Then you can prepare training data set which has many instances of defect, non-defect scenarios. In this step, for each screenshot or screen dump file you collect; you would manually tag it as defect or non-defect.
Then you can train classifier using this training data. Classifier would try to generalize training samples to predict the output label for the samples not seen in the past.
Since, your input is non-standard you might need some preprocessing to convert your input to standard form. For example, to process screenshots you might need some image processing, OCR, computer vision libraries.

Fastest way to reduce dimensionality for multi-classification in R

What I currently have:
I have a data frame with one column of factors called "Class" which contains 160 different classes. I have 1200 variables, each one being an integer and no individual cell exceeding the value of 1000 (if that helps). About 1/4 of the cells are the number zero. The total dataset contains 60,000 rows. I have already used the nearZeroVar function, and the findCorrelation function to get it down to this number of variables. In my particular dataset some individual variables may appear unimportant by themselves, but are likely to be predictive when combined with two other variables.
What I have tried:
First I tried just creating a random forest model then planned on using the varimp property to filter out the useless stuff, gave up after letting it run for days. Then I tried using fscaret, but that ran overnight on a 8-core machine with 64GB of RAM (same as the previous attempt) and didn't finish. Then I tried:
Feature Selection using Genetic Algorithms That ran overnight and didn't finish either. I was trying to make principal component analysis work, but for some reason couldn't. I have never been able to successfully do PCA within Caret which could be my problem and solution here. I can follow all the "toy" demo examples on the web, but I still think I am missing something in my case.
What I need:
I need some way to quickly reduce the dimensionality of my dataset so I can make it usable for creating a model. Maybe a good place to start would be an example of using PCA with a dataset like mine using Caret. Of course, I'm happy to hear any other ideas that might get me out of the quicksand I am in right now.
I have done only some toy examples too.
Still, here are some ideas that do not fit into a comment.
All your attributes seem to be numeric. Maybe running the Naive Bayes algorithm on your dataset will gives some reasonable classifications? Then, all attributes are assumed to be independent from each other, but experience shows / many scholars say that NaiveBayes results are often still useful, despite strong assumptions?
If you absolutely MUST do attribute selection .e.g as part of an assignment:
Did you try to process your dataset with the free GUI-based data-mining tool Weka? There is an "attribute selection" tab where you have several algorithms (or algorithm-combinations) for removing irrelevant attributes at your disposal. That is an art, and the results are not so easy to interpret, though.
Read this pdf as an introduction and see this video for a walk-through and an introduction to the theoretical approach.
The videos assume familiarity with Weka, but maybe it still helps.
There is an RWeka interface but it's a bit laborious to install, so working with the Weka GUI might be easier.

OLS in Python with Dummy Variables - Best Solution?

I have a problem I am trying to solve in Python, and I have found multiple solutions (I think) but I am trying to figure out which one is the best. I am hoping to choose libraries that will be supported fully in the future so I do not have to re-write this service.
I want to do an ordinary multi-variate least squares regression with both categorical and continuous dependent variables. The code has to be written in Python, as it is being integrated into a web service. I have been following Pandas quite a bit but never used it, so this seems to be one approach:
SOLUTION 1. https://github.com/pydata/pandas/blob/master/examples/regressions.py
Obviously, numpy/scipy are ideal, but I cant find an example that uses dummy variables (does anyone have one???). I did find this though,
SOLUTION 2. http://www.scipy.org/Cookbook/OLS
which I could modify to support dummy variables, but I do not want to do that if someone else has done it already + I want the numbers to be very similar to R, as I have done most of my analysis offline and I can use these results for unit tests.
And in the example (2) above, I see that I could technically use rpy/rpy2, although that is not optimal because my web service requires yet another piece of technology (R). The good thing about using the interface is the numbers would be identical to my results from R.
SOLUTION 3. http://www.scipy.org/Cookbook/OLS (but using Rpy/Rpy2)
Anyways, I am interested in what everyone's approach would be out of these three solutions, if there are any I am missing ...... and if Panda's is mature enough to start using in a production web service. The key thing here is that I do not want to have to support/patch bug fixes or write anything from scratch if possible. I'm too busy and probably not smart enough :)
Thanks.
You can use statsmodels, which provides many different models and result statistics
If you want to use an R like formula interface, here are some examples and you can look at the corresponding documentation :
http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/contrasts.html
http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/example_formulas.html
If you want a pure numpy version, then here is an old example that does everything from scratch
http://statsmodels.sourceforge.net/devel/examples/generated/example_ols.html#ols-with-dummy-variables
The models are integrated with pandas, and can use pandas DataFrame as the data structure for the dependent and independent variables (endog and exog in statsmodels naming convention).

Resources