Is it possible to use the AllenNLP Semantic Role Labeler with BERT-Large instead of BERT-base? - bert-language-model

The BERT-based SRL model that Shi and Lin develop (which is currently the the backend of the AllenNLP SRL model) has more consistent advantages over Ouichi et al.'s (2018) ensemble model when using BERT-large, instead of BERT-base. For example, the Shi and Lin model gets better F1 than Ouichi on CoNLL 05 only when using BERT-Large.
So, is it possible to use the AllenNLP SRL model with BERT-large rather than BERT-base?

Related

How to use VGG19 in Flux.jl?

I have a specific computer vision problem that I want to try solving using some pre-trained models. The Flux.jl docs don't actually have any pre-trained models in them like some of the other ML frameworks (PyTorch as an example). How would I access those sort of pertained models in Flux?
In the Flux ecosystem, the functionality for something like pre-trained computer vision models has been extrapolated out into a separate package called MetalHead.jl: https://github.com/FluxML/Metalhead.jl
Per the docs there, you can create a VGG19 model by doing:
julia> vgg19 = VGG19()
VGG19()
and then you can pass the model to something like the classify function along with an input image for a validation test.

Is it possible to build a random forest with model based trees i.e., `mob()` in partykit package

I'm trying to build a random forest using model based regression trees in partykit package. I have built a model based tree using mob() function with a user defined fit() function which returns an object at the terminal node.
In partykit there is cforest() which uses only ctree() type trees. I want to know if it is possible to modify cforest() or write a new function which builds random forests from model based trees which returns objects at the terminal node. I want to use the objects in the terminal node for predictions. Any help is much appreciated. Thank you in advance.
Edit: The tree I have built is similar to the one here -> https://stackoverflow.com/a/37059827/14168775
How do I build a random forest using a tree similar to the one in above answer?
At the moment, there is no canned solution for general model-based forests using mob() although most of the building blocks are available. However, we are currently reimplementing the backend of mob() so that we can leverage the infrastructure underlying cforest() more easily. Also, mob() is quite a bit slower than ctree() which is somewhat inconvenient in learning forests.
The best alternative, currently, is to use cforest() with a custom ytrafo. These can also accomodate model-based transformations, very much like the scores in mob(). In fact, in many situations ctree() and mob() yield very similar results when provided with the same score function as the transformation.
A worked example is available in this conference presentation:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2017).
"Individual Treatment Effect Prediction Using Model-Based Random Forests."
Presented at Workshop "Psychoco 2017 - International Workshop on Psychometric Computing",
WU Wirtschaftsuniversität Wien, Austria.
URL https://eeecon.uibk.ac.at/~zeileis/papers/Psychoco-2017.pdf
The special case of model-based random forests for individual treatment effect prediction was also implemented in a dedicated package model4you that uses the approach from the presentation above and is available from CRAN. See also:
Heidi Seibold, Achim Zeileis, Torsten Hothorn (2019).
"model4you: An R Package for Personalised Treatment Effect Estimation."
Journal of Open Research Software, 7(17), 1-6.
doi:10.5334/jors.219

Fable functions - theoretical questions

My master thesis is in health forecasting and I'm using R (fable, fabletools, fasster) to implement the methods.
For the theoretical part of the thesis, I need to know the heuristics and the theoretical basis of each function I use.
I have been using Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos and I have already read R documentation on these functions but I still have some doubts.
I need information like what theoretical method they follow (ARIMA, Moving Averages, ANN, etc), the mathematical expression they use and how it is decided which is the best fit (for automatic methods):
I use the following methods and gathered some information about each one.
I'm new in this field and I need some help.
Is this correct? Can anyone add anything else about any of the functions?
ARIMA() - MSARIMA model (meaning an ARIMA model that is sensible to seasonality and can take into account several external regressors:
SNAIVE()- Linear regression with seasonality;
NNETAR() - ANN model;
fasster()
ETS()
Thank you in advance!
The book you cite contains information on how SNAIVE, NNETAR, ETS, and ARIMA forecasts are calculated. It explains that for model classes such as ETS and ARIMA, the AICc is used to select a particular model. It gives equations for all these methods. Please read it.
fasster() is a new method that is not fully documented yet. The readme file (https://github.com/tidyverts/fasster) provides some information, and there is a talk by the author (https://www.youtube.com/watch?v=6YlboftSalY) explaining the state space modelling framework behind it.

how to create a updatable coreml model?

I tried to build a pre-trained core-ml model with the help of create ML framework, but the model created is not updatable, Is there a way to create a pre-trained core-ml model which can be updated on the device itself (newly introduced feature in Core-ML 3) ?
Not directly with Create ML, you'll have to use coremltools to make the model updatable. See here for examples: https://github.com/apple/coremltools/tree/main/examples
However... this will only work for neural networks and k-nearest neighbors models. Create ML does not actually produce these kinds of models (at the moment).
For example, an image classifier trained with Create ML is a GLM on top of a fixed neural network. You cannot make GLM models updatable at this point.
So in short, no, you can't make models trained with Create ML updatable.

Can DOE driver results feed Metamodel component?

I am interested in exploring surrogate based optimization. I am not yet writing opendao code, just trying to figure out to what extent OpenMDAO will support this work.
I see that it has a DOE driver to generate training data (http://openmdao.readthedocs.org/en/1.5.0/usr-guide/tutorials/doe-drivers.html), I see that it has several surrogate models that can be added to a meta model (http://openmdao.readthedocs.org/en/1.5.0/usr-guide/examples/krig_sin.html). Yet, I haven't found an example where the results of the DOE are passed as training data to the Meta-model.
In many of the examples/tutorials/forum-posts it seems that the training data is created directly on or within the meta model. So it is not clear how these things work together.
Could the developers explain how training data is passed from a DOE to a meta model? Thanks!
In openmdao 1.x, this kind of process isn't directly supported (yet) via a DOE, but it is definitely possible. There are two paths that you can take, which offer different benefits depending on your eventual goal.
I will separate the different scenarios based on a single high level classification:
1) You want to do gradient based optimization around the whole DOE/Metamodel combination. This would be the case if, for example, you wanted to use CFD to predict drag at a few key points, then use a meta-model to generate a drag polar for mission analysis. A great example of this kind of modeling can be found in this paper on simultaneous aircraft-mission design optimization..
2) You don't want to do gradient based optimization around the whole model. You might want to do gradient free optimization (like a Genetic algorithm). You might want to do gradient based optimization just around the surrogate itself, with fixed training data. Or you might not want to do optimization at all...
If you're use case falls under scenario 1 (or will eventually fall under this use case in the future), then you want to use a multi-point approach. You create one instance of your model for each training case, then you can mux the results into an array you pass into meta-model. This is necessary so that derivatives can
be propagated through the full model. The multi-point approach will work well, and is very parallelizable. Depending on the structure of the model you will use for generating the training data itself, you might also consider a slightly different multi-point approach with a distributed component or a series of distributed components chained together. If your model will support it, the distributed component approach is the most efficient model structure to use in this case.
If you're use case falls into scenario 2, you can still employ the multi-point approach if you like. It will work out of the box. However, you could also consider using a regular DOE to generate the training data. In order to do this, you'll need to use a nested-problem approach, where you put the DOE training data generation in a sub-problem. This will also work, though it will take a bit of extra coding on your part to get the array of results out of the DOE because thats not currently implemented.
If you wanted to use the DOE to generate the data, then pass it downstream to a surrogate that would get optimized on, you could use a pair of problem instances. This would not necessarily require that you make nested problems at all. Instead you just build a run-script that has one problem instance that uses a DOE, when its done you collect the data into an array. Then you could manually assign that to the training inputs of a meta-model in a second problem instance. Something like the following pseudo-code:
prob1 = Problem()
prob1.driver = DOE()
#set up the DOE variables and model ...
prob1.run()
training_data = prob1.driver.results
prob2 = Problem()
prob2.driver = Optimizer()
#set up the meta-model and optimization problem
prob2['meta_model.train:x'] = training_data
prob2.run()

Resources