hydra composition for ML models - fb-hydra

I have some configurations for ML components as follows:
ml/encoder.yaml
hidden_layers_sizes: [2000, 1000, 300]
z_dim: 50
ml/decoder.yaml
hidden_layers_sizes: [300, 1000, 2000]
z_dim: 50
Now I have another configuration file as models/vae.yaml which I want to define as having these encoder and decoder configurations.
So the whole thing is structured as:
- conf
- ml
- encoder.yaml
- decoder.yaml
- models
- vae.yaml
How should I define in vae.yamlso that the configuration of the encoders and decoders can be passed down to the underlying object (and be overridden if possible by the command line)?
I tried something like:
# #package _global_
defaults:
- override /ml/encoder: encoder
- override /ml/decoder: decoder
However, this results in Could not override 'ml/encoder'. No match in the defaults list.

I managed to get it working as:
defaults:
- encoder: vae_encoder
- decoder: vae_decoder
I changed the config to look as:
- conf
- models
- encoder
- encoder.yaml
- decoder
- decoder.yaml
- vae.yaml

Related

How to use a config group multiple times, while overriding each instance

Here is my current config structure
hydra/
pipeline/
common/
feature.yaml
stage/
train.yaml
with the following files:
train.yaml
# #package _global_
defaults:
- _self_
- ../pipeline/common#train: feature
- ../pipeline/common#val: feature
train:
conf:
split: train
val:
conf:
split: val
pipeline:
- ${oc.dict.values: train.steps}
- ${oc.dict.values: val.steps}
feature.yaml
conf:
split: train
steps:
tabular:
name: "${conf.split}-tabular
class: FeatureGeneration
dataset:
datasources: [ "${conf.split}_split" ]
What I've accomplished:
I've been able to figure out how to use the config group multiple times utilizing the defaults in train.yaml.
What I'm stuck on:
I'm getting an error: InterpolationKeyError 'conf.split' not found
I do realize that imports are absolute. If I put #package common.feature at the beginning of feature.yaml I can import conf.split via common.feature.conf.split, but is there not a cleaner way? I tried relative imports but got the same error.
I can't seem to override conf.split from train.yaml. You can see where I set train.conf.split and val.conf.split but these do not get propagated. What I need to be able to do is have each instance of the config group utilize a different conf.split value. This is the biggest issue I'm facing.
What I've referenced so far:
The following resources have gotten me to where I am so far, but am still having trouble with what's listed above.
Hydra : how to assign config files from same group to two different fields
https://hydra.cc/docs/advanced/overriding_packages/
https://hydra.cc/docs/patterns/extending_configs/
Interpolation is not import and it's evaluated at when you access the config node. At that point your config is already composed so it should be straight forward to use either absolute interpolation (the default) or relative based on the structure of your final config.
Hard to be 100% sure, but I suspect this problem is because your defaults list has _self_ at the beginning. This means that the content of the config with containing the defaults list is overridden by what comes after in the defaults list.
Try to move _self_ to the end:
# #package _global_
defaults:
- ../pipeline/common#train: feature
- ../pipeline/common#val: feature
- _self_
#...

Hydra: how to use variable interpolation in packaged configs

I have some config file, model/foo.yaml:
# #package _global_
# foo.yaml
MODEL:
BACKBONE:
OUT_FEATURES: [c4, c5]
HEAD:
IN_FEATURES: ${MODEL.BACKBONE.OUT_FEATURES}
There are no issues with variable interpolation when I point to this config in the defaults-list of another config, eg buzz.yaml, except when I also override the package like so:
# buzz.yaml
defaults:
- model#foo_head: foo
Attempting to compose buzz.yaml, you will get an error like:
omegaconf.errors.InterpolationKeyError: Interpolation key 'MODEL.BACKBONE.OUT_FEATURES' not found
Can variable interpolation not be used in configs when packaging?
Yes. OmegaConf supports relative interpolation.
MODEL:
BACKBONE:
OUT_FEATURES: [c4, c5]
HEAD:
IN_FEATURES: ${..BACKBONE.OUT_FEATURES}
I strongly recommend that you read the docs of OmegaConf.

Set list of config nodes as value entries in yaml contrasting with structured configs in Hydra

I would like to get a list of configs as a (default) value entry
and use a structured schema to validate the input list.
E.g., in trainer.yaml:
defaults:
- callbacks:
- checkpointer
- early_stopping
In callbacks/checkpointer.yaml and callbacks/early_stopping.yaml I have a link to appropriate structured configs as default values, e.g.:
# callbacks/checkpointer.yaml
defaults:
- /trainer_lib/callbacks/base_checkpointer#_here_
The structured schema:
#dataclass
class CheckpointerConfig:
_target_: str = "some_library_class"
data_dir: str = "folder"
#dataclass
class TrainerConfig:
callbacks: List[Any] = MISSING
and config store:
cs = ConfigStore.instance()
cs.store(group="trainer_lib/callbacks", name="base_checkpointer", node=CheckpointerConfig)
I am not sure what is the correct syntax (what I tried fails) to accomplish this. I get an omegaconf.errors.ConfigTypeError: Cannot merge DictConfig with ListConfig.
Is there a way to accomplish this? Thanks.
Discussion on this topic in this Hydra issue.
Are you on Hydra 1.0? This is actually supported in Hydra 1.1. Here is the documentation: https://hydra.cc/docs/next/patterns/select_multiple_configs_from_config_group

How to do file over-rides in hydra?

I have a main config file, let's say config.yaml:
num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000
I'd like to be able to override this, on the command-line, with another file, like say big_model.yaml, which I'd use conceptually like:
python my_script.py --override big_model.yaml
and big_model.yaml might look like:
num_layers: 8
embedding_size: 1024
I'd like to be able to override with an arbitrary number of such files, each one taking priority over the last. Let's say I also have fast_learn.yaml
learning_rate: 2.0
And so I'd then want to conceptually do something like:
python my_script.py --override big_model.yaml --override fast_learn.yaml
What is the easiest/most standard way to do this in hydra? (or potentially in omegaconf perhaps?)
(note that I'd like these override files to ideally just be standard yaml files, that override the earlier yaml files, ideally; though if I have to write using override DSL instead, I can do that, if that's the easiest/best/most standard way)
It sounds like package override might be the a good solution for you.
The documentation can be found here: https://hydra.cc/docs/next/advanced/overriding_packages
an example application can be found here:
https://github.com/facebookresearch/hydra/tree/master/examples/advanced/package_overrides
using the example application as an example, you can achieve the override by doing something like
$ python simple.py db=postgresql db.pass=helloworld
db:
driver: postgresql
user: postgre_user
pass: helloworld
timeout: 10
Refer to the basic tutorial and read about config groups.
You can create arbitrary config groups, and select one option from each (As of Hydra 1.0, config groups options are mutually exclusive), you will need two config groups here:
one can be model, with a normal, small and big model, and another can trainer, with maybe normal and fast options.
Config groups can also override things in other config groups.
You can also always append to the defaults list from the command line - so you can also add additional config groups that are only used in the command line.
an example for that can an 'experiment' config group. You can use it as:
$ python train.py +experiment=exp1
In such config groups that are overriding things across the entire config you should use the global package (read more about packages in the docs).
# #package _global_
num_layers: 8
embedding_size: 1024
learning_rate: 2.0

How to delete an inherit property from yaml config?

I have a yaml file like this:
local: &local
image: xxx
# *tons of config*
ci:
<<: *local
image: # delete
build: .
I want ci to inherit all values from local, except the image.
Is there a way to "delete" this value?
No there isn't a way to mark a key for deletion in a YAML file. You can only overwrite existing values.
And the latter is what you do, you associate the empty scalar as value to the key image as if you would have written:
image: null # delete
There are two things you can do: post-process or make a base mapping in your YAML file.
If you want to post-process, you associate a special unique value to image, or a specially tagged object, and after loading recursively walk over the tree to remove key-value pairs with this special value. Whether you can already do this during parsing, using hooks or overwriting some of its methods, depends on the parser.
Using a base mapping requires less work, but is more intrusive to the YAML file:
localbase: &lb
# *tons of config*
local: &local
image: xxx
ci:
<<: *lb
build: .
If you do the former you should note that if you use a parsers that preserve the "merge-hierarchy" on round-tripping (like my ruamel.yaml parser can do) it is not enough to delete the key-value pair, in that case the original from local would come back. Other parsers that simply resolve this at load time don't have this issue.
For properties that accept a list of values, you can send [] as value.
For example in docker-compose you don't want to inherit ports:
service_1: &service_1
# some other properties.
ports:
- "49281:22"
- "8876:8000"
# some other properties
image: some_image:latest
service_2:
<<: *service_1
ports: [] # it removes ports values.
image: null # it removes image value.

Resources