Set list of config nodes as value entries in yaml contrasting with structured configs in Hydra - fb-hydra

I would like to get a list of configs as a (default) value entry
and use a structured schema to validate the input list.
E.g., in trainer.yaml:
defaults:
- callbacks:
- checkpointer
- early_stopping
In callbacks/checkpointer.yaml and callbacks/early_stopping.yaml I have a link to appropriate structured configs as default values, e.g.:
# callbacks/checkpointer.yaml
defaults:
- /trainer_lib/callbacks/base_checkpointer#_here_
The structured schema:
#dataclass
class CheckpointerConfig:
_target_: str = "some_library_class"
data_dir: str = "folder"
#dataclass
class TrainerConfig:
callbacks: List[Any] = MISSING
and config store:
cs = ConfigStore.instance()
cs.store(group="trainer_lib/callbacks", name="base_checkpointer", node=CheckpointerConfig)
I am not sure what is the correct syntax (what I tried fails) to accomplish this. I get an omegaconf.errors.ConfigTypeError: Cannot merge DictConfig with ListConfig.
Is there a way to accomplish this? Thanks.

Discussion on this topic in this Hydra issue.

Are you on Hydra 1.0? This is actually supported in Hydra 1.1. Here is the documentation: https://hydra.cc/docs/next/patterns/select_multiple_configs_from_config_group

Related

How to use a config group multiple times, while overriding each instance

Here is my current config structure
hydra/
pipeline/
common/
feature.yaml
stage/
train.yaml
with the following files:
train.yaml
# #package _global_
defaults:
- _self_
- ../pipeline/common#train: feature
- ../pipeline/common#val: feature
train:
conf:
split: train
val:
conf:
split: val
pipeline:
- ${oc.dict.values: train.steps}
- ${oc.dict.values: val.steps}
feature.yaml
conf:
split: train
steps:
tabular:
name: "${conf.split}-tabular
class: FeatureGeneration
dataset:
datasources: [ "${conf.split}_split" ]
What I've accomplished:
I've been able to figure out how to use the config group multiple times utilizing the defaults in train.yaml.
What I'm stuck on:
I'm getting an error: InterpolationKeyError 'conf.split' not found
I do realize that imports are absolute. If I put #package common.feature at the beginning of feature.yaml I can import conf.split via common.feature.conf.split, but is there not a cleaner way? I tried relative imports but got the same error.
I can't seem to override conf.split from train.yaml. You can see where I set train.conf.split and val.conf.split but these do not get propagated. What I need to be able to do is have each instance of the config group utilize a different conf.split value. This is the biggest issue I'm facing.
What I've referenced so far:
The following resources have gotten me to where I am so far, but am still having trouble with what's listed above.
Hydra : how to assign config files from same group to two different fields
https://hydra.cc/docs/advanced/overriding_packages/
https://hydra.cc/docs/patterns/extending_configs/
Interpolation is not import and it's evaluated at when you access the config node. At that point your config is already composed so it should be straight forward to use either absolute interpolation (the default) or relative based on the structure of your final config.
Hard to be 100% sure, but I suspect this problem is because your defaults list has _self_ at the beginning. This means that the content of the config with containing the defaults list is overridden by what comes after in the defaults list.
Try to move _self_ to the end:
# #package _global_
defaults:
- ../pipeline/common#train: feature
- ../pipeline/common#val: feature
- _self_
#...

Hydra: how to use variable interpolation in packaged configs

I have some config file, model/foo.yaml:
# #package _global_
# foo.yaml
MODEL:
BACKBONE:
OUT_FEATURES: [c4, c5]
HEAD:
IN_FEATURES: ${MODEL.BACKBONE.OUT_FEATURES}
There are no issues with variable interpolation when I point to this config in the defaults-list of another config, eg buzz.yaml, except when I also override the package like so:
# buzz.yaml
defaults:
- model#foo_head: foo
Attempting to compose buzz.yaml, you will get an error like:
omegaconf.errors.InterpolationKeyError: Interpolation key 'MODEL.BACKBONE.OUT_FEATURES' not found
Can variable interpolation not be used in configs when packaging?
Yes. OmegaConf supports relative interpolation.
MODEL:
BACKBONE:
OUT_FEATURES: [c4, c5]
HEAD:
IN_FEATURES: ${..BACKBONE.OUT_FEATURES}
I strongly recommend that you read the docs of OmegaConf.

How to do file over-rides in hydra?

I have a main config file, let's say config.yaml:
num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000
I'd like to be able to override this, on the command-line, with another file, like say big_model.yaml, which I'd use conceptually like:
python my_script.py --override big_model.yaml
and big_model.yaml might look like:
num_layers: 8
embedding_size: 1024
I'd like to be able to override with an arbitrary number of such files, each one taking priority over the last. Let's say I also have fast_learn.yaml
learning_rate: 2.0
And so I'd then want to conceptually do something like:
python my_script.py --override big_model.yaml --override fast_learn.yaml
What is the easiest/most standard way to do this in hydra? (or potentially in omegaconf perhaps?)
(note that I'd like these override files to ideally just be standard yaml files, that override the earlier yaml files, ideally; though if I have to write using override DSL instead, I can do that, if that's the easiest/best/most standard way)
It sounds like package override might be the a good solution for you.
The documentation can be found here: https://hydra.cc/docs/next/advanced/overriding_packages
an example application can be found here:
https://github.com/facebookresearch/hydra/tree/master/examples/advanced/package_overrides
using the example application as an example, you can achieve the override by doing something like
$ python simple.py db=postgresql db.pass=helloworld
db:
driver: postgresql
user: postgre_user
pass: helloworld
timeout: 10
Refer to the basic tutorial and read about config groups.
You can create arbitrary config groups, and select one option from each (As of Hydra 1.0, config groups options are mutually exclusive), you will need two config groups here:
one can be model, with a normal, small and big model, and another can trainer, with maybe normal and fast options.
Config groups can also override things in other config groups.
You can also always append to the defaults list from the command line - so you can also add additional config groups that are only used in the command line.
an example for that can an 'experiment' config group. You can use it as:
$ python train.py +experiment=exp1
In such config groups that are overriding things across the entire config you should use the global package (read more about packages in the docs).
# #package _global_
num_layers: 8
embedding_size: 1024
learning_rate: 2.0

How to remove duplicate lines in YAML format configuration files?

I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:
...
app: activity-worker
app: activity-worker
...
I need to search through each of those files and find those duplicates so that I can remove one of them.
Note: I know that to replace a certain string (say, switch service: to app:) in all files of a directory (say, dev) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/\service:/app:/g'. I'm looking for a relation between lines.
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}

How to write a custom constraint in the heat template of openstack?

I find I can write anything,like this
constraints:
- custom_constraint:here anything
description: Value must be one of m1.medium, m1.large or m1.xlarge
and in CLI do this WILL BE OK -> heat template-validate -f bad.yaml
And in the document ,just tell you that's a plugin,how should i write a validation plugin????
If you look at the setup.cfg file in the heat source code you will find there is a section that lists the constraints:
heat.constraints =
nova.flavor = heat.engine.clients.os.nova:FlavorConstraint
nova.network = heat.engine.clients.os.nova:NetworkConstraint
...
These reference classes: so if you look at the FlavorConstraint, you will find it in the file heat/heat/engine/clients/os/nova.py
An examination of the class should give you an idea of how to write your own.
That is, if I'm understanding your question (and the code!) correctly.

Resources