How to use a config group multiple times, while overriding each instance - fb-hydra

Here is my current config structure
hydra/
pipeline/
common/
feature.yaml
stage/
train.yaml
with the following files:
train.yaml
# #package _global_
defaults:
- _self_
- ../pipeline/common#train: feature
- ../pipeline/common#val: feature
train:
conf:
split: train
val:
conf:
split: val
pipeline:
- ${oc.dict.values: train.steps}
- ${oc.dict.values: val.steps}
feature.yaml
conf:
split: train
steps:
tabular:
name: "${conf.split}-tabular
class: FeatureGeneration
dataset:
datasources: [ "${conf.split}_split" ]
What I've accomplished:
I've been able to figure out how to use the config group multiple times utilizing the defaults in train.yaml.
What I'm stuck on:
I'm getting an error: InterpolationKeyError 'conf.split' not found
I do realize that imports are absolute. If I put #package common.feature at the beginning of feature.yaml I can import conf.split via common.feature.conf.split, but is there not a cleaner way? I tried relative imports but got the same error.
I can't seem to override conf.split from train.yaml. You can see where I set train.conf.split and val.conf.split but these do not get propagated. What I need to be able to do is have each instance of the config group utilize a different conf.split value. This is the biggest issue I'm facing.
What I've referenced so far:
The following resources have gotten me to where I am so far, but am still having trouble with what's listed above.
Hydra : how to assign config files from same group to two different fields
https://hydra.cc/docs/advanced/overriding_packages/
https://hydra.cc/docs/patterns/extending_configs/

Interpolation is not import and it's evaluated at when you access the config node. At that point your config is already composed so it should be straight forward to use either absolute interpolation (the default) or relative based on the structure of your final config.
Hard to be 100% sure, but I suspect this problem is because your defaults list has _self_ at the beginning. This means that the content of the config with containing the defaults list is overridden by what comes after in the defaults list.
Try to move _self_ to the end:
# #package _global_
defaults:
- ../pipeline/common#train: feature
- ../pipeline/common#val: feature
- _self_
#...

Related

Hydra: how to use variable interpolation in packaged configs

I have some config file, model/foo.yaml:
# #package _global_
# foo.yaml
MODEL:
BACKBONE:
OUT_FEATURES: [c4, c5]
HEAD:
IN_FEATURES: ${MODEL.BACKBONE.OUT_FEATURES}
There are no issues with variable interpolation when I point to this config in the defaults-list of another config, eg buzz.yaml, except when I also override the package like so:
# buzz.yaml
defaults:
- model#foo_head: foo
Attempting to compose buzz.yaml, you will get an error like:
omegaconf.errors.InterpolationKeyError: Interpolation key 'MODEL.BACKBONE.OUT_FEATURES' not found
Can variable interpolation not be used in configs when packaging?
Yes. OmegaConf supports relative interpolation.
MODEL:
BACKBONE:
OUT_FEATURES: [c4, c5]
HEAD:
IN_FEATURES: ${..BACKBONE.OUT_FEATURES}
I strongly recommend that you read the docs of OmegaConf.

Set list of config nodes as value entries in yaml contrasting with structured configs in Hydra

I would like to get a list of configs as a (default) value entry
and use a structured schema to validate the input list.
E.g., in trainer.yaml:
defaults:
- callbacks:
- checkpointer
- early_stopping
In callbacks/checkpointer.yaml and callbacks/early_stopping.yaml I have a link to appropriate structured configs as default values, e.g.:
# callbacks/checkpointer.yaml
defaults:
- /trainer_lib/callbacks/base_checkpointer#_here_
The structured schema:
#dataclass
class CheckpointerConfig:
_target_: str = "some_library_class"
data_dir: str = "folder"
#dataclass
class TrainerConfig:
callbacks: List[Any] = MISSING
and config store:
cs = ConfigStore.instance()
cs.store(group="trainer_lib/callbacks", name="base_checkpointer", node=CheckpointerConfig)
I am not sure what is the correct syntax (what I tried fails) to accomplish this. I get an omegaconf.errors.ConfigTypeError: Cannot merge DictConfig with ListConfig.
Is there a way to accomplish this? Thanks.
Discussion on this topic in this Hydra issue.
Are you on Hydra 1.0? This is actually supported in Hydra 1.1. Here is the documentation: https://hydra.cc/docs/next/patterns/select_multiple_configs_from_config_group

How to do file over-rides in hydra?

I have a main config file, let's say config.yaml:
num_layers: 4
embedding_size: 512
learning_rate: 0.2
max_steps: 200000
I'd like to be able to override this, on the command-line, with another file, like say big_model.yaml, which I'd use conceptually like:
python my_script.py --override big_model.yaml
and big_model.yaml might look like:
num_layers: 8
embedding_size: 1024
I'd like to be able to override with an arbitrary number of such files, each one taking priority over the last. Let's say I also have fast_learn.yaml
learning_rate: 2.0
And so I'd then want to conceptually do something like:
python my_script.py --override big_model.yaml --override fast_learn.yaml
What is the easiest/most standard way to do this in hydra? (or potentially in omegaconf perhaps?)
(note that I'd like these override files to ideally just be standard yaml files, that override the earlier yaml files, ideally; though if I have to write using override DSL instead, I can do that, if that's the easiest/best/most standard way)
It sounds like package override might be the a good solution for you.
The documentation can be found here: https://hydra.cc/docs/next/advanced/overriding_packages
an example application can be found here:
https://github.com/facebookresearch/hydra/tree/master/examples/advanced/package_overrides
using the example application as an example, you can achieve the override by doing something like
$ python simple.py db=postgresql db.pass=helloworld
db:
driver: postgresql
user: postgre_user
pass: helloworld
timeout: 10
Refer to the basic tutorial and read about config groups.
You can create arbitrary config groups, and select one option from each (As of Hydra 1.0, config groups options are mutually exclusive), you will need two config groups here:
one can be model, with a normal, small and big model, and another can trainer, with maybe normal and fast options.
Config groups can also override things in other config groups.
You can also always append to the defaults list from the command line - so you can also add additional config groups that are only used in the command line.
an example for that can an 'experiment' config group. You can use it as:
$ python train.py +experiment=exp1
In such config groups that are overriding things across the entire config you should use the global package (read more about packages in the docs).
# #package _global_
num_layers: 8
embedding_size: 1024
learning_rate: 2.0

What *is* a salt formula, really?

I am trying to work through the Salt Formulas documentation and seem to be having a fundamental misunderstanding of what a salt formula really is.
Understandably, this question may seem like a duplicate of these questions, but due to my failing to grasp the basic concepts I'm also struggling to make use of the answers to these questions.
I thought, that a salt formula is basically just a package that implements extra functions, a lot like
#include <string.h>
in C, or
import numpy as np
in Python. Thus, I thought, I could download the salt-formula-linux to /srv/formulas/salt-formula-linux/, add that to file_roots, restart the master (all as per the docs), and then write a file like swapoff.sls containing
disable_swap:
linux:
storage:
swap:
file:
enabled: False
(the above is somewhat similar to the examples in the repo's root) in hope that the formula would then handle removing the swap entry from /etc/fstab and running swapoff -a for me. Needless to say, this didn't work, clearly because I'm not understanding what a salt formula is meant to be.
So, what is a salt formula and how do I use it? Can I make use of it as a library of functions too?
This answer might not be fully correct in all technicalities, but this is what solved my problem.
A salt formula is not a library of functions. It is, rather, a collection of state files. While often a state file can be very simple, such as some of my user defined
--> top.sls <--
base:
'*':
- docker
--> docker.sls <--
install_docker_1703:
pkgrepo.managed:
# stuff
pkg.installed:
- name: docker-ce
creating a state file like
--> swapoff.sls <--
disable_swap:
linux.storage.swap: # and so on
is, perhaps, not the way to go. Well, at least, maybe not for a beginner with lacking knowledge.
Instead, add an item to top.sls:
- linux.storage.swap
This is not enough, however. Most formulas (or the state files within them, if you will) are highly parametrizable, i.e. they're full of placeholders with variable names, such as {{ swap.device }}. If there's nothing to fill this gap, the state fill will not be able to do anything. These gaps are filled from pillars.
All that remains, is to create a file like swap.sls in /srv/pillar/ that would contain something like (as per the examples of that formula)
linux:
storage:
enabled: true
swap:
file:
enabled: true
engine: file
device: /swapfile
size: 1024
and also /srv/pillar/top.sls with
base:
'*':
- swap
Perhaps /srv/pillar should also be included in pillar_roots in /etc/salt/master.
So now /srv/salt/top.sls runs /srv/formulas/salt-formula-linux/linux/storage/swap.sls which using the guidance of /srv/pillar/top.sls pulls some parameters from /srv/pillar/swap.sls and enables a swapfile.

How to delete an inherit property from yaml config?

I have a yaml file like this:
local: &local
image: xxx
# *tons of config*
ci:
<<: *local
image: # delete
build: .
I want ci to inherit all values from local, except the image.
Is there a way to "delete" this value?
No there isn't a way to mark a key for deletion in a YAML file. You can only overwrite existing values.
And the latter is what you do, you associate the empty scalar as value to the key image as if you would have written:
image: null # delete
There are two things you can do: post-process or make a base mapping in your YAML file.
If you want to post-process, you associate a special unique value to image, or a specially tagged object, and after loading recursively walk over the tree to remove key-value pairs with this special value. Whether you can already do this during parsing, using hooks or overwriting some of its methods, depends on the parser.
Using a base mapping requires less work, but is more intrusive to the YAML file:
localbase: &lb
# *tons of config*
local: &local
image: xxx
ci:
<<: *lb
build: .
If you do the former you should note that if you use a parsers that preserve the "merge-hierarchy" on round-tripping (like my ruamel.yaml parser can do) it is not enough to delete the key-value pair, in that case the original from local would come back. Other parsers that simply resolve this at load time don't have this issue.
For properties that accept a list of values, you can send [] as value.
For example in docker-compose you don't want to inherit ports:
service_1: &service_1
# some other properties.
ports:
- "49281:22"
- "8876:8000"
# some other properties
image: some_image:latest
service_2:
<<: *service_1
ports: [] # it removes ports values.
image: null # it removes image value.

Resources