How to properly generate a multi-level composition? - fb-hydra

Currently, my hydra config is organized as follows:
configs/
├── config.yaml
├── data
│   ├── IMDB.yaml
│   └── REUT.yaml
└── model
├── BERT.yaml
├── GPT.yaml
└── loss
├── CrossEntropyLoss.yaml
└── TripletMarginLoss.yaml
config.yaml:
defaults:
- model: BERT
- data: IMDB
tasks: [ "fit", "eval" ]
The dataset (IMDB.yaml and REUT.yaml) settings are in the format:
name: IMDB
dir: resource/dataset/imdb_reviews/
folds: [0,1,2,3,4]
max_length: 256
num_classes: 10
The model (BERT.yaml and GPT.yaml) settings are in the format:
defaults:
- loss: TripletMarginLoss
name: BERT
architecture: bert-base-uncased
lr: 5e-5
tokenizer:
architecture: ${model.architecture}
And finally, the loss function settings (CrossEntropyLoss.yaml and TripletMarginLoss.yam) adopt the following structure:
_target_: source.loss.TripletMarginLoss.TripletMarginLoss
params:
name: TripletMarginLoss
margin: 1.0
eps: 1e-6
reduction: mean
Running the following entry point:
#hydra.main(config_path="configs/", config_name="config.yaml")
def my_app(params):
OmegaConf.resolve(params)
print(
f"Params:\n"
f"{OmegaConf.to_yaml(params)}\n")
if __name__ == '__main__':
my_app()
# python main.py
generates the correct config composition:
tasks:
- fit
- eval
model:
loss:
_target_: source.loss.TripletMarginLoss.TripletMarginLoss
params:
name: TripletMarginLoss
margin: 1.0
eps: 1.0e-06
reduction: mean
name: BERT
architecture: bert-base-uncased
lr: 5.0e-05
tokenizer:
architecture: bert-base-uncased
data:
name: IMDB
dir: resource/dataset/imdb_reviews/
folds:
- 0
- 1
- 2
- 3
- 4
max_length: 256
num_classes: 10
However, overriding the loss function generates the wrong config:
python main.py model.loss=CrossEntropyLoss
tasks:
- fit
- eval
model:
loss: CrossEntropyLoss
name: BERT
architecture: bert-base-uncased
lr: 5.0e-05
tokenizer:
architecture: bert-base-uncased
data:
name: IMDB
dir: resource/dataset/imdb_reviews/
folds:
- 0
- 1
- 2
- 3
- 4
max_length: 256
num_classes: 10
Therefore, how to correctly generate a multi-level composition?

Overriding nested config groups is done with / as separator as documented in the config group description here.
Try:
$ python main.py model/loss=CrossEntropyLoss

Related

Schema validation in Hydra not working when configuration path is parent folder

I have the following project setup:
configs/
├── default.yaml
└── trainings
├── data_config
│ └── default.yaml
├── simple.yaml
└── schema.yaml
The content of the files are as follows:
app.py:
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from omegaconf import MISSING, DictConfig, OmegaConf
import hydra
from hydra.core.config_store import ConfigStore
CONFIGS_DIR_PATH = Path(__file__).parent / "configs"
TRAININGS_DIR_PATH = CONFIGS_DIR_PATH / "trainings"
class Sampling(Enum):
UPSAMPLING = 1
DOWNSAMPLING = 2
#dataclass
class DataConfig:
sampling: Sampling = MISSING
#dataclass
class TrainerConfig:
project_name: str = MISSING
data_config: DataConfig = MISSING
# #hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
#hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
sampling = OmegaConf.to_container(cfg=configuration, resolve=True)["data_config"]["sampling"]
print(f"{sampling} Type: {type(sampling)}")
def register_schemas():
config_store = ConfigStore.instance()
config_store.store(name="base_schema", node=TrainerConfig)
if __name__ == "__main__":
register_schemas()
run()
configs/default.yaml:
defaults:
- /trainings#: simple
- _self_
project_name: test
configs/trainings/simple.yaml:
defaults:
- base_schema
- data_config: default
- _self_
project_name: test
configs/trainings/schema.yaml:
defaults:
- data_config: default
- _self_
project_name: test
configs/trainings/data_config/default.yaml:
defaults:
- _self_
sampling: DOWNSAMPLING
Now, when I run app.py as shown above, I get the expected result (namely, "DOWNSAMPLING" gets resolved to an enum type). However, when I try to run the application where it constructs the configuration from the default.yaml in the parent directory then I get this error:
So, when the code is like so:
...
#hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
# #hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
...
I get the error below:
In 'trainings/simple': Could not load 'trainings/base_schema'.
Config search path:
provider=hydra, path=pkg://hydra.conf
provider=main, path=file:///data/code/demos/hydra/configs
provider=schema, path=structured://
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I do not understand why specifying the schema to be used is causing this issue. Would someone have an idea why and what could be done to fix the problem?
If you are using default lists in more than one config file I strongly suggest that you fully read andf understand The Defaults List page.
Configs addressed in the defaults list are relative to the config group of the containing config.
The error is telling you that Hydra is looking for base_schema in trainings, because the defaults list that loads base_schema is in trainings.
Either put base_schema inside trainings when you register it:
config_store.store(group="trainings", name="base_schema", node=TrainerConfig)
Or use absolute addressing in the defaults list when addressing it (e.g. in configs/trainings/simple.yaml):
defaults:
- /base_schema
- data_config: default
- _self_

pre-commit config top level exclude doesn't work?

I have the following .pre-commit-config.yaml file. At the first line, I have this top level exclude path to exclude all checks in that folder, but for some reason, the mypy hook still outputs errors from files in that folder. Could someone help me understand what's going on? Thanks.
exclude: "src/project/proto/"
default_language_version:
python: python3.8
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-ast
- id: check-docstring-first
- id: check-executables-have-shebangs
- id: check-json
- id: check-merge-conflict
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/asottile/pyupgrade
rev: v3.2.2
hooks:
- id: pyupgrade
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.9.0
hooks:
- id: reorder-python-imports
args: [--py3-plus]
- repo: https://github.com/asottile/add-trailing-comma
rev: v2.3.0
hooks:
- id: add-trailing-comma
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.991
hooks:
- id: mypy
args: [--strict]
additional_dependencies:
[types-protobuf>=4.21.0.0, types-termcolor>=1.1.6, pytest>=7.2.0]
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
- id: flake8
I also have this mypy.ini. Just in case, I missed anything here.
[mypy]
python_version = 3.8
mypy_path = ./src:./tests
[mypy-lark.*]
ignore_missing_imports = True
it is working, but not in the way you expect
mypy performs dynamic analysis and will follow imports. pre-commit is not passing those filenames onto mypy but they are being checked due to follow-imports
you will need to also use mypy's settings to ignore errors there
disclaimer: I created pre-commit

How to specify a similiarities options checker of pylint in pre-commit args?

I've been trying to configure my pre-commit config file to ignore similiarities on imports and docstrings as described in: https://pylint.pycqa.org/en/latest/technical_reference/features.html?highlight=similar#similarities-checker-options
But I don't use any of .pylintrc files, just the .pre-commit-config.yaml.
Below is a snippet from .pre-commit-config.yaml
- id: pylint
name: pylint
entry: pylint
language: python
require_serial: true
types_or: [python, pyi]
exclude: 'kedro-init'
args: ['--disable=E0401,E1101,E1102,R0913,R0914,W0703,E0602,C0103,C0114,
W0102,C0330,C0326,W0107,R1716,R0902,E0611,E1124', '--fail-under=7.5',
'--ignore-imports=yes', '--ignore-docstrings=yes'] #this last line does not work
Is there any way to specify those options on the args key?
Thx!
python version: 3.8.x
pylint version: 2.12.3
If you retype your args in .pre-commit-config.yaml as below, it should work.
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: python
require_serial: true
types_or: [python, pyi]
exclude: 'kedro-init'
args: ["--ignore=E0401,E1101,E1102,R0913,R0914,W0703,E0602,C0103,
C0114, W0102,C0330,C0326,W0107,R1716,R0902,E0611,E1124",
"--fail-under=7.5"]

Not able to execute lifecycle operation using script plugin

I'm trying to learn how to use script plugin. I'm following script plugin docs here but not able to make it work.
I've tried to use the plugin in two ways. The first, when cloudify.interface.lifecycle.start operation is mapped directly to a script:
tosca_definitions_version: cloudify_dsl_1_3
imports:
- 'http://www.getcloudify.org/spec/cloudify/4.5.5/types.yaml'
node_templates:
Import_Project:
type: cloudify.nodes.WebServer
capabilities:
scalable:
properties:
default_instances: 1
interfaces:
cloudify.interfaces.lifecycle:
start:
implementation: scripts/create_project.sh
inputs: {}
The second with a direct mapping:
tosca_definitions_version: cloudify_dsl_1_3
imports:
- 'http://www.getcloudify.org/spec/cloudify/4.5.5/types.yaml'
node_templates:
Import_Project:
type: cloudify.nodes.WebServer
capabilities:
scalable:
properties:
default_instances: 1
interfaces:
cloudify.interfaces.lifecycle:
start:
implementation: script.script_runner.tasks.run
inputs:
script_path: scripts/create_project.sh
I've created a directory named scripts and placed the below create_project.sh script in this directory:
#! /bin/bash -e
ctx logger info "Hello to this world"
hostname
I'm getting errors while validating the blueprint.
Error when operation is mapped directly to a script:
[2019-04-13 13:29:40.594] [DEBUG] DslParserExecClient - got output from dsl parser Could not extract plugin from operation mapping 'scripts/create_project.sh', which is declared for operation 'start'. In interface 'cloudify.interfaces.lifecycle' in node 'Import_Project' of type 'cloudify.nodes.WebServer'
in: /opt/cloudify-composer/backend/dev/workspace/2/tmp-27O0e1t813N6as
in line: 3, column: 2
path: node_templates.Import_Project
value: {'interfaces': {'cloudify.interfaces.lifecycle': {'start': {'implementation': 'scripts/create_project.sh', 'inputs': {}}}}, 'type': 'cloudify.nodes.WebServer', 'capabilities': {'scalable': {'properties': {'default_instances': 1}}}}
Error when using a direct mapping:
[2019-04-13 13:25:21.015] [DEBUG] DslParserExecClient - got output from dsl parser node 'Import_Project' has no relationship which makes it contained within a host and it has a plugin 'script' with 'host_agent' as an executor. These types of plugins must be installed on a host
in: /opt/cloudify-composer/backend/dev/workspace/2/tmp-279QCz2CV3Y81L
in line: 2, column: 0
path: node_templates
value: {'Import_Project': {'interfaces': {'cloudify.interfaces.lifecycle': {'start': {'implementation': 'script.script_runner.tasks.run', 'inputs': {'script_path': 'scripts/create_project.sh'}}}}, 'type': 'cloudify.nodes.WebServer', 'capabilities': {'scalable': {'properties': {'default_instances': 1}}}}}
What is missing to make this work?
I also found the Cloudify Script Plugin examples from their documentation do not work: https://docs.cloudify.co/4.6/working_with/official_plugins/configuration/script/
However, I found I can make the examples work by adding an executor line in parallel with the implementation line to override the host_agent executor as follows:
tosca_definitions_version: cloudify_dsl_1_3
imports:
- 'http://www.getcloudify.org/spec/cloudify/4.5.5/types.yaml'
node_templates:
Import_Project:
type: cloudify.nodes.WebServer
capabilities:
scalable:
properties:
default_instances: 1
interfaces:
cloudify.interfaces.lifecycle:
start:
implementation: scripts/create_project.sh
executor: central_deployment_agent
inputs: {}

How can I found that the Rscript have been run successfully on the docker by the CWL?

I have written the CWL code which runs the Rscript command in the docker container. I have two files of cwl and yaml and run them by the command:
cwltool --debug a_code.cwl a_input.yaml
I get the output which said that the process status is success but there is no output file and in the result "output": null is reported. I want to know if there is a method to find that the Rscript file has run on the docker successfully. I want actually know about the reason that the output files are null.
The final part of the result is:
[job a_code.cwl] {
"output": null,
"errorFile": null
}
[job a_code.cwl] Removing input staging directory /tmp/tmpUbyb7k
[job a_code.cwl] Removing input staging directory /tmp/tmpUbyb7k
[job a_code.cwl] Removing temporary directory /tmp/tmpkUIOnw
[job a_code.cwl] Removing temporary directory /tmp/tmpkUIOnw
Removing intermediate output directory /tmp/tmpCG9Xs1
Removing intermediate output directory /tmp/tmpCG9Xs1
{
"output": null,
"errorFile": null
}
Final process status is success
Final process status is success
R code:
library(cummeRbund)
cuff<-readCufflinks( dbFile = "cuffData.db", gtfFile = NULL, runInfoFile = "run.info", repTableFile = "read_groups.info", geneFPKM = "genes.fpkm_trac .... )
#setwd("/scripts")
sink("cuff.txt")
print(cuff)
sink()
My cwl file code is:
class: CommandLineTool
cwlVersion: v1.0
id: cummerbund
baseCommand:
- Rscript
inputs:
- id: Rfile
type: File?
inputBinding:
position: 0
- id: cuffdiffout
type: 'File[]?'
inputBinding:
position: 1
- id: errorout
type: File?
inputBinding:
position: 99
prefix: 2>
valueFrom: |
error.txt
outputs:
- id: output
type: File?
outputBinding:
glob: cuff.txt
- id: errorFile
type: File?
outputBinding:
glob: error.txt
label: cummerbund
requirements:
- class: DockerRequirement
dockerPull: cummerbund_0
my input file (yaml file) is:
inputs:
Rfile:
basename: run_cummeR.R
class: File
nameext: .R
nameroot: run_cummeR
path: run_cummeR.R
cuffdiffout:
- class: File
path: cuffData.db
- class: File
path: genes.fpkm_tracking
- class: File
path: read_groups.info
- class: File
path: genes.count_tracking
- class: File
path: genes.read_group_tracking
- class: File
path: isoforms.fpkm_tracking
- class: File
path: isoforms.read_group_tracking
- class: File
path: isoforms.count_tracking
- class: File
path: isoform_exp.diff
- class: File
path: gene_exp.diff
errorout:
- class: File
path: error.txt
Also, this is my Dockerfile for creating image:
FROM r-base
COPY . /scripts
RUN apt-get update
RUN apt-get install -y \
libcurl4-openssl-dev\
libssl-dev\
libmariadb-client-lgpl-dev\
libmariadbclient-dev\
libxml2-dev\
r-cran-plyr\
r-cran-reshape2
WORKDIR /scripts
RUN Rscript /scripts/build.R
ENTRYPOINT /bin/bash
I got the answer!
There were some problems in my program.
1. The docker was not pulled correctly then the cwl couldn't produce any output.
2. The inputs and outputs were not defined mandatory. So I got the success status in the case which I did not have proper inputs and output.

Resources