Schema validation in Hydra not working when configuration path is parent folder - fb-hydra

I have the following project setup:
configs/
├── default.yaml
└── trainings
├── data_config
│ └── default.yaml
├── simple.yaml
└── schema.yaml
The content of the files are as follows:
app.py:
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from omegaconf import MISSING, DictConfig, OmegaConf
import hydra
from hydra.core.config_store import ConfigStore
CONFIGS_DIR_PATH = Path(__file__).parent / "configs"
TRAININGS_DIR_PATH = CONFIGS_DIR_PATH / "trainings"
class Sampling(Enum):
UPSAMPLING = 1
DOWNSAMPLING = 2
#dataclass
class DataConfig:
sampling: Sampling = MISSING
#dataclass
class TrainerConfig:
project_name: str = MISSING
data_config: DataConfig = MISSING
# #hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
#hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
sampling = OmegaConf.to_container(cfg=configuration, resolve=True)["data_config"]["sampling"]
print(f"{sampling} Type: {type(sampling)}")
def register_schemas():
config_store = ConfigStore.instance()
config_store.store(name="base_schema", node=TrainerConfig)
if __name__ == "__main__":
register_schemas()
run()
configs/default.yaml:
defaults:
- /trainings#: simple
- _self_
project_name: test
configs/trainings/simple.yaml:
defaults:
- base_schema
- data_config: default
- _self_
project_name: test
configs/trainings/schema.yaml:
defaults:
- data_config: default
- _self_
project_name: test
configs/trainings/data_config/default.yaml:
defaults:
- _self_
sampling: DOWNSAMPLING
Now, when I run app.py as shown above, I get the expected result (namely, "DOWNSAMPLING" gets resolved to an enum type). However, when I try to run the application where it constructs the configuration from the default.yaml in the parent directory then I get this error:
So, when the code is like so:
...
#hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
# #hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
...
I get the error below:
In 'trainings/simple': Could not load 'trainings/base_schema'.
Config search path:
provider=hydra, path=pkg://hydra.conf
provider=main, path=file:///data/code/demos/hydra/configs
provider=schema, path=structured://
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I do not understand why specifying the schema to be used is causing this issue. Would someone have an idea why and what could be done to fix the problem?

If you are using default lists in more than one config file I strongly suggest that you fully read andf understand The Defaults List page.
Configs addressed in the defaults list are relative to the config group of the containing config.
The error is telling you that Hydra is looking for base_schema in trainings, because the defaults list that loads base_schema is in trainings.
Either put base_schema inside trainings when you register it:
config_store.store(group="trainings", name="base_schema", node=TrainerConfig)
Or use absolute addressing in the defaults list when addressing it (e.g. in configs/trainings/simple.yaml):
defaults:
- /base_schema
- data_config: default
- _self_

Related

How to use rules_webtesting?

I want to use https://github.com/bazelbuild/rules_webtesting. I am using Bazel 5.2.0.
The whole project can be found here.
My WORKSPACE.bazel file looks like this:
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_webtesting",
sha256 = "3ef3bb22852546693c94e9b0b02c2570e74abab6f800fd58e0cbe79492e49c1b",
urls = [
"https://github.com/bazelbuild/rules_webtesting/archive/581b1557e382f93419da6a03b91a45c2ac9a9ec8/rules_webtesting.tar.gz",
],
)
load("#io_bazel_rules_webtesting//web:repositories.bzl", "web_test_repositories")
web_test_repositories()
My BUILD.bazel file looks like this:
load("#io_bazel_rules_webtesting//web:py.bzl", "py_web_test_suite")
py_web_test_suite(
name = "browser_test",
srcs = ["browser_test.py"],
browsers = [
"#io_bazel_rules_webtesting//browsers:chromium-local",
],
local = True,
deps = ["#io_bazel_rules_webtesting//testing/web"],
)
browser_test.py looks like this:
import unittest
from testing.web import webtest
class BrowserTest(unittest.TestCase):
def setUp(self):
self.driver = webtest.new_webdriver_session()
def tearDown(self):
try:
self.driver.quit()
finally:
self.driver = None
# Your tests here
if __name__ == "__main__":
unittest.main()
When I try to do a bazel build //... I get (under Ubuntu 20.04 and macOS):
INFO: Invocation ID: 74c03efd-9caa-4174-9fda-42f7ff37e38b
ERROR: error loading package '': Every .bzl file must have a corresponding package, but '#io_bazel_rules_webtesting//web:repositories.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.
INFO: Elapsed time: 0.038s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
The error message does not make sense to me, since there is a BUILD file in
https://github.com/bazelbuild/rules_webtesting/blob/581b1557e382f93419da6a03b91a45c2ac9a9ec8/BUILD.bazel
and https://github.com/bazelbuild/rules_webtesting/blob/581b1557e382f93419da6a03b91a45c2ac9a9ec8/web/BUILD.bazel.
I also tried a different version of Bazel - but with the same result.
Any ideas on how to get this working?
You need to add a strip_prefix = "rules_webtesting-581b1557e382f93419da6a03b91a45c2ac9a9ec8" in your http_archive call.
For debugging, you can look in the folder where Bazel extracts it: bazel-out/../../../external/io_bazel_rules_webtesting. #io_bazel_rules_webtesting//web translates to bazel-out/../../../external/io_bazel_rules_webtesting/web, so if that folder doesn't exist things won't work.

How to set up hydra config to accept a custom enum?

How do I set up a my hydra config to accept a custom enum? Specifically I followed the Structured Config Schema tutorial.
I have a dataclass config:
#dataclass_validate
#dataclass
class CustomConfig:
custom_enum: CustomEnum
With the custom enum:
class CustomEnum(str, Enum):
ENUM1 = "enum1"
ENUM2 = "enum2"
Error from running python my_app.py
Error merging 'data/config' with schema
Invalid value 'enum1', expected one of [ENUM1, ENUM2]
full_key: custom_enum
object_type=CustomConfig
Where my_app.py is just:
cs = ConfigStore.instance()
cs.store(name="base_config", node=Config)
cs.store(group="data", name="config", node=CustomConfig)
#hydra.main(config_path=".", config_name="config")
def setup_config(cfg: Config) -> None:
print(OmegaConf.to_yaml(cfg))
And the config in data/config.yaml is just
custom_enum: enum1
Note the error message: Invalid value 'enum1', expected one of [ENUM1, ENUM2].
This is to say, in your data/config.yaml file, you should be using ENUM1 instead of enum1.

How to load Hydra parameters from previous jobs (without having to use argparse and the compose API)?

I'm using Hydra for training machine learning models. It's great for doing complex commands like python train.py data=MNIST batch_size=64 loss=l2. However, if I want to then run the trained model with the same parameters, I have to do something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml. I then use argparse to load in the previous yaml and use the compose API to initialize the Hydra environment. The path to the trained model is inferred from the path to Hydra's .yaml file. If I want to modify one of the parameters, I have to add additional argparse parameters and run something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml --batch_size 128. The code then manually overrides any Hydra parameters with those that were specified on the command line.
What's the right way of doing this?
My current code looks something like the following:
train.py:
import hydra
#hydra.main(config_name="config", config_path="conf")
def main(cfg):
# [training code using cfg.data, cfg.batch_size, cfg.loss etc.]
# [code outputs model checkpoint to job folder generated by Hydra]
main()
reconstruct.py:
import argparse
import os
from hydra.experimental import initialize, compose
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('hydra_config')
parser.add_argument('--batch_size', type=int)
# [other flags and parameters I may need to override]
args = parser.parse_args()
# Create the Hydra environment.
initialize()
cfg = compose(config_name=args.hydra_config)
# Since checkpoints are stored next to the .hydra, we manually generate the path.
checkpoint_dir = os.path.dirname(os.path.dirname(args.hydra_config))
# Manually override any parameters which can be changed on the command line.
batch_size = args.batch_size if args.batch_size else cfg.data.batch_size
# [code which uses checkpoint_dir to load the model]
# [code which uses both batch_size and params in cfg to set up the data etc.]
This is my first time posting, so let me know if I should clarify anything.
If you want to load the previous config as is and not change it, use OmegaConf.load(file_path).
If you want to re-compose the config (and it sounds like you do, because you added that you want override things), I recommend that you use the Compose API and pass in parameters from the overrides file in the job output directory (next to the stored config.yaml), but concatenate the current run parameters.
This script seems to be doing the job:
import os
from dataclasses import dataclass
from os.path import join
from typing import Optional
from omegaconf import OmegaConf
import hydra
from hydra import compose
from hydra.core.config_store import ConfigStore
from hydra.core.hydra_config import HydraConfig
from hydra.utils import to_absolute_path
# You can also use a yaml config file instead of this Structured Config
#dataclass
class Config:
load_checkpoint: Optional[str] = None
batch_size: int = 16
loss: str = "l2"
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
#hydra.main(config_path=".", config_name="config")
def my_app(cfg: Config) -> None:
if cfg.load_checkpoint is not None:
output_dir = to_absolute_path(cfg.load_checkpoint)
original_overrides = OmegaConf.load(join(output_dir, ".hydra/overrides.yaml"))
current_overrides = HydraConfig.get().overrides.task
hydra_config = OmegaConf.load(join(output_dir, ".hydra/hydra.yaml"))
# getting the config name from the previous job.
config_name = hydra_config.hydra.job.config_name
# concatenating the original overrides with the current overrides
overrides = original_overrides + current_overrides
# compose a new config from scratch
cfg = compose(config_name, overrides=overrides)
# train
print("Running in ", os.getcwd())
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
~/tmp$ python train.py
Running in /home/omry/tmp/outputs/2021-04-19/21-23-13
load_checkpoint: null
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13
Running in /home/omry/tmp/outputs/2021-04-19/21-23-22
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13 batch_size=32
Running in /home/omry/tmp/outputs/2021-04-19/21-23-28
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 32
loss: l2

QBS: Explicitly setting qbs.profiles inside Products causing build to fail

My use-case is this:
I have a static library which I want to be available for some profiles (e.g. "gcc", "arm-gcc", "mips-gcc").
I also have an application which links to this library, but this applications should only build using a specific profile (e.g. "arm-gcc").
For this I am modifying the app-and-lib QBS example.
The lib.qbs file:
import qbs 1.0
Product {
qbs.profiles: ["gcc", "arm-gcc", "mips-gcc"] //I added only this line
type: "staticlibrary"
name: "mylib"
files: [
"lib.cpp",
"lib.h",
]
Depends { name: 'cpp' }
cpp.defines: ['CRUCIAL_DEFINE']
Export {
Depends { name: "cpp" }
cpp.includePaths: [product.sourceDirectory]
}
}
The app.qbs file:
import qbs 1.0
Product {
qbs.profiles: ["arm-gcc"] //I added only this line
type: "application"
consoleApplication: true
files : [ "main.cpp" ]
Depends { name: "cpp" }
Depends { name: "mylib" }
}
The app build fails. Qbs wrongly tries to link to the "gcc" version of the library instead of the "arm-gcc" version, as you can see in the log:
Build graph does not yet exist for configuration 'default'. Starting from scratch.
Resolving project for configuration default
Setting up build graph for configuration default
Building for configuration default
compiling lib.cpp [mylib {"profile":"gcc"}]
compiling lib.cpp [mylib {"profile":"arm-gcc"}]
compiling lib.cpp [mylib {"profile":"mips-gcc"}]
compiling main.cpp [app]
creating libmylib.a [mylib {"profile":"gcc"}]
creating libmylib.a [mylib {"profile":"mips-gcc"}]
creating libmylib.a [mylib {"profile":"arm-gcc"}]
linking app [app]
ERROR: /usr/bin/arm-linux-gnueabihf-g++ -o /home/user/programs/qbs/usr/local/share/qbs/examples/app-and-lib/default/app.7d104347/app /home/user/programs/qbs/usr/local/share/qbs/examples/app-and-lib/default/app.7d104347/3a52ce780950d4d9/main.cpp.o /home/user/programs/qbs/usr/local/share/qbs/examples/app-and-lib/default/mylib.eyJwcm9maWxlIjoiZ2NjIn0-.792f47ec/libmylib.a
ERROR: /home/user/programs/qbs/usr/local/share/qbs/examples/app-and-lib/default/mylib.eyJwcm9maWxlIjoiZ2NjIn0-.792f47ec/libmylib.a: error adding symbols: File format not recognized
collect2: error: ld returned 1 exit status
ERROR: Process failed with exit code 1.
The following products could not be built for configuration default:
app
The build fails only when selecting one profile in the app.qbs file, and this profile should not be the first profile in the qbs.profiles line in the lib.qbs file.
When selecting two or more profiles - the build succeeds.
My analysis:
I think this problem is related to multiplexing:
The lib.qbs contains more than one profile. This turns on multiplexing when building the library, which, in turn, adds additional 'multiplexConfigurationId' to the build-directory name (moduleloader.cpp).
The app.lib contains only one profile, so multiplexing is not turned on and the build-directory name does not get the extra string.
The problem can be solved by changing the code (moduleloader.cpp) so that multiplexing is turned even if there is only one profile i.e. with the following patch:
--- moduleloader.cpp 2018-10-24 16:17:43.633527397 +0300
+++ moduleloader.cpp.new 2018-10-24 16:18:27.541370544 +0300
## -872,7 +872,7 ##
= callWithTemporaryBaseModule<const MultiplexInfo>(dummyContext,
extractMultiplexInfoFromProduct);
- if (multiplexInfo.table.size() > 1)
+ if (multiplexInfo.table.size() > 0)
productItem->setProperty(StringConstants::multiplexedProperty(), VariantValue::trueValue());
VariantValuePtr productNameValue = VariantValue::create(productName);
## -891,7 +891,7 ##
const QString multiplexConfigurationId = multiplexInfo.toIdString(row);
const VariantValuePtr multiplexConfigurationIdValue
= VariantValue::create(multiplexConfigurationId);
- if (multiplexInfo.table.size() > 1 || aggregator) {
+ if (multiplexInfo.table.size() > 0 || aggregator) {
multiplexConfigurationIdValues.push_back(multiplexConfigurationIdValue);
item->setProperty(StringConstants::multiplexConfigurationIdProperty(),
multiplexConfigurationIdValue);
This worked for my use case. I don't know if it make sense in a broader view.
Finally, the questions:
Does it all make sense?
Is this a normal behavior?
Is this use-case simply not supported?
Is there a better solution?
Thanks in advance.
Yes, the default behavior with multiplexing is that the a non-multiplexed product depends on all variants of the dependency. In general, there is no way for a user to change that behavior, but there should be.
However, luckily for you, profiles are special:
Depends { name: "mylib"; profiles: "arm-gcc" }
This should fix your problem.

What is the proper way to organize a Julia source tree?

I am trying to figure the proper way to organize the source tree for a Julia application seqscan. For now I have the following tree:
$ tree seqscan/
seqscan/
├── LICENSE
├── README.md
├── benchmark
├── doc
├── examples
├── src
│   └── seq.jl
└── test
└── test_seq.jl
5 directories, 4 files
The file seq.jl contains
module SeqScan
module Seq
export SeqEntry
type SeqEntry
id
seq
scores
seq_type
end
end
end
and test_seq.jl contains:
module TestSeq
using Base.Test
using SeqScan.Seq
#testset "Testing SeqEntry" begin
#testset "test SeqEntry creation" begin
seq_entry = SeqEntry("test", "atcg")
#test seq_entry.id == "test"
#test seq_entry.seq == "atcg"
end
end
end
However, running the test code yields an error:
ERROR: LoadError: ArgumentError: Module SeqScan not found in current path.
even after setting the JULIA_LOAD_PATH environment variable to include seqscan or seqscan/src, so I must be doing something wrong?
The name of your package (the root of your local tree) needs to match the name of a file that exists under the src directory. Try this:
SeqScan/
|-- src/
|-- SeqScan.jl (your seq.jl)
I don't know why you are enclosing the module Seq in SeqScan. If there is no important reason to do that, you could access the type more directly. You could remove "module Seq" and the paired "end". Then just "using SeqScan" would bring in the type SeqEntry.
The type, SeqEntry, as written knows what to do when given four field values, one for each of the defined fields. If you want to initialize that type with just the first two fields, you need to include a two-argument constructor. For example, assuming seq is a vector of some numeric type and scores is also a vector of that numeric type and and seq_type is a numeric type:
function SeqEntry(id, seq)
seq_type = typeof(seq[1])
scores = zeros(seq_type, length(seq))
return SeqEntry(id, seq, scores, seq_type)
end
An example of a package with internal modules, for Julia v0.5.
The package is named MyPackage.jl; it incorporates two internal modules: TypeModule and FuncModule; each module has its own file: TypeModule.jl and FuncModule.jl.
TypeModule contains a new type, MyType. FuncModule contains a new function, MyFunc, which operates on variable[s] of MyType. There are two forms of that function, a 1-arg and a 2-arg version.
MyPackage uses both internal modules. It incorporates each for immediate use and initializes two variables of MyType. Then MyPackage applies MyFunc to them and prints the results.
I assume Julia's package directory is "/you/.julia/v0.5" (Windows: "c:\you.julia\v0.5"), and refer to it as PkgDir. You can find the real package directory by typing Pkg.dir() at Julia's interactive prompt. The first thing to do make sure Julia's internal information is current: > Pkg.update() and then get a special package call PkgDev: > Pkg.add("PkgDev")
You might start your package on GitHub. If you are starting it locally, you should use PkgDev because it creates the essential package file (and others) using the right structure:
> using PkgDev then > PkgDev.generate("MyPackage","MIT")
This also creates a file, LICENSE.md, with Julia's go-to license. You can keep it, replace it or remove it.
In the directory PkgDir/MyPackage/src, create a subdirectory "internal". In the directory PkgDir/MyPackage/src/internal, create two files: "TypeModule.jl" and "FuncModule.jl", these:
TypeModule.jl:
module TypeModule
export MyType
type MyType
value::Int
end
end # TypeModule
FuncModule.jl:
module FuncModule
export MyFunc
#=
!important!
TypeModule is included in MyPackage.jl before this module
This module gets MyType from MyPackage.jl, not directly.
Getting it directly would create mismatch of module indirection.
=#
import ..MyPackage: MyType
function MyFunc(x::MyType)
return x.value + 1
end
function MyFunc(x::MyType, y::MyType)
return x.value + y.value + 1
end
end # FuncModule
And in the src directory, edit MyPackage.jl so it matches this:
MyPackage.jl:
module MyPackage
export MyType, MyFunc
#=
!important! Do this before including FuncModule
because FuncModule.jl imports MyType from here.
MyType must be in use before including FuncModule.
=#
include( joinpath("internal", "TypeModule.jl") )
using .TypeModule # prefix the '.' to use an included module
include( joinpath("internal", "FuncModule.jl") )
using .FuncModule # prefix the '.' to use an included module
three = MyType(3)
five = MyType(5)
four = MyFunc(three)
eight = MyFunc(three, five)
# show that everything works
println()
println( string("MyFunc(three) = ", four) )
println( string("MyFunc(three, five) = ", eight) )
end # MyPackage
Now, running Julia entering > using MyPackage should show this:
julia> using MyPackage
4 = MyFunc(three)
9 = MyFunc(three, five)
julia>

Resources