How to set up hydra config to accept a custom enum? - fb-hydra

How do I set up a my hydra config to accept a custom enum? Specifically I followed the Structured Config Schema tutorial.
I have a dataclass config:
#dataclass_validate
#dataclass
class CustomConfig:
custom_enum: CustomEnum
With the custom enum:
class CustomEnum(str, Enum):
ENUM1 = "enum1"
ENUM2 = "enum2"
Error from running python my_app.py
Error merging 'data/config' with schema
Invalid value 'enum1', expected one of [ENUM1, ENUM2]
full_key: custom_enum
object_type=CustomConfig
Where my_app.py is just:
cs = ConfigStore.instance()
cs.store(name="base_config", node=Config)
cs.store(group="data", name="config", node=CustomConfig)
#hydra.main(config_path=".", config_name="config")
def setup_config(cfg: Config) -> None:
print(OmegaConf.to_yaml(cfg))
And the config in data/config.yaml is just
custom_enum: enum1

Note the error message: Invalid value 'enum1', expected one of [ENUM1, ENUM2].
This is to say, in your data/config.yaml file, you should be using ENUM1 instead of enum1.

Related

Schema validation in Hydra not working when configuration path is parent folder

I have the following project setup:
configs/
├── default.yaml
└── trainings
├── data_config
│ └── default.yaml
├── simple.yaml
└── schema.yaml
The content of the files are as follows:
app.py:
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from omegaconf import MISSING, DictConfig, OmegaConf
import hydra
from hydra.core.config_store import ConfigStore
CONFIGS_DIR_PATH = Path(__file__).parent / "configs"
TRAININGS_DIR_PATH = CONFIGS_DIR_PATH / "trainings"
class Sampling(Enum):
UPSAMPLING = 1
DOWNSAMPLING = 2
#dataclass
class DataConfig:
sampling: Sampling = MISSING
#dataclass
class TrainerConfig:
project_name: str = MISSING
data_config: DataConfig = MISSING
# #hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
#hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
sampling = OmegaConf.to_container(cfg=configuration, resolve=True)["data_config"]["sampling"]
print(f"{sampling} Type: {type(sampling)}")
def register_schemas():
config_store = ConfigStore.instance()
config_store.store(name="base_schema", node=TrainerConfig)
if __name__ == "__main__":
register_schemas()
run()
configs/default.yaml:
defaults:
- /trainings#: simple
- _self_
project_name: test
configs/trainings/simple.yaml:
defaults:
- base_schema
- data_config: default
- _self_
project_name: test
configs/trainings/schema.yaml:
defaults:
- data_config: default
- _self_
project_name: test
configs/trainings/data_config/default.yaml:
defaults:
- _self_
sampling: DOWNSAMPLING
Now, when I run app.py as shown above, I get the expected result (namely, "DOWNSAMPLING" gets resolved to an enum type). However, when I try to run the application where it constructs the configuration from the default.yaml in the parent directory then I get this error:
So, when the code is like so:
...
#hydra.main(version_base="1.2", config_path=CONFIGS_DIR_PATH, config_name="default")
# #hydra.main(version_base="1.2", config_path=TRAININGS_DIR_PATH, config_name="simple")
def run(configuration: DictConfig):
...
I get the error below:
In 'trainings/simple': Could not load 'trainings/base_schema'.
Config search path:
provider=hydra, path=pkg://hydra.conf
provider=main, path=file:///data/code/demos/hydra/configs
provider=schema, path=structured://
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I do not understand why specifying the schema to be used is causing this issue. Would someone have an idea why and what could be done to fix the problem?
If you are using default lists in more than one config file I strongly suggest that you fully read andf understand The Defaults List page.
Configs addressed in the defaults list are relative to the config group of the containing config.
The error is telling you that Hydra is looking for base_schema in trainings, because the defaults list that loads base_schema is in trainings.
Either put base_schema inside trainings when you register it:
config_store.store(group="trainings", name="base_schema", node=TrainerConfig)
Or use absolute addressing in the defaults list when addressing it (e.g. in configs/trainings/simple.yaml):
defaults:
- /base_schema
- data_config: default
- _self_

Deploying flask with sqlite to heroku

When I push my app to Heroku it does use the data_base.db and if i try to add a user to the data base I get the error that no data base is found
2021-08-15T13:02:39.226938+00:00 app[web.1]: cursor.execute(statement, parameters)
2021-08-15T13:02:39.226938+00:00 app[web.1]: sqlite3.OperationalError: no such table: users
Does heroku forbid the use of sqlite or am I doing something wrong
This is config.py
import random
import string
class Config:
SECRET_KEY = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(60))
SQLALCHEMY_DATABASE_URI = 'sqlite:///data_base.db'
SQLALCHEMY_TRACK_MODIFICATIONS = False
class DevelopmentConfig(Config):
DEBUG = True
class ProductionConfig(Config):
DEBUG = False
config = {
'dev' : DevelopmentConfig,
'prod': ProductionConfig,
}
replace this
SQLALCHEMY_DATABASE_URI = 'sqlite:///data_base.db'
with
uri = os.getenv("DATABASE_URL")
if uri.startswith("postgres://"):
uri = uri.replace("postgres://", "postgresql://", 1)
SQLALCHEMY_DATABASE_URI = uri or 'sqlite:///data_base.db'
simply tells the app to use postgresql when deployed to heroku
and don't forget to add psycopg2 to the requirements.txt file

How to load Hydra parameters from previous jobs (without having to use argparse and the compose API)?

I'm using Hydra for training machine learning models. It's great for doing complex commands like python train.py data=MNIST batch_size=64 loss=l2. However, if I want to then run the trained model with the same parameters, I have to do something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml. I then use argparse to load in the previous yaml and use the compose API to initialize the Hydra environment. The path to the trained model is inferred from the path to Hydra's .yaml file. If I want to modify one of the parameters, I have to add additional argparse parameters and run something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml --batch_size 128. The code then manually overrides any Hydra parameters with those that were specified on the command line.
What's the right way of doing this?
My current code looks something like the following:
train.py:
import hydra
#hydra.main(config_name="config", config_path="conf")
def main(cfg):
# [training code using cfg.data, cfg.batch_size, cfg.loss etc.]
# [code outputs model checkpoint to job folder generated by Hydra]
main()
reconstruct.py:
import argparse
import os
from hydra.experimental import initialize, compose
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('hydra_config')
parser.add_argument('--batch_size', type=int)
# [other flags and parameters I may need to override]
args = parser.parse_args()
# Create the Hydra environment.
initialize()
cfg = compose(config_name=args.hydra_config)
# Since checkpoints are stored next to the .hydra, we manually generate the path.
checkpoint_dir = os.path.dirname(os.path.dirname(args.hydra_config))
# Manually override any parameters which can be changed on the command line.
batch_size = args.batch_size if args.batch_size else cfg.data.batch_size
# [code which uses checkpoint_dir to load the model]
# [code which uses both batch_size and params in cfg to set up the data etc.]
This is my first time posting, so let me know if I should clarify anything.
If you want to load the previous config as is and not change it, use OmegaConf.load(file_path).
If you want to re-compose the config (and it sounds like you do, because you added that you want override things), I recommend that you use the Compose API and pass in parameters from the overrides file in the job output directory (next to the stored config.yaml), but concatenate the current run parameters.
This script seems to be doing the job:
import os
from dataclasses import dataclass
from os.path import join
from typing import Optional
from omegaconf import OmegaConf
import hydra
from hydra import compose
from hydra.core.config_store import ConfigStore
from hydra.core.hydra_config import HydraConfig
from hydra.utils import to_absolute_path
# You can also use a yaml config file instead of this Structured Config
#dataclass
class Config:
load_checkpoint: Optional[str] = None
batch_size: int = 16
loss: str = "l2"
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
#hydra.main(config_path=".", config_name="config")
def my_app(cfg: Config) -> None:
if cfg.load_checkpoint is not None:
output_dir = to_absolute_path(cfg.load_checkpoint)
original_overrides = OmegaConf.load(join(output_dir, ".hydra/overrides.yaml"))
current_overrides = HydraConfig.get().overrides.task
hydra_config = OmegaConf.load(join(output_dir, ".hydra/hydra.yaml"))
# getting the config name from the previous job.
config_name = hydra_config.hydra.job.config_name
# concatenating the original overrides with the current overrides
overrides = original_overrides + current_overrides
# compose a new config from scratch
cfg = compose(config_name, overrides=overrides)
# train
print("Running in ", os.getcwd())
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
~/tmp$ python train.py
Running in /home/omry/tmp/outputs/2021-04-19/21-23-13
load_checkpoint: null
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13
Running in /home/omry/tmp/outputs/2021-04-19/21-23-22
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13 batch_size=32
Running in /home/omry/tmp/outputs/2021-04-19/21-23-28
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 32
loss: l2

Key in Configuration : how to list Configurations and Keys?

The sbt in Action book introduces a concept of Key in Configuration
It then lists the default configurations:
Compile
Test
Runtime
IntegrationTest
Q1) Is it possible to print out a list of all Configurations from a sbt session? If not, can I find information on Configurations in the sbt documentation?
Q2) For a particular Configuration, e.g. 'Compile', is it possible to print out a list of Keys for the Configuration from a sbt session? If not, can I find information on a Configuration's Keys in the sbt documentation?
List of all configurations
For this you can use a setting like so:
val allConfs = settingKey[List[String]]("Returns all configurations for the current project")
val root = (project in file("."))
.settings(
name := "scala-tests",
allConfs := {
configuration.all(ScopeFilter(inAnyProject, inAnyConfiguration)).value.toList
.map(_.name)
}
This shows the name of all configurations. You can access more details about each configuration inside the map.
Output from the interactive sbt console:
> allConfs
[info] * provided
[info] * test
[info] * compile
[info] * runtime
[info] * optional
If all you want is to print them you can have a settingKey[Unit] and use println inside the setting definition.
List of all the keys in a configuration
For this we need a task (there might be other ways, but I haven't explored, in sbt I'm satisfied if something works... ) and a parser to parse user input.
All join the above setting in this snippet:
import sbt._
import sbt.Keys._
import complete.DefaultParsers._
val allConfs = settingKey[List[String]]("Returns all configurations for the current project")
val allKeys = inputKey[List[String]]("Prints all keys of a given configuration")
val root = (project in file("."))
.settings(
name := "scala-tests",
allConfs := {
configuration.all(ScopeFilter(inAnyProject, inAnyConfiguration)).value.toList
.map(_.name)
},
allKeys := {
val configHints = s"One of: ${
configuration.all(ScopeFilter(inAnyProject, inAnyConfiguration)).value.toList.mkString(" ")
}"
val configs = spaceDelimited(configHints).parsed.map(_.toLowerCase).toSet
val extracted: Extracted = Project.extract(state.value)
val l = extracted.session.original.toList
.filter(set => set.key.scope.config.toOption.map(_.name.toLowerCase)
.exists(configs.contains))
.map(_.key.key.label)
l
}
)
Now you can use it like:
$ sbt "allKeys compile"
If you are in interactive mode you can press tab after allKeys to see the prompt:
> allKeys
One of: provided test compile runtime optional
Since allKeys is a task it's output won't appear on the sbt console if you just "return it" but you can print it.

Create a portal_user_catalog and have it used (Plone)

I'm creating a fork of my Plone site (which has not been forked for a long time). This site has a special catalog object for user profiles (a special Archetypes-based object type) which is called portal_user_catalog:
$ bin/instance debug
>>> portal = app.Plone
>>> print [d for d in portal.objectMap() if d['meta_type'] == 'Plone Catalog Tool']
[{'meta_type': 'Plone Catalog Tool', 'id': 'portal_catalog'},
{'meta_type': 'Plone Catalog Tool', 'id': 'portal_user_catalog'}]
This looks reasonable because the user profiles don't have most of the indexes of the "normal" objects, but have a small set of own indexes.
Since I found no way how to create this object from scratch, I exported it from the old site (as portal_user_catalog.zexp) and imported it in the new site. This seemed to work, but I can't add objects to the imported catalog, not even by explicitly calling the catalog_object method. Instead, the user profiles are added to the standard portal_catalog.
Now I found a module in my product which seems to serve the purpose (Products/myproduct/exportimport/catalog.py):
"""Catalog tool setup handlers.
$Id: catalog.py 77004 2007-06-24 08:57:54Z yuppie $
"""
from Products.GenericSetup.utils import exportObjects
from Products.GenericSetup.utils import importObjects
from Products.CMFCore.utils import getToolByName
from zope.component import queryMultiAdapter
from Products.GenericSetup.interfaces import IBody
def importCatalogTool(context):
"""Import catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog')
parent_path=''
if obj and not obj():
importer = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
__traceback_info__ = path
print [importer]
if importer:
print importer.name
if importer.name:
path = '%s%s' % (parent_path, 'usercatalog')
print path
filename = '%s%s' % (path, importer.suffix)
print filename
body = context.readDataFile(filename)
if body is not None:
importer.filename = filename # for error reporting
importer.body = body
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
importObjects(sub, path+'/', context)
def exportCatalogTool(context):
"""Export catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog', None)
if tool is None:
logger = context.getLogger('catalog')
logger.info('Nothing to export.')
return
parent_path=''
exporter = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
if exporter:
if exporter.name:
path = '%s%s' % (parent_path, 'usercatalog')
filename = '%s%s' % (path, exporter.suffix)
body = exporter.body
if body is not None:
context.writeDataFile(filename, body, exporter.mime_type)
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
exportObjects(sub, path+'/', context)
I tried to use it, but I have no idea how it is supposed to be done;
I can't call it TTW (should I try to publish the methods?!).
I tried it in a debug session:
$ bin/instance debug
>>> portal = app.Plone
>>> from Products.myproduct.exportimport.catalog import exportCatalogTool
>>> exportCatalogTool(portal)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../Products/myproduct/exportimport/catalog.py", line 58, in exportCatalogTool
site = context.getSite()
AttributeError: getSite
So, if this is the way to go, it looks like I need a "real" context.
Update: To get this context, I tried an External Method:
# -*- coding: utf-8 -*-
from Products.myproduct.exportimport.catalog import exportCatalogTool
from pdb import set_trace
def p(dt, dd):
print '%-16s%s' % (dt+':', dd)
def main(self):
"""
Export the portal_user_catalog
"""
g = globals()
print '#' * 79
for a in ('__package__', '__module__'):
if a in g:
p(a, g[a])
p('self', self)
set_trace()
exportCatalogTool(self)
However, wenn I called it, I got the same <PloneSite at /Plone> object as the argument to the main function, which didn't have the getSite attribute. Perhaps my site doesn't call such External Methods correctly?
Or would I need to mention this module somehow in my configure.zcml, but how? I searched my directory tree (especially below Products/myproduct/profiles) for exportimport, the module name, and several other strings, but I couldn't find anything; perhaps there has been an integration once but was broken ...
So how do I make this portal_user_catalog work?
Thank you!
Update: Another debug session suggests the source of the problem to be some transaction matter:
>>> portal = app.Plone
>>> puc = portal.portal_user_catalog
>>> puc._catalog()
[]
>>> profiles_folder = portal.some_folder_with_profiles
>>> for o in profiles_folder.objectValues():
... puc.catalog_object(o)
...
>>> puc._catalog()
[<Products.ZCatalog.Catalog.mybrains object at 0x69ff8d8>, ...]
This population of the portal_user_catalog doesn't persist; after termination of the debug session and starting fg, the brains are gone.
It looks like the problem was indeed related with transactions.
I had
import transaction
...
class Browser(BrowserView):
...
def processNewUser(self):
....
transaction.commit()
before, but apparently this was not good enough (and/or perhaps not done correctly).
Now I start the transaction explicitly with transaction.begin(), save intermediate results with transaction.savepoint(), abort the transaction explicitly with transaction.abort() in case of errors (try / except), and have exactly one transaction.commit() at the end, in the case of success. Everything seems to work.
Of course, Plone still doesn't take this non-standard catalog into account; when I "clear and rebuild" it, it is empty afterwards. But for my application it works well enough.

Resources