How to access all file names in hydra config - fb-hydra

I have a directory contains a bunch of txt files:
dir/train/[train1.txt, train2.txt, train3.txt]
I'm able to read a single file, if I define following in a config.yaml
file_name: ${paths.data_dir}/train/train1.txt
So I get the str and I used np.loadtxt(self.hparams.file_name)
I tried
file_name: ${paths.data_dir}/train/*
So I have List[str], I then loop over file_name
dat = []
for file in self.hparams.file_name:
dat.append(np.loadtxt(file))
but it didn't work out.

You could define an OmegaConf custom resolver for this:
# my_app.py
import pathlib
from pathlib import Path
from typing import List
from omegaconf import OmegaConf
yaml_data = """
paths:
data_dir: dir
file_names: ${pathlib_glob:${paths.data_dir}, 'train/*'}
"""
def pathlib_glob(data_dir: str, glob_pattern: str) -> List[str]:
"""Use Pathlib glob to get a list of filenames"""
data_dir_path = pathlib.Path(data_dir)
file_paths: List[Path] = [p for p in data_dir_path.glob(glob_pattern)]
filenames: List[str] = [str(p) for p in file_paths]
return filenames
OmegaConf.register_new_resolver("pathlib_glob", pathlib_glob)
cfg = OmegaConf.create(yaml_data)
assert cfg.file_names == ['dir/train/train3.txt', 'dir/train/train2.txt', 'dir/train/train1.txt']
Now, at the command line:
mkdir -p dir/train
touch dir/train/train1.txt
touch dir/train/train2.txt
touch dir/train/train3.txt
python my_app.py # the assertion passes

Related

How to load Hydra parameters from previous jobs (without having to use argparse and the compose API)?

I'm using Hydra for training machine learning models. It's great for doing complex commands like python train.py data=MNIST batch_size=64 loss=l2. However, if I want to then run the trained model with the same parameters, I have to do something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml. I then use argparse to load in the previous yaml and use the compose API to initialize the Hydra environment. The path to the trained model is inferred from the path to Hydra's .yaml file. If I want to modify one of the parameters, I have to add additional argparse parameters and run something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml --batch_size 128. The code then manually overrides any Hydra parameters with those that were specified on the command line.
What's the right way of doing this?
My current code looks something like the following:
train.py:
import hydra
#hydra.main(config_name="config", config_path="conf")
def main(cfg):
# [training code using cfg.data, cfg.batch_size, cfg.loss etc.]
# [code outputs model checkpoint to job folder generated by Hydra]
main()
reconstruct.py:
import argparse
import os
from hydra.experimental import initialize, compose
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('hydra_config')
parser.add_argument('--batch_size', type=int)
# [other flags and parameters I may need to override]
args = parser.parse_args()
# Create the Hydra environment.
initialize()
cfg = compose(config_name=args.hydra_config)
# Since checkpoints are stored next to the .hydra, we manually generate the path.
checkpoint_dir = os.path.dirname(os.path.dirname(args.hydra_config))
# Manually override any parameters which can be changed on the command line.
batch_size = args.batch_size if args.batch_size else cfg.data.batch_size
# [code which uses checkpoint_dir to load the model]
# [code which uses both batch_size and params in cfg to set up the data etc.]
This is my first time posting, so let me know if I should clarify anything.
If you want to load the previous config as is and not change it, use OmegaConf.load(file_path).
If you want to re-compose the config (and it sounds like you do, because you added that you want override things), I recommend that you use the Compose API and pass in parameters from the overrides file in the job output directory (next to the stored config.yaml), but concatenate the current run parameters.
This script seems to be doing the job:
import os
from dataclasses import dataclass
from os.path import join
from typing import Optional
from omegaconf import OmegaConf
import hydra
from hydra import compose
from hydra.core.config_store import ConfigStore
from hydra.core.hydra_config import HydraConfig
from hydra.utils import to_absolute_path
# You can also use a yaml config file instead of this Structured Config
#dataclass
class Config:
load_checkpoint: Optional[str] = None
batch_size: int = 16
loss: str = "l2"
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
#hydra.main(config_path=".", config_name="config")
def my_app(cfg: Config) -> None:
if cfg.load_checkpoint is not None:
output_dir = to_absolute_path(cfg.load_checkpoint)
original_overrides = OmegaConf.load(join(output_dir, ".hydra/overrides.yaml"))
current_overrides = HydraConfig.get().overrides.task
hydra_config = OmegaConf.load(join(output_dir, ".hydra/hydra.yaml"))
# getting the config name from the previous job.
config_name = hydra_config.hydra.job.config_name
# concatenating the original overrides with the current overrides
overrides = original_overrides + current_overrides
# compose a new config from scratch
cfg = compose(config_name, overrides=overrides)
# train
print("Running in ", os.getcwd())
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
~/tmp$ python train.py
Running in /home/omry/tmp/outputs/2021-04-19/21-23-13
load_checkpoint: null
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13
Running in /home/omry/tmp/outputs/2021-04-19/21-23-22
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13 batch_size=32
Running in /home/omry/tmp/outputs/2021-04-19/21-23-28
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 32
loss: l2

Url Persian not found in django/passenger

sample Url:
site.com/category/%D9%81%D8%AA%D9%88%DA%AF%D8%B1%D8%A7%D9%81%DB%8C_%D8%B1%DB%8C%D9%86%D9%88%D9%BE%D9%84%D8%A7%D8%B3%D8%AA%DB%8C/
url config:
url(r'^category/(?P<page_slug>.*)/$', views.category, name='category'),
passenger config:
import imp
import os
import sys
sys.path.insert(0, os.path.dirname(__file__))
wsgi = imp.load_source('wsgi', 'photography/wsgi.py')
application = wsgi.application
wsgi config:
"""
WSGI config for photography project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/1.10/howto/deployment/wsgi/
"""
import os
from django.core.wsgi import get_wsgi_application
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "photography.settings")
#os.environ["DJANGO_SETTINGS_MODULE"] = "photography.settings"
application = get_wsgi_application()
but response 404 not found in url!
in problem for all url persian slug.
when change config wsgi to unquote:
from urllib.parse import unquote
def application(environ, start_fn):
environ['PATH_INFO'] = unquote(environ['PATH_INFO'])
app = get_wsgi_application()
print(environ)
return app(environ, start_fn)
change url into:
site.com/category/%C3%99%C2%81%C3%98%C2%AA%C3%99%C2%88%C3%9A%C2%AF%C3%98%C2%B1%C3%98%C2%A7%C3%99/tag/%D9%81%D8%AA%D9%88%DA%AF%D8%B1%D8%A7%D9%81%DB%8C-%D9%BE%D8%B2%D8%B4%DA%A9%DB%8C/
But there is an open problem !
I applied all the changes I found with the search, but there is still a problem!
Output one of the changes in the wsgi:
App 3585081 output: set_script_prefix(get_script_name(environ))
App 3585081 output: File "/home/sepandteb/virtualenv/sepandteb/3.5/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 210, in get_script_name
App 3585081 output: return script_name.decode(UTF_8)
App 3585081 output: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 27: unexpected end of data
use encode('utf-8') for Persian character and set # -*- coding: utf-8 -*- on top of file.
for example when you save data on DB or read on DB use this function.
i.e :
slug.encode('utf-8')
You can use "uri_to_iri" in the view, for example:
from django.shortcuts import get_object_or_404
from django.utils.encoding import uri_to_iri
def blog_detail(request, slug):
post= get_object_or_404(Post, slug=uri_to_iri(slug))

Negative pattern in Pathlib.rglob() function

I need to find all python file in folder excluding __init__.py
My first attempt was
import re
search_path.rglob(re.compile("(?!__init__).*.py"))
Such code fails, so i end up with:
filter(
lambda path: '__init__.py' != path.name and path.name.endswith('.py') and path.is_file(), search_path.rglob("*.py")
)
Looks like rglob does not support python regexps.
Why?
Does rglob supports negative patterns?
Can this code be more elegant?
I need something very much like this too. I came up with this:
import pathlib
search_path = pathlib.Path.cwd() / "test_folder"
for file in filter(lambda item: item.name != "__init__.py", search_path.rglob("./*.py")):
print(f"{file}")
Alternatively, you can use fnmatch.
import pathlib
import fnmatch
search_path = pathlib.Path.cwd() / "test_folder"
for file in search_path.rglob("./*.py"):
if not fnmatch.fnmatch(file.name, "__init__.py"):
print(f"{file}")

OCI (Oracle Cloud) Child Compartment

Question : Here is python code for oci (oracle cloud) . It is able to create Bucket or download and upload fine in bucket in root compartment . But i am not able to do the same on sub compartment. Sub compartment is " My_Sub_Compartment"
Please advise how to fix it .
import os
import oci
import io
from oci.config import from_file
data_dir = "D:\\DataScienceAndStats\\artificialintelligence\\CS223A"
files_to_process = [file for file in os.listdir(data_dir) if file.endswith('txt')]
bucket_name = "Sales_Data"
# this is to configure the oci configuration file
my_config = from_file(file_location="C:\\Users\\amits\\Desktop\\Oracle_Cloud\\config_file_oci.txt")
print(my_config)
# Test Configuration file of oci
# print(validate_config(my_config))
"""
Create object storage client and get its namespace
"""
object_storage_client = oci.object_storage.ObjectStorageClient(my_config)
namespace = object_storage_client.get_namespace().data
"""
Create a bucket if it does not exist
"""
try:
create_bucket_response = object_storage_client.create_bucket(namespace,
oci.object_storage.models.CreateBucketDetails(name=bucket_name, compartment_id=my_config['tenancy']))
except Exception as e:
print("Please read below messages")
print(e.message)
print(e.status)
"""
Uploading the files
"""
print("uploading files to bucket")
for upload_file in files_to_process:
print('Uploading file {}'.format(upload_file))
object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(data_dir, upload_file),
'rb'))
"""
Listing a files in the Bucket
"""
object_list = object_storage_client.list_objects(namespace, bucket_name)
for o in object_list.data.objects:
print(o.name)
"""
Downloading files from Bucket
"""
object_name = "1.txt"
destination_dir = 'D:\\DataScienceAndStats\\artificialintelligence\\CS223A\\moved_files'.format(object_name)
get_obj = object_storage_client.get_object(namespace, bucket_name, object_name)
with open(os.path.join(destination_dir, object_name), 'wb') as f:
for chunk in get_obj.data.raw.stream(1024 * 1024, decode_content=False):
f.write(chunk)
Within create_bucket_response function you should provide the sub compartment OCID, instead of my_config['tenancy']. Have you tried this way? my_config['tenancy'] will always return the root compartment.
If you add compartment=ocid1.compartment.oc1..[...] (whatever OCID the target compartment has)
to your config_file_oci.txt and replace my_config['tenancy'] with my_config['compartment'] it should work.
You cannot have the same name for two buckets in the same namespace/tenancy, have you deleted the one you already created?

Python Requests logging to file

How can I configure logging to file requests's get or post?
my_config = {'verbose': sys.stderr}
requests.get('http://httpbin.org/headers', config=my_config)
What should I use in verbose?
Have you tried simply opening a file?
>>> import sys
>>> type(sys.stderr)
file
>>> f = open('test.log', 'w')
>>> type(f)
f
So the example above will look like this:
my_config = { 'verbose': open('/path/to/file', 'w') }
requests.get('http://httpbin.org/headers', config = my_config)
HTH

Resources