Why unittest.mock.mock_add_spec removes existing mock methods? - python-unittest

Why after using unittest.mock.mock add_spec methods of mock are disappearing?
from unittest.mock import Mock
m = Mock(spec=str)
print(hasattr(m, 'lower')) # True
m.mock_add_spec(['foo'],)
print(hasattr(m, 'lower')) # False
print(hasattr(m, 'foo')) # True
From the docs:
Only attributes on the spec can be fetched as attributes from the
mock.
TBH I don't quite get this line but obviously method with name "add" should not shrink the count of methods :)
Python version:
Python 3.10.9 (main, Dec 7 2022, 01:12:00) [GCC 9.4.0] on linux

Related

How to load Hydra parameters from previous jobs (without having to use argparse and the compose API)?

I'm using Hydra for training machine learning models. It's great for doing complex commands like python train.py data=MNIST batch_size=64 loss=l2. However, if I want to then run the trained model with the same parameters, I have to do something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml. I then use argparse to load in the previous yaml and use the compose API to initialize the Hydra environment. The path to the trained model is inferred from the path to Hydra's .yaml file. If I want to modify one of the parameters, I have to add additional argparse parameters and run something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml --batch_size 128. The code then manually overrides any Hydra parameters with those that were specified on the command line.
What's the right way of doing this?
My current code looks something like the following:
train.py:
import hydra
#hydra.main(config_name="config", config_path="conf")
def main(cfg):
# [training code using cfg.data, cfg.batch_size, cfg.loss etc.]
# [code outputs model checkpoint to job folder generated by Hydra]
main()
reconstruct.py:
import argparse
import os
from hydra.experimental import initialize, compose
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('hydra_config')
parser.add_argument('--batch_size', type=int)
# [other flags and parameters I may need to override]
args = parser.parse_args()
# Create the Hydra environment.
initialize()
cfg = compose(config_name=args.hydra_config)
# Since checkpoints are stored next to the .hydra, we manually generate the path.
checkpoint_dir = os.path.dirname(os.path.dirname(args.hydra_config))
# Manually override any parameters which can be changed on the command line.
batch_size = args.batch_size if args.batch_size else cfg.data.batch_size
# [code which uses checkpoint_dir to load the model]
# [code which uses both batch_size and params in cfg to set up the data etc.]
This is my first time posting, so let me know if I should clarify anything.
If you want to load the previous config as is and not change it, use OmegaConf.load(file_path).
If you want to re-compose the config (and it sounds like you do, because you added that you want override things), I recommend that you use the Compose API and pass in parameters from the overrides file in the job output directory (next to the stored config.yaml), but concatenate the current run parameters.
This script seems to be doing the job:
import os
from dataclasses import dataclass
from os.path import join
from typing import Optional
from omegaconf import OmegaConf
import hydra
from hydra import compose
from hydra.core.config_store import ConfigStore
from hydra.core.hydra_config import HydraConfig
from hydra.utils import to_absolute_path
# You can also use a yaml config file instead of this Structured Config
#dataclass
class Config:
load_checkpoint: Optional[str] = None
batch_size: int = 16
loss: str = "l2"
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
#hydra.main(config_path=".", config_name="config")
def my_app(cfg: Config) -> None:
if cfg.load_checkpoint is not None:
output_dir = to_absolute_path(cfg.load_checkpoint)
original_overrides = OmegaConf.load(join(output_dir, ".hydra/overrides.yaml"))
current_overrides = HydraConfig.get().overrides.task
hydra_config = OmegaConf.load(join(output_dir, ".hydra/hydra.yaml"))
# getting the config name from the previous job.
config_name = hydra_config.hydra.job.config_name
# concatenating the original overrides with the current overrides
overrides = original_overrides + current_overrides
# compose a new config from scratch
cfg = compose(config_name, overrides=overrides)
# train
print("Running in ", os.getcwd())
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
~/tmp$ python train.py
Running in /home/omry/tmp/outputs/2021-04-19/21-23-13
load_checkpoint: null
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13
Running in /home/omry/tmp/outputs/2021-04-19/21-23-22
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13 batch_size=32
Running in /home/omry/tmp/outputs/2021-04-19/21-23-28
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 32
loss: l2

run the same method of a list of instances in pathos.multiprocessing

I am working on a traveling salesman problem. Given that all agents traverse the same graph to find their own path separately, i am trying to parallelize the path-finding action of agents. the task is for each iteration, all agents start from a start node to find their paths and collect all the paths to find the best path in the current iteration.
I am using pathos.multiprocessing.
the agent class has a traverse method as,
class Agent:
def find_a_path(self, graph):
# here is the logic to find a path by traversing the graph
return found_path
I create a helper function to wrap up the method
def do_agent_find_a_path(agent, graph):
return agent.find_a_path(graph)
then create a pool and employ amap by passing the helper function, a list of agent instance and the same graph,
pool = ProcessPool(nodes = 10)
res = pool.amap(do_agent_find_a_path, agents, [graph] * len(agents))
but, the processes are created in sequence and it runs very slow. I'd like to have some instructions on a correct/decent way to leverage pathos in this situation.
thank you!
UPDATE:
I am using pathos 0.2.3 on ubuntu,
Name: pathos
Version: 0.2.3
Summary: parallel graph management and execution in heterogeneous computing
Home-page: https://pypi.org/project/pathos
Author: Mike McKerns
i get the following error with the TreadPool sample code:
>import pathos
>pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
Traceback (most recent call last):
File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-f8f5e7774646>", line 1, in <module>
pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
AttributeError: 'ThreadPool' object has no attribute 'iumap'```
I'm the pathos author. I'm not sure how long your method takes to run, but from your comments, I'm going to assume not very long. I'd suggest that, if the method is "fast", that you use a ThreadPool instead. Also, if you don't need to preserve the order of the results, the fastest map is typically uimap (unordered, iterative map).
>>> class Agent:
... def basepath(self, dirname):
... import os
... return os.path.basename(dirname)
... def slowpath(self, dirname):
... import time
... time.sleep(.2)
... return self.basepath(dirname)
...
>>> a = Agent()
>>> import pathos.pools as pp
>>> dirs = ['/tmp/foo', '/var/path/bar', '/root/bin/bash', '/tmp/foo/bar']
>>> import time
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.006751060485839844
>>> p.close(); p.join(); p.clear()
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.0005156993865966797
>>> t.close(); t.join(); t.clear()
and, just to compare against something that takes a bit longer...
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.slowpath, dirs)); print(time.time()-go)
('bar', 'bash', 'bar', 'foo')
0.2055649757385254
>>> t.close(); t.join(); t.clear()
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.slowpath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.2084510326385498
>>> p.close(); p.join(); p.clear()
>>>

Retrieve Github repository name using GitPython

Is there a way to get the repository name using GitPython?
repo = git.Repo.clone_from(repoUrl, ".", branch=branch)
I can't seem to find any properties attached to the repo object which has this information. It might be that I misunderstand how github/GitPython works.
Simple, compact, and robust that works with remote .git repos:
from git import Repo
repo = Repo(repo_path)
# For remote repositories
repo_name = repo.remotes.origin.url.split('.git')[0].split('/')[-1]
# For local repositories
repo_name = repo.working_tree_dir.split("/")[-1]
May I suggest:
remote_url = repo.remotes[0].config_reader.get("url") # e.g. 'https://github.com/abc123/MyRepo.git'
os.path.splitext(os.path.basename(remote_url))[0] # 'MyRepo'
I don't think there is a way to do it. However, I built this function to retrieve the repository name given an URL (you can see it in action here):
def get_repo_name_from_url(url: str) -> str:
last_slash_index = url.rfind("/")
last_suffix_index = url.rfind(".git")
if last_suffix_index < 0:
last_suffix_index = len(url)
if last_slash_index < 0 or last_suffix_index <= last_slash_index:
raise Exception("Badly formatted url {}".format(url))
return url[last_slash_index + 1:last_suffix_index]
Then, you do:
get_repo_name_from_url("https://github.com/ishepard/pydriller.git") # returns pydriller
get_repo_name_from_url("https://github.com/ishepard/pydriller") # returns pydriller
get_repo_name_from_url("https://github.com/ishepard/pydriller.git/asd") # Exception
The working_dir property of the Repo object is the absolute path to the git repo. To parse the repo name, you can use the os.path.basename function.
>>> import git
>>> import os
>>>
>>> repo = git.Repo.clone_from(repoUrl, ".", branch=branch)
>>> repo.working_dir
'/home/user/repo_name'
>>> os.path.basename(repo.working_dir)
'repo_name'

pyHook or pythoncom bug?

I have Windows 7, 64 bit. I'm running the example.py file (code posted below) that comes with the pyHook package. Whenever my active window is Skype, either my computer crashes or I get 'TypeError: KeyboardSwitch() missing 8 required positional arguments: ..'. I assume that the code in the example is okay, and if I'm not using Skype it runs fine. Any thoughts?
from __future__ import print_function
import pyHook
def OnMouseEvent(event):
print('MessageName:',event.MessageName)
print('Message:',event.Message)
print('Time:',event.Time)
print('Window:',event.Window)
print('WindowName:',event.WindowName)
print('Position:',event.Position)
print('Wheel:',event.Wheel)
print('Injected:',event.Injected)
print('---')
# return True to pass the event to other handlers
# return False to stop the event from propagating
return True
def OnKeyboardEvent(event):
print('MessageName:',event.MessageName)
print('Message:',event.Message)
print('Time:',event.Time)
print('Window:',event.Window)
print('WindowName:',event.WindowName)
print('Ascii:', event.Ascii, chr(event.Ascii))
print('Key:', event.Key)
print('KeyID:', event.KeyID)
print('ScanCode:', event.ScanCode)
print('Extended:', event.Extended)
print('Injected:', event.Injected)
print('Alt', event.Alt)
print('Transition', event.Transition)
print('---')
# return True to pass the event to other handlers
# return False to stop the event from propagating
return True
# create the hook mananger
hm = pyHook.HookManager()
# register two callbacks
hm.MouseAllButtonsDown = OnMouseEvent
hm.KeyDown = OnKeyboardEvent
# hook into the mouse and keyboard events
hm.HookMouse()
hm.HookKeyboard()
if __name__ == '__main__':
import pythoncom
pythoncom.PumpMessages()
I had this and traced it to a UnicodeDecodeError when pyHook tries to interpret the window name as ascii. It fails on Skype which has unicode characters in its window name. I've posted how I fixed it here. But I had to rebuild pyHook.
PS: kind of a duplicate answer but wanted to connect this question to what I found.

from plone.app.content.batching import Batch fails with ImportError: No module named batching

I just upgraded to Plone 4.3 and I get this error:
ImportError: No module named batching
plone.app.content no more provides a batching implementation.
Replace
from plone.app.content.batching import Batch
with
try:
from plone.app.content.batching import Batch # Plone < 4.3
HAS_PLONE43 = False
except ImportError:
from plone.batching import Batch # Plone >= 4.3
HAS_PLONE43 = True
[EDIT]
The two implementations have a different API: the pagesize argument is named size in plone.app.batching; also, instead of a page number a start index is required.
If you have code that looks like this
b = Batch(items,
pagesize=pagesize,
pagenumber=pagenumber)
replace it with
if HAS_PLONE43:
b = Batch(items,
size=pagesize,
start=pagenumber * pagesize)
else:
b = Batch(items,
pagesize=pagesize,
pagenumber=pagenumber)

Resources