run the same method of a list of instances in pathos.multiprocessing - asynchronous

I am working on a traveling salesman problem. Given that all agents traverse the same graph to find their own path separately, i am trying to parallelize the path-finding action of agents. the task is for each iteration, all agents start from a start node to find their paths and collect all the paths to find the best path in the current iteration.
I am using pathos.multiprocessing.
the agent class has a traverse method as,
class Agent:
def find_a_path(self, graph):
# here is the logic to find a path by traversing the graph
return found_path
I create a helper function to wrap up the method
def do_agent_find_a_path(agent, graph):
return agent.find_a_path(graph)
then create a pool and employ amap by passing the helper function, a list of agent instance and the same graph,
pool = ProcessPool(nodes = 10)
res = pool.amap(do_agent_find_a_path, agents, [graph] * len(agents))
but, the processes are created in sequence and it runs very slow. I'd like to have some instructions on a correct/decent way to leverage pathos in this situation.
thank you!
UPDATE:
I am using pathos 0.2.3 on ubuntu,
Name: pathos
Version: 0.2.3
Summary: parallel graph management and execution in heterogeneous computing
Home-page: https://pypi.org/project/pathos
Author: Mike McKerns
i get the following error with the TreadPool sample code:
>import pathos
>pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
Traceback (most recent call last):
File "/opt/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-f8f5e7774646>", line 1, in <module>
pathos.pools.ThreadPool().iumap(lambda x:x*x, [1,2,3,4])
AttributeError: 'ThreadPool' object has no attribute 'iumap'```

I'm the pathos author. I'm not sure how long your method takes to run, but from your comments, I'm going to assume not very long. I'd suggest that, if the method is "fast", that you use a ThreadPool instead. Also, if you don't need to preserve the order of the results, the fastest map is typically uimap (unordered, iterative map).
>>> class Agent:
... def basepath(self, dirname):
... import os
... return os.path.basename(dirname)
... def slowpath(self, dirname):
... import time
... time.sleep(.2)
... return self.basepath(dirname)
...
>>> a = Agent()
>>> import pathos.pools as pp
>>> dirs = ['/tmp/foo', '/var/path/bar', '/root/bin/bash', '/tmp/foo/bar']
>>> import time
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.006751060485839844
>>> p.close(); p.join(); p.clear()
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.basepath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.0005156993865966797
>>> t.close(); t.join(); t.clear()
and, just to compare against something that takes a bit longer...
>>> t = pp.ThreadPool(4)
>>> go = time.time(); tuple(t.uimap(a.slowpath, dirs)); print(time.time()-go)
('bar', 'bash', 'bar', 'foo')
0.2055649757385254
>>> t.close(); t.join(); t.clear()
>>> p = pp.ProcessPool()
>>> go = time.time(); tuple(p.uimap(a.slowpath, dirs)); print(time.time()-go)
('foo', 'bar', 'bash', 'bar')
0.2084510326385498
>>> p.close(); p.join(); p.clear()
>>>

Related

dagster solid Parallel Run Test exmaple

I tried to run parallel, but it didn't work as I expected
The progress bar doesn't work the way I thought it would.
I think that both operations should be executed at the same time.
but first run find_highest_calorie_cereal after find_highest_protein_cereal
import csv
import time
import requests
from dagster import pipeline, solid
# start_complex_pipeline_marker_0
#solid
def download_cereals():
response = requests.get("https://docs.dagster.io/assets/cereal.csv")
lines = response.text.split("\n")
return [row for row in csv.DictReader(lines)]
#solid
def find_highest_calorie_cereal(cereals):
time.sleep(5)
sorted_cereals = list(
sorted(cereals, key=lambda cereal: cereal["calories"])
)
return sorted_cereals[-1]["name"]
#solid
def find_highest_protein_cereal(context, cereals):
time.sleep(10)
sorted_cereals = list(
sorted(cereals, key=lambda cereal: cereal["protein"])
)
# for i in range(1, 11):
# context.log.info(str(i) + '~~~~~~~~')
# time.sleep(1)
return sorted_cereals[-1]["name"]
#solid
def display_results(context, most_calories, most_protein):
context.log.info(f"Most caloric cereal 테스트: {most_calories}")
context.log.info(f"Most protein-rich cereal: {most_protein}")
#pipeline
def complex_pipeline():
cereals = download_cereals()
display_results(
most_protein=find_highest_protein_cereal(cereals),
most_calories=find_highest_calorie_cereal(cereals),
)
I am not sure but I think you should set up a executor with parallelism available. You could use multiprocess_executor.
"Executors are responsible for executing steps within a pipeline run.
Once a run has launched and the process for the run, or run worker,
has been allocated and started, the executor assumes responsibility
for execution."
modes provide the possible set of executors one can use. Use the executor_defs property on ModeDefinition.
MODE_DEV = ModeDefinition(name="dev", executor_defs=[multiprocess_executor])
#pipeline(mode_defs=[MODE_DEV], preset_defs=[Preset_test])
the execution config section of the run config determines the actual executor.
in the yml file or run_config, set:
execution:
multiprocess:
config:
max_concurrent: 4
retrieved from : https://docs.dagster.io/deployment/executors

How to load Hydra parameters from previous jobs (without having to use argparse and the compose API)?

I'm using Hydra for training machine learning models. It's great for doing complex commands like python train.py data=MNIST batch_size=64 loss=l2. However, if I want to then run the trained model with the same parameters, I have to do something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml. I then use argparse to load in the previous yaml and use the compose API to initialize the Hydra environment. The path to the trained model is inferred from the path to Hydra's .yaml file. If I want to modify one of the parameters, I have to add additional argparse parameters and run something like python reconstruct.py --config_file path_to_previous_job/.hydra/config.yaml --batch_size 128. The code then manually overrides any Hydra parameters with those that were specified on the command line.
What's the right way of doing this?
My current code looks something like the following:
train.py:
import hydra
#hydra.main(config_name="config", config_path="conf")
def main(cfg):
# [training code using cfg.data, cfg.batch_size, cfg.loss etc.]
# [code outputs model checkpoint to job folder generated by Hydra]
main()
reconstruct.py:
import argparse
import os
from hydra.experimental import initialize, compose
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('hydra_config')
parser.add_argument('--batch_size', type=int)
# [other flags and parameters I may need to override]
args = parser.parse_args()
# Create the Hydra environment.
initialize()
cfg = compose(config_name=args.hydra_config)
# Since checkpoints are stored next to the .hydra, we manually generate the path.
checkpoint_dir = os.path.dirname(os.path.dirname(args.hydra_config))
# Manually override any parameters which can be changed on the command line.
batch_size = args.batch_size if args.batch_size else cfg.data.batch_size
# [code which uses checkpoint_dir to load the model]
# [code which uses both batch_size and params in cfg to set up the data etc.]
This is my first time posting, so let me know if I should clarify anything.
If you want to load the previous config as is and not change it, use OmegaConf.load(file_path).
If you want to re-compose the config (and it sounds like you do, because you added that you want override things), I recommend that you use the Compose API and pass in parameters from the overrides file in the job output directory (next to the stored config.yaml), but concatenate the current run parameters.
This script seems to be doing the job:
import os
from dataclasses import dataclass
from os.path import join
from typing import Optional
from omegaconf import OmegaConf
import hydra
from hydra import compose
from hydra.core.config_store import ConfigStore
from hydra.core.hydra_config import HydraConfig
from hydra.utils import to_absolute_path
# You can also use a yaml config file instead of this Structured Config
#dataclass
class Config:
load_checkpoint: Optional[str] = None
batch_size: int = 16
loss: str = "l2"
cs = ConfigStore.instance()
cs.store(name="config", node=Config)
#hydra.main(config_path=".", config_name="config")
def my_app(cfg: Config) -> None:
if cfg.load_checkpoint is not None:
output_dir = to_absolute_path(cfg.load_checkpoint)
original_overrides = OmegaConf.load(join(output_dir, ".hydra/overrides.yaml"))
current_overrides = HydraConfig.get().overrides.task
hydra_config = OmegaConf.load(join(output_dir, ".hydra/hydra.yaml"))
# getting the config name from the previous job.
config_name = hydra_config.hydra.job.config_name
# concatenating the original overrides with the current overrides
overrides = original_overrides + current_overrides
# compose a new config from scratch
cfg = compose(config_name, overrides=overrides)
# train
print("Running in ", os.getcwd())
print(OmegaConf.to_yaml(cfg))
if __name__ == "__main__":
my_app()
~/tmp$ python train.py
Running in /home/omry/tmp/outputs/2021-04-19/21-23-13
load_checkpoint: null
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13
Running in /home/omry/tmp/outputs/2021-04-19/21-23-22
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 16
loss: l2
~/tmp$ python train.py load_checkpoint=/home/omry/tmp/outputs/2021-04-19/21-23-13 batch_size=32
Running in /home/omry/tmp/outputs/2021-04-19/21-23-28
load_checkpoint: /home/omry/tmp/outputs/2021-04-19/21-23-13
batch_size: 32
loss: l2

Emitting dronekit.io vehicle's attribute changes using flask-socket.io

I'm trying to send data from my dronekit.io vehicle using flask-socket.io. Unfortunately, I got this log:
Starting copter simulator (SITL)
SITL already Downloaded and Extracted.
Ready to boot.
Connecting to vehicle on: tcp:127.0.0.1:5760
>>> APM:Copter V3.3 (d6053245)
>>> Frame: QUAD
>>> Calibrating barometer
>>> Initialising APM...
>>> barometer calibration complete
>>> GROUND START
* Restarting with stat
latitude -35.363261
>>> Exception in attribute handler for location.global_relative_frame
>>> Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
longitude 149.1652299
>>> Exception in attribute handler for location.global_relative_frame
>>> Working outside of request context.
This typically means that you attempted to use functionality that needed
an active HTTP request. Consult the documentation on testing for
information about how to avoid this problem.
Here is my code:
sample.py
from dronekit import connect, VehicleMode
from flask import Flask
from flask_socketio import SocketIO, emit
import dronekit_sitl
import time
sitl = dronekit_sitl.start_default()
connection_string = sitl.connection_string()
print("Connecting to vehicle on: %s" % (connection_string,))
vehicle = connect(connection_string, wait_ready=True)
def arm_and_takeoff(aTargetAltitude):
print "Basic pre-arm checks"
while not vehicle.is_armable:
print " Waiting for vehicle to initialise..."
time.sleep(1)
print "Arming motors"
vehicle.mode = VehicleMode("GUIDED")
vehicle.armed = True
while not vehicle.armed:
print " Waiting for arming..."
time.sleep(1)
print "Taking off!"
vehicle.simple_takeoff(aTargetAltitude)
while True:
if vehicle.location.global_relative_frame.alt>=aTargetAltitude*0.95:
print "Reached target altitude"
break
time.sleep(1)
last_latitude = 0.0
last_longitude = 0.0
last_altitude = 0.0
#vehicle.on_attribute('location.global_relative_frame')
def location_callback(self, attr_name, value):
global last_latitude
global last_longitude
global last_altitude
if round(value.lat, 6) != round(last_latitude, 6):
last_latitude = value.lat
print "latitude ", value.lat, "\n"
emit("latitude", value.lat)
if round(value.lon, 6) != round(last_longitude, 6):
last_longitude = value.lon
print "longitude ", value.lon, "\n"
emit("longitude", value.lon)
if round(value.alt) != round(last_altitude):
last_altitude = value.alt
print "altitude ", value.alt, "\n"
emit("altitude", value.alt)
app = Flask(__name__)
socketio = SocketIO(app)
if __name__ == '__main__':
socketio.run(app, host='0.0.0.0', port=5000, debug=True)
arm_and_takeoff(20)
I know because of the logs that I should not do any HTTP request inside "vehicle.on_attribute" decorator method and I should search for information on how to solve this problem but I didn't found any info about the error.
Hope you could help me.
Thank you very much,
Raniel
The emit() function by default returns an event back to the active client. If you call this function outside of a request context there is no concept of active client, so you get this error.
You have a couple of options:
indicate the recipient of the event and the namespace that you are using, so that there is no need to look them up in the context. You can do this by adding room and namespace arguments. Use '/' for the namespace if you are using the default namespace.
emit to all clients by adding broadcast=True as an argument, plus the namespace as indicated in #1.

How to get sys.exc_traceback form IPython shell.run_code?

My app interfaces with the IPython Qt shell with code something like this:
from IPython.core.interactiveshell import ExecutionResult
shell = self.kernelApp.shell # ZMQInteractiveShell
code = compile(script, file_name, 'exec')
result = ExecutionResult()
shell.run_code(code, result=result)
if result:
self.show_result(result)
The problem is: how can show_result show the traceback resulting from exceptions in code?
Neither the error_before_exec nor the error_in_exec ivars of ExecutionResult seem to give references to the traceback. Similarly, neither sys nor shell.user_ns.namespace.get('sys') have sys.exc_traceback attributes.
Any ideas? Thanks!
Edward
IPython/core/interactiveshell.py contains InteractiveShell._showtraceback:
def _showtraceback(self, etype, evalue, stb):
"""Actually show a traceback. Subclasses may override..."""
print(self.InteractiveTB.stb2text(stb), file=io.stdout)
The solution is to monkey-patch IS._showtraceback so that it writes to sys.stdout (the Qt console):
from __future__ import print_function
...
shell = self.kernelApp.shell # ZMQInteractiveShell
code = compile(script, file_name, 'exec')
def show_traceback(etype, evalue, stb, shell=shell):
print(shell.InteractiveTB.stb2text(stb), file=sys.stderr)
sys.stderr.flush() # <==== Oh, so important
old_show = getattr(shell, '_showtraceback', None)
shell._showtraceback = show_traceback
shell.run_code(code)
if old_show: shell._showtraceback = old_show
Note: there is no need to pass an ExecutionResult object to shell.run_code().
EKR

Create a portal_user_catalog and have it used (Plone)

I'm creating a fork of my Plone site (which has not been forked for a long time). This site has a special catalog object for user profiles (a special Archetypes-based object type) which is called portal_user_catalog:
$ bin/instance debug
>>> portal = app.Plone
>>> print [d for d in portal.objectMap() if d['meta_type'] == 'Plone Catalog Tool']
[{'meta_type': 'Plone Catalog Tool', 'id': 'portal_catalog'},
{'meta_type': 'Plone Catalog Tool', 'id': 'portal_user_catalog'}]
This looks reasonable because the user profiles don't have most of the indexes of the "normal" objects, but have a small set of own indexes.
Since I found no way how to create this object from scratch, I exported it from the old site (as portal_user_catalog.zexp) and imported it in the new site. This seemed to work, but I can't add objects to the imported catalog, not even by explicitly calling the catalog_object method. Instead, the user profiles are added to the standard portal_catalog.
Now I found a module in my product which seems to serve the purpose (Products/myproduct/exportimport/catalog.py):
"""Catalog tool setup handlers.
$Id: catalog.py 77004 2007-06-24 08:57:54Z yuppie $
"""
from Products.GenericSetup.utils import exportObjects
from Products.GenericSetup.utils import importObjects
from Products.CMFCore.utils import getToolByName
from zope.component import queryMultiAdapter
from Products.GenericSetup.interfaces import IBody
def importCatalogTool(context):
"""Import catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog')
parent_path=''
if obj and not obj():
importer = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
__traceback_info__ = path
print [importer]
if importer:
print importer.name
if importer.name:
path = '%s%s' % (parent_path, 'usercatalog')
print path
filename = '%s%s' % (path, importer.suffix)
print filename
body = context.readDataFile(filename)
if body is not None:
importer.filename = filename # for error reporting
importer.body = body
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
importObjects(sub, path+'/', context)
def exportCatalogTool(context):
"""Export catalog tool.
"""
site = context.getSite()
obj = getToolByName(site, 'portal_user_catalog', None)
if tool is None:
logger = context.getLogger('catalog')
logger.info('Nothing to export.')
return
parent_path=''
exporter = queryMultiAdapter((obj, context), IBody)
path = '%s%s' % (parent_path, obj.getId().replace(' ', '_'))
if exporter:
if exporter.name:
path = '%s%s' % (parent_path, 'usercatalog')
filename = '%s%s' % (path, exporter.suffix)
body = exporter.body
if body is not None:
context.writeDataFile(filename, body, exporter.mime_type)
if getattr(obj, 'objectValues', False):
for sub in obj.objectValues():
exportObjects(sub, path+'/', context)
I tried to use it, but I have no idea how it is supposed to be done;
I can't call it TTW (should I try to publish the methods?!).
I tried it in a debug session:
$ bin/instance debug
>>> portal = app.Plone
>>> from Products.myproduct.exportimport.catalog import exportCatalogTool
>>> exportCatalogTool(portal)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File ".../Products/myproduct/exportimport/catalog.py", line 58, in exportCatalogTool
site = context.getSite()
AttributeError: getSite
So, if this is the way to go, it looks like I need a "real" context.
Update: To get this context, I tried an External Method:
# -*- coding: utf-8 -*-
from Products.myproduct.exportimport.catalog import exportCatalogTool
from pdb import set_trace
def p(dt, dd):
print '%-16s%s' % (dt+':', dd)
def main(self):
"""
Export the portal_user_catalog
"""
g = globals()
print '#' * 79
for a in ('__package__', '__module__'):
if a in g:
p(a, g[a])
p('self', self)
set_trace()
exportCatalogTool(self)
However, wenn I called it, I got the same <PloneSite at /Plone> object as the argument to the main function, which didn't have the getSite attribute. Perhaps my site doesn't call such External Methods correctly?
Or would I need to mention this module somehow in my configure.zcml, but how? I searched my directory tree (especially below Products/myproduct/profiles) for exportimport, the module name, and several other strings, but I couldn't find anything; perhaps there has been an integration once but was broken ...
So how do I make this portal_user_catalog work?
Thank you!
Update: Another debug session suggests the source of the problem to be some transaction matter:
>>> portal = app.Plone
>>> puc = portal.portal_user_catalog
>>> puc._catalog()
[]
>>> profiles_folder = portal.some_folder_with_profiles
>>> for o in profiles_folder.objectValues():
... puc.catalog_object(o)
...
>>> puc._catalog()
[<Products.ZCatalog.Catalog.mybrains object at 0x69ff8d8>, ...]
This population of the portal_user_catalog doesn't persist; after termination of the debug session and starting fg, the brains are gone.
It looks like the problem was indeed related with transactions.
I had
import transaction
...
class Browser(BrowserView):
...
def processNewUser(self):
....
transaction.commit()
before, but apparently this was not good enough (and/or perhaps not done correctly).
Now I start the transaction explicitly with transaction.begin(), save intermediate results with transaction.savepoint(), abort the transaction explicitly with transaction.abort() in case of errors (try / except), and have exactly one transaction.commit() at the end, in the case of success. Everything seems to work.
Of course, Plone still doesn't take this non-standard catalog into account; when I "clear and rebuild" it, it is empty afterwards. But for my application it works well enough.

Resources