get Item/slice in chainer.Variable doesn't support multi-GPU chainer.training.ParallelUpdater? - chainer

I have 2-dimension array , When I use following code to compute loss:
_roi_score = roi_score[row_index, col_index]
gt_roi_label_lst = gt_roi_label_lst[row_index, col_index]
loss = F.sigmoid_cross_entropy(roi_score, gt_roi_label_lst) # multi label
during back propagation, code report error:
File "AU_rcnn/train.py", line 249, in main
trainer.run()
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/training/trainer.py", line 324, in run
six.reraise(*sys.exc_info())
File "/usr/local/anaconda3/lib/python3.6/site-packages/six.py", line 686, in reraise
raise value
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/training/trainer.py", line 310, in run
update()
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/training/updater.py", line 223, in update
self.update_core()
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/training/updater.py", line 367, in update_core
loss.backward()
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/variable.py", line 916, in backward
target_input_indexes, out_grad, in_grad)
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/function_node.py", line 486, in backward_accumulate
gxs = self.backward(target_input_indexes, grad_outputs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/function.py", line 124, in backward
gxs = self._function.backward(in_data, grad_out_data)
File "/usr/local/anaconda3/lib/python3.6/site-packages/chainer-3.0.0b1-py3.6.egg/chainer/functions/connection/linear.py", line 56, in backward
gb = gy.sum(0)
File "cupy/core/core.pyx", line 967, in cupy.core.core.ndarray.sum
File "cupy/core/core.pyx", line 975, in cupy.core.core.ndarray.sum
File "cupy/core/reduction.pxi", line 216, in cupy.core.core.simple_reduction_function.__call__
File "cupy/core/elementwise.pxi", line 102, in cupy.core.core._preprocess_args
ValueError: Array device must be same as the current device: array device = 1 while current = 0
Although I only use one GPU, it appeared. What is reason caused this, I have stucked for long time.

Related

python integration with azure gremlin not working

I am trying to mimic as mentioned in GIT.
I almost commented everything, and just trying to run simply
g.V().count()
my connection details are correct, and matched to documentation...
I am getting following error.
Traceback (most recent call last):
File "c:\Users\PrasaRak\OneDrive\gremlin_azure_function\connect.py", line 193, in <module>
count_vertices(client)
File "c:\Users\PrasaRak\OneDrive\gremlin_azure_function\connect.py", line 116, in count_vertices
callback = client.submit(_gremlin_count_vertices)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\gremlin_python\driver\client.py", line 127, in submit
return self.submitAsync(message, bindings=bindings, request_options=request_options).result()
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\gremlin_python\driver\client.py", line 148, in submitAsync
return conn.write(message)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\gremlin_python\driver\connection.py", line 55, in write
self.connect()
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\gremlin_python\driver\connection.py", line 45, in connect
self._transport.connect(self._url, self._headers)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\gremlin_python\driver\tornado\transport.py", line 40, in connect
self._ws = self._loop.run_sync(
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\ioloop.py", line 576, in run_sync
return future_cell[0].result()
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\simple_httpclient.py", line 269, in run
stream = yield self.tcp_client.connect(
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\gen.py", line 1133, in run
value = future.result()
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\gen.py", line 1147, in run
yielded = self.gen.send(value)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\tcpclient.py", line 232, in connect
af, addr, stream = yield connector.start(connect_timeout=timeout)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\tcpclient.py", line 87, in start
self.try_connect(iter(self.primary_addrs))
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\tcpclient.py", line 104, in try_connect
stream, future = self.connect(af, addr)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\tcpclient.py", line 276, in _create_stream
return stream, stream.connect(addr)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\iostream.py", line 1325, in connect
self._add_io_state(self.io_loop.WRITE)
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\iostream.py", line 1157, in _add_io_state
self.io_loop.add_handler(
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\site-packages\tornado\platform\asyncio.py", line 83, in add_handler
self.asyncio_loop.add_writer(
File "C:\Users\PrasaRak\Miniconda3\envs\learn-gremlin\lib\asyncio\events.py", line 507, in add_writer
raise NotImplementedError
NotImplementedError
I think i got the answer.
issue was with python 3.8 & Tornado compatibility, when it comes to asyncio.
more info is at this link
fix was to add following line in tornado/platform/asyncio.py
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) # python-3.8.0a4

Issue with Jupyter Object_Detection_Tutorial - DLL load failed while importing win32api

I have been trying to get a Jupyter object detection tutorial to run but for some reason, when the left hand side should go from In[] and switch to In1, it stays as In[0]. I assume this means the code isn't actually running. The top of the page also says there is a Kernel Error and when I click this, the following code is displayed:
Traceback (most recent call last):
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\web.py", line 1703, in _execute
result = await result
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "c:\anaconda\envs\tensorflow1\lib\site-packages\notebook\services\sessions\handlers.py", line 69, in post
model = yield maybe_future(
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 735, in run
value = future.result()
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "c:\anaconda\envs\tensorflow1\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 88, in create_session
kernel_id = yield self.start_kernel_for_session(session_id, path, name, type, kernel_name)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 735, in run
value = future.result()
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 742, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "c:\anaconda\envs\tensorflow1\lib\site-packages\notebook\services\sessions\sessionmanager.py", line 100, in start_kernel_for_session
kernel_id = yield maybe_future(
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 735, in run
value = future.result()
File "c:\anaconda\envs\tensorflow1\lib\site-packages\tornado\gen.py", line 209, in wrapper
yielded = next(result)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\notebook\services\kernels\kernelmanager.py", line 168, in start_kernel
super(MappingKernelManager, self).start_kernel(**kwargs)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_client\multikernelmanager.py", line 158, in start_kernel
km.start_kernel(**kwargs)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_client\manager.py", line 301, in start_kernel
kernel_cmd, kw = self.pre_start_kernel(**kw)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_client\manager.py", line 248, in pre_start_kernel
self.write_connection_file()
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_client\connect.py", line 468, in write_connection_file
self.connection_file, cfg = write_connection_file(self.connection_file,
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_client\connect.py", line 138, in write_connection_file
with secure_write(fname) as f:
File "c:\anaconda\envs\tensorflow1\lib\contextlib.py", line 113, in enter
return next(self.gen)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_core\paths.py", line 435, in secure_write
win32_restrict_file_to_user(fname)
File "c:\anaconda\envs\tensorflow1\lib\site-packages\jupyter_core\paths.py", line 361, in win32_restrict_file_to_user
import win32api
ImportError: DLL load failed while importing win32api: The specified module could not be found.
I am not sure why there is an import error. For reference, I have downloaded CUDA 10.1, cuDNN 7.6.5, anaconda for python 3.6, and TensorFlow 2.0.0. Please let me know how I can fix this to get the Jupyter code to run.
I had the same problem. Then I rebooted and it worked fine.

gremlin python add vertex KeyError

I'm using gremlinpython. Inserting a vertex with a property value of greater than 32-bits results in a KeyError.
g.addV('test').property('size', 2147483648).iterate()
File "/home/ec2-user/src/common/test.py", line 74, in insert_vertices
self.g.addV('test').property('size', 2147483648).iterate()
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/process/traversal.py", line 65, in iterate
try: self.nextTraverser()
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/process/traversal.py", line 70, in nextTraverser
self.traversal_strategies.apply_strategies(self)
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/process/traversal.py", line 506, in apply_strategies
traversal_strategy.apply(traversal)
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/driver/remote_connection.py", line 148, in apply
remote_traversal = self.remote_connection.submit(traversal.bytecode)
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/driver/driver_remote_connection.py", line 54, in submit
results = result_set.all().result()
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/driver/resultset.py", line 90, in cb
f.result()
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/driver/connection.py", line 80, in _receive
status_code = self._protocol.data_received(data, self._results)
File "/home/ec2-user/venv/lib64/python3.6/dist-packages/gremlin_python/driver/protocol.py", line 83, in data_received
result_set = results_dict[request_id]
KeyError: None
A value of less than 32 bits works fine
g.addV('test').property('size', 2147483647).iterate()
Casting it to float also works fine
g.addV('test').property('size', float(2147483648)).iterate()
Same behavior with a locally running gremlin server and a remote Neptune DB. It works fine from the gremlin console. So I don't think this is a server issue.
Python version - 3.6 and 3.7
gremlinpython version - 3.4.1
You explicitly need to define that number as a long() like:
from gremlin_python.statics import *
g.addV('test').property('size', long(2147483648)).iterate()

Getting an error while starting airflow worker

I have installed airflow and trying to start the worker on the mac. But I am getting following error. Unable to identify what must be causing this issue.
[2018-05-02 15:37:11,458: CRITICAL/MainProcess] Unrecoverable error: TypeError("Invalid argument(s) 'visibility_timeout' sent to create_engine(), using configuration MySQLDialect_mysqldb/QueuePool/Engine. Please check that the keyword arguments are appropriate for this combination of components.",)
Traceback (most recent call last):
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/bootsteps.py", line 370, in start
return self.obj.start()
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/worker/consumer/consumer.py", line 320, in start
blueprint.start(self)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/worker/consumer/tasks.py", line 37, in start
c.connection, on_decode_error=c.on_decode_error,
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/celery/app/amqp.py", line 302, in TaskConsumer
**kw
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/messaging.py", line 386, in __init__
self.revive(self.channel)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/messaging.py", line 408, in revive
self.declare()
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/messaging.py", line 421, in declare
queue.declare()
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/entity.py", line 605, in declare
self._create_queue(nowait=nowait, channel=channel)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/entity.py", line 614, in _create_queue
self.queue_declare(nowait=nowait, passive=False, channel=channel)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/entity.py", line 649, in queue_declare
nowait=nowait,
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/transport/virtual/base.py", line 531, in queue_declare
self._new_queue(queue, **kwargs)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 82, in _new_queue
self._get_or_create(queue)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 70, in _get_or_create
obj = self.session.query(self.queue_cls) \
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 65, in session
_, Session = self._open()
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 56, in _open
engine = self._engine_from_config()
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/kombu/transport/sqlalchemy/__init__.py", line 51, in _engine_from_config
return create_engine(conninfo.hostname, **transport_options)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/sqlalchemy/engine/__init__.py", line 424, in create_engine
return strategy.create(*args, **kwargs)
File "/Users/manishz/anaconda2/envs/airflow/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 162, in create
engineclass.__name__))
TypeError: Invalid argument(s) 'visibility_timeout' sent to create_engine(), using configuration MySQLDialect_mysqldb/QueuePool/Engine. Please check that the keyword arguments are appropriate for this combination of components.
Appreciate any help on it.
Thanks in avance
Manish

Running MPI on ParallelGroup I get a reshape error in default_vector

I have a workflow with a ParallelGroup. I instantiate many times the same Component inside it and pass each a different input.
I am using the option prob.setup(vector_class=PETScVector, check=False, mode='fwd') like in the example.
I get the following error:
Traceback (most recent call last):
File "workflow.py", line 73, in <module>
prob.run_model()
File "/usr/local/lib/python2.7/dist-packages/openmdao/core/problem.py", line 282, in run_model
self.final_setup()
File "/usr/local/lib/python2.7/dist-packages/openmdao/core/problem.py", line 423, in final_setup
model._final_setup(comm, vector_class, 'full', force_alloc_complex=force_alloc_complex)
File "/usr/local/lib/python2.7/dist-packages/openmdao/core/system.py", line 787, in _final_setup
force_alloc_complex=force_alloc_complex)
File "/usr/local/lib/python2.7/dist-packages/openmdao/core/system.py", line 586, in _get_root_vectors
ncol=ncol, relevant=rel)
File "/usr/local/lib/python2.7/dist-packages/openmdao/vectors/vector.py", line 160, in __init__
self._initialize_views()
File "/usr/local/lib/python2.7/dist-packages/openmdao/vectors/default_vector.py", line 320, in _initialize_views
v.shape = shape
ValueError: cannot reshape array of size 0 into shape (4,3)
The variable with shape (4,3) is a "global" variable given to the ParallelGroup (by promotion of each of its Components) by an external IndepVarComp.
EDIT: This only happens when the number of nodes allocated is less than the number of Components in the ParallelGroup.

Resources