parallel list comprehension using Pool map - python-3.4

I have a list comprehension:
thingie=[f(a,x,c) for x in some_list]
which I am parallelising as follows:
from multiprocessing import Pool
pool=Pool(processes=4)
thingie=pool.map(lambda x: f(a,x,c), some_list)
but I get the following error:
_pickle.PicklingError: Can't pickle <function <lambda> at 0x7f60b3b0e9d8>:
attribute lookup <lambda> on __main__ failed
I have tried to install the pathos package which apparently addresses this issue, but when I try to import it I get the error:
ImportError: No module named 'pathos'

OK, so this answer is just for the record, I've figured it out with author of the question during comment conversation.
multiprocessing needs to transport every object between processes, so it uses pickle to serialize it in one process and deserialize in another. It all works well, but pickle cannot serialize lambda. AFAIR it is so because pickle needs functions source to serialize it, and lambda won't have it, but I'm not 100% sure and cannot quote my source.
It won't be any problem if you use map() on 1 argument function - you can pass that function instead of lambda. If you have more arguments, like in your example, you need to define some wrapper with def keyword:
from multiprocessing import Pool
def f(x, y, z):
print(x, y, z)
def f_wrapper(y):
return f(1, y, "a")
pool = Pool(processes=4)
result = pool.map(f_wrapper, [7, 9, 11])

Just before I close this, I found another way to do this with Python 3, using functools,
say I have a function f with three variables f(a,x,c), one of which I want to may, say x. I can use the following code to do basically what #FilipMalczak suggests:
import functools
from multiprocessing import Pool
f1=functools.partial(f,a=10)
f2=functools.partial(f2,c=10)
pool=Pool(processes=4)
final_answer=pool.map(f2,some_list)

Related

Automatic change of defaultdict (python 3)

I encounter a problem while translating from python2 to python3 the following line:
fmap = defaultdict(count(1).next)
I changed count(1).next to next(count(1))
but get this error:
fmap = defaultdict(next(count(1))) TypeError: first argument must be
callable or None
I guess this line intend to assign new default value each time. Do you have suggestions?
Thanks
The error is clear - the first argument to a defaultdict must be a callable (function for example, or class name), or None. This callable will be called in case a key does not exist to construct the default vale. On the other hand:
next(count(3))
will return an integer, which is not callable, and makes no sense. If you want the defaultdict to default to an increasing a number whenever a missing key is used then something close to what you have is:
>>> x=defaultdict(lambda x=count(30): next(x))
>>> x[1]
30
>>> x[2]
31
>>> x[3]
32
>>> x[4]
33
The .next() method on iterators has been renamed in Python 3. Use .__next__() instead.
Code
fmap = defaultdict(count(1).__next__)
Demo
fmap["a"]
# 1
fmap["b"]
# 2
Note, defaultdict needs a callable argument, something that will act as a function, hence parentheses are removed, e.g. __next__.

How to use assocs with HashMap in Haskell

Currently I'm trying to use the Map's assocs method, but unable to figure out how to get it to work for a HashMap. For a regular Map the following works just fine.
import qualified Data.Map as M
test = M.fromList [("a", 1), ("b", 2)]
M.assocs test
However when I try the same thing with a HashMap it doesn't work. I tried several variation on the import all fail with different errors. Oddly however most other functions that work on maps work just fine with the below import, for example I have no trouble using M.lookup.
import qualified Data.HashMap.Lazy as M
test = M.fromList [("a", 1), ("b", 2)]
M.assocs test
In case it is useful the above code gives the following error:
<interactive>:1:1: error:
Not in scope: ‘M.assocs’
No module named ‘M’ is imported.
Data.HashMap.Lazy, from unordered-containers, does not export an assocs function.
You might be thinking of Data.HashMap from the hashmap package.
I figured out the answer. In Data.HashMap.Lazy the method toList performs the same function as assocs. As such the following code works.
import qualified Data.HashMap.Lazy as M
test = M.fromList [("a", 1), ("b", 2)]
M.toList test

Why does setting an initialization value prevent placing a variable on a GPU in TensorFlow?

I get an exception when I try to run the following very simple TensorFlow code, although I virtually copied it from the documentation:
import tensorflow as tf
with tf.device("/gpu:0"):
x = tf.Variable(0, name="x")
sess = tf.Session()
sess.run(x.initializer) # Bombs!
The exception is:
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to
node 'x': Could not satisfy explicit device specification '/device:GPU:0' because
no supported kernel for GPU devices is available.
If I change the variable's initial value to tf.zeros([1]) instead, everything works fine:
import tensorflow as tf
with tf.device("/gpu:0"):
x = tf.Variable(tf.zeros([1]), name="x")
sess = tf.Session()
sess.run(x.initializer) # Works fine
Any idea what's going on?
This error arises because tf.Variable(0, ...) defines a variable of element type tf.int32, and there is no kernel that implements int32 variables on GPU in the standard TensorFlow distribution. When you use tf.Variable(tf.zeros([1])), you're defining a variable of element type tf.float32, which is supported on GPU.
The story of tf.int32 on GPUs in TensorFlow is a long one. While it's technically easy to support integer operations running on a GPU, our experience has been that most integer operations actually take place on the metadata of tensors, and this metadata lives on the CPU, so it's more efficient to operate on it there. As a short-term workaround, several kernel registrations for int32 on GPUs were removed. However, if these would be useful for your models, it would be possible to add them as custom ops.
Source: In TensorFlow 0.10, the Variable-related kernels are registered using the TF_CALL_GPU_NUMBER_TYPES() macro. The current "GPU number types" are tf.float16, tf.float32, and tf.float64.

Use string in subprocess

I've written Python code to compute an IP programmatically, that I then want to use in an external connection program.
I don't know how to pass it to the subprocess:
import subprocess
from subprocess import call
some_ip = "192.0.2.0" # Actually the result of some computation,
# so I can't just paste it into the call below.
subprocess.call("given.exe -connect host (some_ip)::5631 -Password")
I've read what I could and found similar questions but I truly cannot understand this step, to use the value of some_ip in the subprocess. If someone could explain this to me it would be greatly appreciated.
If you don't use it with shell=True (and I don't recommend shell=True unless you really know what you're doing, as shell mode can have security implications) subprocess.call takes the command as an sequence (e.g. a list) of its components: First the executable name, then the arguments you want to pass to it. All of those should be strings, but whether they are string literals, variables holding a string or function calls returning a string doesn't matter.
Thus, the following should work:
import subprocess
some_ip = "192.0.2.0" # Actually the result of some computation.
subprocess.call(
["given.exe", "-connect", "host", "{}::5631".format(some_ip), "-Password"])
I'm using str's format method to replace the {} placeholder in "{}::5631" with the string in some_ip.
If you invoke it as subprocess.call(...), then
import subprocess
is sufficient and
from subprocess import call
is unnecessary. The latter would be needed if you want to invoke the function as just call(...). In that case the former import would be unneeded.

Python function object on Map function going weird. (Spark)

I have a dictionary that maps a key to a function object. Then, using Spark 1.4.1 (Spark may not even be relevant for this question), I try to map each object in the RDD using a function object retrieved from the dictionary (acts as look-up table). e.g. a small snippet of my code:
fnCall = groupFnList[0].fn
pagesRDD = pagesRDD.map(lambda x: [x, fnCall(x[0])]).map(shapeToTuple)
Now, it has fetched from a namedtuple the function object. Which I temporarily 'store' (c.q. pointing to fn obj) in FnCall. Then, using the map operations I want the x[0] element of each tuple to be processed using that function.
All works fine and good in that there indeed IS a fn object, but it behaves in a weird way.
Each time I call an action method on the RDD, even without having used a fn obj in between, the RDD values have changed! To visualize this I have created dummy functions for the fn objects that just output a random integer. After calling the fn obj on the RDD, I can inspect it with .take() or .first() and get the following:
pagesRDD.first()
>>> [(u'myPDF1.pdf', u'34', u'930', u'30')]
pagesRDD.first()
>>> [(u'myPDF1.pdf', u'23', u'472', u'11')]
pagesRDD.first()
>>> [(u'myPDF1.pdf', u'4', u'69', u'25')]
So it seems to me that the RDD's elements have the functions bound to them in some way, and each time I do an action operation (like .first(), very simple) it 'updates' the RDD's contents.
I don't want this to happen! I just want the function to process the RDD ONLY when I call it with a map operation. How can I 'unbind' this function after the map operation?
Any ideas?
Thanks!
####### UPDATE:
So apparently rewriting my code to call it like pagesRDD.map(fnCall) should do the trick, but why should this even matter? If I call
rdd = rdd.map(lambda x: (x,1))
rdd.first()
>>> # some output
rdd.first()
>>> # same output as before!
So in this case, using a lambda function it would not get bound to the rdd and would not be called each time I do a .take()-like action. So why is that the case when I use a fn object INSIDE the lambda? Logically it just does not make sense to me. Any explanation on this?
If you redefine your functions that their parameter is an iterable. Your code should look like this.
pagesRDD = pagesRDD.map(fnCall).map(shapeToTuple)

Resources