Porting to Python3: PyPDF2 mergePage() gives TypeError - python-3.4

I'm using Python 3.4.2 and PyPDF2 1.24 (also using reportlab 3.1.44 in case that helps) on windows 7.
I recently upgraded from Python 2.7 to 3.4, and am in the process of porting my code. This code is used to create a blank pdf page with links embedded in it (using reportlab) and merge it (using PyPDF2) with an existing pdf page. I had an issue with reportlab in that saving the canvas used StringIO which needed to be changed to BytesIO, but after doing that I ran into this error:
Traceback (most recent call last):
File "C:\cms_software\pdf_replica\builder.py", line 401, in merge_pdf_files
input_page.mergePage(link_page)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2013, in mergePage
self.mergePage(page2)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2059, in mergePage
page2Content = PageObject._pushPopGS(page2Content, self.pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 1973, in _pushPopGS
stream = ContentStream(contents, pdf)
File "C:\Python34\lib\site-packages\PyPDF2\pdf.py", line 2446, in __init
stream = BytesIO(b_(stream.getData()))
File "C:\Python34\lib\site-packages\PyPDF2\generic.py", line 826, in getData
decoded._data = filters.decodeStreamData(self)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 326, in decodeStreamData
data = ASCII85Decode.decode(data)
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in decode
data = [y for y in data if not (y in ' \n\r\t')]
File "C:\Python34\lib\site-packages\PyPDF2\filters.py", line 264, in
data = [y for y in data if not (y in ' \n\r\t')]
TypeError: 'in <string>' requires string as left operand, not int
Here is the line and the line above where the traceback mentions:
link_page = self.make_pdf_link_page(pdf, size, margin, scale_factor, debug_article_links)
if link_page != None:
input_page.mergePage(link_page)
Here are the relevant parts of that make_pdf_link_page function:
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=(size['width'], size['height']))
....# left out code here is just reportlab specifics for size and url stuff
can.linkURL(url, r1, thickness=1, color=colors.green)
can.rect(x1, y1, width, height, stroke=1, fill=0)
# create a new PDF with Reportlab that has the url link embedded
can.save()
packet.seek(0)
try:
new_pdf = PdfFileReader(packet)
except Exception as e:
logger.exception('e')
return None
return new_pdf.getPage(0)
I'm assuming it's a problem with using BytesIO, but I can't create the page with reportlab with StringIO. This is a critical feature that used to work perfectly with Python 2.7, so I'd appreciate any kind of feedback on this. Thanks!
UPDATE:
I've also tried changing from using BytesIO to just writing to a temp file, then merging. Unfortunately I got the same error.
Here is tempfile version:
import tempfile
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, "tmp.pdf")
can = canvas.Canvas(temp_path, pagesize=(size['width'], size['height']))
....
can.showPage()
can.save()
try:
new_pdf = PdfFileReader(temp_path)
except Exception as e:
logger.exception('e')
return None
return new_pdf.getPage(0)
UPDATE:
I found an interesting bit of information on this. It seems if I comment out the can.rect and can.linkURL calls it will merge. So drawing anything on a page, then trying to merge it with my existing pdf is causing the error.

After digging in to PyPDF2 library code, I was able to find my own answer. For python 3 users, old libraries can be tricky. Even if they say they support python 3, they don't necessarily test everything. In this case, the problem was with the class ASCII85Decode in filters.py in PyPDF2. For python 3, this class needs to return bytes. I borrowed the code for this same type of function from pdfminer3k, which is a port for python 3 of pdfminer. If you exchange the ASCII85Decode() class for this code, it will work:
import struct
class ASCII85Decode(object):
def decode(data, decodeParms=None):
if isinstance(data, str):
data = data.encode('ascii')
n = b = 0
out = bytearray()
for c in data:
if ord('!') <= c and c <= ord('u'):
n += 1
b = b*85+(c-33)
if n == 5:
out += struct.pack(b'>L',b)
n = b = 0
elif c == ord('z'):
assert n == 0
out += b'\0\0\0\0'
elif c == ord('~'):
if n:
for _ in range(5-n):
b = b*85+84
out += struct.pack(b'>L',b)[:n-1]
break
return bytes(out)

Related

detectron2 diffusioninst: oom-kill during training

I tried to run code for DiffusionInst based on Detectron2 (source code: https://github.com/chenhaoxing/DiffusionInst). During my training, my python process has always been killed (at 10000-20000 iteration epochs, which is insufficient for diffisioninst training).
I only rewrite the code for dataloader, in order to adapt to my own dataset.
My new code for dataloader:
class DiffusionInstDatasetMapper:
"""
A callable which takes a dataset dict in Detectron2 Dataset format,
and map it into a format used by DiffusionInst.
The callable currently does the following:
1. Read the image from "file_name"
2. Applies geometric transforms to the image and annotation
3. Find and applies suitable cropping to the image and annotation
4. Prepare image and annotation to Tensors
"""
def __init__(self, cfg, is_train=True):
if cfg.INPUT.CROP.ENABLED and is_train:
self.crop_gen = [
# T.ResizeShortestEdge([400, 500, 600], sample_style="choice"),
T.RandomCrop(cfg.INPUT.CROP.TYPE, cfg.INPUT.CROP.SIZE),
]
else:
self.crop_gen = None
self.tfm_gens = build_transform_gen(cfg, is_train)
logging.getLogger(__name__).info(
"Full TransformGens used in training: {}, crop: {}".format(str(self.tfm_gens), str(self.crop_gen))
)
self.img_format = cfg.INPUT.FORMAT
self.is_train = is_train
def __call__(self, dataset_dict):
"""
Args:
dataset_dict (dict): Metadata of one image, in Detectron2 Dataset format.
Returns:
dict: a format that builtin models in detectron2 accept
"""
dataset_dict = copy.deepcopy(dataset_dict) # it will be modified by code below
# image = utils.read_image(dataset_dict["file_name"], format=self.img_format)
## crop roi
'''lst = dataset_dict['file_name'].split('-')
image = sitk.ReadImage('-'.join(lst[:-2]))
image = sitk.GetArrayFromImage(image)
above, below = int(lst[-2]), int(lst[-1])
image = image[:, above:below, :]'''
## no crop roi
image = sitk.ReadImage(dataset_dict["file_name"],sitk.sitkFloat32)
image = sitk.GetArrayFromImage(image)
# print('**********************',image.shape,'************************')
image = (image - image.min()) / (image.max() - image.min()) * 255
#print(image.dtype)
image = image.transpose(1, 2, 0).astype(np.uint8)
image = np.repeat(image, 3, axis=2)
#print(image.dtype)
utils.check_image_size(dataset_dict, image)
#origshape = image.shape
if self.crop_gen is None:
image, transforms = T.apply_transform_gens(self.tfm_gens, image)
else:
image, transforms = T.apply_transform_gens(
self.tfm_gens + self.crop_gen, image
)
#print('orig', origshape, '\t\tresized', image.shape)
image_shape = image.shape[:2] # h, w
# Pytorch's dataloader is efficient on torch.Tensor due to shared-memory,
# but not efficient on large generic data structures due to the use of pickle & mp.Queue.
# Therefore it's important to use torch.Tensor.
dataset_dict["image"] = torch.as_tensor(np.ascontiguousarray(image.transpose(2, 0, 1)))
del image
gc.collect()
if not self.is_train:
# USER: Modify this if you want to keep them for some reason.
dataset_dict.pop("annotations", None)
return dataset_dict
if "annotations" in dataset_dict:
# USER: Modify this if you want to keep them for some reason.
# import pdb;pdb.set_trace()
for anno in dataset_dict["annotations"]:
# anno.pop("segmentation", None)
anno.pop("keypoints", None)
# USER: Implement additional transformations if you have other types of data
annos = [
utils.transform_instance_annotations(obj, transforms, image_shape)
for obj in dataset_dict.pop("annotations")
if obj.get("iscrowd", 0) == 0
]
instances = utils.annotations_to_instances(annos, image_shape, mask_format="bitmask")
dataset_dict["instances"] = utils.filter_empty_instances(instances)
del instances
gc.collect()
return dataset_dict
And the information about the oom-killer:
[2599547.303018] python invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=995
[2599547.303084] [<ffffffff8119bfae>] oom_kill_process+0x1fe/0x3c0
[2599547.303133] Task in /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e/8b4a8d5c2c1a082f93b1610173beb70bbc19fb1a1c2e28150d2d912ed9b95b10 killed as a result of limit of /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e
[2599547.305957] Memory cgroup out of memory: Kill process 1041771 (python) score 1198 or sacrifice child
[2599547.307810] Killed process 1041771 (python) total-vm:36436532kB, anon-rss:10288264kB, file-rss:104888kB
[2599718.702250] python invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=995
[2599718.702299] [<ffffffff8119bfae>] oom_kill_process+0x1fe/0x3c0
[2599718.702333] Task in /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e/8b4a8d5c2c1a082f93b1610173beb70bbc19fb1a1c2e28150d2d912ed9b95b10 killed as a result of limit of /kubepods/burstable/podd09a5032-8b07-11ed-bb60-ac1f6b9ec91e
I set IMS_PER_BATCH to 1, and used a dataset which contains only 1 image, but the oom problem still occurred.
I wonder know what should i do to prevent oom problem?

Transposition Cipher returning wrong results

This is the source code I got from https://inventwithpython.com/hacking
import math, pyperclip
def main():
myMessage = 'Cenoonommstmme oo snnio. s s c'
myKey = 8
plaintext = decryptMessage(myKey, myMessage)
print(plaintext + '|')
pyperclip.copy(plaintext)
def decryptMessage(key, message):
numOfColumns = math.ceil(len(message) / key)
numOfRows = key
numOfShadedBoxes = (numOfColumns * numOfRows) - len(message)
plaintext = [''] * int(numOfColumns)
col = 0
row = 0
for symbol in message:
plaintext[col] += symbol
col += 1
if (col == numOfColumns) or (col == numOfColumns - 1 and row >= numOfRows - numOfShadedBoxes):
col = 0
row += 1
return ''.join(plaintext)
if __name__ == '__main__':
main()
What this Should be returning is
Common sence is not so common.|
What im getting back is
Coosmosi seomteonos nnmm n. c|
I cant figure out where the code is failing to send back the phrase
The code is OK. The problem is that you're using the wrong version of Python. As the 'Installation' chapter of that website says:
Important Note! Be sure to install Python 3, and not Python 2. The
programs in this book use Python 3, and you’ll get errors if you try
to run them with Python 2. It is so important, I am adding a cartoon
penguin telling you to install Python 3 so that you do not miss this
message:
You are using Python 2 to run the program.
The result is incorrect because the program depends on a feature that behaves differently in Python 2 than in Python 3. Specifically, dividing two integers in Python 3 produces a floating-point result but in Python 2 it produces a rounded-down integer result. So this expression:
(len(message) / key)
produces 3.75 in Python 3 but produces 3 in Python 2, and therefore this expression:
math.ceil(len(message) / key)
produces 4 (3.75 rounded up is 4) in Python 3 but produces 3 (3 rounded up is 3) in Python 2. This means that your numOfColumns is incorrect and therefore the decryption procedure produces an incorrect result.
You can fix this specific issue by changing (len(message) / key) to (float(len(message)) / key) to force Python 2 to treat that calculation as a floating-point division that will give the desired 3.75 result. But the real solution is to switch to using Python 3, because these differences in behaviour between Python 3 and Python 2 are just going to keep causing trouble as you proceed through the rest of the book.

Error in running a Python code from R with the package rPithon

I would like to run this Python code from R:
>>> import nlmpy
>>> nlm = nlmpy.mpd(nRow=50, nCol=50, h=0.75)
>>> nlmpy.exportASCIIGrid("raster.asc", nlm)
Nlmpy is a Python package to build neutral landscape models. The example comes from the website
To run this Python code from R, I 'm trying to use the package rPithon. However, I obtain this error message:
if (pithon.available())
{
nRow <- 50
nCol <- 50
h <- 0.75
# this file contains the definition of function concat
pithon.load("C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py")
pithon.call( "mpd", nRow, nCol, h)
} else {
print("Unable to execute python")
}
Error in pithon.get("_r_call_return", instance.name = instname) :
Couldn't retrieve variable: Traceback (most recent call last):
File "C:/Users/Documents/R/win-library/3.3/rPithon/pythonwrapperscript.py", line 110, in <module>
reallyReallyLongAndUnnecessaryPrefix.data = json.dumps([eval(reallyReallyLongAndUnnecessaryPrefix.argData)])
File "C:\Users\ANACON~1\lib\json\__init__.py", line 244, in dumps
return _default_encoder.encode(obj)
File "C:\Users\ANACON~1\lib\json\encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Users\ANACON~1\lib\json\encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "C:\Users\ANACON~1\lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([[ 0.36534654, 0.31962481, 0.44229946, ..., 0.11513079,
0.07156331, 0.00286971], [ 0.41534291, 0.41333479, 0.48118995, ..., 0.19203674,
0.04192771, 0.03679473], [ 0.5188
Is this error caused by a syntax issue in my code ? I work with the Anaconda 4.2.0 platform for Windows which uses the Python 2.7 version.
I haven't used the nlmpy package hence, I am not sure what would be your expected output. However, this code successfully communicates between R and Python.
There are two files,
nlmpyInR.R
command ="python"
path2script="path_to_your_pythoncode/nlmpyInPython.py"
nRow <-50
nCol <-50
h <- 0.75
# Build up args in a vector
args = c(nRow, nCol, h)
# Add path to script as first arg
allArgs = c(path2script, args)
Routput = system2(command, args=allArgs, stdout=TRUE)
#The command would be python nlmpyInPython.py 50 50 0.75
print(paste("The Output is:\n", Routput))
nlmpyInPython.py
import sys
import nlmpy
#Getting the arguments from the command line call
nRow = sys.argv[1]
nCol = sys.argv[2]
h = sys.argv[3]
nlm = nlmpy.mpd(nRow, nCol, h)
pyhtonOutput = nlmpy.exportASCIIGrid("raster.asc", nlm)
#Whatever you print will get stored in the R's output variable.
print pyhtonOutput
The cause of the error that you're getting is hinted at by the
"is not JSON serializable" line. Your R code calls the mpd
function with certain arguments, and that function itself will
execute correctly. The rPithon library will then try to send the
return value of the function back to R, and to do this it will try
to create a JSON object
that describes the return value.
This works well for integers, floating point values, arrays, etc,
but not every kind of Python object can be converted to such a
JSON representation. And because rPithon can't convert the return value
of mpd this way, an error is generated.
You can still use rPithon to call the mpd function though. The following
code creates a new Python function that performs two steps: first
it calls the mpd function with the specified parameters, and then it
exports the result to a file, of which the filename is also an argument.
Using rPithon, the new function is then called from R. Because myFunction doesn't return anything, representing the return value in JSON format will not be a problem.
library("rPithon")
pythonCode = paste("import nlmpy.nlmpy as nlmpy",
"",
"def myFunction(nRow, nCol, h, fileName):",
" nlm = nlmpy.mpd(nRow, nCol, h)",
" nlmpy.exportASCIIGrid(fileName, nlm)",
sep = "\n")
pithon.exec(pythonCode)
nRow <- 50
nCol <- 50
h <- 0.75
pithon.call("myFunction", nRow, nCol, h, "outputraster.asc")
Here, the Python code defined as an R string, and executed using
pithon.exec. You could also put that Python code in a separate file
and use pithon.load to process the code so that the myFunction
function is known.

Tensorflow : how to insert custom input to existing graph?

I have downloaded a tensorflow GraphDef that implements a VGG16 ConvNet, which I use doing this :
Pl['images'] = tf.placeholder(tf.float32,
[None, 448, 448, 3],
name="images") #batch x width x height x channels
with open("tensorflow-vgg16/vgg16.tfmodel", mode='rb') as f:
fileContent = f.read()
graph_def = tf.GraphDef()
graph_def.ParseFromString(fileContent)
tf.import_graph_def(graph_def, input_map={"images": Pl['images']})
Besides, I have image features that are homogeneous to the output of the "import/pool5/".
How can I tell my graph that don't want to use his input "images", but the tensor "import/pool5/" as input ?
Thank's !
EDIT
OK I realize I haven't been very clear. Here is the situation:
I am trying to use this implementation of ROI pooling, using a pre-trained VGG16, which I have in the GraphDef format. So here is what I do:
First of all, I load the model:
tf.reset_default_graph()
with open("tensorflow-vgg16/vgg16.tfmodel",
mode='rb') as f:
fileContent = f.read()
graph_def = tf.GraphDef()
graph_def.ParseFromString(fileContent)
graph = tf.get_default_graph()
Then, I create my placeholders
images = tf.placeholder(tf.float32,
[None, 448, 448, 3],
name="images") #batch x width x height x channels
boxes = tf.placeholder(tf.float32,
[None,5], # 5 = [batch_id,x1,y1,x2,y2]
name = "boxes")
And I define the output of the first part of the graph to be conv5_3/Relu
tf.import_graph_def(graph_def,
input_map={'images':images})
out_tensor = graph.get_tensor_by_name("import/conv5_3/Relu:0")
So, out_tensor is of shape [None,14,14,512]
Then, I do the ROI pooling:
[out_pool,argmax] = module.roi_pool(out_tensor,
boxes,
7,7,1.0/1)
With out_pool.shape = N_Boxes_in_batch x 7 x 7 x 512, which is homogeneous to pool5. I would then like to feed out_pool as an input to the op that comes just after pool5, so it would look like
tf.import_graph_def(graph.as_graph_def(),
input_map={'import/pool5':out_pool})
But it doesn't work, I have this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-89-527398d7344b> in <module>()
5
6 tf.import_graph_def(graph.as_graph_def(),
----> 7 input_map={'import/pool5':out_pool})
8
9 final_out = graph.get_tensor_by_name("import/Relu_1:0")
/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict)
333 # NOTE(mrry): If the graph contains a cycle, the full shape information
334 # may not be available for this op's inputs.
--> 335 ops.set_shapes_for_outputs(op)
336
337 # Apply device functions for this op.
/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py in set_shapes_for_outputs(op)
1610 raise RuntimeError("No shape function registered for standard op: %s"
1611 % op.type)
-> 1612 shapes = shape_func(op)
1613 if len(op.outputs) != len(shapes):
1614 raise RuntimeError(
/home/hbenyounes/vqa/roi_pooling_op_grad.py in _roi_pool_shape(op)
13 channels = dims_data[3]
14 print(op.inputs[1].name, op.inputs[1].get_shape())
---> 15 dims_rois = op.inputs[1].get_shape().as_list()
16 num_rois = dims_rois[0]
17
/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py in as_list(self)
745 A list of integers or None for each dimension.
746 """
--> 747 return [dim.value for dim in self._dims]
748
749 def as_proto(self):
TypeError: 'NoneType' object is not iterable
Any clue ?
It is usually very convenient to use tf.train.export_meta_graph to store the whole MetaGraph. Then, upon restoring you can use tf.train.import_meta_graph, because it turns out that it passes all additional arguments to the underlying import_scoped_meta_graph which has the input_map argument and utilizes it when it gets to it's own invocation of import_graph_def.
It is not documented, and took me waaaay toooo much time to find it, but it works!
What I would do is something along those lines:
-First retrieve the names of the tensors representing the weights and biases of the 3 fully connected layers coming after pool5 in VGG16.
To do that I would inspect [n.name for n in graph.as_graph_def().node].
(They probably look something like import/locali/weight:0, import/locali/bias:0, etc.)
-Put them in a python list:
weights_names=["import/local1/weight:0" ,"import/local2/weight:0" ,"import/local3/weight:0"]
biases_names=["import/local1/bias:0" ,"import/local2/bias:0" ,"import/local3/bias:0"]
-Define a function that look something like:
def pool5_tofcX(input_tensor, layer_number=3):
flatten=tf.reshape(input_tensor,(-1,7*7*512))
tmp=flatten
for i in xrange(layer_number):
tmp=tf.matmul(tmp, graph.get_tensor_by_name(weights_name[i]))
tmp=tf.nn.bias_add(tmp, graph.get_tensor_by_name(biases_name[i]))
tmp=tf.nn.relu(tmp)
return tmp
Then define the tensor using the function:
wanted_output=pool5_tofcX(out_pool)
Then you are done !
Jonan Georgiev provided an excellent answer here. The same approach was also described with little fanfare at the end of this git issue: https://github.com/tensorflow/tensorflow/issues/3389
Below is a copy/paste runnable example of using this approach to switch out a placeholder for a tf.data.Dataset get_next tensor.
import tensorflow as tf
my_placeholder = tf.placeholder(dtype=tf.float32, shape=1, name='my_placeholder')
my_op = tf.square(my_placeholder, name='my_op')
# Save the graph to memory
graph_def = tf.get_default_graph().as_graph_def()
print('----- my_op before any remapping -----')
print([n for n in graph_def.node if n.name == 'my_op'])
tf.reset_default_graph()
ds = tf.data.Dataset.from_tensors(1.0)
next_tensor = tf.data.make_one_shot_iterator(ds).get_next(name='my_next_tensor')
# Restore the graph with a custom input mapping
tf.graph_util.import_graph_def(graph_def, input_map={'my_placeholder': next_tensor}, name='')
print('----- my_op after remapping -----')
print([n for n in tf.get_default_graph().as_graph_def().node if n.name == 'my_op'])
Output, where we can clearly see that the input to the square operation has changed.
----- my_op before any remapping -----
[name: "my_op"
op: "Square"
input: "my_placeholder"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
]
----- my_op after remapping -----
[name: "my_op"
op: "Square"
input: "my_next_tensor"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
]

Pyqt bind function to dinamically generated push buttons

Using the PyQT library I am currently having issues with binding a function to the dinamically generated pushbuttons resulting from a loop.
So far I have managed to generate the buttons and bind a function to them with the lambda command. The problem is that since I need every single button to open a different file I find myself in an odd situation as all the button do open the same one. The last value assigned to the variable.
Any idea on how to fix the situation? As a last note, sorry in case of stupid mistakes. I am new to OOP and PyQT.
def searchTheStuff(self):
found = 0
data = MainWindow.intermediateCall()
f1 = open('Path.txt', 'r')
path = f1.read()
f1.close()
Yinc = 25
Y = 40
X = 20
results = 0
for path, dirs, files in os.walk(path, topdown=True):
for name in files:
if name.endswith('.txt'):
fullpath = os.path.join(path, name)
mail = open_close(fullpath)
if mail.find(data) != -1 and results<3:
found = 1
self.buttons.append(QtGui.QPushButton(self))
print fullpath
command = lambda : webbrowser.open()
self.buttons[-1].clicked.connect(command)
self.buttons[-1].setText(_translate("self", name, None))
self.buttons[-1].setGeometry(QtCore.QRect(X, Y, 220, 20))
results = results+1
if results == 33:
X = 260
Y = 15
Y = Y + Yinc
if found == 0:
self.label = QtGui.QLabel(self)
self.label.setGeometry(QtCore.QRect(20, 40, 321, 21))
self.label.setText(_translate("self", "No templates have been found:", None))

Resources