For example, is posible to do something like this (this fails):
def map = [ property: 1,
propertyPlusOne: map.property + 1]
Of course, it's posible to do so:
def map = [:]
map.property = 1
map.propertyPlusOne = map.property + 1
But all in the declaration?
You could use a with declaration:
def map = [ : ].with {
property = 1
propertyPlusOne = property + 1
it
}
assert map.propertyPlusOne == 2
Though something like ruby's tap (or #timyates' extension) is slightly cleaner:
def map = [ : ].tap {
property = 1
propertyPlusOne = property + 1
}
assert map.propertyPlusOne == 2
Generally not.
You have to define and initialize your map var first, to be able to set values:
def map = [ property: 1 ]
map += [ propertyPlusOne: map.property + 1]
I'm not sure what you are up to, but it might be worth checking the withDefault() method.
Related
class BinarySearchTree:
def __init__(self, value=None, left=None, right=None):
self.value = value
self.left = left
self.right = right
def insert(self, value) -> bool:
current_node = self
while current_node is not None:
if current_node.value < value:
current_node = current_node.left
elif current_node.value == value:
return False
elif current_node.value > value:
current_node = current_node.right
# id(current_node) <-- I need to create a object on this
current_node = BinarySearchTree(value) # <--
# id(current_node) <-- This is a new assigned id, but I need the same as the previous id
return True
binary_search_tree = BinarySearchTree(2)
binary_search_tree.insert(5)
print(binary_search_tree.__dict__)
current_node refers to current_node.left or current_node.right.
I need to create a new object and assign it to the pointer the current_node is referring,
but I only create a new object and assign it to a new pointer.
a new assigned id, but I need the same as the previous id
After the loop, as Scott Hunter noted, current_node is None, so the previous id is the “identity” of None, and you surely understand that a new object's id cannot be the same as the id of None. What you could do is modify the loop in a way which allows to refer to the attribute left or right afterwards as needed:
while current_node is not None:
object = current_node # remember the node to be changed
if current_node.value < value:
current_node = current_node.right
name = 'right' # remember the attribute to be changed
elif current_node.value == value:
return False
elif current_node.value > value:
current_node = current_node.left
name = 'left' # remember the attribute to be changed
setattr(object, name, BinarySearchTree(value)) # make the change
Alternatively you could make things a bit more simple by replacing in __init__
self.left = left
self.right = right
with
self.subtree = [left, right]
and the whole body of insert with
if value == self.value: return False
t = 0 if value < self.value else 1
if self.subtree[t]: return self.subtree[t].insert(value)
self.subtree[t] = BinarySearchTree(value)
return True
For debugging purposes before writing out tests, I am looking to get the number of key:value pairs within the one object in the array.
Right now, I have this:
"items": [
{
"id": "6b0051ad-721d-blah-blah-4dab9cf39ff4",
"external_id": "blahvekmce",
"filename": "foo-text_field-XYGLVU",
"created_date": "2019-02-11T04:10:31Z",
"last_update_date": "2019-02-11T04:10:31Z",
"file_upload_date": "2019-02-11T04:10:31Z",
"deleted_date": null,
"released_and_not_expired": true,
"asset_properties": null,
"file_properties": null,
"thumbnails": null,
"embeds": null
}
]
When I write out:
* print response.items.length // returns 1
When I write out:
* print response.items[0].length it doesn't return anything
Any thoughts on how I can approach this?
There are multiple ways, but this should work, plus you see how to get the keys as well:
* def keys = []
* eval karate.forEach(response.items[0], function(x){ keys.add(x) })
* def count = keys.length
* match count == 12
Refer the docs: https://github.com/intuit/karate#json-transforms
Karate now provides karate.sizeOf() API to get count of an object.
* def object = { a: 1, b: 'hello' }
* def count = karate.sizeOf(object)
* match count == 2
Ref: https://github.com/karatelabs/karate#the-karate-object
count = 0
for (var v in response.items[0]) {
count = count + 1;
}
print(count)
I am using Google Vision API, primarily to extract texts. I works fine, but for specific cases where I would need the API to scan the enter line, spits out the text before moving to the next line. However, it appears that the API is using some kind of logic that makes it scan top to bottom on the left side and moving to right side and doing a top to bottom scan. I would have liked if the API read left-to-right, move down and so on.
For example, consider the image:
The API returns the text like this:
“ Name DOB Gender: Lives In John Doe 01-Jan-1970 LA ”
Whereas, I would have expected something like this:
“ Name: John Doe DOB: 01-Jan-1970 Gender: M Lives In: LA ”
I suppose there is a way to define the block size or margin setting (?) to read the image/scan line by line?
Thanks for your help.
Alex
This might be a late answer but adding it for future reference.
You can add feature hints to your JSON request to get the desired results.
{
"requests": [
{
"image": {
"source": {
"imageUri": "https://i.stack.imgur.com/TRTXo.png"
}
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
]
}
For text which are very far apart the DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.
The following code does simple line segmentation based on the character polygon coordinates.
https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision
Here a simple code to read line by line. y-axis for lines and x-axis for each word in the line.
items = []
lines = {}
for text in response.text_annotations[1:]:
top_x_axis = text.bounding_poly.vertices[0].x
top_y_axis = text.bounding_poly.vertices[0].y
bottom_y_axis = text.bounding_poly.vertices[3].y
if top_y_axis not in lines:
lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]
for s_top_y_axis, s_item in lines.items():
if top_y_axis < s_item[0][1]:
lines[s_top_y_axis][1].append((top_x_axis, text.description))
break
for _, item in lines.items():
if item[1]:
words = sorted(item[1], key=lambda t: t[0])
items.append((item[0], ' '.join([word for _, word in words]), words))
print(items)
You can extract the text based on the bounds per line too, you can use boundyPoly and concatenate the text in the same line
"boundingPoly": {
"vertices": [
{
"x": 87,
"y": 148
},
{
"x": 411,
"y": 148
},
{
"x": 411,
"y": 206
},
{
"x": 87,
"y": 206
}
]
for example this 2 words are in the same "line"
"description": "you",
"boundingPoly": {
"vertices": [
{
"x": 362,
"y": 1406
},
{
"x": 433,
"y": 1406
},
{
"x": 433,
"y": 1448
},
{
"x": 362,
"y": 1448
}
]
}
},
{
"description": "start",
"boundingPoly": {
"vertices": [
{
"x": 446,
"y": 1406
},
{
"x": 540,
"y": 1406
},
{
"x": 540,
"y": 1448
},
{
"x": 446,
"y": 1448
}
]
}
}
I get max and min y and iterate over y to get all potential lines, here is the full code
import io
import sys
from os import listdir
from google.cloud import vision
def read_image(image_file):
client = vision.ImageAnnotatorClient()
with io.open(image_file, "rb") as image_file:
content = image_file.read()
image = vision.Image(content=content)
return client.document_text_detection(
image=image,
image_context={"language_hints": ["bg"]}
)
def extract_paragraphs(image_file):
response = read_image(image_file)
min_y = sys.maxsize
max_y = -1
for t in response.text_annotations:
poly_range = get_poly_y_range(t.bounding_poly)
t_min = min(poly_range)
t_max = max(poly_range)
if t_min < min_y:
min_y = t_min
if t_max > max_y:
max_y = t_max
max_size = max_y - min_y
text_boxes = []
for t in response.text_annotations:
poly_range = get_poly_y_range(t.bounding_poly)
t_x = get_poly_x(t.bounding_poly)
t_min = min(poly_range)
t_max = max(poly_range)
poly_size = t_max - t_min
text_boxes.append({
'min_y': t_min,
'max_y': t_max,
'x': t_x,
'size': poly_size,
'description': t.description
})
paragraphs = []
for i in range(min_y, max_y):
para_line = []
for text_box in text_boxes:
t_min = text_box['min_y']
t_max = text_box['max_y']
x = text_box['x']
size = text_box['size']
# size < max_size excludes the biggest rect
if size < max_size * 0.9 and t_min <= i <= t_max:
para_line.append(
{
'text': text_box['description'],
'x': x
}
)
# here I have to sort them by x so the don't get randomly shuffled
para_line = sorted(para_line, key=lambda x: x['x'])
line = " ".join(map(lambda x: x['text'], para_line))
paragraphs.append(line)
# if line not in paragraphs:
# paragraphs.append(line)
return "\n".join(paragraphs)
def get_poly_y_range(poly):
y_list = []
for v in poly.vertices:
if v.y not in y_list:
y_list.append(v.y)
return y_list
def get_poly_x(poly):
return poly.vertices[0].x
def extract_paragraphs_from_image(picName):
print(picName)
pic_path = rootPics + "/" + picName
text = extract_paragraphs(pic_path)
text_path = outputRoot + "/" + picName + ".txt"
write(text_path, text)
This code is WIP.
In the end, I get the same line multiple times and post-processing to determine the exact values. (paragraphs variable). Let me know if I have to clarify anything
Inspired by Borislav's answer, I just wrote something for python that also works for handwriting. It's messy and I am new to python, but I think you can get an idea of how to do this.
A class to hold some extended data for each word, for example, the average y position of a word, which I used to calculate the differences between words:
import re
from operator import attrgetter
import numpy as np
class ExtendedAnnotation:
def __init__(self, annotation):
self.vertex = annotation.bounding_poly.vertices
self.text = annotation.description
self.avg_y = (self.vertex[0].y + self.vertex[1].y + self.vertex[2].y + self.vertex[3].y) / 4
self.height = ((self.vertex[3].y - self.vertex[1].y) + (self.vertex[2].y - self.vertex[0].y)) / 2
self.start_x = (self.vertex[0].x + self.vertex[3].x) / 2
def __repr__(self):
return '{' + self.text + ', ' + str(self.avg_y) + ', ' + str(self.height) + ', ' + str(self.start_x) + '}'
Create objects with that data:
def get_extended_annotations(response):
extended_annotations = []
for annotation in response.text_annotations:
extended_annotations.append(ExtendedAnnotation(annotation))
# delete last item, as it is the whole text I guess.
del extended_annotations[0]
return extended_annotations
Calculate the threshold.
First, all words a sorted by their y position, defined as being the average of all 4 corners of a word. The x position is not relevant at this moment.
Then, the differences between every word and their following word are calculated. For a perfectly straight line of words, you would expect the differences of the y position between every two words to be 0. Even for handwriting, it should be around 1 ~ 10.
However, whenever there is a line break, the difference between the last word of the former row and the first word of the new row is much greater than that, for example, 50 or 60.
So to decide whether there should be a line break between two words, the standard deviation of the differences is used.
def get_threshold_for_y_difference(annotations):
annotations.sort(key=attrgetter('avg_y'))
differences = []
for i in range(0, len(annotations)):
if i == 0:
continue
differences.append(abs(annotations[i].avg_y - annotations[i - 1].avg_y))
return np.std(differences)
Having calculated the threshold, the list of all words gets grouped into rows accordingly.
def group_annotations(annotations, threshold):
annotations.sort(key=attrgetter('avg_y'))
line_index = 0
text = [[]]
for i in range(0, len(annotations)):
if i == 0:
text[line_index].append(annotations[i])
continue
y_difference = abs(annotations[i].avg_y - annotations[i - 1].avg_y)
if y_difference > threshold:
line_index = line_index + 1
text.append([])
text[line_index].append(annotations[i])
return text
Finally, each row is sorted by their x position to get them into the correct order from left to right.
Then a little regex is used to remove whitespace in front of interpunctuation.
def sort_and_combine_grouped_annotations(annotation_lists):
grouped_list = []
for annotation_group in annotation_lists:
annotation_group.sort(key=attrgetter('start_x'))
texts = (o.text for o in annotation_group)
texts = ' '.join(texts)
texts = re.sub(r'\s([-;:?.!](?:\s|$))', r'\1', texts)
grouped_list.append(texts)
return grouped_list
Based on Borislav Stoilov latest answer I wrote the code for c# for anybody that might need it in the future. Find the code bellow:
public static List<TextParagraph> ExtractParagraphs(IReadOnlyList<EntityAnnotation> textAnnotations)
{
var min_y = int.MaxValue;
var max_y = -1;
foreach (var item in textAnnotations)
{
var poly_range = Get_poly_y_range(item.BoundingPoly);
var t_min = poly_range.Min();
var t_max = poly_range.Max();
if (t_min < min_y) min_y = t_min;
if (t_max > max_y) max_y = t_max;
}
var max_size = max_y - min_y;
var text_boxes = new List<TextBox>();
foreach (var item in textAnnotations)
{
var poly_range = Get_poly_y_range(item.BoundingPoly);
var t_x = Get_poly_x(item.BoundingPoly);
var t_min = poly_range.Min();
var t_max = poly_range.Max();
var poly_size = t_max - t_min;
text_boxes.Add(new TextBox
{
Min_y = t_min,
Max_y = t_max,
X = t_x,
Size = poly_size,
Description = item.Description
});
}
var paragraphs = new List<TextParagraph>();
for (int i = min_y; i < max_y; i++)
{
var para_line = new List<TextLine>();
foreach (var text_box in text_boxes)
{
int t_min = text_box.Min_y;
int t_max = text_box.Max_y;
int x = text_box.X;
int size = text_box.Size;
//# size < max_size excludes the biggest rect
if (size < (max_size * 0.9) && t_min <= i && i <= t_max)
para_line.Add(
new TextLine
{
Text = text_box.Description,
X = x
}
);
}
// here I have to sort them by x so the don't get randomly enter code hereshuffled
para_line = para_line.OrderBy(x => x.X).ToList();
var line = string.Join(" ", para_line.Select(x => x.Text));
var paragraph = new TextParagraph
{
Order = i,
Text = line,
WordCount = para_line.Count,
TextBoxes = para_line
};
paragraphs.Add(paragraph);
}
return paragraphs;
//return string.Join("\n", paragraphs);
}
private static List<int> Get_poly_y_range(BoundingPoly poly)
{
var y_list = new List<int>();
foreach (var v in poly.Vertices)
{
if (!y_list.Contains(v.Y))
{
y_list.Add(v.Y);
}
}
return y_list;
}
private static int Get_poly_x(BoundingPoly poly)
{
return poly.Vertices[0].X;
}
Calling ExtractParagraphs() method will return a list of strings which contains doubles from the file. I also wrote some custom code to treat that problem. If you need any help processing the doubles let me know, and I could provide the rest of the code.
Example:
Text in picture: "I want to make this thing work 24/7!"
Code will return:
"I"
"I want"
"I want to "
"I want to make"
"I want to make this"
"I want to make this thing"
"I want to make this thing work"
"I want to make this thing work 24/7!"
"to make this thing work 24/7!"
"this thing work 24/7!"
"thing work 24/7!"
"work 24/7!"
"24/7!"
I also have an implementation of parsing PDFs to PNGs beacause Google Cloud Vision Api won't accept PDFs that are not stored in the Cloud Bucket. If needed I can provide it.
Happy coding!
What is the efficient way to convert an array of pixelValues [UInt8] into two dimensional array of pixelValues rows - [[UInt8]]
You can write something like this:
var pixels: [UInt8] = [0,1,2,3, 4,5,6,7, 8,9,10,11, 12,13,14,15]
let bytesPerRow = 4
assert(pixels.count % bytesPerRow == 0)
let pixels2d: [[UInt8]] = stride(from: 0, to: pixels.count, by: bytesPerRow).map {
Array(pixels[$0..<$0+bytesPerRow])
}
But with the value semantics of Swift Arrays, all attempt to create new nested Array requires copying the content, so may not be "efficient" enough for your purpose.
Re-consider if you really need such nested Array.
This should work
private func convert1Dto2DArray(oneDArray:[String], stringsPerRow:Int)->[[String]]?{
var target = oneDArray
var outOfIndexArray:[String] = [String]()
let reminder = oneDArray.count % stringsPerRow
if reminder > 0 && reminder <= stringsPerRow{
let suffix = oneDArray.suffix(reminder)
let list = oneDArray.prefix(oneDArray.count - reminder)
target = Array(list)
outOfIndexArray = Array(suffix)
}
var array2D: [[String]] = stride(from: 0, to: target.count, by: stringsPerRow).map {
Array(target[$0..<$0+stringsPerRow])}
if !outOfIndexArray.isEmpty{
array2D.append(outOfIndexArray)
}
return array2D
}
I have an existing map in Groovy.
I want to create a new map that has the same keys but different values in it.
Eg.:
def scores = ["vanilla":10, "chocolate":9, "papaya": 0]
//transformed into
def preference = ["vanilla":"love", "chocolate":"love", "papaya": "hate"]
Any way of doing it through some sort of closure like:
def preference = scores.collect {//something}
You can use collectEntries
scores.collectEntries { k, v ->
[ k, 'new value' ]
}
An alternative to using a map for the ranges would be to use a switch
def grade = { score ->
switch( score ) {
case 10..9: return 'love'
case 8..6: return 'like'
case 5..2: return 'meh'
case 1..0: return 'hate'
default : return 'ERR'
}
}
scores.collectEntries { k, v -> [ k, grade( v ) ] }
Nice, functional style solution(including your ranges, and easy to modify):
def scores = [vanilla:10, chocolate:9, papaya: 0]
// Store somewhere
def map = [(10..9):"love", (8..6):"like", (5..2):"meh", (1..0):"hate"]
def preference = scores.collectEntries { key, score -> [key, map.find { score in it.key }.value] }
// Output: [vanilla:love, chocolate:love, papaya:hate]
def scores = ["vanilla":10, "chocolate":9, "papaya": 0]
def preference = scores.collectEntries {key, value -> ["$key":(value > 5 ? "like" : "hate")]}
Then the result would be
[vanilla:like, chocolate:like, papaya:hate]
EDIT: If you want a map, then you should use collectEntries like tim_yates said.