I'm looking to use hosting where you get an AppPool with limited RAM either 128MB/256MB/1GB for all the sites+subdomains in the account.
And I'm interested to know how much ram my sites use, this is what I use now:
public IActionResult RamUsage()
{
var mem = GC.GetTotalMemory(false);
using (var me = Process.GetCurrentProcess())
{
var mem2 = (me.PrivateMemorySize64 / 1048576) + " MB";
var mem3 = (me.WorkingSet64 / 1048576) + " MB";
var res = new
{
Memory = (mem / 1048576) + " MB " + mem2 + " " + mem3,
Cpu = me.TotalProcessorTime.TotalSeconds
};
return Json(res);
}
}
and I'm getting 3 different values, the first one being the smallest, and the last one the biggest.
I am using Google Vision API, primarily to extract texts. I works fine, but for specific cases where I would need the API to scan the enter line, spits out the text before moving to the next line. However, it appears that the API is using some kind of logic that makes it scan top to bottom on the left side and moving to right side and doing a top to bottom scan. I would have liked if the API read left-to-right, move down and so on.
For example, consider the image:
The API returns the text like this:
“ Name DOB Gender: Lives In John Doe 01-Jan-1970 LA ”
Whereas, I would have expected something like this:
“ Name: John Doe DOB: 01-Jan-1970 Gender: M Lives In: LA ”
I suppose there is a way to define the block size or margin setting (?) to read the image/scan line by line?
Thanks for your help.
Alex
This might be a late answer but adding it for future reference.
You can add feature hints to your JSON request to get the desired results.
{
"requests": [
{
"image": {
"source": {
"imageUri": "https://i.stack.imgur.com/TRTXo.png"
}
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
]
}
For text which are very far apart the DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.
The following code does simple line segmentation based on the character polygon coordinates.
https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision
Here a simple code to read line by line. y-axis for lines and x-axis for each word in the line.
items = []
lines = {}
for text in response.text_annotations[1:]:
top_x_axis = text.bounding_poly.vertices[0].x
top_y_axis = text.bounding_poly.vertices[0].y
bottom_y_axis = text.bounding_poly.vertices[3].y
if top_y_axis not in lines:
lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]
for s_top_y_axis, s_item in lines.items():
if top_y_axis < s_item[0][1]:
lines[s_top_y_axis][1].append((top_x_axis, text.description))
break
for _, item in lines.items():
if item[1]:
words = sorted(item[1], key=lambda t: t[0])
items.append((item[0], ' '.join([word for _, word in words]), words))
print(items)
You can extract the text based on the bounds per line too, you can use boundyPoly and concatenate the text in the same line
"boundingPoly": {
"vertices": [
{
"x": 87,
"y": 148
},
{
"x": 411,
"y": 148
},
{
"x": 411,
"y": 206
},
{
"x": 87,
"y": 206
}
]
for example this 2 words are in the same "line"
"description": "you",
"boundingPoly": {
"vertices": [
{
"x": 362,
"y": 1406
},
{
"x": 433,
"y": 1406
},
{
"x": 433,
"y": 1448
},
{
"x": 362,
"y": 1448
}
]
}
},
{
"description": "start",
"boundingPoly": {
"vertices": [
{
"x": 446,
"y": 1406
},
{
"x": 540,
"y": 1406
},
{
"x": 540,
"y": 1448
},
{
"x": 446,
"y": 1448
}
]
}
}
I get max and min y and iterate over y to get all potential lines, here is the full code
import io
import sys
from os import listdir
from google.cloud import vision
def read_image(image_file):
client = vision.ImageAnnotatorClient()
with io.open(image_file, "rb") as image_file:
content = image_file.read()
image = vision.Image(content=content)
return client.document_text_detection(
image=image,
image_context={"language_hints": ["bg"]}
)
def extract_paragraphs(image_file):
response = read_image(image_file)
min_y = sys.maxsize
max_y = -1
for t in response.text_annotations:
poly_range = get_poly_y_range(t.bounding_poly)
t_min = min(poly_range)
t_max = max(poly_range)
if t_min < min_y:
min_y = t_min
if t_max > max_y:
max_y = t_max
max_size = max_y - min_y
text_boxes = []
for t in response.text_annotations:
poly_range = get_poly_y_range(t.bounding_poly)
t_x = get_poly_x(t.bounding_poly)
t_min = min(poly_range)
t_max = max(poly_range)
poly_size = t_max - t_min
text_boxes.append({
'min_y': t_min,
'max_y': t_max,
'x': t_x,
'size': poly_size,
'description': t.description
})
paragraphs = []
for i in range(min_y, max_y):
para_line = []
for text_box in text_boxes:
t_min = text_box['min_y']
t_max = text_box['max_y']
x = text_box['x']
size = text_box['size']
# size < max_size excludes the biggest rect
if size < max_size * 0.9 and t_min <= i <= t_max:
para_line.append(
{
'text': text_box['description'],
'x': x
}
)
# here I have to sort them by x so the don't get randomly shuffled
para_line = sorted(para_line, key=lambda x: x['x'])
line = " ".join(map(lambda x: x['text'], para_line))
paragraphs.append(line)
# if line not in paragraphs:
# paragraphs.append(line)
return "\n".join(paragraphs)
def get_poly_y_range(poly):
y_list = []
for v in poly.vertices:
if v.y not in y_list:
y_list.append(v.y)
return y_list
def get_poly_x(poly):
return poly.vertices[0].x
def extract_paragraphs_from_image(picName):
print(picName)
pic_path = rootPics + "/" + picName
text = extract_paragraphs(pic_path)
text_path = outputRoot + "/" + picName + ".txt"
write(text_path, text)
This code is WIP.
In the end, I get the same line multiple times and post-processing to determine the exact values. (paragraphs variable). Let me know if I have to clarify anything
Inspired by Borislav's answer, I just wrote something for python that also works for handwriting. It's messy and I am new to python, but I think you can get an idea of how to do this.
A class to hold some extended data for each word, for example, the average y position of a word, which I used to calculate the differences between words:
import re
from operator import attrgetter
import numpy as np
class ExtendedAnnotation:
def __init__(self, annotation):
self.vertex = annotation.bounding_poly.vertices
self.text = annotation.description
self.avg_y = (self.vertex[0].y + self.vertex[1].y + self.vertex[2].y + self.vertex[3].y) / 4
self.height = ((self.vertex[3].y - self.vertex[1].y) + (self.vertex[2].y - self.vertex[0].y)) / 2
self.start_x = (self.vertex[0].x + self.vertex[3].x) / 2
def __repr__(self):
return '{' + self.text + ', ' + str(self.avg_y) + ', ' + str(self.height) + ', ' + str(self.start_x) + '}'
Create objects with that data:
def get_extended_annotations(response):
extended_annotations = []
for annotation in response.text_annotations:
extended_annotations.append(ExtendedAnnotation(annotation))
# delete last item, as it is the whole text I guess.
del extended_annotations[0]
return extended_annotations
Calculate the threshold.
First, all words a sorted by their y position, defined as being the average of all 4 corners of a word. The x position is not relevant at this moment.
Then, the differences between every word and their following word are calculated. For a perfectly straight line of words, you would expect the differences of the y position between every two words to be 0. Even for handwriting, it should be around 1 ~ 10.
However, whenever there is a line break, the difference between the last word of the former row and the first word of the new row is much greater than that, for example, 50 or 60.
So to decide whether there should be a line break between two words, the standard deviation of the differences is used.
def get_threshold_for_y_difference(annotations):
annotations.sort(key=attrgetter('avg_y'))
differences = []
for i in range(0, len(annotations)):
if i == 0:
continue
differences.append(abs(annotations[i].avg_y - annotations[i - 1].avg_y))
return np.std(differences)
Having calculated the threshold, the list of all words gets grouped into rows accordingly.
def group_annotations(annotations, threshold):
annotations.sort(key=attrgetter('avg_y'))
line_index = 0
text = [[]]
for i in range(0, len(annotations)):
if i == 0:
text[line_index].append(annotations[i])
continue
y_difference = abs(annotations[i].avg_y - annotations[i - 1].avg_y)
if y_difference > threshold:
line_index = line_index + 1
text.append([])
text[line_index].append(annotations[i])
return text
Finally, each row is sorted by their x position to get them into the correct order from left to right.
Then a little regex is used to remove whitespace in front of interpunctuation.
def sort_and_combine_grouped_annotations(annotation_lists):
grouped_list = []
for annotation_group in annotation_lists:
annotation_group.sort(key=attrgetter('start_x'))
texts = (o.text for o in annotation_group)
texts = ' '.join(texts)
texts = re.sub(r'\s([-;:?.!](?:\s|$))', r'\1', texts)
grouped_list.append(texts)
return grouped_list
Based on Borislav Stoilov latest answer I wrote the code for c# for anybody that might need it in the future. Find the code bellow:
public static List<TextParagraph> ExtractParagraphs(IReadOnlyList<EntityAnnotation> textAnnotations)
{
var min_y = int.MaxValue;
var max_y = -1;
foreach (var item in textAnnotations)
{
var poly_range = Get_poly_y_range(item.BoundingPoly);
var t_min = poly_range.Min();
var t_max = poly_range.Max();
if (t_min < min_y) min_y = t_min;
if (t_max > max_y) max_y = t_max;
}
var max_size = max_y - min_y;
var text_boxes = new List<TextBox>();
foreach (var item in textAnnotations)
{
var poly_range = Get_poly_y_range(item.BoundingPoly);
var t_x = Get_poly_x(item.BoundingPoly);
var t_min = poly_range.Min();
var t_max = poly_range.Max();
var poly_size = t_max - t_min;
text_boxes.Add(new TextBox
{
Min_y = t_min,
Max_y = t_max,
X = t_x,
Size = poly_size,
Description = item.Description
});
}
var paragraphs = new List<TextParagraph>();
for (int i = min_y; i < max_y; i++)
{
var para_line = new List<TextLine>();
foreach (var text_box in text_boxes)
{
int t_min = text_box.Min_y;
int t_max = text_box.Max_y;
int x = text_box.X;
int size = text_box.Size;
//# size < max_size excludes the biggest rect
if (size < (max_size * 0.9) && t_min <= i && i <= t_max)
para_line.Add(
new TextLine
{
Text = text_box.Description,
X = x
}
);
}
// here I have to sort them by x so the don't get randomly enter code hereshuffled
para_line = para_line.OrderBy(x => x.X).ToList();
var line = string.Join(" ", para_line.Select(x => x.Text));
var paragraph = new TextParagraph
{
Order = i,
Text = line,
WordCount = para_line.Count,
TextBoxes = para_line
};
paragraphs.Add(paragraph);
}
return paragraphs;
//return string.Join("\n", paragraphs);
}
private static List<int> Get_poly_y_range(BoundingPoly poly)
{
var y_list = new List<int>();
foreach (var v in poly.Vertices)
{
if (!y_list.Contains(v.Y))
{
y_list.Add(v.Y);
}
}
return y_list;
}
private static int Get_poly_x(BoundingPoly poly)
{
return poly.Vertices[0].X;
}
Calling ExtractParagraphs() method will return a list of strings which contains doubles from the file. I also wrote some custom code to treat that problem. If you need any help processing the doubles let me know, and I could provide the rest of the code.
Example:
Text in picture: "I want to make this thing work 24/7!"
Code will return:
"I"
"I want"
"I want to "
"I want to make"
"I want to make this"
"I want to make this thing"
"I want to make this thing work"
"I want to make this thing work 24/7!"
"to make this thing work 24/7!"
"this thing work 24/7!"
"thing work 24/7!"
"work 24/7!"
"24/7!"
I also have an implementation of parsing PDFs to PNGs beacause Google Cloud Vision Api won't accept PDFs that are not stored in the Cloud Bucket. If needed I can provide it.
Happy coding!
Trying to pull a list of ratings from a collection of Reviews and then average them to come up with an aggregated average rating for a Plate. When I look at the data output from the ratings variable I get nothing but "undefined undefined undefined".
averageRating: function() {
var reviews = Reviews.findOne({plateId: this._id});
var ratings = _.pluck(reviews, 'rating');
var sum = ratings.reduce(function(pv, cv){return pv + cv;}, 0);
var avg = sum / ratings.length;
//Testing output
var test = "";
var x;
for (x in reviews) {
text += reviews[x] + ',';
}
return test;
}
Sorry if this is a super newbie question, but I've been at this for hours and cannot figure it out.
I figured out the issue. As listed above var reviews gets set to a cursor which apparently .pluck does not work on. By first converting the cursor to an array of objects I was then able to use .pluck. So updated code looks like this:
averageRating: function() {
var reviewsCursor = Reviews.find({plateId: this._id});
//Converts cursor to an array of objects
var reviews = reviewsCursor.fetch();
var ratings = _.pluck(reviews, 'rating');
var sum = ratings.reduce(function(pv, cv){return pv + cv;}, 0);
var avg = (sum / ratings.length).toPrecision(2);
return avg;
}
Based on previous questions here I managed to create the dataset, print all recipes listed and now I am trying to pick one of the recipes from that list and show its Title, Instructions and Ingredients. The instructions are mapped to the Recipes via the pkID column and the ingredients are mapped to the Recipes through a recipeID column. When I open the database on Sqlite Database Browser I can access this information inside the Tables dropdown list, so I suppose the proper name for them are tables within the database.
I am not being able to "filter" by pkID and by recipeID, so that after picking one recipe, only the appropriate content is shown.
This is the code in Python of what I am trying to do in Genie:
def PrintSingleRecipe(self,which):
sql = 'SELECT * FROM Recipes WHERE pkID = %s' % str(which)
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
for x in cursor.execute(sql):
recipeid =x[0]
print "Title: " + x[1]
print "Serves: " + x[2]
print "Source: " + x[3]
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
sql = 'SELECT * FROM Ingredients WHERE RecipeID = %s' % recipeid
print 'Ingredient List:'
for x in cursor.execute(sql):
print x[1]
print ''
print 'Instructions:'
sql = 'SELECT * FROM Instructions WHERE RecipeID = %s' % recipeid
for x in cursor.execute(sql):
print x[1]
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
resp = raw_input('Press A Key -> ')
I have not been able to improve much of my code, it seems that using the approach I used before of iterating in a step statement cannot be used here. This is how far I got in Genie:
def PrintSingleRecipe(db:Database)
stmt:Statement = PreparedStatements.select_all( db )
res:int = UserInterface.raw_input("Select a recipe -> ").to_int()
cols:int = stmt.column_count ()
var row = new dict of string, string
item:int = 1
print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
while res == ROW
for i:int = 0 to (cols - 1)
row[ stmt.column_name( i ) ] = stmt.column_text( i )
stdout.printf( "%-5s", item.to_string( "%03i" ))
stdout.printf( "%-30s", row[ "Title" ])
stdout.printf( "%-20s", row[ "Serves" ])
stdout.printf( "%-30s\n", row[ "Source" ])
print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
print "Ingredient list"
print " "
stdout.printf("%-5s", item.to_string( "%03i" ))
I have found a solution to the problem, maybe it can be optimized. For now it is enough.
Answers from another question helped immensely. The solution I used was to use the exec function and point the callback to the PrintSingleRecipe().
Some adjustments had to be done for it to work as a callback, but I got what I needed.
Here is the code where the function gets called:
while true
response:string = UserInterface.get_input_from_menu()
if response == "1" // Show All Recipes
PrintAllRecipes(db)
else if response is "2" // Search for a recipe
pass
else if response is "3" //Show a Recipe
res:string = UserInterface.raw_input("Select a recipe -> ")
sql:string = "SELECT * FROM Recipes WHERE pkID = " + res
db.exec(sql, PrintSingleRecipe, null)
else if response is "4"//Delete a recipe
pass
else if response is "5" //Add a recipe
pass
else if response is "6" //Print a recipe
pass
else if response is "0" //Exit
print "Goodbye"
break
else
print "Unrecognized command. Try again."
Here is how the PrintSingleRecipe looks like:
def PrintSingleRecipe(n_columns:int, values:array of string, column_names:array of string):int
print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
for i:int = 0 to n_columns
stdout.printf ("%s = %s\n", column_names[i], values[i])
print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
print "Ingredient list"
print " "
return 0
I am new to F# and I have frankensteined the code below from various examples I found online in an attempt to get a better understanding of how I can use it. Currently the code below reads in a list of machines from a file and pings each of the machines. I had to divide the initial array from the file up into a smaller arrays of 25 machines to control the number of concurrent actions otherwise it takes far to long to map out the list of machines. I would like be able to use a threadpool to manage the threads but I have not found a way to make it work. Any guidance would be great. I am not able to make this work:
let creatework = FileLines|> Seq.map (fun elem -> ThreadPool.QueueUserWorkItem(new WaitCallback(dowork), elem))
Here is the complete code:
open System.Threading
open System
open System.IO
let filePath = "c:\qa\machines.txt"
let FileLines = File.ReadAllLines(filePath)
let count = FileLines.Length/25
type ProcessResult = { exitCode : int; stdout : string; stderr : string }
let executeProcess (exe,cmdline) =
let psi = new System.Diagnostics.ProcessStartInfo(exe,cmdline)
psi.UseShellExecute <- false
psi.RedirectStandardOutput <- true
psi.RedirectStandardError <- true
psi.CreateNoWindow <- true
let p = System.Diagnostics.Process.Start(psi, EnableRaisingEvents = true)
let output = new System.Text.StringBuilder()
let error = new System.Text.StringBuilder()
p.OutputDataReceived.Add(fun args -> output.AppendLine(args.Data)|> ignore)
p.ErrorDataReceived.Add(fun args -> error.AppendLine(args.Data) |> ignore)
p.BeginErrorReadLine()
p.BeginOutputReadLine()
p.WaitForExit()
{ exitCode = p.ExitCode; stdout = output.ToString(); stderr = error.ToString() }
let dowork machinename=
async{
let exeout = executeProcess(#"c:\windows\system32\ping.exe", "-n 1 " + machinename)
let exelines =
if exeout.stdout.Contains("Reply from") then Console.WriteLine(machinename + " " + "REPLY")
elif exeout.stdout.Contains("Request timed out.") then Console.WriteLine(machinename + " " + "RTO")
elif exeout.stdout.Contains("Ping request could not find host") then Console.WriteLine(machinename + " " + "Unknown Host")
else Console.WriteLine(machinename + " " + "ERROR")
exelines
}
printfn "%A" (System.DateTime.Now.ToString())
for i in 0..count do
let x = i*25
let y = if i = count then FileLines.Length-1 else (i+1)*25
printfn "%s %d" "X equals: " x
printfn "%s %d" "Y equals: " y
let filesection = FileLines.[x..y]
let creatework = filesection |> Seq.map dowork |> Async.Parallel |> Async.RunSynchronously|>ignore
creatework
printfn "%A" (System.DateTime.Now.ToString())
printfn "finished"
UPDATE:
The code below works and provides a framework for what I want to do. The link that was referenced by Tomas Petricek did have the bits of code that made this work. I just had to figure which example was the right one. It is within 3 seconds of duplicate framework written in Java so I think I am headed in the right direction. I hope the example below will be useful to anyone else trying to thread out various executables in F#:
open System
open System.IO
open System.Diagnostics
let filePath = "c:\qa\machines.txt"
let FileLines = File.ReadAllLines(filePath)
type Process with
static member AsyncStart psi =
let proc = new Process(StartInfo = psi, EnableRaisingEvents = true)
let asyncExit = Async.AwaitEvent proc.Exited
async {
proc.Start() |> ignore
let! args = asyncExit
return proc
}
let shellExecute(program : string, args : string) =
let startInfo =
new ProcessStartInfo(FileName = program, Arguments = args,
UseShellExecute = false,
CreateNoWindow = true,
RedirectStandardError = true,
RedirectStandardOutput = true)
Process.AsyncStart(startInfo)
let dowork (machinename : string)=
async{
let nonbtstat = "NONE"
use! pingout = shellExecute(#"c:\windows\system32\ping.exe", "-n 1 " + machinename)
let pingRdToEnd = pingout.StandardOutput.ReadToEnd()
let pingresults =
if pingRdToEnd.ToString().Contains("Reply from") then (machinename + " " + "REPLY")
elif pingRdToEnd.ToString().Contains("Request timed out.") then (machinename + " " + "RTO")
elif pingRdToEnd.ToString().Contains("Ping request could not find host") then (machinename + " " + "Unknown Host")
else (machinename + " " + "PING_ERROR")
if pingresults.ToString().Contains("REPLY") then
use! nbtstatout = shellExecute(#"c:\windows\system32\nbtstat.exe", "-a " + machinename)
let nbtstatRdToEnd = nbtstatout.StandardOutput.ReadToEnd().Split('\n')
let nbtstatline = Array.tryFind(fun elem -> elem.ToString().Contains("<00> UNIQUE Registered")) nbtstatRdToEnd
return Console.WriteLine(pingresults + nbtstatline.Value.ToString())
else return Console.WriteLine(pingresults + " " + nonbtstat)
}
printfn "%A" (System.DateTime.Now.ToString())
let creatework = FileLines |> Seq.map dowork |> Async.Parallel |> Async.RunSynchronously|>ignore
creatework
printfn "%A" (System.DateTime.Now.ToString())
printfn "finished"
The main problem with your code is that executeProcess is a synchronous function that takes a long time to run (it runs the ping.exe process and waits for its result). The general rule is that tasks in a thread pool should not block for a long time (because then they block thread pool threads, which means that the thread pool cannot efficiently schedule other work).
I think you can solve this quite easily by making executeProcess asynchronous. Instead of calling WaitForExit (which blocks), you can wait for the Exitted event using Async.AwaitEvent:
let executeProcess (exe,cmdline) = async {
let psi = new System.Diagnostics.ProcessStartInfo(exe,cmdline)
psi.UseShellExecute <- false
// [Lots of stuff omitted]
p.BeginOutputReadLine()
let! _ = Async.AwaitEvent p.Exited
return { exitCode = p.ExitCode
stdout = output.ToString(); stderr = error.ToString() } }
This should unblock threads in the thread pool and so you'll be able to use Async.Parallel on all the URLs from the input array without any manual scheduling.
EDIT As #desco pointed out in a comment, the above is not quite right if the process exits before the AwaitEvent line is reached (before it may miss the event). To fix that, you need to use Event.guard function, which was discussed in this SO question:
Need help regarding Async and fsi