PDF OCR Google cloud vision keep spacing

PDF OCR Google cloud vision keep spacing - google-cloud-vision

I don't know if it's possible, but I'm receiving forms in PDF format. I need to take the text from the PDF in the exact same position it is in the file and place it in a text document I can parse like:
Line 4 startCharacter 50 endcharacter 60
This would give me whatever text is in that position.
Is this possible?

At the moment, it is not possible to do this. I found a Feature Request made to Cloud Vision API to take a PDF file and export it as a searchable PDF which might resolve this issue. I recommend you to subscribe to the Feature Request (click on the star next to the title) so it can get more visibility.
In the meantime, you can check the documentation on PDFs recognition to try it out and see if you can get the desired behavior.
If the forms you are working with have a determined format, you might be able to solve the issue by going through the TextAnnotation response from the API. The response gives you the text, plus additional insights on it, like the pages, paragraphs, etc.

Related

Google OCR incorrect text detection

I'm using Google AI to get gext from image. It is using DOCUMENT_TEXT_DETECTTION,
but the data is incorrect.
Do you have any suggestion?
[Update]
expected result should be:

I would recommend for you to either:
Submit a Public report on Issue Trackers since OCR issues are better supported there. (Recommended)
Update your question with your request process; JSON, Client Library or API as
in Detect text in images.
Note that Supported Images are subject to these File formats, Image sizing and File size.

Reading a PDF back from an iFrame?

I have a PDF document that is getting generated on the fly, and rendered on the fly to an iFrame within a radwindow. Basically the document is already largely prepopulated, however the user will still have a chunk of information that they are required to enter. I've found a good amount of information about sending a pdf TO an iframe, but not much information about going the other way. I have a button within the radwindow that can access the iframe object, however I'm somewhat lost as to where to go from there.
EDIT: The PDF is an editable form. I'm trying to pull back the entire PDF document as is, after the client side makes their entries to the form.

I think you'll need to send the file to the user so they can edit it locally and instruct them to upload it.
The content-disposition header with value attachment can help with the first task and you can use RadAsyncUpload to upload it: http://demos.telerik.com/aspnet-ajax/asyncupload/examples/overview/defaultcs.aspx.
I am not aware of ways to tap into the PDF viewer plugin the browsers use to show the PDF. Perhaps there is API from Adobe or some other third party plugin but that would rely on them and is out of your control.
Perhaps the JS PDF viewer from FireFox has something: https://mozillalabs.com/en-US/pdfjs/ but I don't know how stable and usable it is.

As per what was described in the comments, I ended up using postbacks through the PDF's themselves along with 1 pixel fields to store data required to identify the documents. It's a little hacky, but functional. I'm leaving this as an actual answer as this is as close to a real solution to the problem I originally had. This has been up and running for close to 4 years in this manner, and thus far hasn't caused any issues.

Generate PDF from Gridview in asp.net

I am using 4 gridviews on my page. I need to generate a pdf file which should be accessible on a button click and should contain these 4 grids.
Please help.

We have used WKHTMLTOPDF for our PDF's and it has worked very well. You can use it with a URL which it will then convert to a pdf which can then be sent to the users browser. So you could give it the url of your page with the gridview and then it would return a pdf. There are third party providers that can do the conversion for you, such as PDFCrowd, however this comes at a cost. If I recall correctly they provide a C# wrapper for you to use so it is pretty simple. There is also iText PDF, I have never used it though so cannot comment on it.
Hope this gives you a good start.

You could try with Amyuni WebkitPDF. It is a free tool for HTML to PDF or XAML/XPS convertion. You can pass the URL of the page to the library and obtain a PDF file as result that you can send to the browser.
Usual disclaimer applies.

How to preview a word document or PDF using asp.net?

I have a list box that contains a collection of documents that a user has uploaded. I would like to make some sort of preview window on the page that allows the user to see the document without having to download it and open up word or reader. Can this be done?

One word answer: yep
Many words answer: for the PDF viewer, you'll basically need to use a PDF library (ABCPdf, etc) that allows you to generate an image from a PDF on the fly that you send down to the client. For word, there could be similar functionality in VSTO.

Another solution is to use Scribd API to handle all the dirt work for you. That way your users can view their docs in the nice Scribd doc viewer.

Capture screenshot of website on the client (Javascript or flash)

Is there some method to issue a screen capture(browser window content only) from the browser with javascript or a embedded flash object etc so that a full quality image of the page content be saved or printed or an alternative approach.
I have a web app (asp.net 3.5) with google maps and other ajax operations client side like a custom tile server. I have been trying to implement a way for the user to print good quality captures of the webpage.
I have used the basic Window.Print() but in both IE and FF there many artifacts within the google maps and some items such as the popped up bubble doesnt print. I have experimented with save pdf thru cutepdf(just to post an example here) and the quality thru window.print() is low too.
For example, A screenshot with FireShot addon is perfect and what I want the client to have. however that is FF only and I cant ask the clients to install addons/activex controls on their browsers.
Have a look at this download example zip file(4mb) with:
Example screen shot using FireShot
(example of what I want to achieve
thru a html/JS button with in the
page)
Firefox Window.Print() result
(cutepdf used to save as pdf)
IE Window.Print() result (cutepdf
used to save as pdf)
note in 2,3 the little bubble is not printed even when open.
For now, I have added the function on my site to go fullscreen and guide the user to take a screenshot or call the window.print() function.
I am still looking for a method to print/capture my page.
are there any flash/activex controls that I can include in my page and thru them provide a quality print mechanism?
Thanks again for all the help but I still need more. :)
Thank you in advance.
http://rapidshare.com/files/311849636/Print_examples.zip.html

You'll go to all that work only to find that a simple app like Snagit will do the job. Building a SnagIt Screen Capture Plugin

The only way to reliably provide a high-quality print version of whats on-screen in a rich web application is to use the client-side, say JavaScript, to send the server precise information about the current state (where bubbles are, etc.) and use that to generate an image that mimics the positioning. Convert that image to a PDF or what-have-you, then send to the client for download.

Google also has a Static Maps API that might give you good results. I looked into it myself once, and only didn't go with it since (at the time) there were limitations on how many points they could support in a polyline.

I don't think this is possible. It would be quite a security risk to be able to capture the user's screen through scripting (imagine bad sites capturing screen information).

No there is not, though it would come in quite handy at times for bug reporting etcetera.
You will probably get the best result by creating a separate version of the page as a PDF have that being generated. It's no quick fix by all means, but you'll get superb print quality and total control over everything. The map part will probably be a bit tricky though as you need to get the map as a bitmap on the server somehow, and if it's not in flash on the client I don't know how you'd do that.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex