How to deal the multipage pdf file with tess4j

How to deal the multipage pdf file with tess4j - tess4j

I am using tess4j to recognize the image file.
Pix pix = Leptonica1.pixRead(image.getPath());
TessAPI1.TessBaseAPIInit3(tessBaseAPI, tessDataPath, "eng");
TessAPI1.TessBaseAPISetImage2(tessBaseAPI, pix);
// TessAPI1.TessBaseAPIProcessPages(tessBaseAPI,image.getPath(),"",0,null);
PointerByReference pixa = null;
PointerByReference blockids = null;
Boxa boxa = TessAPI1.TessBaseAPIGetComponentImages(tessBaseAPI, ITessAPI.TessPageIteratorLevel.RIL_TEXTLINE, 1, pixa, blockids);
For multiple page tiff files only the Boxa information in the first page can be returned by TessBaseAPIGetComponentImages().
If I use TessAPI1.TessBaseAPIProcessPages(tessBaseAPI,image.getPath(),"",0,null);
only the last page information can be returned.
So how can I deal with the recognized information page by page for multiple pages?
Thanks.

Related

NetSuite Advanced PDF Dynamically Created in Script - Cannot Set <img> tag

I am dynamically creating an Advanced PDF in script. I've created an XML string that I am then passing to NetSuite's XML to PDF API; nlapiXMLToPDF(xmlString).
I've added saved searches, tables, styling, and the xml string is parsing correctly.
I cannot add a logo in an tag as I'm not sure how to drill into the file cabinet and store the 'src' of the image.
Has anyone had experience dynamically creating Advanced PDFs in NetSuite and pulling in the logo in a script?

Are you trying to include an image from the file cabinet? If you have the internal ID of the file in the variable fileID, then you can use the following code:
var imageURL = nlapiLoadFile(fileID).getURL();
imageURL = nlapiEscapeXML(imageURL);
var xmlString = ... + '<div><img height="XXpx" width="XXpx" src="'+logoURL+'" /></div>' + ...;
var myPDF = nlapiXMLToPDF(xmlString);
If you want to use the Form Logo set on the Company Information page, then you can populate fileID using the following code:
var companyInfo = nlapiLoadConfiguration('companyinformation');
var fileID = companyInfo.getFieldValue('formlogo');
Then use the first code block to include the logo in xmlString.

Target Object Tag with PDF stream in HTML page

I am using crystal reports in a .NET 2.0 asp.net website to create a PDF from the report. I then want to stream the report to the browser, which I already know how to do. What I don't know how to do is target the object tag the will hold the PDF. Does someone know how to do this within HTML with javascript or any other way?
Thanks in advance for any help that can be given.

I wanted to come back and answer this after finding out what I had to do. I had to create a separate aspx page and called it PDFView.aspx. I then added the code to the PageLoad event:
if (!IsPostBack)
{
ReportDocument rpt;
rpt = (ReportDocument)Session["CrystalReport"];
System.IO.Stream myStream;
CrystalDecisions.Shared.ExportOptions myExportOptions;
myExportOptions = myReport.ExportOptions;
myExportOptions.ExportFormatType = CrystalDecisions.Shared.ExportFormatType.PortableDocFormat;
myExportOptions.FormatOptions = new CrystalDecisions.Shared.PdfRtfWordFormatOptions();
CrystalDecisions.Shared.ExportRequestContext myExportRequestContext = new CrystalDecisions.Shared.ExportRequestContext();
myExportRequestContext.ExportInfo = myExportOptions;
//SetReportParameter("pPrinterFriendly", true, (ReportClass)myReport);
System.Web.HttpContext.Current.Response.ClearContent();
System.Web.HttpContext.Current.Response.ClearHeaders();
System.Web.HttpContext.Current.Response.ContentType = "application/pdf";
myStream = myReport.FormatEngine.ExportToStream(myExportRequestContext);
Byte[] myBuffer = new Byte[myStream.Length];
myStream.Read(myBuffer, 0, (int)myStream.Length);
System.Web.HttpContext.Current.Response.BinaryWrite(myBuffer);
System.Web.HttpContext.Current.Response.Flush();
}
I created the report object setting all parameters and datasource in the calling aspx page and the wrote the report to a session variable for retrieval when the PDFView.aspx page is loaded. I then used the code above to retrieve, execute and stream the report as a binary stream "the binary PDF" to the browsers response stream.
The PDFView.aspx page is referenced in the calling page with an object tag like this:
<object id="pdfObj" type="application/pdf" style="width:60%;height:95%;position:relative;top:2%;left:0%;right:10%;bottom:10%;margin:0px;padding:0px;border:0px;" data="PDFView.aspx"></object>

how to read word document before uploading it by webpage

I have excel file abc.xls and I renamed it as abc.doc using command prompt.
My requirement is: I want to upload a proper doc file, but there I can only check the MIME type of the file to upload file, this is not sufficient. I want to confirm before uploading the doc file, that it is a doc and not allow users to upload abc.doc file, because it is not a doc file its a excel file.

Because the OP wrote it in the comments:
You are on a wrong track here, Validation should always happen on the server side, you can add additional validation on the client side, but its not required. You have to do this for a simple reason:
Clients can always circumvent client-side Validation methods because the Client is fully under their control. So even if you implement your validation method to check if its a doc or excel document, a bad user can always just send you a post request with the validation disabled and you're getting a excel document or a virus etc.
This is a core webprogramming principle: Never trust input data, you can't validate on the client only!
Secondly your validation is done much mor easily on the server. So you should upload any file (check for file extensions & size) and then validate on the server!

You probably need an ActiveX Object to access the file content on the client system before uploading. Checking the byte array with javascript to find whether it's a real doc might prove interesting though :-)
EDIT :
function CheckWordDoc(filepath){
var fso, f, ts, s;
var ForReading = 1, ForWriting = 2, ForAppending = 8;
var TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0;
fso = new ActiveXObject("Scripting.FileSystemObject");
f = fso.getFile(filepath);
ts = f.OpenAsTextStream(ForReading, TristateUseDefault);
while (!ts.AtEndOfStream) {
s = ts.ReadLine();
if (s.indexOf("Word.Document.8") != -1) {
ts.Close( );
return true;
}
}
ts.Close( );
return false;
}
http://www.piclist.com/techref/language/asp/vbs/vbscript/jsmthopenastextstream.htm
http://msdn.microsoft.com/en-us/library/hwfw5c59%28v=vs.85%29.aspx

PDFizer: how to inset picture in generated pdf document?

i'm using PDFizer library for .NET from here - PDFizer
and i need help... how i can convert all html document(including pictures stored in it) to PDF with this library? Now i can only generate pdf without images...

After some testing, this is what you need to do:
Create a Folder in which you will have all of your Images.
If you already have an instance of Pdfizer.HtmlToPdfConverter change the ImagePath Attribute to point to the folder where your images reside.
Include the <img> tags in your html code.
Make sure the images are in the folder.
Note: I tried adding Png files and got a conversion error. Here is an example I took from the site you provided, plus my modifications:
System.Text.StringBuilder sbHtml = new System.Text.StringBuilder();
sbHtml.Append("<html>");
sbHtml.Append("<body>");
sbHtml.Append("<font size='14'>My Document Title Line</font>");
sbHtml.Append("<img src='trollface.jpg' />");
sbHtml.Append("<br />");
sbHtml.Append("This is my document text");
sbHtml.Append("</body>");
sbHtml.Append("</html>");
//create file stream to PDF file to write to
using (System.IO.Stream stream = new System.IO.FileStream
(sPathToWritePdfTo, System.IO.FileMode.OpenOrCreate))
{
// create new instance of Pdfizer
Pdfizer.HtmlToPdfConverter htmlToPdf = new Pdfizer.HtmlToPdfConverter();
// open stream to write Pdf to to
htmlToPdf.Open(stream);
htmlToPdf.ImagePath = Server.MapPath(ResolveUrl("~/Images"));
// write the HTML to the component
htmlToPdf.Run(sbHtml.ToString());
// close the write operation and complete the PDF file
htmlToPdf.Close();
}
}
Good luck!

An image from byte to optimized web page presentation

I get the data of the stored image on database as byte[] array;
then I convert it to System.Drawing.Image like the code shown below;
public System.Drawing.Image CreateImage(byte[] bytes)
{
System.IO.MemoryStream memoryStream = new System.IO.MemoryStream(bytes);
System.Drawing.Image image = System.Drawing.Image.FromStream(memoryStream);
return image;
}
(*) On the other hand I am planning to show a list of images on asp.net pages as the client scrolls downs the page. The more user gets down and down on the page he/she does see the more photos. So it means fast page loads and rich user experience. (you may see what I mean on www.mashable.com, just take care the new loads of the photos as you scroll down.)
Moreover, the returned imgae object from the method above, how can i show it in a loop dynamically using the (*) conditions above.
Regards
bk

Well, I think the main bottleneck is actually hitting the database each time you need an image. (Especially considering many users accessing the site.)
I would go with the following solution:
Database will store images with the original quality;
.ashx handler will cache images on the file system in various needed resolutions (like 32x32 pixels for icons, 48x48 for thumbnails, etc.) returning them on request and accessing database only once; (in this example is shown how to return an image via ashx handler)
The actual pages will point to .ashx page to get an image. (like <img scr="GetImage.ashx?ID=324453&Size=48" />)
UPDATE:
So the actual workflow in the handler will be like:
public void ProcessRequest (HttpContext context)
{
// Create path of cached file based on the context passed
int size = Int32.Parse(context.Request["Size"]);
// For ID Guids are possibly better
// but it can be anything, even parameter you need to pass
// to the web service in order to get those bytes
int id = Int32.Parse(context.Request["Id"]);
string imagePath = String.Format(#"images/cache/{0}/{1}.png", size, id);
// Check whether cache image exists and created less than an hour ago
// (create it if necessary)
if (!File.Exists(imagePath)
|| File.GetLastWriteTime(imagePath) < DateTime.Now.AddHours(-1))
{
// Get the file from the web service here
byte[] imageBytes = ...
// Save as a file
using (var memoryStream = new MemoryStream(imageBytes))
using (var outputStream = File.OpenWrite(imagePath))
Image.FromStream(memoryStream).Save(outputStream);
}
context.Response.ContentType = "image/png";
context.Response.WriteFile(imagePath);
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to deal the multipage pdf file with tess4j - tess4j

Related

NetSuite Advanced PDF Dynamically Created in Script - Cannot Set <img> tag

Target Object Tag with PDF stream in HTML page

how to read word document before uploading it by webpage

PDFizer: how to inset picture in generated pdf document?

An image from byte to optimized web page presentation

Categories

Resources