How do I parse specific data from a website within Codename One? - web-scraping

I have run into a road block developing my Codename One app. One of my classes in my project parses 3 specific html "td" elements from a website and saves the text to a string where I then input that text data into a Codename One multibutton. I originally used jSoup for this operation but soon realized that Codename One doesn't support 3rd party jar files so I used this method as shown below.
public void showOilPrice() {
if (current != null) {
current.show();
return;
}
WebBrowser b = new WebBrowser() {
#Override
public void onLoad(String url) {
BrowserComponent c = (BrowserComponent) this.getInternal();
JavascriptContext ctx = new JavascriptContext(c);
String wtiLast = (String) ctx.get("document.getElementById('pair_8849').childNodes[4].innerText");
String wtiPrev = (String) ctx.get("document.getElementById('pair_8849').childNodes[5].innerText");
String wtiChange = (String) ctx.get("document.getElementById('pair_8849').childNodes[8].innerText");
Form op = new Form("Oil Prices", new BoxLayout(BoxLayout.Y_AXIS));
MultiButton wti = new MultiButton("West Texas Intermediate");
Image icon = null;
Image emblem = null;
wti.setEmblem(emblem);
wti.setTextLine2("Current Price: " + wtiLast);
wti.setTextLine3("Previous: " + wtiPrev);
wti.setTextLine4("Change: " + wtiChange);
op.add(wti);
op.show();
}
};
b.setURL("https://sslcomrates.forexprostools.com/index.php?force_lang=1&pairs_ids=8833;8849;954867;8988;8861;8862;&header-text-color=%23FFFFFF&curr-name-color=%230059b0&inner-text-color=%23000000&green-text-color=%232A8215&green-background=%23B7F4C2&red-text-color=%23DC0001&red-background=%23FFE2E2&inner-border-color=%23CBCBCB&border-color=%23cbcbcb&bg1=%23F6F6F6&bg2=%23ffffff&open=show&last_update=show");
}
This method works in the simulator (and gives a "depreciated API" warning), but does not run when I submit my build online after signing. I have imported the parse4cn1 and cn1JSON libraries and have gone through a series of obstacles but I still receive a build error when I submit. I want to start fresh and use an alternative method if one exists. Is there a way that I can rewrite this segment of code without having to use these libraries? Maybe by using the XMLParser class?

The deprecation is for the WebBrowser class. You can use BrowserComponent directly so WebBrowser is redundant in this case.
I used XMLParser for this use case in the past. It should work with HTML as it was originally designed to show HTML.
It might also be possible to port JSoup to Codename One although I'm not sure about the scope of effort involved.
It's very possible that onLoad isn't invoked for a site you don't actually see rendered so the question is what specifically failed on the device?

Related

AWS Textract - GetDocumentAnalysisRequest only returns correct results for first page of document

I have written code to extract tables and name value pairs from pdf using Amazon Textract. I followed this example:
https://docs.aws.amazon.com/textract/latest/dg/async-analyzing-with-sqs.html
which was in sdk for java version 1.1.
I have refactored it for version 2.
This is an async process that only applies to multi page documents. When i get back the results it is pretty accurate for first page. But the consecutive pages are mostly empty rows. The documents i parse are scanned so the quality is not great. However if i take a jpg of individual pages and use the one page operation, i.e. AnalyzeDocumentRequest, each page comes out good. Also Amazon Textract tryit service renders the pages correctly.
So the error must be in my code but can't see where.
As you see it all happens in here :
GetDocumentAnalysisRequest documentAnalysisRequest = GetDocumentAnalysisRequest.builder().jobId(jobId)
.maxResults(maxResults).nextToken(paginationToken).build();
response = textractClient.getDocumentAnalysis(documentAnalysisRequest);
and i can't really do any intervention.
The most likely place I could make a mistake would be in the util file that gathers the page and table blocks i.e. here:
PageModel pageModel = tableUtil.getTableResults(blocks);
But that works perfectly for the first page, and i could also see in the response object above, that the number of blocks returned are much less.
Here is the full code:
private DocumentModel getDocumentAnalysisResults(String jobId) throws Exception {
int maxResults = 1000;
String paginationToken = null;
GetDocumentAnalysisResponse response = null;
Boolean finished = false;
int pageCount = 0;
DocumentModel documentModel = new DocumentModel();
// loops until pagination token is null
while (finished == false) {
GetDocumentAnalysisRequest documentAnalysisRequest = GetDocumentAnalysisRequest.builder().jobId(jobId)
.maxResults(maxResults).nextToken(paginationToken).build();
response = textractClient.getDocumentAnalysis(documentAnalysisRequest);
// Show blocks, confidence and detection times
List<Block> blocks = response.blocks();
PageModel pageModel = tableUtil.getTableResults(blocks);
pageModel.setPageNumber(pageCount++);
Map<String,String> keyValues = formUtil.getFormResults(blocks);
pageModel.setKeyValues(keyValues);
documentModel.getPages().add(pageModel);
paginationToken = response.nextToken();
if (paginationToken == null)
finished = true;
}
return documentModel;
}
Has anyone else encountered this issue?
Many thanks
if the response has NextToken, then you need to recall textract and pass in the NextToken to get the next batch of Blocks.
I am not sure how to do this in Java but here is the python example from AWS repo
https://github.com/aws-samples/amazon-textract-serverless-large-scale-document-processing/blob/master/src/jobresultsproc.py
For my solution, I did a simple if response['NextToken'] then recall method and concat the response['Blocks'] to my current list.

Zebra Print issue using asp.net with c#

I am using the PrintDocument for printing directly to the network printer using asp.net with and C#. The application hosted in IIS with Windows authentication. I am not getting the error and also the PrintStatus is Printing. But we can not see the printed document in the printer and also there is no errors in the printer.
System.Drawing.Printing.PrintDocument printdoc = new System.Drawing.Printing.PrintDocument();
printdoc.DefaultPageSettings.PaperSize = new PaperSize("Custom", 4, 3);
printdoc.OriginAtMargins = true;
// Set the printer name
PrinterSettings printer = new PrinterSettings();
printer.PrinterName = SqlDatabaseUtility.GetZebraPrinterName();
string fullName = CheckPrinterConfiguration(printer.PrinterName);
if (!String.IsNullOrEmpty(fullName))
{
printdoc.PrinterSettings.PrinterName = fullName;
// Handle printing
if (printdoc.PrinterSettings.IsValid)
{
printdoc.PrintPage += new System.Drawing.Printing.PrintPageEventHandler(printdoc_PrintPage);
printdoc.PrinterSettings.Copies = 2;
printdoc.Print();
}
}
Just a theory, but the PrintDocument class is a descendant of Component and so implements IDisposable.
In the same way that you don't leave SqlConnection instances undisposed, you should call Dispose() on your printdoc instance so that any unmanaged resources held by the PrintDocument instance - such as a handle to the printer device, perhaps - get released.
Put a using clause around your printing block as below. It might help with your problem, but even if it does not it is proper practice.
using (System.Drawing.Printing.PrintDocument printdoc = new System.Drawing.Printing.PrintDocument())
{
...
}
The next line of enquiry would be to follow up on the "user permissions" suggestion in the comment. Assuming that your code works if you run it in a test console application, a quick test for this would be to change the user account of your web app's Application Pool to be your own account. If your web app starts printing, then you know that permissions are the problem.

Inline image rendered twice by OSX mail app

My .NET 4.5 web application uses class SmtpClient to create and send e-mail messages to various recipients.
Each e-mail message consists of:
an HTML message body
an embedded inline image (JPeg or PNG or GIF)
an attachment (PDF)
Sample code is below. It works fine, but there is one gripe from OSX users. Apple's standard mail app renders the image twice; once inlined in the message body, and again following the message body, next to the preview of the PDF attachment.
I tinkered with the following properties; none of which would help.
SmtpClient's DeliveryFormat
MailMessage's IsBodyHtml and BodyTransferEncoding
Attachment's MimeType, Inline, DispositionType, ContentId, FileName, Size, CreationDate, ModificationDate
If I compose a similar e-mail message in MS Outlook and send it off to the Apple user, the image is rendered once, inlined in the message body; exactly as I would like it to be. So apparently it is possible.
After reading this, I inspected the raw MIME data, and noticed Outlook uses multipart/related to group together the message body and the images.
My question:
How do I mimic Outlook's behavior with the classes found in System.Net.Mail?
Things I would rather not do:
Employ external images instead of embedded ones (many e-mail clients initially block these to protect recipient's privacy).
Use third party libraries (to avoid legal hassle). The SmtpDirect class I found here seems to solve the problem (though I got a server exception in return), but it is hard for me to accept a complete rewrite of MS's SmtpClient implementation is necessary for such a subtle change.
Send the e-mail message to a pickup folder, manipulate the resulting .eml file, push the file to our Exchange server.
Minimal code to reproduce the problem:
using System.IO;
using System.Net.Mail;
using System.Net.Mime;
namespace SendMail
{
class Program
{
const string body = "Body text <img src=\"cid:ampersand.gif\" /> image.";
static Attachment CreateGif()
{
var att = new Attachment(new MemoryStream(Resource1.ampersand), "ampersand.gif")
{
ContentId = "ampersand.gif",
ContentType = new ContentType(MediaTypeNames.Image.Gif)
};
att.ContentDisposition.Inline = true;
return att;
}
static Attachment CreatePdf()
{
var att = new Attachment(new MemoryStream(Resource1.Hello), "Hello.pdf")
{
ContentId = "Hello.pdf",
ContentType = new ContentType(MediaTypeNames.Application.Pdf)
};
att.ContentDisposition.Inline = false;
return att;
}
static MailMessage CreateMessage()
{
var msg = new MailMessage(Resource1.from, Resource1.to, "The subject", body)
{
IsBodyHtml = true
};
msg.Attachments.Add(CreateGif());
msg.Attachments.Add(CreatePdf());
return msg;
}
static void Main(string[] args)
{
new SmtpClient(Resource1.host).Send(CreateMessage());
}
}
}
To actually build and run it, you will need an additional resource file Resource1.resx with the two attachments (ampersand and Hello) and three strings host (the SMTP server), from and to (both of which are e-mail addresses).
(I found this solution myself before I got to posting the question, but decided to publish anyway; it may help out others. I am still open for alternative solutions!)
I managed to get the desired effect by using class AlternateView.
static MailMessage CreateMessage()
{
var client = new SmtpClient(Resource1.host);
var msg = new MailMessage(Resource1.from, Resource1.to, "The subject", "Alternative message body in plain text.");
var view = AlternateView.CreateAlternateViewFromString(body, System.Text.Encoding.UTF8, MediaTypeNames.Text.Html);
var res = new LinkedResource(new MemoryStream(Resource1.ampersand), new ContentType(MediaTypeNames.Image.Gif))
{
ContentId = "ampersand.gif"
};
view.LinkedResources.Add(res);
msg.AlternateViews.Add(view);
msg.Attachments.Add(CreatePdf());
return msg;
}
As a side effect, the message now also contains a plain text version of the body (for paranoid web clients that reject HTML). Though it is a bit of a burden ("Alternative message body in plain text" needs improvement), it does give you more control as to how the message is rendered under different security settings.

CreateEnvelopeFromTemplates - Guid should contain 32 digits with 4 dashes

I am attempting to create a DocuSign envelope from a template document using the CreateEnvelopeFromTemplates method, available within their v3 SOAP API web service. This is being instantiated from a asp.NET v4.0 web site.
Upon calling the method armed with the required parameter objects being passed in. I am recieving an exception from the web service, basically telling me that the Template ID is not a valid GUID.
669393: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
Line 14889:
Line 14890: public DocuSignDSAPI.EnvelopeStatus CreateEnvelopeFromTemplates(DocuSignDSAPI.TemplateReference[] TemplateReferences, DocuSignDSAPI.Recipient[] Recipients, DocuSignDSAPI.EnvelopeInformation EnvelopeInformation, bool ActivateEnvelope) {
Line 14891: return base.Channel.CreateEnvelopeFromTemplates(TemplateReferences, Recipients, EnvelopeInformation, ActivateEnvelope);
Line 14892: }
Line 14893:
The template reference, a guid. Must be specified as the "Template" string property against TemplateReference object. This is then added to a dynamic array of TemplateReferences, which is one of the input parameters of the CreateEnvelopeFromTemplates method.
Actual template GUID: f37b4d64-54e3-4723-a6f1-a4120f0e9695
I am building up my template reference object using the following function that i wrote to try and make the functionality reusable:
Private Function GetTemplateReference(ByVal TemplateID As String) As TemplateReference
Dim templateReference As New TemplateReference
Dim guidTemplateID As Guid
With TemplateReference
.TemplateLocation = TemplateLocationCode.Server
If Guid.TryParse(TemplateID, guidTemplateID) Then
.Template = guidTemplateID.ToString
End If
End With
Return TemplateReference
End Function
The TemplateID is being passed in from a appSetting configuration value at the time of the TemplateReferences array instantiation like so...
templateReferences = New TemplateReference() {GetTemplateReference(ConfigurationManager.AppSettings("DocuSignTemplate_Reference"))}
recipients = New Recipient() {AddRecipient("myself#work.email", "My Name")}
envelopeInformation = CreateEnvelopeInformation()
envelopeStatus = client.CreateEnvelopeFromTemplates(templateReferences, recipients, envelopeInformation, True)
As you can see from my GetTemplateReference function I am also parsing the GUID before setting it back as a string so i know its valid. The template is managed and stored at the DocuSign end, hence specifying the document location.
I am referring to their own documentation:
CreateEnvelopeFromTemplates
Why oh why is the method not liking my Template ID? I can successfully use their REST API to call the same method, using their own code samples. Worst case I can make use of this but would rather interact with the web service as I would need to construct all the relevent requests in either XML or JSON.
I would really appreciate if someone could perhaps shed some light on this problem.
Thanks for taking the time to read my question!
Andrew might be spot on with the AccountId mention - are you setting the AccountId in the envelope information object? Also, have you seen the DocuSign SOAP SDK up on Github? That has 5 sample SOAP projects including one MS.NET project. The .NET project is in C# not Visual Basic, but still I think it will be helpful to you. Check out the SOAP SDK here:
https://github.com/docusign/DocuSign-eSignature-SDK
For instance, here is the test function for the CreateEnvelopeFromTemplates() function:
public void CreateEnvelopeFromTemplatesTest()
{
// Construct all the recipient information
DocuSignWeb.Recipient[] recipients = HeartbeatTests.CreateOneSigner();
DocuSignWeb.TemplateReferenceRoleAssignment[] finalRoleAssignments = new DocuSignWeb.TemplateReferenceRoleAssignment[1];
finalRoleAssignments[0] = new DocuSignWeb.TemplateReferenceRoleAssignment();
finalRoleAssignments[0].RoleName = recipients[0].RoleName;
finalRoleAssignments[0].RecipientID = recipients[0].ID;
// Use a server-side template -- you could make more than one of these
DocuSignWeb.TemplateReference templateReference = new DocuSignWeb.TemplateReference();
templateReference.TemplateLocation = DocuSignWeb.TemplateLocationCode.Server;
// TODO: replace with template ID from your account
templateReference.Template = "server template ID";
templateReference.RoleAssignments = finalRoleAssignments;
// Construct the envelope information
DocuSignWeb.EnvelopeInformation envelopeInfo = new DocuSignWeb.EnvelopeInformation();
envelopeInfo.AccountId = _accountId;
envelopeInfo.Subject = "create envelope from templates test";
envelopeInfo.EmailBlurb = "testing docusign creation services";
// Create draft with all the template information
DocuSignWeb.EnvelopeStatus status = _apiClient.CreateEnvelopeFromTemplates(new DocuSignWeb.TemplateReference[] { templateReference },
recipients, envelopeInfo, false);
// Confirm that the envelope has been assigned an ID
Assert.IsNotNullOrEmpty(status.EnvelopeID);
Console.WriteLine("Status for envelope {0} is {1}", status.EnvelopeID, status.Status);
}
This code calls other sample functions in the SDK which I have not included, but hopefully this helps shed some light on what you're doing wrong...
This problem arises when you don't set up the field AccountId. This field can be retrieved from your account. In Docusign's console go to Preferences / API and look here
Where to find AccountID Guid in Docusign's Console
Use API Account ID (which is in GUID format) and you should be OK.

Javascript permission denied error when using Atalasoft DotImage

Have a real puzzler here. I'm using Atalasoft DotImage to allow the user to add some annotations to an image. When I add two annotations of the same type that contain text that have the same name, I get a javascript permission denied error in the Atalasoft's compressed js. The error is accessing the style member of a rule:
In the debugger (Visual Studio 2010 .Net 4.0) I can access
h._rule
but not
h._rule.style
What in javascript would cause permission denied when accessing a membere of an object?
Just wondering if anyone else has encountered this. I see several people using Atalasoft on SO and I even saw a response from someone with Atalasoft. And yes, I'm talking to them, but it never hurts to throw it out to the crowd. This only happens in IE8, not FireFox.
Thanks, Brian
Updates: Yes, using latest version: 9.0.2.43666
By same name (see comment below) I mean, I created default annotations and they are named so they can be added with javascript later.
// create a default annotation
TextData text = new TextData();
text.Name = "DefaultTextAnnotation";
text.Text = "Default Text Annotation:\n double-click to edit";
//text.Font = new AnnotationFont("Arial", 12f);
text.Font = new AnnotationFont(_strAnnotationFontName, _fltAnnotationFontSize);
text.Font.Bold = true;
text.FontBrush = new AnnotationBrush(Color.Black);
text.Fill = new AnnotationBrush(Color.Ivory);
text.Outline = new AnnotationPen(new AnnotationBrush(Color.White), 2);
WebAnnotationViewer1.Annotations.DefaultAnnotations.Add(text);
In javascript:
CreateAnnotation('TextData', 'DefaultTextAnnotation');
function CreateAnnotation(type, name) {
SetAnnotationModified(true);
WebAnnotationViewer1.DeselectAll();
var ann = WebAnnotationViewer1.CreateAnnotation(type, name);
WebThumbnailViewer1.Update();
}
There was a bug in an earlier version that allowed annotations to be saved with the same unique id's. This generally doesn't cause problems for any annotations except for TextAnnotations, since they use the unique id to create a CSS class for the text editor. CSS doesn't like having two or more classes defined by the same name, this is what causes the "Permission denied" error.
You can remove the unique id's from the annotations without it causing problems. I have provided a few code snippets below that demonstrate how this can be done. Calling ResetUniques() after you load the annotation data (on the server side) should make everything run smoothly.
-Dave C. from Atalasoft
protected void ResetUniques()
{
foreach (LayerAnnotation layerAnn in WebAnnotationViewer1.Annotations.Layers)
{
ResetLayer(layerAnn.Data as LayerData);
}
}
protected void ResetLayer(LayerData layer)
{
ResetUniqueID(layer);
foreach (AnnotationData data in layer.Items)
{
LayerData group = data as LayerData;
if (group != null)
{
ResetLayer(data as LayerData);
}
else
{
ResetUniqueID(data);
}
}
}
protected void ResetUniqueID(AnnotationData data)
{
data.SetExtraProperty("_atalaUniqueIndex", null);
}

Resources