Why is the type of mshtml.htmldocument.body.all is just object? - microsoft.mshtml

say dw1 is a variable whose type is msthml.htmldocument
dw1.all type is mshtml.ihtmlelementcollection
However, dw1.body.all type is object.
Why is it so?
To be more blunt.
Why is the type of dw1.all differs from the type of dw1.body.all?

I get the following
when the DOM loads and resolves, you'll get body.All is an HtmlElementCollection of all elements underneath the current element, as defined at https://msdn.microsoft.com/en-us/library/system.windows.forms.htmlelement.all(v=vs.110).aspx
This table will help navigate this structure
https://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument(v=vs.110).aspx
All - Gets an instance of HtmlElementCollection, which stores all HtmlElement objects for the document.
Body - Gets the HtmlElement for the BODY tag.
Here's how you load the DOM
// Construct DOM
HTMLDocument doc = new HTMLDocument();
// Obtain the document interface
IHTMLDocument2 htmlDocument = (IHTMLDocument2)doc;
string htmlContent = "<!DOCTYPE html><html><body><h2>An unordered HTML list</h2><ul> <li>Coffee</li> <li>Tea</li> <li>Milk</li></ul></body></html>";
// Load the DOM
htmlDocument.write(htmlContent);
// Extract all body elements
IHTMLElementCollection allBody = htmlDocument.body.all;
// All page elements including body, head, style, etc
IHTMLElementCollection all = htmlDocument.all;
// Iterate all the elements and display tag names
foreach (IHTMLElement element in allBody)
{
Console.WriteLine(element.tagName);
}

Related

getElementsByClassName - Undefined Return

I am having issues with a getElementsByClassName function I am using in Google Tag Manager.
I will need to capture an input field value in my client's form and I am isolating the class name and using it in my custom JS however I am only getting Undefined back.
The JS I am using is the below and I've also created a gtm.formsubmit event but I reckon that the event is firing before it has time to listen to the user input, it that even possible?
function() {
var inputField = document.getElementsByClassName("wpcf7-form");
return inputField.value || "";
}
Thanks!
Even if there is just a single element with the class wpcf7-form a call to getElementsByClassName will return an array of elements (in that case a single element). Since an array has no "value" attribute you get an "undefined".
If you are resonably sure there is only one element with the class you can do
...
var inputField = document.getElementsByClassName("wpcf7-form");
return inputField[0].value || "";
...
since a single element will always be at index 0. In that case it would be easier to use a DOM type variable in Google Tag Manager and set the selection method to "CSS selector". This will return the first element with your class (or undefined if not present).

First number of result of template is delete

In a view of Brite.js the result of render a handlebars.js template is 1 <span>products</span> but when display the view the number is not show, it only show the span tag and the text.
Template is:
{{num}} <span>products</span>
The view only in create has function that return result of render template.
A brite.js view needs to return (or resolve) to an HTML element, an HTML string (starting with a tag), or a jQuery object pointing to an HTML Element.
So, something like this should work:
var counter = 0;
brite.registerView("MyView",{
create: function(){
counter++;
return "<div>" + + "</div>");
}
});
Then, every time you call brite.display("MyView","body"), it will add a new div to <body> in this case, with the incremented counter.
Obviously, the create method can use any templating engine like Handlebars, but the string needs to be an HTML tag.

Using ParseXHtml with individual stylesheet

In itextsharp, i use
var paragraph = new Paragraph();
var reader = new StringReader(text);
var handler = new HtmlHandler();
XMLWorkerHelper.GetInstance().ParseXHtml(handler, reader);
foreach (var element in handler.elements)
{
paragraph.Add(element);
}
to get the IElements of a given HTML text "text" since
iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList (reader, stylesheet)
is deprecated and I had some problem with unorderded lists (the list items were not indented and had not bullet).
Is there a possibility to include a css-file (like in the old version of the parser)?
Thanks in advance!
In the current version of the API, ParseXHtml has an overload for passing in a css file.

Removing all elements from HTML that have given class using Agility Pack

I'm trying to select all elements that have a given class and remove them from a HTML string.
This is what I have so far it doesn't seem to remove anything although the source shows clearly 4 elements with that class name.
// Filter page HTML to display required content
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
// filePath is a path to a file containing the html
htmlDoc.LoadHtml(pageHTML);
// ParseErrors is an ArrayList containing any errors from the Load statement);
if (!htmlDoc.ParseErrors.Any())
{
// Remove all elements marked with pdf-ignore class
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//body[#class='pdf-ignore']");
// Remove the collection from above
foreach (var node in nodes)
{
node.Remove();
}
}
EDIT: Just to clarify the document is parsing and the SelectNodes line is being hit, just not returning anything.
Here is a snippet of the html:
<input type=\"submit\" name=\"ctl00$MainContent$PrintBtn\" value=\"Print Shotlist\" onclick=\"window.print();\" id=\"MainContent_PrintBtn\" class=\"pdf-ignore\">
EDIT: in your updated answer you posted a part of the HTML string an <input> element declaration, but you're trying to match a <body> element with the class pdf-ignore (according to your expression //body[#class='pdf-ignore']).
If you want to match all the elements from the document with this class you should use:
var nodes = htmlDoc.DocumentNode.SelectNodes("//*[contains(#class,'pdf-ignore')]");
code to get your nodes. This will match all the elements with the class name specified.
Your code is seems to be correct except the one detail: the condition htmlDoc.ParseErrors == null. You select and remove nodes ONLY if the ParseErrors property (which is a type of IEnumerable<HtmlParseError>) is null, but actually if no errors found this property returns an empty list. So changing your code to:
if (!htmlDoc.ParseErrors.Any())
{
// some logic here
}
should solve the issue.
Your xpath is probably not matching: have you tried "//div[class='pdf-ignore']" (no "#")?

How to create an empty XHTML compliant P node using HTML Agility Pack 1.4.0?

However I try to create an HTMLNode for the P tag and inject it into the HTMLDocument DOM, it always appears as an unclosed tag. For example.
// different ways I've tried creating the node:
var p = HtmlNode.CreateNode("<p />");
var p = HtmlNode.CreateNode("<p></p>");
var p = HtmlNode.CreateNode("<p>");
var p = HtmlTextNode.CreateNode("<p></p>");
// some other properties I've played with:
p.Name = "p";
p.InnerHtml = "";
They all end up as just <p> in the output after using the .Save() method.
I want it properly closed for XHTML like <p /> or <p></p>. Either is fine.
My workaround: What I can do is issue CreateNode("<p> </p>") (with a space in between) and it retains the entire source, but I think there has to be a better way.
Other options tried or considered:
When I turn on the option .OutputAsXml it escapes the existing entities, for example turns to &nbsp; which is not ideal, and it doesn't close my injected P tag.
when I enable the option .OptionWriteEmptyNodes it still doesn't close my injected P tag.
I see the Agility Pack contains the enum HtmlElementFlag with values Closed, Empty, CData, CanOverlap (Closed might be useful) but cannot see where I would apply it when creating a new element/node.
I found the answer: the P tag has to be created off the HtmlDocument instance using the CreateElement(..) factory method like so:
var hdoc = new HtmlDocument(); // HTML doc instance
// ... stuff
HtmlNode p = hdoc.CreateElement("p"); // << will close itself for XHTML.
Then P will close itself like <p />.
If you instead create an HtmlNode instance using the HtmlNode.CreateNode(..) factory method like I was trying in the question, it behaves differently in the DOM as far as closure.
HtmlNode.ElementsFlags["p"] = HtmlElementFlag.Closed;
Default value for "p" is "Empty | Closed". You should to set it as "Closed" to return:
<p></p>

Resources