How do I script html that is not well formed to be well formed using classic asp and vbscript? - asp-classic

I am trying to parse some html to switch out values of various element attributes. I decided that the most reliable way to parse the html was to use an xml parser (msxml.)
The problem is that the html I'm trying to parse contains attribute like:
<param name="flashvars" value="autoplay=false&brand=embed&cid=97%2Ftest&locale=en_US"/>
Which causes the xml parser to blow up. I figured out that I need to server.htmlencode() the value attribute in order for the xml parser to load it properly. How do I approach this?
I feel like the problem is a vicious circle. I couldn't use regex's because html is not regular enough, and now I can't use xml parsers because the html isn't "well formed"
help. How do I approach this issue? I want to be able to change attribute values with a vbscript.

Is your HTML well formed? If so you could simply use an XML DomDocument. Use XPath to find the attributes you want to replace.
You can actually use JScript serverside as well in ASP, whicdh might give you access to HTMLDom libraries you could use.
You should probably have a look at one of the libraries for cleaning up HTML, something like HTML Tidy http://www.w3.org/People/Raggett/tidy/
Your main problem is you need to do a replace on the ampersands, they need to be & in well formed XML/XHTML.

Related

Inserting HTML into Word Using OpenXML

I have some HTML stored in a database that I want to insert into a Word document using DocumentFormat.OpenXml.
Inspired by the article here, I tried the following code.
mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml,
String.Format("<html><body>{0}</body></html>", html));
But this gives me the following error.
'(My HTML Here)' ID is not a valid XSD ID
I really don't understand this error. Does anyone know what I'm doing wrong?
Also, my biggest concern about this approach is that the HTML may not be perfectly formed and I suspect this code is not as forgiving as browsers are. Any recommendations for other possible approaches? I'm considering parsing the HTML and rendering it myself, but that will be a lot of work.
The second parameter is a part ID, not the part contents.
To set the part contents, you need to put well-formed XHTML into the RootElement property of the returned AlternativeFormatImportPart

XML and XSLT to generate CSS?

I want to provide user facility to change the CSS.
First think clicked is that storing CSS as XML will help me read CSS and understand.
Second is that using XSLT i will be able to generate the CSS (am i right ? will that be useful)
Lastly when user changed the CSS XML file can be updated and then it can be used.
Now this is at very rough level ..... i am using ASP.NET can some one please guide me if my understanding is correct or not and how should i approach for this pros/cons.
Will something like below will work ? is possible?
<link src="someserverfiletoprocessxmlusingxslt.aspx?user=id" type=text/css/>
That is possible; your ASPX page would need to return CSS with a MIME type of text/css.
However, it would be better to use an ASHX (Generic Handler) rather than an ASPX (Web Form).
Using an ASP.NET generic HTTP handlers (ashx) would be better. This is just a class that gives you access to the output stream (better for non-html output).
From there you can process the XML, transform it using XSLT and write/dump it on the output stream.
Might be a good idea to implement some kind of caching to enhance performance...
More info on generic handlers: http://www.brainbell.com/tutorials/ASP/Generic_Handlers_(ASHX_Files).html
Setting the method attribute of the xsl:output element to text will strip the resulting output of all XML tags and return it unencoded.

How to get a newline in JSF (plain text)?

I am using JSF to generate text and need newlines to make the text easier to read. I have an HTML version which works great, I hacked it together using <br/> (I'm not proud of that, but it works).
I would like to do the same for the plain text version such as inserting \n.
I am doing something like this:
<customTagLibrary:customTag>
<h:outputText value="Exception"/><br/><br/>
...
</customTagLibrary:customTag>
Instead of the <br/>, I want \n. What is the best way to do that?
Please keep in mind that I'm NOT using this to generate content that will be sent to the browser. This will be used to create email messages or (plain-text) attachments in emails.
Thanks,
Walter
If you use Facelets to render HTML, this did the trick for me:
<h:outputText value="
" />
Why not simply wrap it in a HTML <pre> tag?
The h: prefix means html. So if you don't want html, don't use h: tags. Create your own tags or at least renderers for h: tags and let them output \n.
But my personal opinion is that it's better to use another templating technology for emails.
I'm assuming that your template XML strips whitespace. Unfortunately, EL doesn't let you express newlines in string literals, but you could bind to a string that did (<h:outputText value="#{applicationScope.foo.newline}" />). However, since you want to serve multiple markups, this would be a less than ideal approach.
To share JSF templates between different content types, you could 1) remove all markup specific tags from the template and 2) provide RenderKits which would provide a Renderer appropriate for the current markup. This would be the way to serve content using JSF's model-view-presenter design.
You may have to make some decisions about how you handle markup-specific attributes. The default render kit is geared towards rendering the HTML concrete components. Exactly what you do depends on your goals.
I am going to simply write a newline tag. It will detect whether it should output a or a \n. In my tag library, it would look like this:
<content:newline/>
Walter

programmatically remove all html and inline formatting

I have taken over a code base and I have to read in these html files that were generated by Microsoft Word, I think so it has all kinds of whacky inline formatting.
is there anyway to parse out all of the bad inline formatting and just get the text from this stream. I basically want a purifier programmatically so I can then apply some sensible css
You should use HTML Tidy - it's uniquitous when it comes to cleansing HTML. There's an article on DevX that describes how to do it from .NET.
in the end i just wrote a small class that did a bunch of find and replaces. not pretty but it worked.

Are there any tools out there to compare the structure of 2 web pages?

I receive HTML pages from our creative team, and then use those to build aspx pages. One challenge I frequently face is getting the HTML I spit out to match theirs exactly. I almost always end up screwing up the nesting of <div>s between my page and the master pages.
Does anyone know of a tool that will help in this situation -- something that will compare 2 pages and output the structural differences? I can't use a standard diff tool, because IDs change from what I receive from creative, text replaces lorem ipsum, etc..
You can use HTMLTidy to convert the HTML to well-formed XML so you can use XML Diff, as Gulzar suggested.
tidy -asxml index.html
If out output XML compliant HTML. Or at least translate your HTML product into XML compliancy, you at least could then XSL your output to remove the content and id tags. Apply the same transformation to their html, and then compare.
I was thinking on lines of XML Diff since HTML can be represented as an XML Document.
The challenge with HTML is that it might not be always well formed. Found one more here showing how to use XMLDiff class.
A copy of my own answer from here.
What about DaisyDiff (Java and PHP vesions available).
Following features are really nice:
Works with badly formed HTML that can be found "in the wild".
The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
In addition to the default visual diff, HTML source can be diffed coherently.
Provides easy to understand descriptions of the changes.
The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.
winmerge is a good visual diff program

Resources