Commenting div closing automation - xhtml

I'm looking for a solution to a rats nest of code I was handed - it's massive in volume, so I'm looking for suggestions to a programmatic approach to commenting what div closes where.
Example:
BEFORE
<div id="wrapper-item">
<div id="outer-item">
<div class="inner-item">
<h1>Just Some Placekeeper Copy</h1>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing.</p>
</div>
</div>
</div>
AFTER
<div id="wrapper-item">
<div id="outer-item">
<div class="inner-item">
<h1>Just Some Placekeeper Copy</h1>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing.</p>
</div><!-- .inner-item -->
</div><!-- #outer-item -->
</div><!-- #wrapper-item -->
I tried a few Regex attempts with no joy, I'd be curious of what the best approach is..

If it is valid xhtml, then you could just put it in an XML Document and then proceed to process the div tags by finding all of them, then adding a sibling that has the comment node that you want.

A solitary regular expression might work, but it might be better to write a program that is enhanced by regular expressions. Get the text between < div id=" and >, then add that in a list or stack, and link it, in order, with an int index. Continue scanning it and upon finding a '' tag, pop the newest text on the stack, and format it to become a comment.

I just use a script to properly indent the HTML so I can check at a glance if all tags are closed properly -- the indenting correctly comes back to touch the left margin. To check what opening matches what closing I use a folding text editor like SciTe or Komodo Edit so I can browse the properly indented code by opening and collapsing sections.
If anybody is interested in the indenting script (written in tcl) I can upload it somewhere. Alternatively you can try using something like HTMLTidy to do the formatting.

You can use a XML/DOM API for a particular language and play with how it handles Comment Objects.
For example, Python -- http://docs.python.org/library/xml.dom.html#comment-objects

<div>whatever</div>
<!-- this comment is already deprecated -->
Just use a decent editor that highlights the start/end tag. Even NotePad2 does this. Most editors can also select tag and contents, or just contents. Many editors will also reformat the html for you (if you dare).
Commenting the end of the tag is going to get out of sync and be even harder to follow for you or your coworkers.
If you do end up successfully adding redundant comments to every div, span, p, ul, etc tag, I think you'll find the code is even more bloated and unreadable.

Related

Message 'pagebreakavoidchecked="true";' showed on new page

Till now, we are generating correctly PDFs with mPDF, combining PHP + CSS #page, that includes "page-break-before" and "page-break-after" . We write each block using "writeHTML" class. However, after some changes, we noticed that mPDF generates and display the message pagebreakavoidchecked="true"; and the top of the page after page break after summary, just when we began to put headers and footers.
We have no idea why is that message appearing just and only at that page.
Please, any idea? Do you need any other info?
Thank you
Update: I discovered mPDF have some problem with "page-break-inside: avoid;". I'm using it this way:
HTML:
<article class="bloque_anuncio">
<header class="cabecera_anuncio">
<p class="nivel1">Level 1</p>
<p class="nivel2">Level 2</p>
<p class="nivel3">Level 3</p>
<p class="nivel4">Level 4</p>
</header>
<div class="contenido_anuncio">
a lot of text (at least a complete page, but usually several pages)
</div>
</article>
There are several articles, but I want to maintain header aside with the content, so I use in my stylesheet:
.cabecera_anuncio {page-break-inside: avoid; }
And it works as it should, however, mPDF inserts the mentioned message at the beginning of the first page (and only there):
If I remove the style, the message dissapear, but I need to avoid page breaks in that point!!
I noticed that the problem is only relative to the <p> tag. Try to substitute with a <div> tag

How to avoid broken thematic sections (eg. div) in HTML?

I am trying to transfer a text from a printed book into HTML5, but meanwhile I am trying to keep its thematic and page/paragraph/lines layout structure exactly as it is. For example, every page of the printed book is divided as a <div> section eg. <div class=page id=55> so that it emulates/represents exactly the page unit of the printed book, and also facilitate referencing. I don't care much how the text will be rendered on the browser, this is something that I can think about later. I just want the HTML and the browser to "know" the original pagination and layout of the printed book.
The problem is that in the printed book, some paragraphs or even boxes, tables etc span over to the next page. If I translate it to HTML, I do it like this:
<div class=page id=1>
<p>Once upon a time...</p>
...
<p>...and so the bold knight
</div>
<div class=page id=2>
slew the evil dragon.</p>
<p>Text...</p>
...
This is illegal in HTML, as we have a <p> tag being interrupted by a </div> tag, and then a new div element beginning with a plain text, which is closed by a </p> tag.
HTML would expect me to close the first part of the broken paragraph with a </p>, and continue with a new <p> tag after the div, but I am not doing this because it doesn't correspond to the pagnation of the original book, and would result in half-paragraphs being understood are 2 proper paragraphs.
So, how to use legal HTML while maintaining the theoretical page/paragraph/broken paragraph/page break structure and information, or at least making the brower "know" the original pagination? Is there a more appropriate tag or method to emulate the page break while keeping the page number id?
Perhaps something like
<p>...and so the brave knight<some tag(s) that show page 2 begins here>killed the dragon</p>
How about instead of encapsulating each page within a div you include a tag at the start of each page designating the page number. An aside tag seems appropriate for this.
<aside class="page-number" data-page="1">Page 1</aside>
<p>Once upon a time...</p>
<p>...and so the bold knight</p>
<aside class="page-number" data-page="2">Page 2</aside>
<p class="continued">slew the evil dragon.</p>
<p>Text...</p>
If you need to continue a paragraph then you'll have to break into multiple elements, but perhaps you can specify when a paragraph is a continuation of a previous one. For instance using the continued class as shown above.
If you really don't want to break the p tag then you could put a span within it that is only used for semantic reasons. Something like this;
<p>...and so the bold knight
<span class="page-marker" aria-hidden="true" data-page="1"></span>
slew the evil dragon.</p>
But this kind of makes less semantic sense than the previous solution.
Try adding display: inline; to either the CSS style of the class page or the style attribute of each page div.

Text-align:justify only works on indented (complicated) HTML code, not on HTML code without whitespaces

I'm maintaining an online newspaper editor, and I've stumbled on a weird issue, where text doesn't want to be justified with text-align:justify. After a few hours of debugging, I noticed it might have something to do with the output HTML indenting (which sounds realy weird to me).
Obviously the raw HTML output of my editor page isn't indented, but a text field has a basic structure like this:
<div>
<p>
<span>
<span>
<span>
Hello
</span>
</span>
</span>
<span>
<span>
<span>
World.
</span>
</span>
</span>
</p>
</div>
Every word is wrapped in 3 spans(rendered by JS/jQuery, for styles, fonts & uniformity between browsers), and I put the text-align:justify; in the <p> element.
Here's some sample code:
https://jsfiddle.net/tdje0a9L/
As you can see, the text isn't justified.
But now, when i indent the exact same HTML code, it becomes justified: https://jsfiddle.net/3v7vk24d/
I can't realy do much about the multiple span wrapping, that's just how the editor works.
Now is my question:
is there any way to render the output HTML indented?
(to get my text justified)
or
is there an other way to get my text justified?
It is because you have (non-breaking space) entities in you source code - that means that you code really don't have spaces between words.
So for text-align: justify it seams to be one word.
Your example will print: Donec quam felis as one word
You can look in this question for some more information how you can remove your unwanted entities: How to remove from the end of spans with a given class?

How to show different words in a sentence display different colours, what's the 'correct' way to mark-up (for HTML-email markup)

If I put in a new tag eg:
<p>lorem ipsem blah blah blah <p class="special-colour">special phrase</p><p> lorem ipsem blah blah blah</p>
I get linefeeds which is not what I want.
Also do I need to explicitly return to the style that I was using or will it be assumed unless overridden by the class="foo-bar" attribute?
I realise that question probably goes to specificity which I'm yet to get on top of, since I don't really know what hierarchies naturally exist in HTML/CSS documents, I'm just wading into it all ATM.
Use a <span>:
<p>first part <span class="special-colour">special phrase</span><p> next part</p>
it's an "inline" element so will not cause a line feed.
And yes, the next part text will automatically "revert" to the style applied to the p element.
Your css for the span would be something like:
span.special-colour {
color: #ff7766;
}
Also, I can tell you are a UK-English speaker - be very careful with your use of color vs. colour !
Instead of
<p class="special-colour">`special phrase</p>
use
<span class="special-colour">special phrase</span>
The span tag is rendered in-line with no line breaks.
Since you tagged this question as html-email. The only way to get it working would be to create a table with say 3 columns, Put the respective content in the three columns and then style the each td of the table giving the inline-css. There is no way you can achieve this in any other way.
Exernal CSS never works for html-emails.
You can use <span> (possibly with a class, or you can define a general rule for all spans contained in a paragraph). You could also use <strong> or <em> if you want to give the word or phrase more weight or added importance, for example it is the main subject of the page. There is also <i> and <b> as well that can be used.
http://html5doctor.com/i-b-em-strong-element/ has more information about when to use each

HTML formatting in Visual Studio 2010

Whenever I reformat html source code in Visual Studio with Ctrl-K, Ctrl-D it formats my source code like this:
<p>
text</p>
<p>
more text</p>
How can I make it use the following format instead?
<p>
text
</p>
<p>
more text
</p>
I know that there are settings at Options-> Text Editor -> Html -> Formatting, but I could not find suitable there.
Thanks,
Adrian
Edit: I've checked the tag-specific settings, and page break for p tags is set to "Before opening, within, and after closing". Also, the little preview shows exactly the format I want to have. But Visual Studio still does it wrong. Could this have anything to do with Resharper being installed on my system?
The problem has nothing to do with ReSharper. This is a feature by design of the Visual Studio Source Formatter where it will attempt not to change the semantics of an element due to formatting options that you specify.
So, you specified that you want the p tags to have breaks within the content, but a break after a p tag would change the semantics of the content within the tag, thus the formatter ends up putting the closing p tag right after the content. To have the closing tag on a separate line you will need to explicitly add a space just before the end of the content and the closing tag.
Thus:
<p>content</p>
will produce:
<p>
content</p>
While (note the explicit inclusion of a space between the content and the closing p tag):
<p>content </p>
will produce:
<p>
content
</p>
This is discussed in a blog post by Scott Guthrie in the 3rd paragraph from the bottom. Start counting from the paragraph right above the additional links section.
Click Tools, Options, Text Editor, HTML, Formatting, Tag Specific Options.
Add a new Client-side tag for p (if it's not there already) and select Separate Closing tag and Before, within, and after closing.
Check this How Change Auto Formatting HTML In Visual Studio 2008 2010

Resources