How to Print Text Directly In XHTML 5 - xhtml

Is there any tag that tells the browser to simply print what is inside the tag, without caring about the syntax of what is inside the tag? I'm trying to print a few unicode characters, but the browser keeps giving errors, even if I paste the character directly inside of a pre tag, without using ampersands.
I'm trying to print © inside of a div tag, but that putting that character inside of a div tag results in an "improperly formatted" error (the page doesn't even show up in Mozilla Firefox, and the sentence with the copyright symbol isn't printed in Microsoft Edge).
The page is being served as application/xhtml+xml.
Here is the code:
<footer>©</footer>
and here is the error:
XML Parsing Error: not well-formed Location: http://programcode.net/ Line Number 19, Column 13:
<footer></footer>
------------^
If I do this:
<footer><pre>© </pre></footer>
then the same error occurs:
XML Parsing Error: not well-formed Location: http://programcode.net/ Line Number 19, Column 18:
<footer><pre> </pre></footer>
-----------------^
I tried declaring utf-8 and utf-32 (in both the meta tag in the xhtml file, and .htaccess), but the error still occurred.

XHTML is awesome because it uses the XML parser which is extremely strict. When you have an error you know you have an error and that you need to fix it. I've seen a person spend three days trying to figure out why Safari wouldn't work but all the other browsers worked fine (he was missing a quote around an element's attribute).
What you need to do is encode HTML entities. There are a few websites that show you the full Unicode ranges and their characters. I recommend using https://unicode-table.com/en/ because it's less intimidating.
Now once you're there you'll want to want to search for the copyright symbol.
Next you'll click the obvious symbol and you'll end up on the copyright page.
You're looking for the HTML-code (the proper terminology when speaking with other professionals is "numeric HTML entity"). Never use the loose "Entity" (©), you want to always use the numeric HTML entity (©).
So your code should look like the following:
©
XHTML, CSS and JavaScript handle HTML entities a bit differently.
For JavaScript Entities you'll need to replace the uppercase 'U' with a lowercase 'u', remove the '+'. Here is an example that you can run from any browser's web developer console:
alert('Look at my \u00A9 date!');
Note that you must have the double zeroes for the copyright symbol (removing them will break the code).
For CSS Entities it's a little simpler:
h1::after {content: '\00A9'; display: block; float: left;}
Why is this so complex?
There are eight bits to a byte (one megabit a second is really only 125,000 bytes (125 kilobytes) a second. Some characters can not by represented by a single character in code. There are multiple levels of Unicode (universal character set) but most websites are moving to UTF-8. Some languages (such as Chinese, to the best of my understanding) use a symbol for an entire word (they their "alphabet" is much longer). All these characters have to somehow be represented by code (that you do not see). There is a big move to support UTF-8 natively everywhere (especially the web). Pretty much anything above character code 127 should be encoded when using XHTML. It may or it may not work natively and that is a more advanced topic for a different question. Hopefully this will give you enough insight to get a moving and a grooving though. 😊

Related

Output XHTML entity references with scala-xml

Using scala.xml.parsing.XhtmlParser I can parse an XHTML document without either losing or having to resolve the entity references against the DTD. However, XhtmlParser appears to do this by internally resolving the entities, such that, for instance — becomes a literal —, “ becomes a literal “, and so on.
This is clearly the right thing to do if you want to extract Unicode text from an XHTML document. However, once I've imported the XHTML and munged it in various ways, I need to output it again, and I don't trust the downstream system to handle encodings correctly. I'd like to output my results in an ASCII-safe manner, thus turning the —s back into —es and so on.
I've tried using scala.xml.Xhtml.toXhtml() on my Elem objects, but it just produces (sensibly enough) a Unicode String, with the only things encoded being &, < and > as required by XML.
I suppose I could take scala.xml.parsing.XhtmlEntities.entList, go through my output string character by character, and make the substitution myself, this seems like a chore. (Plus I wouldn't be able to use the raw list, as I'd have to skip the legit <s, >s, and &s in the XML output.)
Is there anything in the Scala XML libraries that will do this for me, or is the manual scan/replace my best option?

Visual FoxPro 9.0 report show unicode

I am using Visual Foxpro 9, I want to print Unicode chars in report (frx).
There are some ways to extend report listener to show unicode. I need the code to extend/show reportListner to show unicode.
I've never had to work with Unicode within VFP either, or spent any time working with Reports, but the Help for the Render method of the ReportListener does mention Unicode:
cContentsToBeRendered
Indicates the text to be rendered for Expression (Field) and Label layout elements.
For Picture layout elements sourced from a file, cContentsToBeRendered contains the filename.
When specifying a filename for an image, ReportListener provides cContentsToBeRendered
as a DBCS string, which is the standard format for strings in Visual FoxPro.
However, when indicating text to be rendered, ReportListener provides
cContentsToBeRendered as a Unicode string, appropriately translated to the correct
locale using any regional script information associated with this layout control in
its report definition file (frx) record.
If your derived class sends the text value through some additional processing, such as
storage in a table, you can use the STRCONV() function, and its optional regional
script parameter, to convert the string to DBCS first. For more information, see
STRCONV( ) Function.
Although I could be incorrect, but I believe VFP does NOT support UniCode and only works with the base ASCII character set. But then again, I've never needed to use Unicode either and have used FoxPro since the beginning of its lifetime.
I would imagine Rick Strahl's article Using Unicode in Visual FoxPro
Web and Desktop Applications would be fairly definitive on the topic.

XSS: Break out of not-complete encoding

I'm pentesting the ASP.NET application running on Microsoft-IIS/7.5 web server and I'm sending it the following GET request parameters:
&search=aaa%20%*+,-/;<=>^|"'bbb
One of the parameters is search, where I've inputed the value that can be seen above. The value is printed in the returned response two times as follows:
The first parameter:
<input name="nn" type="text" value="aaa %* ,-/;<=>^|"'bbb" class="cc" />
Quoted parameters in the first entry are as follows:
" ==> "
' ==> '
< ==> <
I guess there's no way to break out of there, since the value is escaped and we can't input the " character right. Nevertheless, all parameters are not properly escaped, even though it's not possible to break out.
The second parameter:
<strong>aaa %* ,-/;<=>^|"'bbb</strong>
We can see that all of the characters are presented as they are, but there's a catch. After the < character there can't be any [a-zA-Z0-9] (maybe some other as well) characters, because we're probably getting blocked by the ASP.NET filters.
If we input the following:
&searchQuery=aaa<#script>alert('Hi');<#/script>bbb
We get the following output:
<strong>aaa<#script>alert('Hi');<#/script>bbb</strong>
I'm asking if you see any way to break out of the restrictions and execute arbitrary JavaScript code nevertheless?
THank you
HTML requires the tag name to immediately follow the start tag open delimiter <:
Start tags must have the following format:
The first character of a start tag must be a U+003C LESS-THAN SIGN character (<).
The next few characters of a start tag must be the element's tag name.
[…]
Anything beyond that is up to a browser’s interpretation quirks.
But there are also other tags than element tags like markup declaration tags (<!…>), processing instruction tags (<?…>) and alternative comment tags (<%…%>) that are recognized by certain browsers and allow certain hacks.
Have a look at the common XSS cheat sheets like OWASP’s XSS Filter Evasion Cheat Sheet and the HTML5 Security Cheatsheet, or some HTML fuzzers like Shazzer.

What's the correct format for TCDL linkAttributes?

I can see the technology-independent Tridion Content Delivery Language (TCDL) link has the following parameters, which are pretty well described on SDL Live Content.
type
origin
destination
templateURI
linkAttributes
textOnFail
addAnchor
VariantId
How do we add multiple attribute-value pairs for the linkAttributes? Specifically, what do we use to escape the double quotes as well as separate pairs (e.g. if we need class="someclass" and onclick="someevent").
The separate pairs are just space delimited, like a normal series of attributes. Try XML encoding the value of linkAttributes however. So, " become &quote;, etc...
If you are using some Javascript, you might take care of the Javascript quotes too, as in \".
Edit: after I figured out your real question, the answer is a lot simpler:
You should wrap the values inside your linkAttributes in single quotes. Spaces inside linkAttributes are typically handled fine; but if not, escape then with %20.
If you need something more or want something that isn't handled by the standard tcdl:ComponentLink, remember that you can always create your own TCDL tag and and use a TagHandler or TagRenderer (look them up in the docs for examples or search for Jaime's article on TagRenderer) to do precisely what you want.
My original answer was to a question you didn't ask: what is the format for TCDL tags (in general). But the explanation might still be useful to some, so remains below.
I'd suggest having a look at what format the default building blocks (e.g. the Link Resolver TBB in the Default Finish Actions) output and use that as a guide line.
This is what I could quickly get from the transport package of a published page:
<tcdl:Link type="Page" origin="tcm:5-199-64" destination="tcm:5-206-64"
templateURI="tcm:0-0-0" linkAttributes="" textOnFail="true"
addAnchor="" variantId="">Home</tcdl:Link>
<tcdl:ComponentPresentation type="Embedded" componentURI="tcm:5-69"
templateURI="tcm:5-133-32">
<span>
...
One of the things that I know from experience: your entire TCDL tag will have to be on a single line (I wrapped the lines above for readability only). Or at least that is the case if it is used to invoke a REL TagRenderer. Clearly the tcdl:ComponentPresentation tag above will span multiple lines, so that "single line rule" doesn't apply everywhere.
And that is probably the best advice: given the fact that TCDL tags are processed at multiple points in Tridion Publishing, Deployment and Delivery pipeline, I'd stick to the format that the default TBBs output. And from my sample that seems to be: put everything on a single line and wrap the values in (double) quotes.

MathML ApplyFunction Entity - where does it come from?

I'm using System.Xml.Linq to parse MathML 2.0 via its associated DTD. Everything is fine except that Maple produces the &ApplyFunction; element which does not appear to be a DTD. Where is this element defined? I tried googling, but to no avail.
&ApplyFunction; is an entity that's treated as an operator (i.e., an mo element) in MathML. Its a valid Unicode character, with codepoint x2061: http://www.fileformat.info/info/unicode/char/2061/index.htm
ApplyFunction is normally used in order to prevent ambiguity, rather than as a required operator. For example, this code block
<mi>sin</mi><mo>(</mo><mi>x</mi><mo>)</mo>
is just as valid as this code block
<mi>sin</mi><mo>&ApplyFunction;</mo><mo>(</mo><mi>x</mi><mo>)</mo>
and really there's no ambiguity in either case, but for some functions there may be.
&ApplyFunction; should appear as an entity declaration in any MathML DTD.

Resources