I'm pentesting the ASP.NET application running on Microsoft-IIS/7.5 web server and I'm sending it the following GET request parameters:
&search=aaa%20%*+,-/;<=>^|"'bbb
One of the parameters is search, where I've inputed the value that can be seen above. The value is printed in the returned response two times as follows:
The first parameter:
<input name="nn" type="text" value="aaa %* ,-/;<=>^|"'bbb" class="cc" />
Quoted parameters in the first entry are as follows:
" ==> "
' ==> '
< ==> <
I guess there's no way to break out of there, since the value is escaped and we can't input the " character right. Nevertheless, all parameters are not properly escaped, even though it's not possible to break out.
The second parameter:
<strong>aaa %* ,-/;<=>^|"'bbb</strong>
We can see that all of the characters are presented as they are, but there's a catch. After the < character there can't be any [a-zA-Z0-9] (maybe some other as well) characters, because we're probably getting blocked by the ASP.NET filters.
If we input the following:
&searchQuery=aaa<#script>alert('Hi');<#/script>bbb
We get the following output:
<strong>aaa<#script>alert('Hi');<#/script>bbb</strong>
I'm asking if you see any way to break out of the restrictions and execute arbitrary JavaScript code nevertheless?
THank you
HTML requires the tag name to immediately follow the start tag open delimiter <:
Start tags must have the following format:
The first character of a start tag must be a U+003C LESS-THAN SIGN character (<).
The next few characters of a start tag must be the element's tag name.
[…]
Anything beyond that is up to a browser’s interpretation quirks.
But there are also other tags than element tags like markup declaration tags (<!…>), processing instruction tags (<?…>) and alternative comment tags (<%…%>) that are recognized by certain browsers and allow certain hacks.
Have a look at the common XSS cheat sheets like OWASP’s XSS Filter Evasion Cheat Sheet and the HTML5 Security Cheatsheet, or some HTML fuzzers like Shazzer.
Related
According the CSS Level 3 specification, for parsing the start of an identifier, you:
Check if three code points would start an identifier
Look at the first code point:
If the first character is -, then we have a valid identifier if:
The second code point is an identifier-start code point ([a-zA-Z_] or non-ASCII).
The second code point is -.
The second and third character form a valid escape.
Otherwise, we do not have a valid identifier start. After determining if we have a valid identifier start, the only requirements to have a valid <ident-token> is we have 0 or more of any combination of the following:
Escape tokens
ASCII letters
Digits
_ or -
Non-ASCII characters
Since we do not require any characters following an identifier start token, this would suggest that -- is a valid identifier, even if never supported by any browser or framework. However, even official CSS validation services (maintained by those that design the CSS specifications) do not consider this a valid identifier. Is this merely a bug in the validation service?
Yes it's valid and it works. It's the shortest custom property (aka CSS variable) that you can define:
body {
--:red;
background:var(--);
}
Related: Can a css variable name start with a number?
The -- custom property identifier is reserved for future use, but current browsers incorrectly treat it as a valid custom property.
See also
w3c/csswg-drafts#6313
Is there any tag that tells the browser to simply print what is inside the tag, without caring about the syntax of what is inside the tag? I'm trying to print a few unicode characters, but the browser keeps giving errors, even if I paste the character directly inside of a pre tag, without using ampersands.
I'm trying to print © inside of a div tag, but that putting that character inside of a div tag results in an "improperly formatted" error (the page doesn't even show up in Mozilla Firefox, and the sentence with the copyright symbol isn't printed in Microsoft Edge).
The page is being served as application/xhtml+xml.
Here is the code:
<footer>©</footer>
and here is the error:
XML Parsing Error: not well-formed Location: http://programcode.net/ Line Number 19, Column 13:
<footer></footer>
------------^
If I do this:
<footer><pre>© </pre></footer>
then the same error occurs:
XML Parsing Error: not well-formed Location: http://programcode.net/ Line Number 19, Column 18:
<footer><pre> </pre></footer>
-----------------^
I tried declaring utf-8 and utf-32 (in both the meta tag in the xhtml file, and .htaccess), but the error still occurred.
XHTML is awesome because it uses the XML parser which is extremely strict. When you have an error you know you have an error and that you need to fix it. I've seen a person spend three days trying to figure out why Safari wouldn't work but all the other browsers worked fine (he was missing a quote around an element's attribute).
What you need to do is encode HTML entities. There are a few websites that show you the full Unicode ranges and their characters. I recommend using https://unicode-table.com/en/ because it's less intimidating.
Now once you're there you'll want to want to search for the copyright symbol.
Next you'll click the obvious symbol and you'll end up on the copyright page.
You're looking for the HTML-code (the proper terminology when speaking with other professionals is "numeric HTML entity"). Never use the loose "Entity" (©), you want to always use the numeric HTML entity (©).
So your code should look like the following:
©
XHTML, CSS and JavaScript handle HTML entities a bit differently.
For JavaScript Entities you'll need to replace the uppercase 'U' with a lowercase 'u', remove the '+'. Here is an example that you can run from any browser's web developer console:
alert('Look at my \u00A9 date!');
Note that you must have the double zeroes for the copyright symbol (removing them will break the code).
For CSS Entities it's a little simpler:
h1::after {content: '\00A9'; display: block; float: left;}
Why is this so complex?
There are eight bits to a byte (one megabit a second is really only 125,000 bytes (125 kilobytes) a second. Some characters can not by represented by a single character in code. There are multiple levels of Unicode (universal character set) but most websites are moving to UTF-8. Some languages (such as Chinese, to the best of my understanding) use a symbol for an entire word (they their "alphabet" is much longer). All these characters have to somehow be represented by code (that you do not see). There is a big move to support UTF-8 natively everywhere (especially the web). Pretty much anything above character code 127 should be encoded when using XHTML. It may or it may not work natively and that is a more advanced topic for a different question. Hopefully this will give you enough insight to get a moving and a grooving though. 😊
I can see the technology-independent Tridion Content Delivery Language (TCDL) link has the following parameters, which are pretty well described on SDL Live Content.
type
origin
destination
templateURI
linkAttributes
textOnFail
addAnchor
VariantId
How do we add multiple attribute-value pairs for the linkAttributes? Specifically, what do we use to escape the double quotes as well as separate pairs (e.g. if we need class="someclass" and onclick="someevent").
The separate pairs are just space delimited, like a normal series of attributes. Try XML encoding the value of linkAttributes however. So, " become "e;, etc...
If you are using some Javascript, you might take care of the Javascript quotes too, as in \".
Edit: after I figured out your real question, the answer is a lot simpler:
You should wrap the values inside your linkAttributes in single quotes. Spaces inside linkAttributes are typically handled fine; but if not, escape then with %20.
If you need something more or want something that isn't handled by the standard tcdl:ComponentLink, remember that you can always create your own TCDL tag and and use a TagHandler or TagRenderer (look them up in the docs for examples or search for Jaime's article on TagRenderer) to do precisely what you want.
My original answer was to a question you didn't ask: what is the format for TCDL tags (in general). But the explanation might still be useful to some, so remains below.
I'd suggest having a look at what format the default building blocks (e.g. the Link Resolver TBB in the Default Finish Actions) output and use that as a guide line.
This is what I could quickly get from the transport package of a published page:
<tcdl:Link type="Page" origin="tcm:5-199-64" destination="tcm:5-206-64"
templateURI="tcm:0-0-0" linkAttributes="" textOnFail="true"
addAnchor="" variantId="">Home</tcdl:Link>
<tcdl:ComponentPresentation type="Embedded" componentURI="tcm:5-69"
templateURI="tcm:5-133-32">
<span>
...
One of the things that I know from experience: your entire TCDL tag will have to be on a single line (I wrapped the lines above for readability only). Or at least that is the case if it is used to invoke a REL TagRenderer. Clearly the tcdl:ComponentPresentation tag above will span multiple lines, so that "single line rule" doesn't apply everywhere.
And that is probably the best advice: given the fact that TCDL tags are processed at multiple points in Tridion Publishing, Deployment and Delivery pipeline, I'd stick to the format that the default TBBs output. And from my sample that seems to be: put everything on a single line and wrap the values in (double) quotes.
I need to allow the user to submit queries as follows;
/search/"my search string"
but it's failing because of request validation, as outlined in the following 2 questions:
How to include quote characters as a route parameter? Getting "Illegal characters in path" message
How to modify request validation?
I'm currently trying to figure out how to disable request validation for the quote character, but i'd like to know the risks before I actually put the site live with this disabled? I will not disable the request validation unless I can only disable it for the quote character, so I do intend to disallow every other character that's currently not allowed.
According to the URI generic syntax specification (RFC 2396), the double-quote character is explicitly excluded and must be escaped (i.e. %22). See section 2.4.3. The reason given in the spec:
The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields.
You can see easily why this is the case -- imagine trying to create a link in HTML to your URL:
<a href="http://somesite/search/"my search string""/>
That would fail HTML parsing (and also breaks SO's syntax highlighting). You also would have trouble doing basic things with the URL like emailing it to someone (the email client wouldn't parse the URL correctly), posting it on a message board, sending it in an instant message, etc.
For what it's worth, spaces are also explicitly excluded (same section of the RFC explains why).
In some javascript, I have:
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
alert( url );
location.href = url;
where the value of address is the string "Seattle, WA".
In the alert I see
find.aspx?Seattle%2C%20WA
as I expect.
But on the server side, when I look at Request.Url, the relevant substring I see is
find.aspx?Seattle, WA
And in the Firefox url window I see
find.aspx?location=Seattle%2C WA
So I'm getting three different representations whereas I would expect that in all three places I should see what I see in the alert. My expectation is that the url I assign to location.href should show up as-is in the browser url window, and should be passed as-is to the server in Request.Url (and I would need to decode the values on the server before using them). What's happening?
Firefox converts certain encoded characters into their literal forms as a way to be friendly to users. It will also convert spaces typed into the address bar into %20 for the server.
Update: The reason Firefox doesn't display the comma unencoded is because commas are allowed in URLs, but spaces are not, so it knows that a space is going to be unambiguously interpreted, whereas the pre-encoded comma is different from a non-encoded comma to some servers. see: Can I use commas in a URL?
ASP is probably trying to help you out by auto-un-encoding the string for you.
Update: It looks like ASP.NET unencodes Request.Url for you by default, as mentioned here: QueryString malformed after URLDecode They also mention that you can use HttpRequest.Url.Query to access the un-decoded version.
The alert is the only thing not doing any "magic" for you.
For the alert, you are doing the encoding yourself. Perhaps it looks the same as on the server-side if you removed encodeURIComponent.
On the server side, ASP.NET will always show you the unencoded form. This is to make it easier to directly map to files that also have text that needed to be (un)encoded.
Note that you can replace every letter for its UTF8 representation in URL Encoding. It will still be the same URL. I.e., type the following in the browser window and it will still work: %66%59%6E%64.aspx?location=Seattle%2C%20WA. To only encode the necessary chars, use UrlEncode on the server side if you create a link yourself.
URL encoding can become fairly tricky. You ask to explain it. To know the correct escape of a certain character, you need to know how that character looks in UTF8. The hexadecimal value of the UTF-8 bytes then become the %XX%YY value of your letter. Sometimes it's one %XX, but it can be up to six byte sequences in total (some Chinese characters for instance).
URL Encoding works one way only. Never double-encode or double-unencode. This is prohibited by the specification. Also, because you can encode any character, it is not always possible (as you found out) to do roundtrip encoding/unencoding. If you unencode and re-encode again, it is well possible that the resulting string is different, but syntactically the same.
In HTML, URL Encoding is sometimes interspersed with HTML Encoding. I.e., the ampersand is valid in HTML, but not in HTML. find.aspx?city=A&name=B becomes find.aspx?city=A&name=B in and HTML URL. However, browsers are lenient and will accept wrongly HTML-encoded strings.
Finally, a not on the browser: if you type in a space in a link, even inside an <a> tag, it will escape the space (or other character) for you. Likewise, it will nowadays show the odd characters (é, ï etc) in the address bar, but when it sends it over HTTP, the browser will correctly do the encoding for you.
Update: about anwering your question of needing a "definitive" reference or proof.
While I couldn't find any on the internet, I decided to look for it myself using Reflector. Going through the methods that set, for instance, the HttpRequest.QueryString, you quickly encounter the private method HttpRequest.FillInQueryStringCollection which then calls HttpValueCollection.FillfromEncodedBytes. Somewhat near the end of that method, HttpUtility.UrlDecode is called for the values. Conclusion: do not call it yourself, to prevent double decoding.
You can see this for yourself when you download Reflector and disassemble the .NET libs of System.Web.
For your example you can change this line
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
to
var url = "find.aspx?" + "location=" + address;
and see the address as it is. Bu if address variable contains any '&' character your variable will be corrupt. So you are using encodeURIComponent to encode these things url.
On the Server side all these encoded strings are decoded back. It means encodeURIComponent is just for sending the address variable (whether it contains & character or not) to server side correctly.