Is it necessary to html encode right angle brackets? - asp.net

I'm adding some meta description data to my header like so:
HtmlMeta meta = new HtmlMeta();
meta.Name = "description";
meta.Content = description; // this is unencoded
page.Header.Controls.Add(meta);
And .net helpfully encodes things like & and <, but not >. Now, I can't imagine that this would be an oversight, so I conclude that it's unnecessary to escape them. But before I go back to the client with that answer, it would be nice to get confirmation by Some Strangers From The Intarwebs first :)

According to the XML specification > is indeed valid for attributes. Only <, & and " or ' need escaping.
[10] AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"

Related

I need to parse a string using javacc containing single quotes as part of the string

I have defined grammar rules like
TOKEN : { < SINGLE_QUOTE : " ' " > }
TOKEN : { < STRING_LITERAL : " ' " (~["\n","\r"])* " ' ">
But I am not able to parse sequences like 're'd' .I need the parser to parse re'd as a string literal.But the parser parses 're' seperately and 'd' seperately for these rules.
If you need to lex re'd as STRING_LITERAL token then use the following rule
TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : { < STRING_LITERAL : "'"? (~["\n","\r"])* "'"?>
I didn't see the rule for matching "re" separately.
In javacc, definition of your lexical specification STRING_LITERAL is to start with "'" single quot. But your input doesn't have the "'" at starting.
The "?" added in the STRING_LITERAL makes the single quot optional and if present only one. so this will match your input and lex as STRING_LITERAL.
JavaCC decision making rules:
1.) JavaCC will looks for the longest match.
Here in this case even if the input starts with the "'" the possible matches are SINGLE_QUOTE and STRING_LITERAL. the second input character tells which token to choose STRING_LITERAL.
2.) JavaCC takes the the rule declared first in the grammar.
Here if the input is only "'" then it will be lexed as SINGLE_QUOTE even if there is the possible two matches SINGLE_QUOTE and STRING_LITERAL.
Hope this will help you...
The following should work:
TOKEN : { < SINGLE_QUOTE : "'" > }
TOKEN : { < STRING_LITERAL : "'" (~["\n","\r"])* "'"> }
This is pretty much what you had, except that I removed some spaces.
Now if there are two on more apostrophes on a line (i.e. without an intervening newline or return) then the first and the last of those apostrophes together with all characters between should be lexed as one STRING_LITERAL token. That includes all intervening apostrophes. This is assuming there are no other rules involving apostrophes. For example, if your file is 're'd' that should lex as one token; likewise 'abc' + 'def' should lex as one token.

How to handle ampersands in URL parameters?

I am having the following issue:
I am using an application that allows users to concatenate text to build a URL that passes parameters to an ASP page via GET method, i.e. something like:
http://myhostname/process.asp?param1=value1&param2=value2
Problem is value1 and value2 can contain the ampersand symbol, which is not interpreted as a text character.
The most popular solution to this issue is to encode the URL, which is not an option for me because I cannot modify the program that builds the URL. I can modify the process.asp page, but not the program that concatenates the text fields and builds the URL.
Things I've tried to search for are:
How to encode a URL using javascript directly in the browser
How to change IIS default behaviour when reading an &
Alternative ways to pass parameters, i.e. something like passing them as a single string of characters separated with pipes
Hope you can give me some guidance.
You can read the entire query string and parse it yourself, like this:
q = Request.QueryString
a = Split(q, "=")
i = 1
For Each s In a
If i mod 2 = 0 Then
If InStr(s, "&") <> InStrRev(s, "&") Then
Response.Write "Value: " & Left(s, InStrRev(s, "&") - 1) & "<br/>"
hidingParam = Right(s, Len(s) - InStrRev(s, "&"))
Response.Write "PAramName: " & hidingParam & "<br/>"
i = i + 1
Else
Response.Write "Value: " & s & "<br/>"
End If
Else
Response.Write "PAramName: " & s & "<br/>"
End If
i = i + 1
Next
Result:
URL: ...?Q=abc&def&P=123 produces
PAramName: Q Value: abc&def PAramName: P Value: 123
Note that this is less than robust. I am only illustrating my idea. I didn't test with no &.
It also doens't handle multiple "=" characters (if that's a possiblity as well).
If there are 2 (or more) ampersands in-between the equals, then only the last one is a parameter separator. So, using your URL above, and assuming that value1 = "abc&def", and value2 = "123", then the URL will look like:
http://myhostname/process.asp?param1=abc&def&param2=123
Notice there's 2 ampersands in-between the 2 equals. The last one will be your parameter separator, the rest are part of the value. And any ampersands after the last equals are also part of the value.
You'll have to dissect the incoming URL and apply the appropriate logic.

How to create robust access logs using Apache Tomcat Valve Component?

We are working with Apache Tomcat 7 and trying to setup the Valve Component to store our access logs, ready for processing in SnowPlow.
The problem we have is how to make these logs robust. To give an example - we can separate fields with tabs and extract the user agent string like so:
pattern="%{yyyy-MM-dd}t %{hh:mm:ss}t %{User-Agent}i "
The problem is that the Valve Component does not (as far as I can see) escape %{User-Agent}i, so a stray tab in a useragent will corrupt the data (row will look like it contains four fields, not three).
As far as solutions, unless there's a way of escaping the useragent which I've missed, I can see a couple of solutions:
Use a really obscure field delimiter (or combination of field delimiters) which is very unlikely to crop up in a useragent string. We tried Ctrl-A (HTML ?) but that didn't seem to work
Write a custom AccessLogValve which either supports escaping or sanitizes tabs - perhaps similar to this post Sanitizing Tomcat access log entries
A bit puzzled that I can't find anything else about this online - does nobody parse their Tomcat access logs?
What do you recommend? We're a little stuck...
RFC2616 defines user agent string as
User-Agent = "User-Agent" ":" 1*( product | comment )
Then product is defined as
product = token ["/" product-version]
product-version = token
Following this, tokens are defined as
token = 1*<any CHAR except CTLs or separators>
and separators/CTLs as
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
We need not to forget comment, which is defined as
comment = "(" *( ctext | quoted-pair | comment ) ")"
ctext = <any TEXT excluding "(" and ")">
quoted-pair = "\" CHAR
CHAR = <any US-ASCII character (octets 0 - 127)>
So if I understand correctly, you should be able to use any separator or CTL as long as you can distinguish comment, which is wrapped in ( and ). If ( appears inside the comment, it should be escaped with \.
In the end, I wrote a custom Tomcat AccessLogValve which:
Introduced a new pattern, 'I', to escape an incoming header
Introduced a new pattern, 'C', to fetch a cookie stored on the response
Re-implemented the pattern 'i' to ensure that "" (empty string) is replaced with "-"
Re-implemented the pattern 'q' to remove the "?" and ensure "" (empty string) is replaced with "-"
Overwrote the 'v' pattern, to write the version of this AccessLogValve, rather than the local server name
It seems to be pretty robust - I haven't had any further issues with unescaped values.

How to encode the plus (+) symbol in a URL

The URL link below will open a new Google mail window. The problem I have is that Google replaces all the plus (+) signs in the email body with blank space. It looks like it only happens with the + sign. How can I remedy this? (I am working on a ASP.NET web page.)
https://mail.google.com/mail?view=cm&tf=0&to=someemail#somedomain.com&su=some subject&body=Hi there+Hello there
(In the body email, "Hi there+Hello there" will show up as "Hi there Hello there")
The + character has a special meaning in [the query segment of] a URL => it means whitespace: . If you want to use the literal + sign there, you need to URL encode it to %2b:
body=Hi+there%2bHello+there
Here's an example of how you could properly generate URLs in .NET:
var uriBuilder = new UriBuilder("https://mail.google.com/mail");
var values = HttpUtility.ParseQueryString(string.Empty);
values["view"] = "cm";
values["tf"] = "0";
values["to"] = "someemail#somedomain.com";
values["su"] = "some subject";
values["body"] = "Hi there+Hello there";
uriBuilder.Query = values.ToString();
Console.WriteLine(uriBuilder.ToString());
The result:
https://mail.google.com:443/mail?view=cm&tf=0&to=someemail%40somedomain.com&su=some+subject&body=Hi+there%2bHello+there
If you want a plus + symbol in the body you have to encode it as 2B.
For example:
Try this
In order to encode a + value using JavaScript, you can use the encodeURIComponent function.
Example:
var url = "+11";
var encoded_url = encodeURIComponent(url);
console.log(encoded_url)
It's safer to always percent-encode all characters except those defined as "unreserved" in RFC-3986.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
So, percent-encode the plus character and other special characters.
The problem that you are having with pluses is because, according to RFC-1866 (HTML 2.0 specification), paragraph 8.2.1. subparagraph 1., "The form field names and values are escaped: space characters are replaced by `+', and then reserved characters are escaped"). This way of encoding form data is also given in later HTML specifications, look for relevant paragraphs about application/x-www-form-urlencoded.
Just to add this to the list:
Uri.EscapeUriString("Hi there+Hello there") // Hi%20there+Hello%20there
Uri.EscapeDataString("Hi there+Hello there") // Hi%20there%2BHello%20there
See https://stackoverflow.com/a/34189188/98491
Usually you want to use EscapeDataString which does it right.
Generally if you use .NET API's - new Uri("someproto:with+plus").LocalPath or AbsolutePath will keep plus character in URL. (Same "someproto:with+plus" string)
but Uri.EscapeDataString("with+plus") will escape plus character and will produce "with%2Bplus".
Just to be consistent I would recommend to always escape plus character to "%2B" and use it everywhere - then no need to guess who thinks and what about your plus character.
I'm not sure why from escaped character '+' decoding would produce space character ' ' - but apparently it's the issue with some of components.

escaping string for json result in asp.net server side operation

I have a server side operation manually generating some json response. Within the json is a property that contains a string value.
What is the easiest way to escape the string value contained within this json result?
So this
string result = "{ \"propName\" : '" + (" *** \\\"Hello World!\\\" ***") + "' }";
would turn into
string result = "{ \"propName\" : '" + SomeJsonConverter.EscapeString(" *** \\\"Hello World!\\\" ***") + "' }";
and result in the following json
{ \"propName\" : '*** \"Hello World!\" ***' }
First of all I find the idea to implement serialization manually not good. You should to do this mostla only for studying purpose or of you have other very important reason why you can not use standard .NET classes (for example use have to use .NET 1.0-3.0 and not higher).
Now back to your code. The results which you produce currently are not in JSON format. You should place the property name and property value in double quotas:
{ "propName" : "*** \"Hello World!\" ***" }
How you can read on http://www.json.org/ the double quota in not only character which must be escaped. The backslash character also must be escaped. You cen verify you JSON results on http://www.jsonlint.com/.
If you implement deserialization also manually you should know that there are more characters which can be escaped abbitionally to \" and \\: \/, \b, \f, \n, \r, \t and \u which follows to 4 hexadecimal digits.
How I wrote at the beginning of my answer, it is better to use standard .NET classes like DataContractJsonSerializer or JavaScriptSerializer. If you have to use .NET 2.0 and not higher you can use Json.NET.
You may try something like:
string.replace(/(\\|")/g, "\\$1").replace("\n", "\\n").replace("\r", "\\r");

Resources