Encoding apostrophe - asp.net

i am building up a string on the server that is getting put into a javascript variable on the client.
what is the best of encoding this to avoid any issues
right now on the server i am doing something like this:
html = html.Replace("'", "'");
but i assume there is a more elegant fool proof way of doing stuff like this.

You're really better off using the Microsoft Anti-Cross Site Scripting Library to do this. They provide a JavaScriptEncode method that does what you want:
Microsoft.Security.Application.AntiXss.JavaScriptEncode("My 'Quotes' and ""more"".", False)

html = html.Replace("'", "%27");

I'm not sure in which context you're using this string, but \' might be what you're looking for. The backslash is an escape character and allows you to use certain characters that can't otherwise be present in a string literal. This is what the output JavaScript should look like:
alert('It\'s amazing');
Of course, you could use alert("It's amazing"); in this particular case.
Anyway, if you're building JavaScript code:
html = html.Replace("'", "\\'");
On the other hand, there are other characters besides apostrophes that need some processing. Using the Microsoft Anti-Cross Site Scripting Library would get all of them at once.

I found that the AntiXSS library was not able to accomplish what I was looking for, which was to encode server side and decode in javascript.
Instead I used Microsoft.JScript.dll which allows you to:
GlobalObject.escape(string);
and on the client side in javascript:
unescape(string);

The characters that you need to escape in a string value are the backslash and the character used as string delimiter.
If apostrophes (') are used as string delimiter:
html = html.Replace(#"\", #"\\").Replace("'", #"\'");
If quotation marks (") are used as string delimiter:
html = html.Replace(#"\", #"\\").Replace(#"""", #"\""");
If you don't know which delimiter is used, or if it may change in the future, you can just escape both:
html = html.Replace(#"\", #"\\").Replace("'", #"\'").Replace(#"""", #"\""");

Related

Parameter separator in URLs, the case of misused question mark

What I don't really understand is the benefit of using '?' instead of '&' in urls:
It makes nobody's life easier if we use a different character as the first separator character.
Can you come up with a reasonable explanation?
EDIT: after more research I found that "&" can be a part of file name (terms&conditions.html) so "?" is a good separator. But still I think using "?" for separators makes lives easier (from url generators and parsers point of view):
Is there any advantage in using "&" which is not clear at the first glance?
From the URI spec's (RFC 3986) point of view, the only separator here is "?". the format of the query is opaque; the ampersands just are something that HTML happens to use for form submissions.
The answer's pretty much in this article - http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ . To highlight it, here goes :
Query is the preferred way to send some parameters to a resource on
the server. These are key=value pairs and are separated from the rest
of the URL by a ? (question mark) character and are normally separated
from each other by & (ampersand) characters. What you may not know is
the fact that it is legal to separate them from each other by the ;
(semi-colon) character as well. The following URLs are equivalent:
http://www.blah.com/some/crazy/path.html?param1=foo&param2=bar
http://www.blah.com/some/crazy/path.html?param1=foo;param2
The RFC 3896 (https://www.ietf.org/rfc/rfc3986.txt) defines general and sub delimiters ... '?' is a general, '&' and ';' are sub. The spec is pretty clear about that.
In this case the latter '?' chars would be treated as part of the query. If the query parser follows the spec strictly, it would then pass the whole query on to the app-destination. If the app-destination could choose to further process the query string in a manner which treats the ? as a param name-value pairs delimiter, that is up to the app's designers.
My guess is that this often 'just works' because code that splits query strings and the original uri uses all delimiters for matching: 1) first query is split on '?' then 2) query string is parsed using char match list that includes '?' (convenience only).... This could be occurring in ubiquitous parsing libraries already.

ASP.NET special character problem

I'm building an automated RSS feed in ASP.NET and occurrences of apostrophes and hyphens are rendering very strangely:
"Here's a test" is rendering as "Here’s a test"
I have managed to circumvent a similar problem with the pound sign (£) by escaping the ampersand and building the HTML escape for £ manually as shown in in the extract below:
sArticleSummary = sArticleSummary.Replace("£", "£")
But the following attempt is failing to resolve the apostrophe issue, we stil get ’ on the screen.
sArticleSummary = sArticleSummary.Replace("’", "’"")
The string in the database (SQL2005) for all intents and purposes appears to be plain text - can anyone advise why what seem to be plain text strings keep coming out in this manner, and if anyone has any ideas as to how to resolve the apostrophe issue that'd be appreciated.
Thanks for your help.
[EDIT]
Further to Vladimir's help, it now looks as though the problem is that somewhere between the database and it being loaded into the string var the data is converting from an apostrophe to ’ - has anyone seen this happen before or have any pointers?
Thanks
I would guess the the column in your SQL 2005 database is defined as a varchar(N), char(N) or text. If so the conversion is due to the database driver using a different code page setting to that set in the database.
I would recommend changing this column (any any others that may contain non-ASCII data) to nvarchar(N), nchar(N) or nvarchar(max) respectively, which can then contain any Unicode code point, not just those defined by the code page.
All of my databases now use nvarchar/nchar exclusively to avoid these type of encoding issues. The Unicode fields use twice as much storage space but there'll be very little performance difference if you use this technique (the SQL engine uses Unicode internally).
Transpires that the data (whilst showing in SQLServer plain) is actually carrying some MS Word special characters.
Assuming you get Unicode-characters from the database, the easiest way is to let System.Xml.dll take care of the conversion for you by appending the RSS-feed with a XmlDocument object. (I'm not sure about the elements found in a rss-feed.)
XmlDocument rss = new XmlDocument();
rss.LoadXml("<?xml version='1.0'?><rss />");
XmlElement element = rss.DocumentElement.AppendChild(rss.CreateElement("item")) as XmlElement;
element.InnerText = sArticleSummary;
or with Linq.Xml:
XDocument rss = new XDocument(
new XElement("rss",
new XElement("item", sArticleSummary)
)
);
I would just put "Here's a test" into a CDATA tag. Easy and it works.
<![CDATA[Here's a test]]>

Regex to limit string length for strings with new line characters

Looks like a simple task - get a regex that tests a string for particular length:
^.{1,500}$
But if a string has "\r\n" than the above match always fails!
How should the correct regex look like to accept new line characters as part of the string?
I have a <asp:TextBox TextMode="Multiline"> and use a RegularExpressionValidator to check the length of what user types in.
Thank you,
Andrey
You could use the RegexOptions.Singleline option when validating input. This treats the input as a single line statement, and parses it as such.
Otherwise you could give the following expression a try:
^(.|\s){1,500}$
This should work in multiline inputs.
Can you strip the line breaks before checking the length of the string? That'd be easy to do when validating server-side. (In .net you could use a custom validator for that)
From a UX perspective, though, I'd implement a client-side 'character counter' as well. There's plenty to be found. jQuery has a few options. Then you can implement the custom validator to only run server-side, and then use the character counter as your client-side validation. Much nicer for the user to see how many characters they have left WHILE they are typing.
The inability to set the RegexOptions is screwing you up here. Since this is in a RegularExpressionValidator, you could try setting the options in the regular expression itself.
I think this should work:
(?s)^.{1,500}$
The (?s) part turns on the Singleline option which will allow the dot to match every character including line feeds. For what it's worth, the article here also lists the other RegexOptions and the notation needed to set them as an inline statement.

Help with a regular expression to validate a series of n email addresses seperated by semicolons

I'm using an asp.net Web Forms RegularExpressionValidator Control to validate a text field to ensure it contains a series of email addresses separated by semicolons.
What is the proper regex for this task?
I think this one will work:
^([A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}(;|$))+
Breakdown:
[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4} : valid email (from http://www.regular-expressions.info/)
(;|$) : either semicolon or end of string
(...)+ : repeat all one or more times
Make sure you are using case-insensitive matching. Also, this pattern does not allow whitespace between emails or at the start or end of the string.
The 'proper' (aka RFC2822) regex is too complicated. Try something like (\S+#[a-zA-Z0-9-.]+(\s*;\s*|\s*\Z))+
Not perfect but should be there 90% (haven't tried it, so it might need some alteration)
Note: Not too sure about \Z it might be a Perl only thing. Try $ as well if it doesn't work.

How does one pass string contains '\' from from asp.net server side to Javascript function?

How does one pass string contains '\' from from asp.net
server side to javascript function?
After checking parameters at client side, all '\' replaced with '' even,
replacing '\' with '%5C' at server side doesn't work.
Any idea?
\ is a special character - basically it "escapes" the character after it. Try passing \\ instead. BTW - if you're using C# you can use the # character before a string to avoid needing to pass it as a double slash, e.g.
string path = #"c:\documents\mydocuments";
I got solution for that.
parameter.Replace("\\", "\\\\") solve it.
Are you using ASP.NET to write a JavaScript string literal? ie. something like:
Page.RegisterStartupScript("foo",
"<script type='text/javascript'>"+
" var bar= '"+myBarValue+"';"+
"</script>"
);
If so, then you are embedding text inside a delimited JavaScript string literal and you must use an escaping scheme that follows the syntax for string literals. In particular any \ character inside the text must be escaped with \\, and any ' character must be replaced by \', since that's the delimiter being used in this case (JavaScript can use either type of quote to delimit strings).
What's more if you're using an inline <script> block like in the above example, you're actually embedding text in a string literal in an HTML element, so you have to do some HTML escapes too. In particular you have to break up any </ sequences in the text, because that would end the script block. Also, in XHTML, there are no CDATA elements, so you'd also have to ampersand-escape any < or & characters in the text, except that would make it incompatible with legacy-HTML parsers. So to solve all these problems it is better to use JavaScript string literal escapes for that too, replacing < with \x3C and & with \x26.
Ideally what you would do would be to pass the simple string to a JSON encoder library, which would take care of escaping it appropriately for JavaScript. However I don't know of one for .NET that will escape the HTML for you as above, so you'd still need some replaces.

Resources