ASP.NET special character problem - asp.net

I'm building an automated RSS feed in ASP.NET and occurrences of apostrophes and hyphens are rendering very strangely:
"Here's a test" is rendering as "Here’s a test"
I have managed to circumvent a similar problem with the pound sign (£) by escaping the ampersand and building the HTML escape for £ manually as shown in in the extract below:
sArticleSummary = sArticleSummary.Replace("£", "£")
But the following attempt is failing to resolve the apostrophe issue, we stil get ’ on the screen.
sArticleSummary = sArticleSummary.Replace("’", "’"")
The string in the database (SQL2005) for all intents and purposes appears to be plain text - can anyone advise why what seem to be plain text strings keep coming out in this manner, and if anyone has any ideas as to how to resolve the apostrophe issue that'd be appreciated.
Thanks for your help.
[EDIT]
Further to Vladimir's help, it now looks as though the problem is that somewhere between the database and it being loaded into the string var the data is converting from an apostrophe to ’ - has anyone seen this happen before or have any pointers?
Thanks

I would guess the the column in your SQL 2005 database is defined as a varchar(N), char(N) or text. If so the conversion is due to the database driver using a different code page setting to that set in the database.
I would recommend changing this column (any any others that may contain non-ASCII data) to nvarchar(N), nchar(N) or nvarchar(max) respectively, which can then contain any Unicode code point, not just those defined by the code page.
All of my databases now use nvarchar/nchar exclusively to avoid these type of encoding issues. The Unicode fields use twice as much storage space but there'll be very little performance difference if you use this technique (the SQL engine uses Unicode internally).

Transpires that the data (whilst showing in SQLServer plain) is actually carrying some MS Word special characters.

Assuming you get Unicode-characters from the database, the easiest way is to let System.Xml.dll take care of the conversion for you by appending the RSS-feed with a XmlDocument object. (I'm not sure about the elements found in a rss-feed.)
XmlDocument rss = new XmlDocument();
rss.LoadXml("<?xml version='1.0'?><rss />");
XmlElement element = rss.DocumentElement.AppendChild(rss.CreateElement("item")) as XmlElement;
element.InnerText = sArticleSummary;
or with Linq.Xml:
XDocument rss = new XDocument(
new XElement("rss",
new XElement("item", sArticleSummary)
)
);

I would just put "Here's a test" into a CDATA tag. Easy and it works.
<![CDATA[Here's a test]]>

Related

xmlreader.Create return none

i am trying to read from xml string But ,
` XmlReader reader=XmlReader.Create(new StringReader(stringXml)`
reader is always none. why is reader objest none ?
You have to call the read function.
reader.Read();
Here is the answer for your question. There seems to be no problem and the XmlReader is ready to be utilized.
Actually, if you are open to use .NetFramework 3.5 and higher you could benefit from using Linq To Xml:
XElement x = XElement.Load(new StringReader(s));
this happened to me in code that had been working for years. there were two paths for populating the xml string used in creating the xReader. The first pulled xml from a text parameter. If the text parameter was empty then it would fetch the string from sql server. If the text parameter was null, however, then I was getting "none" from the xReader. This despite the fact that SQL return perfectly formed xml. If the text parameter was an empty zero-length string, however, then everything worked fine, that is, the fetch to SQL ran, fetch xml and loaded the reader. It was like the runtime .net engine was running both paths simultaneously and giving me the worst possible outcome, instead of the desired outcome.

Getting input as hindi character from textbox and storing it to database

I am using asp.net and c# in my application and Mysql as Database.I want to take input from user in hindi and store it in database and retrieve it.
When I am storing the hindi characters in from Mysql database it is working fine for me but when I am using textbox to input a hindi characters it is showing me ?????????.
I guess the problem is the aspx page does not set to support hindi characters.Please tell me the way to achieve this.
I guess using UTF-8 encoding on your Http request and responses would solve it. What is your requestEncoding and responseEncoding in your Web.config file set to currently?
See more on the <globalization> tag here:
http://msdn.microsoft.com/en-us/library/hy4kkhe0(v=VS.100).aspx
try this:-
// mytable=2 fields id(auto increment),title(nvarchar(max))
string title = "बिलाल";
SqlCommand cmd = new SqlCommand("insert into mytable values (N'" + title + "')", con);
con.Open();
cmd.ExecuteNonQuery();
con.Close();
Haha.. Oh the memories (and I only had to deal with spanish which fits into the default latin1).
So I don't know the MS side of the stack, but I assume it's the same types of solutions as Java. Namely you should always assume UTF-8, and thus make your Content-Type HTML responses always show UTF-8 so that browsers know to encode POST data in UTF-8. You should always inspect the encoding type of HTML POST's just in case you have a browser that ignored the encoding of the HTML form (someone might be using curl/wget/custom-browser). You need to learn how in MS-land to convert from one encoding type into UTF-8 (in java, for reference, we just say String s = new String(bytes, encoding_name))
Assuming that MS's stack uses UTF-16 or UCS-32 or whatever so that UTF-8 is easy to extract, next comes the mysql layer.
This includes 2 things..
1) column encoding MUST be set to UTF8.. It's not obvious at all how to do it, and even the spelling is annoying.. Just google it.. "create database foo default character set UTF8" (approximate syntax), or if you're worried for some reason, do it at the table level "create table foo (..) character set UTF8" (approximate syntax).. Or if the table is already there, take EVERY column that can take arbitrary web-form text (possibly including login-name, but not columns like enumerated varchars - as it would waste index space - even though you'd think it wouldn't) "alter table foo change name varchar(255) character set UTF8" (approximate syntax).
2) You MUST make the ODBC connection (jdbc in java, don't know in MS), encode all in/out characters at UTF-8. There are two parameters I set (use-unicode, and character-set=UTF-8) (approximate parameter names).
Google it all, but this should point you in the right direction.
Test the existing DB by connecting to mysql both with character-set=UTF8 and latin1.. You'll see totally different output in your text-data when connected as each encoding. If you're lucky, you already got the data in correctly.. Otherwise you'll have to regenerate ALL the data, or perform some very clever character conversion hacks like I had to do once upon a time (painful stuff).

How should I store comments in database so that I can efficiently display them on page as html text?

I have a form where use enters multiple line of texts in a text area.
Some of the lines can have html markups as well. Say one line is bold.
How should I save the text in my database?
Should I store them as like this?
This is a greap post
<br/>
I love this type of findings.
<br/>
<br/>
Thanks for sharing
OR like this?
This is a greap post
<br/>
I love this type of findings.
<br/>
<br/>
Thanks for sharing
During editing:
I must show the text as they were entered. So line break will be replaced by new line
That way use sees there is a line break. Textarea won't unserstand br markup
During displaying:
I must render the text so that it appears like this on the page:
This is a greap post
I love this type of findings.
Thanks for sharing
I want to know the cleanest way to store text that can have markup in them.
Thanks for help
Since you want to output HTML, you will have to store the input in it's raw format in the database. There is only one catch though. You never should trust input, since all input is evil, especially in this case, since outputting HTML directly as it is inputted, opens the possibility of an cross-site scripting (XSS) attack.
You have basically got two options:
Use a HTML sanitizer that let's you remove all tags that are not known to be safe. A good sanitizer is the one that comes with the Microsoft AntiXss toolkit.
Encode the input and decode parts of the result that are known to be safe, for instance:
string[] safeList = { "<br/>", "<b>", "</b>", "<i>", "</i>" };
public static string EncodeInputWithSafeList(string unsafeInput)
{
// First: encode the complete input.
string safeInput = Encoder.HtmlEncode(unsafeInput);
// Next: decode each tag that is known to be safe.
foreach (string safeTag in safeList)
{
string encodedTag = Encoder.HtmlEncode(safeTag, false);
safeInput = safeInput.Replace(encodedTag, safeTag);
}
return safeInput;
}
Note: The example uses the Encoder class from the Microsoft AntiXss toolkit.
Now the question becomes, at what point should we clean it up. Normally you should encode the output just before you send it to the client and not store it encoded in the database, since it depends on the output type (HTML, PDF, JSON) how data should be encoded. This is amplified by the fact that in case there is a bug in the encoder, there is no way to fix it, since the data is already encoded.
In this case it is a bit more tricky though, since the input is HTML and not just text. I would say that sanitizing is something you still would want to do before hand, because this way you prevent bad input from entering your database. The EncodeInputWithSafeList method is a bit tricky, because it is both a sanitizer and an encoder. When we run it before it goes into the database, it prevents the output from changing when we change the safe list. This can be both a good thing and a bad thing, but I would say that when you add new tags to the safe list, you wouldn't want old data to suddenly change. So in this case I would go with input encoding, instead of output encoding.
When you go with input encoding, name the database column in such way that it is clear that we're dealing with sanitized, encoded data.
Try htmlentities($str, ENT_QUOTES); before you save the data, and html_entity_decode($str) after you fetch it from your db, before you render it to the browser.
saving it to your database like this:
<p>This is a greap post
<br/>
I love this type of findings.
<br/>
<br/>
Thanks for sharing</p>
would work..

Are there any anti-XSS libraries for ASP.Net?

I was reading some questions trying to find a good solution to preventing XSS in user provided URLs(which get turned into a link). I've found one for PHP but I can't seem to find anything for .Net.
To be clear, all I want is a library which will make user-provided text safe(including unicode gotchas?) and make user-provided URLs safe(used in a or img tags)
I noticed that StackOverflow has very good XSS protection, but sadly that part of their Markdown implementation seems to be missing from MarkdownSharp. (and I use MarkdownSharp for a lot of my content)
Microsoft has the Anti-Cross Site Scripting Library; you could start by taking a look at it and determining if it fits your needs. They also have some guidance on how to avoid XSS attacks that you could follow if you determine the tool they offer is not really what you need.
There's a few things to consider here. Firstly, you've got ASP.NET Request Validation which will catch many of the common XSS patterns. Don't rely exclusively on this, but it's a nice little value add.
Next up you want to validate the input against a white-list and in this case, your white-list is all about conforming to the expected structure of a URL. Try using Uri.IsWellFormedUriString for compliance against RFC 2396 and RFC 273:
var sourceUri = UriTextBox.Text;
if (!Uri.IsWellFormedUriString(sourceUri, UriKind.Absolute))
{
// Not a valid URI - bail out here
}
AntiXSS has Encoder.UrlEncode which is great for encoding string to be appended to a URL, i.e. in a query string. Problem is that you want to take the original string and not escape characters such as the forward slashes otherwise http://troyhunt.com ends up as http%3a%2f%2ftroyhunt.com and you've got a problem.
As the context you're encoding for is an HTML attribute (it's the "href" attribute you're setting), you want to use Encoder.HtmlAttributeEncode:
MyHyperlink.NavigateUrl = Encoder.HtmlAttributeEncode(sourceUri);
What this means is that a string like http://troyhunt.com/<script> will get escaped to http://troyhunt.com/<script> - but of course Request Validation would catch that one first anyway.
Also take a look at the OWASP Top 10 Unvalidated Redirects and Forwards.
i think you can do it yourself by creating an array of the charecters and another array with the code,
if you found characters from the array replace it with the code, this will help you ! [but definitely not 100%]
character array
<
>
...
Code Array
& lt;
& gt;
...
I rely on HtmlSanitizer. It is a .NET library for cleaning HTML fragments and documents from constructs that can lead to XSS attacks.
It uses AngleSharp to parse, manipulate, and render HTML and CSS.
Because HtmlSanitizer is based on a robust HTML parser it can also shield you from deliberate or accidental
"tag poisoning" where invalid HTML in one fragment can corrupt the whole document leading to broken layout or style.
Usage:
var sanitizer = new HtmlSanitizer();
var html = #"<script>alert('xss')</script><div onload=""alert('xss')"""
+ #"style=""background-color: test"">Test<img src=""test.gif"""
+ #"style=""background-image: url(javascript:alert('xss')); margin: 10px""></div>";
var sanitized = sanitizer.Sanitize(html, "http://www.example.com");
Assert.That(sanitized, Is.EqualTo(#"<div style=""background-color: test"">"
+ #"Test<img style=""margin: 10px"" src=""http://www.example.com/test.gif""></div>"));
There's an online demo, plus there's also a .NET Fiddle you can play with.
(copy/paste from their readme)

(ASP.NET) How do I remove special characters when doing a DateTime.Now.ToString()

So I have a flashobject which I need to pass a formatted DateTime string to.
My code:
string date = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss");
which outputs as: 2009-09-16 22:26:45
However when it is actually output to HTML and swfobject it renders it as:
so.addVariable("inNowDate","2009-09-16+22%3a25%3a13");
I think this is messing up a calculation that the flash object does based off the current time. Do I need to encode or decode this?
Any help would be greatly appreciated! Thanks!
It's not that you have gained special characters, but rather certain special characters you already had are now URL encoded.
There's not enough information present for me to see exactly where this URL encoding his happening. Can you post a bit more context?
When you output to html, try using UrlDecode.
http://msdn.microsoft.com/en-us/library/6196h3wt.aspx

Resources