Saving Unicode characters in asp.net - asp.net

I have a CMS written in ASP.NET using VB.NET and I am having problems saving Unicode characters to the database. Here's the situation:
The web page seems to send the characters fine via an AJAX request (using jQuery), at least according to Firebug it seems that the POST is sent fine I can see the characters in there as they should be (ie, not screwed up). When I look in the database instead of the non-english character I see a questionmark inside the little black diamond, you know the character. I know it's not the database since a) the field is set to NText and b) I can insert that same value directly into the DB via SQL Manager in a manual query. The database is MS SQL 2005.
So the problem must be in between, correct? I am specifically declaring the param on the insert query as NText:
Cmd.Parameters.Add("#FieldContent", SqlDbType.NText).Value = FieldContent
and in web.confing I have encoding set as:
<globalization requestEncoding="utf-8" responseEncoding="utf-8" />
I've googledhigh and low and cannot find any other solutions than the ones I've tried already. Any help is greatly apreciated.

try
cmd.Parameters.Add("#FieldContent", SqlDbType.NVarChar, 1024).Value = FieldContent;

Related

SQLite Search in Chinese is not working in Windows RT

I am writing a Windows RT app (Windows 8.1) in which I have used SQLite for the database. When I change the language setting of the machine to Chinese and I try to search a entity with a Chinese name, its returning me null even though the file exists. When the same query is used in SQLite manager, it returns back the respective entity.
Code used :
var q = string.Format("SELECT Entity.* from Entity where upper(Name) like '%{0}%' or upper(Keywords) like '%{1}%' ", queryString, queryString);
return db.Query<Entity>(q);
The letter could be "啊"
The search is working fine in English. So, do we have to enable something while installing to allow multilingual operation? or any extra parameter need to be sent to indicate that the language is different while sending a query?
You probably need to format the string as UTF-8/Unicode. Suggested reading re internationalization
I don't know what language you're using (C# I'm guessing), but using parameterized queries might help get around encoding issues, and it looks pretty simple to change to. Bonus points for avoiding SQL Injection (See How do parameterized queries help against SQL injection?) It's a local app, so not too serious... but still, bad practice.

Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.

Getting input as hindi character from textbox and storing it to database

I am using asp.net and c# in my application and Mysql as Database.I want to take input from user in hindi and store it in database and retrieve it.
When I am storing the hindi characters in from Mysql database it is working fine for me but when I am using textbox to input a hindi characters it is showing me ?????????.
I guess the problem is the aspx page does not set to support hindi characters.Please tell me the way to achieve this.
I guess using UTF-8 encoding on your Http request and responses would solve it. What is your requestEncoding and responseEncoding in your Web.config file set to currently?
See more on the <globalization> tag here:
http://msdn.microsoft.com/en-us/library/hy4kkhe0(v=VS.100).aspx
try this:-
// mytable=2 fields id(auto increment),title(nvarchar(max))
string title = "बिलाल";
SqlCommand cmd = new SqlCommand("insert into mytable values (N'" + title + "')", con);
con.Open();
cmd.ExecuteNonQuery();
con.Close();
Haha.. Oh the memories (and I only had to deal with spanish which fits into the default latin1).
So I don't know the MS side of the stack, but I assume it's the same types of solutions as Java. Namely you should always assume UTF-8, and thus make your Content-Type HTML responses always show UTF-8 so that browsers know to encode POST data in UTF-8. You should always inspect the encoding type of HTML POST's just in case you have a browser that ignored the encoding of the HTML form (someone might be using curl/wget/custom-browser). You need to learn how in MS-land to convert from one encoding type into UTF-8 (in java, for reference, we just say String s = new String(bytes, encoding_name))
Assuming that MS's stack uses UTF-16 or UCS-32 or whatever so that UTF-8 is easy to extract, next comes the mysql layer.
This includes 2 things..
1) column encoding MUST be set to UTF8.. It's not obvious at all how to do it, and even the spelling is annoying.. Just google it.. "create database foo default character set UTF8" (approximate syntax), or if you're worried for some reason, do it at the table level "create table foo (..) character set UTF8" (approximate syntax).. Or if the table is already there, take EVERY column that can take arbitrary web-form text (possibly including login-name, but not columns like enumerated varchars - as it would waste index space - even though you'd think it wouldn't) "alter table foo change name varchar(255) character set UTF8" (approximate syntax).
2) You MUST make the ODBC connection (jdbc in java, don't know in MS), encode all in/out characters at UTF-8. There are two parameters I set (use-unicode, and character-set=UTF-8) (approximate parameter names).
Google it all, but this should point you in the right direction.
Test the existing DB by connecting to mysql both with character-set=UTF8 and latin1.. You'll see totally different output in your text-data when connected as each encoding. If you're lucky, you already got the data in correctly.. Otherwise you'll have to regenerate ALL the data, or perform some very clever character conversion hacks like I had to do once upon a time (painful stuff).

ASP.net problem with Regional & Language Options (win2k3)

I have a TextBox in an Asp.net form. there is a simple javascript witch separates each three digits in TextBox. it works fine when you enter data in TextBox. I used coma , for separating digits and used dot . as floating point character.
as I said every thing works fine when I am entering data in TextBox. but when the post-back occurs and saved data returns to client, every .(s) has been removed (for example 2.3 saved as 23 and digits in TextBox are separated by . instead of ,.
this problem occurs just in a specific server (windows server 2003 sp1) and works fine in other windows server 2003 (SP1)! I am experiencing this problem for first time!
But I think the problem is because of specific Regional & Language Options in the server. This server is joined to a domain controller. when I change the regional and language options to this set:
Decimal Symbol -> .
Digit Grouping Symbol -> ,
nothing changes.
when I check the following item after customizing settings :
Apply All Settings to the current user account and to the default user profile -> checked
when I restart the Server, It jumps out from domain and need to be re-joined to domain controller! and of-course nothing changes again!
Do you had this problem? any solution please!
I can not post code here, because the code is too complex and I am sure problem is not because of code because it is working every where unless the specified server.
EDIT
Also setting regional and language options for network service user may help to solve the problem. any body knows how can I do this ?
Have you tried using the globalization tag in your web.config? This prevents you from running into trouble when multiple servers are configured differently (ie. different languagepacks).
<configuration>
<system.web>
<globalization
culture="en-US"
uiCulture="en-US" />
</system.web>
</configuration>
After goofing around with a similar problem for WAY to long I did the following with the help of a number of clues (also found on StackOverFlow, StackOverFlow rocks by the way...)
The first thing I did was dump out what the server was actually thinking (Page_Load):
var dtInfo = System.Globalization.DateTimeFormatInfo.CurrentInfo;
DisplayDebugInfo(String.Format(
"Culture({0}/{1}), DateFormat(SD:{2},DS:{3})",
System.Globalization.CultureInfo.CurrentCulture.Name,
System.Globalization.CultureInfo.CurrentUICulture.Name,
dtInfo.ShortDatePattern, dtInfo.DateSeparator));
Also on Windows 2003, I tried fixing the regional setting via the regular control panel but with no success.
I also tried setting the globalization settings in the web.config as mentioned in the other solution but with little effect.
It seems that once you start messing with the regional setting you can quickly get to the point where things are messed up. I decided to avoid messing with the registry and go for a code solution because then I would not have to worry when my code was released to production.
I added the following code to the base class for my page so that it would fix it everywhere. You could also place it in the Page_Load.
using System.Globalization;
using System.Threading;
// Fix the cultural settings...
CultureInfo culture = (CultureInfo)CultureInfo.CurrentCulture.Clone();
culture.DateTimeFormat.ShortDatePattern = "MM/dd/yyyy";
culture.DateTimeFormat.DateSeparator = "/";
Thread.CurrentThread.CurrentCulture = culture;
Problem solved. For me anyway.

ASP.NET special character problem

I'm building an automated RSS feed in ASP.NET and occurrences of apostrophes and hyphens are rendering very strangely:
"Here's a test" is rendering as "Here’s a test"
I have managed to circumvent a similar problem with the pound sign (£) by escaping the ampersand and building the HTML escape for £ manually as shown in in the extract below:
sArticleSummary = sArticleSummary.Replace("£", "&pound;")
But the following attempt is failing to resolve the apostrophe issue, we stil get ’ on the screen.
sArticleSummary = sArticleSummary.Replace("’", "&#146;"")
The string in the database (SQL2005) for all intents and purposes appears to be plain text - can anyone advise why what seem to be plain text strings keep coming out in this manner, and if anyone has any ideas as to how to resolve the apostrophe issue that'd be appreciated.
Thanks for your help.
[EDIT]
Further to Vladimir's help, it now looks as though the problem is that somewhere between the database and it being loaded into the string var the data is converting from an apostrophe to ’ - has anyone seen this happen before or have any pointers?
Thanks
I would guess the the column in your SQL 2005 database is defined as a varchar(N), char(N) or text. If so the conversion is due to the database driver using a different code page setting to that set in the database.
I would recommend changing this column (any any others that may contain non-ASCII data) to nvarchar(N), nchar(N) or nvarchar(max) respectively, which can then contain any Unicode code point, not just those defined by the code page.
All of my databases now use nvarchar/nchar exclusively to avoid these type of encoding issues. The Unicode fields use twice as much storage space but there'll be very little performance difference if you use this technique (the SQL engine uses Unicode internally).
Transpires that the data (whilst showing in SQLServer plain) is actually carrying some MS Word special characters.
Assuming you get Unicode-characters from the database, the easiest way is to let System.Xml.dll take care of the conversion for you by appending the RSS-feed with a XmlDocument object. (I'm not sure about the elements found in a rss-feed.)
XmlDocument rss = new XmlDocument();
rss.LoadXml("<?xml version='1.0'?><rss />");
XmlElement element = rss.DocumentElement.AppendChild(rss.CreateElement("item")) as XmlElement;
element.InnerText = sArticleSummary;
or with Linq.Xml:
XDocument rss = new XDocument(
new XElement("rss",
new XElement("item", sArticleSummary)
)
);
I would just put "Here's a test" into a CDATA tag. Easy and it works.
<![CDATA[Here's a test]]>

Resources