i have this link with the url /laws/document?ref=S-AL1_2_36458.
when clicking on it, weird characters are being added. the characters are:
%E2%80%8B%E2%80%8B%E2%80%8B%E2%80%8B%E2%80%8B
so the url will end being:
/laws/document?ref=S-AL1_2_36458%E2%80%8B%E2%80%8B%E2%80%8B%E2%80%8B%E2%80%8B
when encoding the original link in js (using encodeURI()) it returns the same weird link. however, when encoding just the "ref value"
which is S-AL1_2_3645 it doesn't add weird chars
%E2%80%8B is URL encoded sequence for the Unicode character ZERO WIDTH SPACE (U+200B). This appears repeated 5 times in your URL.
So it seems, that the link isn’t actually just /laws/document?ref=S-AL1_2_36458 but it is /laws/document?ref=S-AL1_2_36458<ZWSP><ZWSP><ZWSP><ZWSP><ZWSP> which the browser will then encode properly in order to make the request. This behavior is expected and correct if that’s the actual URL that is being navigated to.
If this is not the URL that you want, I would suggest you to check the HTML of the link. If this is a static HTML, then check if there are any characters at the end of the URL and remove them. Since they are spaces of zero-width, they are effectively invisible but there are still there (e.g. in text editors, you will notice them when moving your cursor). If this is a generated link, check where the value comes from and try to trim any whitespace around the source.
Related
Hello I'm not trying to use space or anything like that. all my urls are standard with dash separated words but the characters are in persian so instead of
/%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C
I wanna see
/آرایشی
The links are already saved as the second one but when it shows it on webpage it automatically encodes it.
My CMS can handle getting requests like the second one and auto redirect. I've tried changing config file and some globalization settings but no luck yet.
I built a static site initially and am now in the process of converting it to a wordpress site. You can find it here The last image in the right column, when clicked, should open up a fancybox and play a video. It worked very well in the static site, but for some reason in wordpress the box appears at the bottom of the page instead of the center. I'm pretty sure it is seeing the css because I can click on the link and find it.
This is the result of the validation of your page
http://validator.w3.org/check?uri=http://training.mercury.stellarbluewebdesign.com/LittlestTumorFoundation/
Notice the comment :
Byte-Order Mark found in UTF-8 File.
The Unicode Byte-Order Mark (BOM) in UTF-8 encoded files
is known to cause problems for some text editors and older
browsers.
Also notice
Line 1, Column 1: Non-space characters found without seeing a
doctype first. Expected <!DOCTYPE html>
Passing your code through an editor in ansi mode (and showing all symbols), this is what I get :
Those preceding hidden characters before the DOCTYPE in your document makes your browser run in quirks mode hence the unexpected behavior of fancybox (which needs the document in standards mode to run properly)
What you have to do is to save your WP (php) files in an editor using UTF-8 without BOM encoding and upload them again (and alternatively forcing your ftp software to upload in binary mode)
When I put this word "Bibliothèque" in a .aspx page, I see it correctly "Bibliothèque".
If I put the same word in a .html file, I see "Bibliothèque"
How can this be possible? Must be an IIS issue but I can't find the setting.
How can a .aspx file show the right word but not a .html file.
Open the file named web.config in the ASP.NET project. The value of requestEncoding attribute in globalization element is "utf-8". It means the requested texts were encoded as UTF-8 character set.
check your browser what it is support. you can change it using character encoding. So your HTML is giving you the result according to browser character encoding.
To ensure it will always work, for this specific example, you can replace the non ASCII characters using Html entities, like this: Bibliothèque. But this is not always practical in general.
Otherwise, there are other various ways to make it work:
use byte order mark encoding (sometimes called 'signature', or BOM, by editors) and save the file as UTF-8
add a META character encoding to your html file.
define what HTTP headers will be sent to the client using the globalization element in the application web.config (responseEncoding, etc.)
define what HTTP headers will be sent to the client using the ASP.NET #page directive
The best is to make sure all this is consistent in your application. UTF-8 support is now widespread, so it's a good choice as the encoding.
An interesting article on the encoding subject :The Definitive Guide to Web Character Encoding
On our site, I use the category (in Russian) in the querystring.
E.g.: http://www.odinklik.ru/kategoriya.aspx?cat=люди
If you paste this link in IE8, it is translated to cat=???? and it does not work
If I paste it in FireFox, it works.
It gets even more weird: the same URL is reachable from the homepage, and if I click the same URL in IE8 from the homepage it works fine (unless I click open in a new tab, that it is back to ????).
I am using ASP.NET 3.5(C#)
Did you try to do encoding on first page and decode from Cyrillic to Unicode and back?
Little bit a headache but surly will work.
You should URL-encode the category name before adding it to the querystring, rather than relying on the browser to do that for you.
The method HttpServerUtility.UrlEncode should be able to handle this encoding for you.
HttpServerUtility.UrlEncode should give you the link http://www.odinklik.ru/kategoriya.aspx?cat=%D0%BB%D1%8E%D0%B4%D0%B8, which should give you the correct result.
(Note that %D0%BB corresponds to л, %D1%8E to ю, %D0%B4 to д and %D0%B8 to и. As the Unicode values for Cyrillic characters are over U+ff, you will require two URL-encoding bytes for each character.)
According to old AntiXss article on MSDN AntiXss.UrlEncode is used to encode link href (Untrusted-input in the following example):
Click Here!
My understanding was, that UrlEncode should be used only when setting something to URL, like when setting document.location with JS. So why don't I use HtmlAttributeEncode in the previous example to encode [Untrusted-input]? On the other hand is there a security flaw if I use UrlEncode to encode HTML attributes like in the above sample?
Url Encode encodes URL parameters for use in anchor tags.
Html Attribute encode encodes things for use in general HTML attributes.
Both encoding types vary - unsafe characters in HTML attribute encoding will be turned into a &xxx; form, in URL encoding they'll turn into %xxx. Whilst it's probably unlikely getting it wrong would cause a security problem your data wouldn't be properly rendered in the browser, or understood in a request.
(Indeed Url encoding is probably going to change because of an incompatibility with older browsers, and HTML Encoding will change in the next CTP drop to allow for safe listing of particular Unicode ranges).