invalid pixel in Firefox because of content charset setting in Netty server - http

I am developing an http server with Netty. On some occasions, the server must answer a 1x1 transparent pixel. So I hard-coded a GIF transparent pixel in base64, and returned it with the following code :
String pixel_string= new String (Base64.decodeBase64("R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="));
HttpResponse response = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK);
response.setContent(ChannelBuffers.copiedBuffer(pixel_string, CharsetUtil.UTF_8));
EDIT : I also set the content-type :
response.setHeader(HttpHeaders.Names.CONTENT_TYPE,
"image/gif");
In Chrome, everything is fine. However, Firefox tells me that it cannot display the pixel (which is pretty bad for my app), as the pixel data in invalid.
After many investigations, I finally figured out a fix, by changing the charset to Iso-8859-1.
response.setContent(ChannelBuffers.copiedBuffer(
responseBuilder.pixel_string, CharsetUtil.ISO_8859_1));
I don't understand why it works, which makes me think that I may run into troubles in some cases. I tried to change the Firefox preferences (to have UTF8 as default), but it doesn't change much.
Why does Firefox accept the ISO-8859 encoding, and not UTF-8 ? Can I change that ? Would someone have a clue on the origin of the issue and how to be sure that it will work whatever the user's setting ?
Thanks

It's not Firefox that's accepting the encoding or not. It's your server.
When you do your base64 decode you produce a string that contains some characters... but what you really produced was bytes that you're then thinking of as characters somehow. Since a Java String is a container that holds a UTF-16 string, in practice what you're doing is taking each byte, treating it as a a 16-bit integer and constructing the UTF-16 "string" made up of those code units.
But when you want to put all this on the network, you have to convert you string to bytes, and the argument to copiedBuffer says how to do that. If converting to UTF-8, any character that came from a byte that had the high bit set will end up getting encoded as a two-byte UTF-8 sequence. On the other hand, if converting to ISO-8859-1, the conversion just drops the high byte of each UTF-16 code unit (which in your case is always zero anyway).
So the conversion to ISO-8859-1 produces the actual byte array you got out of base64-decoding, while the conversion to UTF-8 produces.... something else which may or may not actually make any sense depending on the exact byte values.

The copiedBuffer constructor you call is not appropriate for the type of data (binary) you are using. According to the JavaDoc of the Netty API, the one you are calling is:
Creates a new big-endian buffer whose content is the specified string
encoded in the specified charset.
Which means that your binary data is being "converted" to UTF-8 (which is meaningless). If you try to save the generated file and look at it with a hex editor, you'll probably see that it is corrupted.
Try with something like this (untested code):
static byte[] pixel_data = Base64.decodeBase64("R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==");
HttpResponse response = ...
response.setHeader(HttpHeaders.Names.CONTENT_TYPE, "image/gif");
response.setContent(ChannelBuffers.copiedBuffer(pixel_data));

Related

Are there other encoding methods besides base64 that end in "="?

I've inherited a project where the previous developer is using an ASP object called ActiveCrypt.Crypt to encrypt the users password before sending it to the database.
The call uses the encryptvariant() function with a mode of 7, which the only documentation I can find indicates that the encrpytion is 3DES (company is now defunct). The problem is, that the value derived from the function appears to be a base64-encoded string (the trailing single and double "==" are a dead give-away).
Are there any other encodings that frequently end in "=" or "=="? Is anyone familiar with this ActiveCrypt object? I've tried 3DES encoding the password, with the key, then converting to base64, but with no luck. I've also tried inverting the key and the password in case the developer swapped the arguments. Any help would be appreciated.
Some examples using the key "key" (without quotes)
abcdefg: xiupz3RT148=
123456: iDLXPSPPjd4=
test: AWulSF10FR0=
1234567890: 8I48MAg9YWvE3y52VfMYew==
The encodings you show look like 8 and 16 bytes encoded with normal base64. Base64 encodes 3 bytes using 4 characters. DES and 3DES operate with block size of 8 bytes. So the sizes of the base64 text seem to reflect the block size. Furthermore, the output of the base 64 decoding looks fully random.
So after base64 decoding you will have 8 or 16 bytes, which you then will have to decrypt. The key is of course unknown to us, as is the block mode of operation and the padding mode. So you will have to find out those yourself. If the key is not given, it could be hard coded within the application.
Happy hunting.

Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.

dealing with an encrypted HttpUtility.UrlEncode parameter

I have a problem dealing with encrypted URL parameters when applying HttpUtility.UrlEncode or UrlDecode.
for a given url string: ?fid=7kqguwhYMNw=&uid=YCRSGG71+58=
the PLUS sign which is part of the encrypted data of uid is stripped out and replaced with a space so my attempts to decrypt it fail.
OK, so I know that the + is a reserved shorthand for space in QUERYSTRING(RFC 1630) but since I don't have too much control over the value that is returned from encryption how can I get around this.
EDIT:
OK, so good point brought up. Ignore the UrlEncode/UrlDecode part of the question. Request.QueryString(["uid"]) will still have the plus sign stripped out of it when I pass it to my decryption method.
I would suggest adding code to remove the = characters, replace + with -, and replace / with .
s = s.Replace("=", "").Replace("+", "-").Replace("/", ".")
If you need to process the resulting string, you can do the reverse:
s = s.Replace(".", "/").Replace("-", "+")
(there is no reason to put back the = characters... they are merely padding).
That way you don't need to worry about URL encoding and decoding and it avoids unnecessary expansion of your string. It also looks more professional to users if they end up seeing the URL... percent signs in URL are ugly and almost always unnecessary... it screams "amateur" whenever I see them.
The Base-64 encoded value needs to be URL-encoded before it is put in the URL. If I do HttpUtility.UrlEncode("YCRSGG71+58=") then I get YCRSGG71%2b58%3d - which has no plus signs, and can be correctly decoded.
In other words, the code that is putting a base-64 value on the URL without encoding it first is wrong. If you control that code, you should change it. If you don't control that code, then don't try to decode something that wasn't url-encoded in the first place.
As a side remark, you should use HttpUtility.UrlEncode and HttpUtility.UrlDecode for this kind of work. However, even these wont help you since the URL is malformed anyway.
So, don't use anything at all! Since it's not encoded, why decode it?

Place images byte into String is not working?

I tried on Flex 3, facing issue with uploading JPG/PNG image, trace readUTFBytes would return correct bytes length but tmpFileContent is trucated, it would only appear to have upload just 3 characters of data to the server through PHP script which made image unusable. I have no issue for non-images format. What is wrong here?
var tmpFileContent:String = fileRef.data.readUTFBytes(fileRef.data.length);
Is String capable of handle bytes?
I'm not sure what you're looking to do with the image, but you might want to read this:
http://livedocs.adobe.com/flex/3/html/help.html?content=Filesystem_15.html
You may also need a image encoder such as the JPEGEncoder: http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/mx/graphics/codec/JPEGEncoder.html
You could always encode using base64:
var enc:Base64Encoder = new Base64Encoder();
enc.encodeBytes(fileRef.data);
var base64data:String = enc.drain();
The method used in the tutorial is not going to work safely for anything but text files. An arbitrary binary format is likely to contain zeros. A zero (a byte whose value is 0) is generally considered a string terminator in many languages / platforms. This is also the case in Actionscript as this code shows:
var str:String = "abc\x00def";
trace(str);
The string will be truncated to "abc", since 0x00 is considered to mark the end of a string.
I think your best bet is to encode the content to base 64 as maclema suggested. From the php side, decode it back before writting the file with something like:
file_put_contents($myFilePath, base64_decode($fileData["filedata"]));
Also, I can't remember if file_put_contents is binary safe (I think it's not). If that's the case, you should use fopen('you_path',"wb"), fwrite() and fclose() to write the file. Notice the "b" in "wb", which stands for binary. If you don't pass that flag you'll probably have problems with some characters (newline and carriage return, for example).
Added:
Perhaps, following davr suggestion, you could try sending the data ByteArray to see if AMFPHP handles it correctly.
Php does allow embbeded Nuls in strings as this code shows:
$str = "a\x00b";
var_dump(ord($str{0})); // 97
var_dump(ord($str{1})); // 0
var_dump(ord($str{2})); // 98
So, if AMFPHP converts the bytearray to a string and does not mangle it in the process, this could actually work.
// method saves files on the server
function uploadFiles($fileData) {
// new file path an name
// to not overwrite the files we add the microtime before the file name
$myFilePath = '../../_uploads/'.
preg_replace("/[^0-9]+/","_",microtime()).'_'.$fileData["filename"];
// writing on the disk
$fp = fopen($myFilePath,"wb");
if($fp) {
fwrite($fp,$fileData["filedata"]);
fclose($fp);
}
// returning response - is not used anywhere
return true;
}
Otherwise, try echoing var_dump($fileData['filedata']) to see what the actual type AMFPHP is converting the data to (perhaps it uses an array, not sure; given how strings work in php (much like a buffer of single byte characters, though, I think it could be just using strings).

when assigning location.href, please explain url encoding (in asp.net and firefox)

In some javascript, I have:
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
alert( url );
location.href = url;
where the value of address is the string "Seattle, WA".
In the alert I see
find.aspx?Seattle%2C%20WA
as I expect.
But on the server side, when I look at Request.Url, the relevant substring I see is
find.aspx?Seattle, WA
And in the Firefox url window I see
find.aspx?location=Seattle%2C WA
So I'm getting three different representations whereas I would expect that in all three places I should see what I see in the alert. My expectation is that the url I assign to location.href should show up as-is in the browser url window, and should be passed as-is to the server in Request.Url (and I would need to decode the values on the server before using them). What's happening?
Firefox converts certain encoded characters into their literal forms as a way to be friendly to users. It will also convert spaces typed into the address bar into %20 for the server.
Update: The reason Firefox doesn't display the comma unencoded is because commas are allowed in URLs, but spaces are not, so it knows that a space is going to be unambiguously interpreted, whereas the pre-encoded comma is different from a non-encoded comma to some servers. see: Can I use commas in a URL?
ASP is probably trying to help you out by auto-un-encoding the string for you.
Update: It looks like ASP.NET unencodes Request.Url for you by default, as mentioned here: QueryString malformed after URLDecode They also mention that you can use HttpRequest.Url.Query to access the un-decoded version.
The alert is the only thing not doing any "magic" for you.
For the alert, you are doing the encoding yourself. Perhaps it looks the same as on the server-side if you removed encodeURIComponent.
On the server side, ASP.NET will always show you the unencoded form. This is to make it easier to directly map to files that also have text that needed to be (un)encoded.
Note that you can replace every letter for its UTF8 representation in URL Encoding. It will still be the same URL. I.e., type the following in the browser window and it will still work: %66%59%6E%64.aspx?location=Seattle%2C%20WA. To only encode the necessary chars, use UrlEncode on the server side if you create a link yourself.
URL encoding can become fairly tricky. You ask to explain it. To know the correct escape of a certain character, you need to know how that character looks in UTF8. The hexadecimal value of the UTF-8 bytes then become the %XX%YY value of your letter. Sometimes it's one %XX, but it can be up to six byte sequences in total (some Chinese characters for instance).
URL Encoding works one way only. Never double-encode or double-unencode. This is prohibited by the specification. Also, because you can encode any character, it is not always possible (as you found out) to do roundtrip encoding/unencoding. If you unencode and re-encode again, it is well possible that the resulting string is different, but syntactically the same.
In HTML, URL Encoding is sometimes interspersed with HTML Encoding. I.e., the ampersand is valid in HTML, but not in HTML. find.aspx?city=A&name=B becomes find.aspx?city=A&name=B in and HTML URL. However, browsers are lenient and will accept wrongly HTML-encoded strings.
Finally, a not on the browser: if you type in a space in a link, even inside an <a> tag, it will escape the space (or other character) for you. Likewise, it will nowadays show the odd characters (é, ï etc) in the address bar, but when it sends it over HTTP, the browser will correctly do the encoding for you.
Update: about anwering your question of needing a "definitive" reference or proof.
While I couldn't find any on the internet, I decided to look for it myself using Reflector. Going through the methods that set, for instance, the HttpRequest.QueryString, you quickly encounter the private method HttpRequest.FillInQueryStringCollection which then calls HttpValueCollection.FillfromEncodedBytes. Somewhat near the end of that method, HttpUtility.UrlDecode is called for the values. Conclusion: do not call it yourself, to prevent double decoding.
You can see this for yourself when you download Reflector and disassemble the .NET libs of System.Web.
For your example you can change this line
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
to
var url = "find.aspx?" + "location=" + address;
and see the address as it is. Bu if address variable contains any '&' character your variable will be corrupt. So you are using encodeURIComponent to encode these things url.
On the Server side all these encoded strings are decoded back. It means encodeURIComponent is just for sending the address variable (whether it contains & character or not) to server side correctly.

Resources