I have a JavaScript request going to a ASP.Net (2.0) HTTP handler which passes the request to a java web service. In this system special characters, such as those with an accent do not get passed on correctly.
E.G.
Human input: Düsseldorf
becomes a JavaScript asynch request to http://site/serviceproxy.ashx?q=D%FCsseldorf, which is valid in ISO-8859-1 as well as in UTF-8 as far as I can tell. (unless it's %c3%bc in UTF-8)
HttpContext.Current.Request.QueryString.Get("q") returns D�sseldorf which is where trouble begins.
but HttpUtility.UrlEncode(HttpContext.Current.Request.QueryString.Get("q"), Encoding.GetEncoding("ISO-8859-1")) returns D%3fsseldorf (a '?')
and HttpUtility.UrlEncode(HttpContext.Current.Request.QueryString.Get("q"), Encoding.UTF8) returns D%ef%bfsseldorf
So it the value doesn't get decoded nor re-encoded correctly to be passed on to the java service.
Notice HttpContext.Current.Request.Url.Query is ?q=D%FCsseldorf&output=json&from=1&to=10
while HttpContext.Current.Request.QueryString.ToString() is q=D%ufffdsseldorf&output=json&from=1&to=10
Why is this, and how can I tell the HttpContext to honor the request headers which include:
Content-Type=application/x-www-form-urlencoded;+charset=UTF-8
and decode the URL's QueryString using the UTF-8 charset.
Addendum: As the answer notes, the trouble lies not so much in the decoding as the encoding; using escape() in JavaScript does not escape according to UTF-8, while using encodeURIComponent() does.
I don't know what the default character encoding used by your server (IIS?) is, or if it can be changed, but I can tell you a few things that might help.
0xFC is the ISO-8859-1 encoding for ü. While the Unicode code point is U+00FC, when encoded with UTF-8, this requires two bytes, and becomes 0xC3 0xBC.
If a UTF-8 decoder were to see the illegal byte sequence 0xFC, it would decode it as a Unicode "replacement character", U+FFFD, and pick up where it saw the beginning of another valid byte sequence, in this case 's'.
The reason you get %3f is that '?' is the "replacement character" for the Latin character set, similar to � in the Unicode character set.
I believe what you're seeing is the client encoding with ISO-8859-1, but the server is decoding with UTF-8. As soon as it hits the server, your data is corrupted. I recommend that you modify the client to use UTF-8 encoding; it should be requesting http://site/serviceproxy.ashx?q=D%C3%BCsseldorf
It sounds like you are constructing these URLs from JavaScript, so you should use the encodeURI and encodeURIComponent functions, not escape.
I am getting the same problem with an ASP.NET generic handler when the URL is typed directly into IE8. Characters are being sent through as char 65533, and yet I do have IE8 set to
[x] Send UTF-8 URLs.
In my scenario, I'm debugging an HTTP handler in Visual Studio and typing the address of the handler directly into the browser:
http://localhost/myHandler.ashx?term=xxxxxx
and then stepping through the code. The client will be passing UTF-8 encoded URLs, but is there a way to debug the code when IE8 running on the development machine is the client?
Related
Hi Guys Im currently capturing some traffic from an android application, but the requests it is sending out to server seems encrypted. Would you guys know how to decrypt such requests? Or is that impossible to do?
platform=android&version=1.0.31&lang=en&requestId=44&time=1485552535566&batch=%5b%7b%22id%22%3a177205%2c%22time%22%3a1485552512601%2c%22name%22%3a%22collectResource%22%2c%22params%22%3a%5b155%5d%2c%22hash%22%3a1948904473%7d%5d&sessionId=674937_bc59a16eae9e1559b2e60ae068baf4e7
That's not encrypted, it's encoded. Do a search for "online url decode". In your example you will get:
platform=android&version=1.0.31&lang=en&requestId=44&time=1485552535566&batch=[{"id":177205,"time":1485552512601,"name":"collectResource","params":[155],"hash":1948904473}]&sessionId=674937_bc59a16eae9e1559b2e60ae068baf4e7
The %xx are url encoded hex values. For example %22 is the hex version of the double quote character. I think that if you use javascript or other tool to decode the url encoding or manually change all % strings to the equivalent characters, you will see that the message is really just url encoded plain text.
I am using ruby to send a SOAP request to a very enterprisey bla bla service, so unfortunately I can not attach any samples, there's nobody to send any server-side logs, nobody knows whats wrong on the provider side or how the actual HTTP requests need to look like (except a single XML example I got, but no HTTP headers), the docs are very Microsoft-centric with C# examples and whatnot ("instantiate AbstractFactoryFactory..." and whatnot), long live enterprise software.
But the bottom line is, eventually I took one of their own XMLs from their logs and sent it via HTTP to the endpoint from the WSDL and sent it to their host using the Savon gem raw XML option and got a HTTP 500 error from their host and a bunch of non-ascii binary data inside - literally, no ASCII characters are in the body.
I guessed that maybe Savon does some bad magic or that the XML option is not working as expected and I tried sending the same request via Faraday, but got the same thing,
the HTTP response headers says it's a HTTP response, XML encoded, from an ASP.NET host:
"content-type"=>"text/xml; charset=utf-8",
"server"=>"Microsoft-IIS/7.5",
"x-aspnet-version"=>"2.0.50727",
"x-powered-by"=>"ASP.NET",
but again, a 440 bytes worth of binaries in the response:
method=:post,
body=
"\x1F\x8B\b\x00\x00...
etc.
Am I missing some weird aspect of the SOAP specification and I need to do something to decode this data or has their server gone bonkers from my XML, HTTP headers or something else and I need to ping the provider?
Update 1
I noticed that their original XML had UTF-16 encoding set, so I tried encoding the raw string to UTF-16, then had Savon spew errors at me about bad data, then I updated encoding in the Savon client config. But I still get HTTP 500 error and binaries as response and if I try to log anything Savon reports a bug:
Encoding::CompatibilityError: incompatible encoding regexp match (US-ASCII regexp with UTF-16 string)
from /home/bbozo/.rvm/gems/ruby-2.2.4/gems/savon-2.11.1/lib/savon/log_message.rb:13:in `to_s'
Faraday basically reported the same behavior, an binary blob.
Update 2
I tried piping the encoding to every known encoding, and got nothing, even though the HTTP headers imply the encoding is UTF-8, it obviously isn't
Encoding.name_list.map{ |e_in| [ e_in, ( response.body.dup.force_encoding(e_in).encode('utf-8') rescue 'incompatible' ) ] }
There is nothing that would indicate the encoding in the WSDL files, the API spec doesn't even mention encoding except that the request XMLs need to be UTF-8 encoding, I tried encoding the body, changing the XML encoding definition, HTTP headers, but still I get the same binary blob, with the same heading (\x1F\x8B\b\x00\x00) - so it's not some weird encryption either.
Compression maybe?
I tried with https for good measure and nothing.
Question
Am I missing some weird aspect of the SOAP specification and I need to do something to decode this data or has their server gone bonkers from my XML, HTTP headers or something else and I need to ping the provider?
The response body was compressed! In the end I just gunzipped it and there it was,
How to decompress Gzip string in ruby?
I encountered some problems about character encoding recently.
When I tried to fire a HTTP GET request, which contains some non-ascii characters in the query string, I found that the server could not decode the parameters correctly.
My current solution is to configure the server.xml of tomcat, adding the attribute URIEncoding="utf-8" to the <Connector> element.
Well, it solves the problem. But my question is: What if the URL is not encoded with utf-8?(Like some ANSI encoding, you can do that, right?)
Is there a way for the server to figure out what encoding the URL is using other than just setting a fixed value?
PS: I know some basics of character encoding and the differences between UTF-8 and Unicode.
The server dictates the charset(s) it will accept for (percent-encoded) URLs to its resources. If the client sends a URL in the wrong charset, it will not work correctly. There is no protocol to allow the server to advertise its desired charset(s), though. So it is kind of a catch-22. If the URL originates from an HTML page, use the charset of the HTML. Otherwise you just have to guess, and you will probably guess wrong, if the server does not accept UTF-8.
I am trying for a way to pass binary data to a server over http, via the URL field in the browser. Is there a way to bypass the automatic http encoding done by the browser so I can just encode the data by myself.
e.g.: Instead of the byte with value 48, to fill in the URL %30 so that the browser doesn't re-encode the url and I end up with %2530
Solved: To whom may encounter similar problems in the future. You can do so by using wget parameter
--restrict-file-name=ascii
Which basically ensures that '%' won't be escaped
Use base64 encoding, that's what it's designed to do.
I managed to do so, by writing my own tcp client to connect to the http server and transmit the request, by inputting it manually.
Use the base62 encoding.
The encoded string doesn't contain any character that will be URL-encoded.
I get some string data from a webservice in utf-8. How do I convert it in an aspx vb to a readable format? The website is german.
UTF-8 is readable. ASP.NET should be able to read it just fine. If it's transmitted with a Content-Type whose charset parameter is set to something other than UTF-8 you might need to instruct ASP.NET to force the decoding to UTF-8. Use Fiddler and figure out how the HTTP request looks like and pay special attention to the Content-Type parameter.
If you have a different output-encoding than UTF-8, you should still be able to output the characters correctly if you decode them with the correct encoding. What is your output encoding? What encoding is the web service you're communicating with using? Figure the answer to these questions (using Fiddler) and your solution should be obvious.