Why do cookie values with whitespace arrive at the client side with quotes? - servlets

I'm a .NET developer starting to dabble in Java.
In .NET, I can set the value of a cookie to a string with white space in it:
new HttpCookie("myCookieName", "my value") - and when I read that value on the client side (JavaScript), I get the value I expected (my value).
If I do the same thing in a Java servlet - new Cookie("myCookieName", "my value"), I get the value including the double quotes ("my value").
Why the difference? Am I missing something? How do people handle this in the Java world? Do you encode the value and then you decode on the client side?

When you set a cookie value with one of the following values as mentioned in Cookie#setValue(),
With Version 0 cookies, values should not contain white space, brackets, parentheses, equals signs, commas, double quotes, slashes, question marks, at signs, colons, and semicolons. Empty values may not behave the same way on all browsers.
then the average container will implicitly set the cookie to version 1 (RFC 2109 spec) instead of the default version 0 (Netscape spec). The behaviour is not specified by the Servlet API, the container is free to implement it (it may for example throw some IllegalArgumentException). As far as I know, Tomcat, JBoss AS and Glassfish behave all the same with regard to implicitly changing the cookie version. For at least Tomcat and JBoss AS this is the consequence of fixes for this security issue.
A version 1 cookie look like this:
name="value with spaces";Max-Age=3600;Path=/;Version=1
while a version 0 compatible cookie look like this:
name=value%20with%20spaces;Expires=Mon, 29-Aug-2011 14:30:00 GMT;Path=/
(note that an URL-encoded value is valid for version 0)
Important note is that Microsoft Internet Explorer doesn't support version 1 cookies. Even not the current IE 11 release. It'll interpret the quotes being part of the whole cookie value and will treat and return that accordingly. It does not support the Max-Age attribute and it'll ignore it altogether which causes that the cookie's lifetime defaults to the browser session. You was apparently using IE to test the cookie handling of your webapp.
To support MSIE as well, you really need to URL-encode and URL-decode the cookie value yourself if it contains possibly characters which are invalid for version 0.
Cookie cookie = new Cookie(name, URLEncoder.encode(value, "UTF-8"));
// ...
and
String value = URLDecoder.decode(cookie.getValue(), "UTF-8"));
// ...
In order to support version 1 cookies for the worldwide audience, you'll really wait for Microsoft to fix the lack of MSIE support and that the browser with the fix has become mainstream. In other words, it'll take ages (update: as of now, 5+ years later, it doesn't seem to ever going to happen). In the meanwhile you'd best stick to version 0 compatible cookies.

As far as I know, spaces must be encoded in cookies. Different browsers react differently to un-encoded cookies. You should URL-encode your cookie before setting it.
String cookieval = "my value";
String cookieenc = URLEncoder.encode(cookieval, "UTF-8");
res.addCookie(new Cookie("myCookieName", cookieenc));
ASP.NET does the encoding automatically, in Java you have to do it yourself. I suspect the quotes you see are added by the user agent.

It probably has to do with the way Java encodes the cookie. I suggest you try calling setVersion(1) on the new cookie and see if that works for you.

Try using setVersion(0).
HttpCookie cookie = new HttpCookie("name", "multi word value");
System.out.println(cookie.toString());
prints:
name="several word value"
But after setting
cookie.setVersion(0);
System.out.println(cookie.toString());
prints:
name=several word value
Encoding is a good idea too, but the quotes around the value look to be an independent issue.

Related

Cookie without a name?

What is going on with the following cookie:
"=value"
In Chrome and Firefox this is identical to:
"value"
i.e. the value for empty cookie name becomes a cookie name.
Is there any official reason for this behavior?
It looks like a bug, since rfc says:
If the name string is empty, ignore the set-cookie-string entirely.
The cookie RFC standards are a bit vague and contradictory in places, and have also changed behaviour over various revisions. Consequently, the browsers also have varying behaviour as far as the requirements for cookies. So in short, for some browsers an empty cookie name is fine, for others not. If this is an app you're building (that you want to work across the various browsers) then you'd be probably safest setting a cookie name.
https://www.rfc-editor.org/rfc/rfc6265#section-5.2
5. If the name string is empty, ignore the set-cookie-string
entirely.
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-rfc6265bis-05#section-5.3
2. If the name-value-pair string lacks a %x3D ("=") character, then
the name string is empty, and the value string is the value of
name-value-pair.
Otherwise, the name string consists of the characters up to, but
not including, the first %x3D ("=") character, and the (possibly
empty) value string consists of the characters after the first
%x3D ("=") character.
I stumbled upon the same question today.
To clarify the answer of #buffoonism ...
https://stackoverflow.com/a/72250741/2323764
The set-cookie header must be ignored.

Double quotes and trailing equal sign missing from cookies values in ServletExec

I have a servlet app which stores cookies as base64-encoded strings. On a server where the app is running on ServletExec, the cookies' values are not wrapped in quotes. Additionally, if the value ends with a '=' character, that character is removed. The missing quotes and trailing '=' prevent the cookies' values from being parsed properly. In 2 other servers where this app is running on ServletExec and Tomcat where this app is working, the cookies are wrapped in double quotes and the trailing '=' sign is not removed.
As seen in a browser's developer tool:
Bad - cookiename:dGVzdHN0cmluZzE
Expected - cookiename:"dGVzdHN0cmluZzE="
Any idea what's stripping out the quotes and the trailing '=' sign? TIA!
By default, the servlet Cookie class follows the Version 0 cookie spec. Here's a cite from the javadoc:
This class supports both the Version 0 (by Netscape) and Version 1 (by RFC 2109) cookie specifications. By default, cookies are created using Version 0 to ensure the best interoperability.
Version 0 cookie values are restrictive in allowed characters. It only allows URL-safe characters. This covers among others the alphanumeric characters (a-z, A-Z and 0-9) and only a few lexical characters, including -, _, ., ~ and %. All other characters are invalid in version 0 cookies, including " and =. If the server doesn't already do it, the browser will swallow the invalid characters.
Your best bet is to URL-encode those characters. This way every character which is not allowed in URLs will be percent-encoded in this form %xx which is valid as cookie value.
So, when creating the cookie do:
Cookie cookie = new Cookie(name, URLEncoder.encode(value, "UTF-8"));
// ...
And when reading the cookie, do:
String value = URLDecoder.decode(cookie.getValue(), "UTF-8");
// ...
An alternative is to switch to Version 1 cookies via Cookie#setVersion(), but this isn't supported in IE<=11.

How to create cookie without quotes around value?

I need to create cookie with e-mail address as value - but when I try to - then I have result:
"someone#example.com"
but I would like to have:
someone#example.com
The cookie should be created without double quoted marks - because other application uses it in such format. How to force java to not to add double quoted? Java adds them because there is special char "at".
I create the cookie that way:
HttpServletResponse response = (HttpServletResponse) FacesContext.getCurrentInstance().getExternalContext().getResponse();
Cookie cookie = new Cookie("login", "someone#example.com");
cookie.setMaxAge(2592000);
cookie.setDomain("domain.com");
cookie.setVersion(1);
response.addCookie(cookie);
Thanks for any help.
It's indeed caused by the # sign. This is not allowed in version 0 cookies. The container will implicitly force it to become a version 1 cookie (which breaks in MSIE browsers). You'd like to URL-encode the cookie value on cookie's creation
Cookie cookie = new Cookie("login", URLEncoder.encode("someone#example.com", "UTF-8"));
cookie.setMaxAge(2592000);
cookie.setDomain("domain.com");
response.addCookie(cookie);
and URL-decode it on cookie reading
String value = URLDecoder.decode(cookie.getValue(), "UTF-8");
Note that you should for sure not explicitly set the cookie version to 1.
See also:
Why do cookie values with whitespace arrive at the client side with quotes?
Unrelated to the concrete problem, cookies are visible and manipulatable by the enduser or man-in-the-middle. Carrying the email address around in a cookie is a bad smell. What if the enduser changes it to a different address? Whatever functional requirement (remembering the login?) you thought to solve with carrying the email address around in a cookie should most likely be solved differently.
See also:
How do I keep a user logged into my site for months?

Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.

asp.net and cookies special characters

Have found very interesting issue in asp.net with cookies:
when adding cookie with value like test&
using
HttpCookie cookie = new HttpCookie("test", "test&");
Response.Cookies.Add(cookie);
and then trying to retrieve value Request.Cookies["test"] trailing ampersand is lost. If it is not trailing it is not lost. In firebug or javascript data is correct so it is asp.net specific I think.
Of course mostly could say just use UrlEncode. But is it really necessary? Is there any list of disallowed charters for cookies (because I think it is smaller than for URLs)?
I have found similar topic but there is no & symbol in restricted list:
Allowed characters in cookies
The ampersand is not an allowed character in a cookie. It's necessary to encode the cookie data with the UrlEncode method.
System.Web.HttpUtility.UrlEncode(cookie);
See also these SO questions/answers:
Broken string in cookie after ampersand (javascript)
How do you use an Ampersand in an HTTPCookie in VB.NET?

Resources