I see some sites, like stackoverflow, use packed cookies where multiple cookies are packed into one. Here's an example:
Set-Cookie: acct=t=&s=; domain=.stackapps.com; expires=Mon, 30-May-2016 20:16:22 GMT; path=/; HttpOnly
Is this just to save sending multiple set-cookie headers, and to avoid sending comma separated cookies on the one set-cookie header? That's allowed--but is it not recommended?
Should the packed cookie just be treated as a single cookie, or does it need to be unpacked and sent back as individual cookies?
I do not know from where the idea of "packed" came about. Those are just cookies with the = sign in the value, or at least should be according to the specs. Let us go through the RFCs and see that:
Set-Cookie: acct=t=&s=; domain=.stackapps.com; expires=...
is exactly the same as
Set-Cookie: acct="t=&s="; domain=.stackapps.com; expires=...
Therefore, it is a single cookie and shall be treated as such.
The answer is rather long, sorry for that. I tried to aim it at people who find the grammar rules found in the RFCs difficult to understand. If you believe that some piece of the grammar is still difficult to understand please point it to me in a comment.
Through the RFCs
The current RFC for the Set-Cookie header is RFC6265, in section 4.1 it has the formal syntax for Set-Cookie:
set-cookie-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string = cookie-pair *( ";" SP cookie-av )
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
token = <token, defined in [RFC2616], Section 2.2>
cookie-av = expires-av / max-age-av / domain-av /
path-av / secure-av / httponly-av /
extension-av
expires-av = "Expires=" sane-cookie-date
sane-cookie-date = <rfc1123-date, defined in [RFC2616], Section 3.3.1>
max-age-av = "Max-Age=" non-zero-digit *DIGIT
; In practice, both expires-av and max-age-av
; are limited to dates representable by the
; user agent.
non-zero-digit = %x31-39
; digits 1 through 9
domain-av = "Domain=" domain-value
domain-value = <subdomain>
; defined in [RFC1034], Section 3.5, as
; enhanced by [RFC1123], Section 2.1
path-av = "Path=" path-value
path-value = <any CHAR except CTLs or ";">
secure-av = "Secure"
httponly-av = "HttpOnly"
extension-av = <any CHAR except CTLs or ";">
That is a little terse but we do not need to got through it all. For a start we have the Set-Cookie: header and a space (SP), then the set-cookie-string which is defined further.
set-cookie-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string is composed of a cookie-pair (defined further), which is the grammar part that interests us, and optionally a set of any number of cookie-av prefixed with ; and a space. The *() construct allows for any number of occurrences (including zero) of the grammar part.
set-cookie-string = cookie-pair *( ";" SP cookie-av )
cookie-av defines the metadata that can be used in the cookie but it is not needed for our proof, therefore we will abandon its discussion.
The cookie-pair on the other hand is a very simple construct: one cookie-name one mandatory = sign and one cookie-value.
cookie-pair = cookie-name "=" cookie-value
The cookie-name is defined as a token which leads us to another RFC, RFC2616. In the section 2.2 of that RFC we find the basic rules that define the token.
cookie-name = token
token = <token, defined in [RFC2616], Section 2.2>
token definition:
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
...
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
The 1*<> syntax means any number of occurrences but at least one occurrence. To find the CTLs use man ascii and check the Dec column, SP is space (as we already saw) and HT is the horizontal tab (9 in the ascii table).
The interesting part for us is the fact that a token cannot contain an = character.
Back to RFC6265:
cookie-pair = cookie-name "=" cookie-value
cookie-name stops at the first = character, that first = character is always the = explicit in the grammar. Now, let's finally define the cookie-value
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
We already saw that the * there means any occurrences including zero (note that empty cookies are allowed by the RFC!). The interesting part the entire cookie-value can be enclosed by double quotes (DQUOTE is the double quote character as you might have guessed).
But the most interesting part is that the = sign (x3D in the ascii) table is allowed as a cookie-octet
/ %x3C-5B / <- right there!
Yet the space (x20) and semicolon (x3B) are disallowed.
Conclusion
Therefore this Set-Cookie header shall be interpreted as
Set-Cookie: acct=t=&s=; domain=.stackapps.com; expires=...
cookie-set-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string = cookie-pair *(";" cookie-av)
cookie-pair = cookie-name "=" cookie-value
cookie-name = "acct"
cookie-value = "t=&s="
And the header sending it back to the server shall be
Cookie: acct=t=&s=
Sending it as follows violates the RFC:
Cookie: acct=t&; s=
Related
According to what I have researched there are no illegal characters in the PWD= field of a SQL Server Connection String.
However, using SQL Server Express 2008 I changed the SA password to a GUID, specifically:
{85C86BD7-B15F-4C51-ADDA-3B6A50D89386}
So when connecting via ODBC I use this connection string:
"Driver={SQL Server};Server=.\\MyInstance;Database=Master;UID=SA;PWD={85C86BD7-B15F-4C51-ADDA-3B6A50D89386};"
But it comes back as Login failed for SA.
However, if I change the SA password to something just as long but without {}- it succeeds! Are there certain characters in PWD= that need to be escaped? I tried all different combinations with no luck.
As Microsoft's documentation states (emphasis added) --
Connection strings used by ODBC have the following syntax:
connection-string ::= empty-string[;] | attribute[;] | attribute; connection-string
empty-string ::=
attribute ::= attribute-keyword=[{]attribute-value[}]
attribute-value ::= character-string
attribute-keyword ::= identifier
Attribute values can optionally be enclosed in braces, and it is good practice to do so. This avoids problems when attribute values contain non-alphanumeric characters. The first closing brace in the value is assumed to terminate the value, so values cannot contain closing brace characters.
I would suggest you simply remove the braces when you set the password, and then the connect string you provided above should work fine.
ADDITION
I dug a bit further on Microsoft's site, and found some ABNF rules which may be relevant --
SC = %x3B ; Semicolon
LCB = %x7B ; Left curly brackets
RCB = %x7D ; Right curly brackets
EQ = %x3D ; Equal sign
ESCAPEDRCB = 2RCB ; Double right curly brackets
SpaceStr = *(SP) ; Any number (including 0) spaces
ODBCConnectionString = *(KeyValuePair SC) KeyValuePair [SC]
KeyValuePair = (Key EQ Value / SpaceStr)
Key = SpaceStr KeyName
KeyName = (nonSP-SC-EQ *nonEQ)
Value = (SpaceStr ValueFormat1 SpaceStr) / (ValueContent2)
ValueFormat1 = LCB ValueContent1 RCB
ValueContent1 = *(nonRCB / ESCAPEDRCB)
ValueContent2 = SpaceStr / SpaceStr (nonSP-LCB-SC) *nonSC
nonRCB = %x01-7C / %x7E- FFFF ; not "}"
nonSP-LCB-SC = %x01-1F / %x21-3A / %x3C-7A / %x7C- FFFF ; not space, "{" or ";"
nonSP-SC-EQ = %x01-1F / %x21-3A / %x3C / %x3E- FFFF ; not space, ";" or "="
nonEQ = %x01-3C / %x3E- FFFF ; not "="
nonSC = %x01-003A / %x3C- FFFF ; not ";"
...
ValueFormat1 is recommended to use when there is a need for Value to contain LCB, RCB, or EQ. ValueFormat1 MUST be used when the Value contains SC or starts with LCB.
ValueContent1 MUST be enclosed by LCB and RCB. Spaces before the enclosing LCB and after the enclosing RCB MUST be ignored.
ValueContent1 MUST be contained in ValueFormat1. If there is an RCB in the ValueContent1, it MUST use the two-character sequence ESCAPEDRCB to represent the one-character value RCB.
All of which comes down to... I believe the following connect string should work for you (note that there are 2 left/open braces and 3 right/close braces on the PWD value) --
"Driver={SQL Server};Server=.\\MyInstance;Database=Master;UID=SA;PWD={{85C86BD7-B15F-4C51-ADDA-3B6A50D89386}}};"
According to this page, the only legal "special character" in a name (I think they're talking about the DSN) is the UNDERSCORE:
The ODBC specification (and the SQL specification) states that names
must be in the format of " letter[digit | letter | _]...". The only
special character allowed is an underscore.
There was no reference to "the ODBC Specification". This page says it's the the ODBC 4.0 Spec.
I read the rfc7230 section 3.2. After removing obsolete rules, the spec about header field is:
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *field-content
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR
VCHAR = %x21-7E; visible (printing) characters
I am confused by the definition of field-content. It seems that it matches 1 or 2 VCHARs, with any amount of space in between, but it will not match another space after a field-content match.
For example, for name:a<sp>b<sp>c, field-name will match name, but field-content will match a<sp>b and then the next <sp> cannot be matched by another field-content, thus this header is invalid.
However, name:a<sp>bc<sp>d is valid because there are two matches for field-content, a<sp>b and c<sp>d.
I think this is inconsistent. Is this intended or do I misunderstood something?
I know this is an old question, but :
The updated RFC 9110 Section 5.5 still holds this ambiguity.
Therefore, i would suggest sticking to the explaination described here.
Quote from https://www.rfc-editor.org/rfc/rfc5987#section-3.2.1:
In order to include character set and language information, this
specification modifies the RFC 2616 grammar to be:
parameter = reg-parameter / ext-parameter
reg-parameter = parmname LWSP "=" LWSP value
ext-parameter = parmname "*" LWSP "=" LWSP ext-value
parmname = 1*attr-char
ext-value = charset "'" [ language ] "'" value-chars
; like RFC 2231's <extended-initial-value>
; (see [RFC2231], Section 7)
charset = "UTF-8" / "ISO-8859-1" / mime-charset
mime-charset = 1*mime-charsetc
What does * mean in parmname = 1*attr-char? And also the same question at mime-charset = 1*mime-charsetc.
What I have known is that "*" mean exactly * itself in ext-parameter = parmname "*" LWSP "=" LWSP ext-value, due to the fact that the RFC show an example latter of ext-parameter = parmname "*" LWSP "=" LWSP ext-value:
title*=iso-8859-1'en'%A3%20rates
Its a quantifier that describes the valid number of repetitions.
"1*element" requires at least one element.
See RFC 2616 section 2.1 - Augmented BNF:
*rule
The character "*" preceding an element indicates repetition. The
full form is "<n>*<m>element" indicating at least <n> and at most
<m> occurrences of element. Default values are 0 and infinity so
that "*(element)" allows any number, including zero; "1*element"
requires at least one; and "1*2element" allows one or two.
The spec you quoted says:
This specification uses the ABNF (Augmented Backus-Naur Form)
notation defined in [RFC5234]. The following core rules are included
by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters),
DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP
(linear whitespace).
Go to RFC 5234 and you'll find https://www.rfc-editor.org/rfc/rfc5234#section-3.6
We are working with Apache Tomcat 7 and trying to setup the Valve Component to store our access logs, ready for processing in SnowPlow.
The problem we have is how to make these logs robust. To give an example - we can separate fields with tabs and extract the user agent string like so:
pattern="%{yyyy-MM-dd}t %{hh:mm:ss}t %{User-Agent}i "
The problem is that the Valve Component does not (as far as I can see) escape %{User-Agent}i, so a stray tab in a useragent will corrupt the data (row will look like it contains four fields, not three).
As far as solutions, unless there's a way of escaping the useragent which I've missed, I can see a couple of solutions:
Use a really obscure field delimiter (or combination of field delimiters) which is very unlikely to crop up in a useragent string. We tried Ctrl-A (HTML ?) but that didn't seem to work
Write a custom AccessLogValve which either supports escaping or sanitizes tabs - perhaps similar to this post Sanitizing Tomcat access log entries
A bit puzzled that I can't find anything else about this online - does nobody parse their Tomcat access logs?
What do you recommend? We're a little stuck...
RFC2616 defines user agent string as
User-Agent = "User-Agent" ":" 1*( product | comment )
Then product is defined as
product = token ["/" product-version]
product-version = token
Following this, tokens are defined as
token = 1*<any CHAR except CTLs or separators>
and separators/CTLs as
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
We need not to forget comment, which is defined as
comment = "(" *( ctext | quoted-pair | comment ) ")"
ctext = <any TEXT excluding "(" and ")">
quoted-pair = "\" CHAR
CHAR = <any US-ASCII character (octets 0 - 127)>
So if I understand correctly, you should be able to use any separator or CTL as long as you can distinguish comment, which is wrapped in ( and ). If ( appears inside the comment, it should be escaped with \.
In the end, I wrote a custom Tomcat AccessLogValve which:
Introduced a new pattern, 'I', to escape an incoming header
Introduced a new pattern, 'C', to fetch a cookie stored on the response
Re-implemented the pattern 'i' to ensure that "" (empty string) is replaced with "-"
Re-implemented the pattern 'q' to remove the "?" and ensure "" (empty string) is replaced with "-"
Overwrote the 'v' pattern, to write the version of this AccessLogValve, rather than the local server name
It seems to be pretty robust - I haven't had any further issues with unescaped values.
I am using asp.net memebrlogin_control and getting exception "The parameter 'username' must not contain commas. Parameter name: username". I am using emailID as username in this
How can I remove this error, and I would also like to know that what is the list of characters that should be disallowed to validate email address.
Try using a regular expression to validate the username field before passing it to whatever method you're passing it. Here's an example:
http://www.codetoad.com/asp_email_reg_exp.asp
Read RFC 2822 for the complete email address syntax. If you work through the BNF, you'll see that "#,#"#foo.bar is perfectly fine, if unusual. The rule you're asking for, though, can be found at section 3.2.4:
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom = [CFWS] dot-atom-text [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
and section 3.4.1:
addr-spec = local-part "#" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
If you ignore everything but dot-atom in the local-part and domain rules, you'll match the common-or-garden addresses. It's possible that your asp.net control doesn't accept all valid RFC2822 addresses, so you should really check that documentation.
You might do something similar to the "%<hexcode>" trick to convert a valid (or not) email address into a username argument that your control can accept.
(Mastering Regular Expressions from O'Reilly used to have a humongous one-page regex (mostly correct) for mail addresses, but it's gone in 3rd ed.)