grammar from RFC 2812 - bnf

I've got a grammar for the IRC-protocol from RFC 2812:
message = [ ":" prefix SPACE ] command [ params ] crlf
prefix = servername / ( nickname [ [ "!" user ] "#" host ] )
command = 1*letter / 3digit
params = *14( SPACE middle ) [ SPACE ":" trailing ]
=/ 14( SPACE middle ) [ SPACE [ ":" ] trailing ]
nospcrlfcl = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B-FF
; any octet except NUL, CR, LF, " " and ":"
middle = nospcrlfcl *( ":" / nospcrlfcl )
trailing = *( ":" / " " / nospcrlfcl )
SPACE = %x20 ; space character
crlf = %x0D %x0A ; "carriage return" "linefeed"
What does the "1*letter" mean? I guess one to infinite occurrences.
And what does "*14( SPACE middle )" mean?
And what dows "14( SPACE middle )" mean?
Thanks in advance.

RFC 2812's References section lists RFC 2234 as the specification of Augmented BNF for Syntax Specifications.
There, in section 3.6, we see:
The operator "*" preceding an element indicates repetition. The full
form is:
<a>*<b>element
where <a> and <b> are optional decimal values, indicating at least
<a> and at most <b> occurrences of element.
Default values are 0 and infinity so that *<element> allows any
number, including zero; 1*<element> requires at least one;
3*3<element> allows exactly 3 and 1*2<element> allows one or two.

Related

What is the format of HTTP 1.1 header values?

I read the rfc7230 section 3.2. After removing obsolete rules, the spec about header field is:
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *field-content
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR
VCHAR = %x21-7E; visible (printing) characters
I am confused by the definition of field-content. It seems that it matches 1 or 2 VCHARs, with any amount of space in between, but it will not match another space after a field-content match.
For example, for name:a<sp>b<sp>c, field-name will match name, but field-content will match a<sp>b and then the next <sp> cannot be matched by another field-content, thus this header is invalid.
However, name:a<sp>bc<sp>d is valid because there are two matches for field-content, a<sp>b and c<sp>d.
I think this is inconsistent. Is this intended or do I misunderstood something?
I know this is an old question, but :
The updated RFC 9110 Section 5.5 still holds this ambiguity.
Therefore, i would suggest sticking to the explaination described here.

What does "*" (asterisk) mean in RFC?

Quote from https://www.rfc-editor.org/rfc/rfc5987#section-3.2.1:
In order to include character set and language information, this
specification modifies the RFC 2616 grammar to be:
parameter = reg-parameter / ext-parameter
reg-parameter = parmname LWSP "=" LWSP value
ext-parameter = parmname "*" LWSP "=" LWSP ext-value
parmname = 1*attr-char
ext-value = charset "'" [ language ] "'" value-chars
; like RFC 2231's <extended-initial-value>
; (see [RFC2231], Section 7)
charset = "UTF-8" / "ISO-8859-1" / mime-charset
mime-charset = 1*mime-charsetc
What does * mean in parmname = 1*attr-char? And also the same question at mime-charset = 1*mime-charsetc.
What I have known is that "*" mean exactly * itself in ext-parameter = parmname "*" LWSP "=" LWSP ext-value, due to the fact that the RFC show an example latter of ext-parameter = parmname "*" LWSP "=" LWSP ext-value:
title*=iso-8859-1'en'%A3%20rates
Its a quantifier that describes the valid number of repetitions.
"1*element" requires at least one element.
See RFC 2616 section 2.1 - Augmented BNF:
*rule
The character "*" preceding an element indicates repetition. The
full form is "<n>*<m>element" indicating at least <n> and at most
<m> occurrences of element. Default values are 0 and infinity so
that "*(element)" allows any number, including zero; "1*element"
requires at least one; and "1*2element" allows one or two.
The spec you quoted says:
This specification uses the ABNF (Augmented Backus-Naur Form)
notation defined in [RFC5234]. The following core rules are included
by reference, as defined in [RFC5234], Appendix B.1: ALPHA (letters),
DIGIT (decimal 0-9), HEXDIG (hexadecimal 0-9/A-F/a-f), and LWSP
(linear whitespace).
Go to RFC 5234 and you'll find https://www.rfc-editor.org/rfc/rfc5234#section-3.6

Should packed cookies be treated as a single cookie?

I see some sites, like stackoverflow, use packed cookies where multiple cookies are packed into one. Here's an example:
Set-Cookie: acct=t=&s=; domain=.stackapps.com; expires=Mon, 30-May-2016 20:16:22 GMT; path=/; HttpOnly
Is this just to save sending multiple set-cookie headers, and to avoid sending comma separated cookies on the one set-cookie header? That's allowed--but is it not recommended?
Should the packed cookie just be treated as a single cookie, or does it need to be unpacked and sent back as individual cookies?
I do not know from where the idea of "packed" came about. Those are just cookies with the = sign in the value, or at least should be according to the specs. Let us go through the RFCs and see that:
Set-Cookie: acct=t=&s=; domain=.stackapps.com; expires=...
is exactly the same as
Set-Cookie: acct="t=&s="; domain=.stackapps.com; expires=...
Therefore, it is a single cookie and shall be treated as such.
The answer is rather long, sorry for that. I tried to aim it at people who find the grammar rules found in the RFCs difficult to understand. If you believe that some piece of the grammar is still difficult to understand please point it to me in a comment.
Through the RFCs
The current RFC for the Set-Cookie header is RFC6265, in section 4.1 it has the formal syntax for Set-Cookie:
set-cookie-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string = cookie-pair *( ";" SP cookie-av )
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
token = <token, defined in [RFC2616], Section 2.2>
cookie-av = expires-av / max-age-av / domain-av /
path-av / secure-av / httponly-av /
extension-av
expires-av = "Expires=" sane-cookie-date
sane-cookie-date = <rfc1123-date, defined in [RFC2616], Section 3.3.1>
max-age-av = "Max-Age=" non-zero-digit *DIGIT
; In practice, both expires-av and max-age-av
; are limited to dates representable by the
; user agent.
non-zero-digit = %x31-39
; digits 1 through 9
domain-av = "Domain=" domain-value
domain-value = <subdomain>
; defined in [RFC1034], Section 3.5, as
; enhanced by [RFC1123], Section 2.1
path-av = "Path=" path-value
path-value = <any CHAR except CTLs or ";">
secure-av = "Secure"
httponly-av = "HttpOnly"
extension-av = <any CHAR except CTLs or ";">
That is a little terse but we do not need to got through it all. For a start we have the Set-Cookie: header and a space (SP), then the set-cookie-string which is defined further.
set-cookie-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string is composed of a cookie-pair (defined further), which is the grammar part that interests us, and optionally a set of any number of cookie-av prefixed with ; and a space. The *() construct allows for any number of occurrences (including zero) of the grammar part.
set-cookie-string = cookie-pair *( ";" SP cookie-av )
cookie-av defines the metadata that can be used in the cookie but it is not needed for our proof, therefore we will abandon its discussion.
The cookie-pair on the other hand is a very simple construct: one cookie-name one mandatory = sign and one cookie-value.
cookie-pair = cookie-name "=" cookie-value
The cookie-name is defined as a token which leads us to another RFC, RFC2616. In the section 2.2 of that RFC we find the basic rules that define the token.
cookie-name = token
token = <token, defined in [RFC2616], Section 2.2>
token definition:
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
...
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
The 1*<> syntax means any number of occurrences but at least one occurrence. To find the CTLs use man ascii and check the Dec column, SP is space (as we already saw) and HT is the horizontal tab (9 in the ascii table).
The interesting part for us is the fact that a token cannot contain an = character.
Back to RFC6265:
cookie-pair = cookie-name "=" cookie-value
cookie-name stops at the first = character, that first = character is always the = explicit in the grammar. Now, let's finally define the cookie-value
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
We already saw that the * there means any occurrences including zero (note that empty cookies are allowed by the RFC!). The interesting part the entire cookie-value can be enclosed by double quotes (DQUOTE is the double quote character as you might have guessed).
But the most interesting part is that the = sign (x3D in the ascii) table is allowed as a cookie-octet
/ %x3C-5B / <- right there!
Yet the space (x20) and semicolon (x3B) are disallowed.
Conclusion
Therefore this Set-Cookie header shall be interpreted as
Set-Cookie: acct=t=&s=; domain=.stackapps.com; expires=...
cookie-set-header = "Set-Cookie:" SP set-cookie-string
set-cookie-string = cookie-pair *(";" cookie-av)
cookie-pair = cookie-name "=" cookie-value
cookie-name = "acct"
cookie-value = "t=&s="
And the header sending it back to the server shall be
Cookie: acct=t=&s=
Sending it as follows violates the RFC:
Cookie: acct=t&; s=

Can a URL have an asterisk?

I notice Wikipedia allows them in their URLs, is it legit or does anyone know where it will give me problems?
It's legit and intended to be a delimiter ; see Uniform Resource Identifier (URI): Generic Syntax
As per http://www.ietf.org/rfc/rfc1738.txt YES, you can.
...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL.
refer: http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
Yes. All of the sub-delims characters can be used as is in the path. Sub-delimiters include the asterisk (*) character:
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
There are several types of URIs, but in general, a path is defined as a set of segments separated by a slash:
path-absolute = "/" [ segment-nz *( "/" segment ) ]
The segments are composed of characters (segment-nz cannot be empty):
segment = *pchar
segment-nz = 1*pchar
And pchar includes sub-delims:
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
Reference: https://www.rfc-editor.org/rfc/rfc3986#appendix-A

Pass delimited List as QueryString value

I haven't had the need to do this before but I want to send a list of IDs in for a query string value in ASP.NET:
?ListOfIDs=1234;3224;&SecondParam=somevalue&ThirdParam=....
I don't think you can add ; or commas right? I couldn't really find a good reference talking about what you can or can't pass in a url.
There is no better authority than the spec! It is always correct by definition.
From the spec (RFC 3986- https://www.rfc-editor.org/rfc/rfc3986#section-3.4) the query string is defined as:
query = ( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "" / "+" / "," / ";" / "="
Once you piece that all together, you have the answer to your question:
Yes, it is perfectly fine to have commas or semi-colons (they are sub-delims).
The query string itself ends at the end of the URI or at the # character (if a URI fragment is present):
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
As far as what your web framework understands when it parses that query string, that is another matter! Perhaps someone else has the answer for how .NET passes an array in the query string?

Resources