Valid characters in cookie string? - http

Is this cookie string valid? Specifically this bit I0=; []scayt_verLang=6; I cant find a simple breakdown on the spec or an online validator.
Cookie JavascriptEnabled=true; Cms_User_Id=removed6CYjfBVknUjmvf9Pp/uSVYoemoQOXCcB0SOg3kZWX9/KZfo9v5C8O7MmLg1Xz0qXf94Wf86p4rLi2lxxminXfnP/16p6pzmwIU5qz7Of4plcQkK6JM6XiU/zbyZb3gksDOz2s8xjhfzWg0ekjgTZUx76/kFuW10/Rf7O8n05aIZzhUX0Gd9UNjk40zLA1DkJ02uNGtMbnil9P9iqVARhE0CNjCZFxc9qoLpyyRXtqG8nv0V/3k175KXzzg6iW6j9jH/DuGH8ko5YZoo6TxiIcW3ViRnFVfoiMK49iatauD2nF6xOtRV6LLH57RV3DhkhTTb/MQurw8bHYbsZWJRIuSnFwKeFUEOoxvRG4friI6d4Qug11F1oM3ECSdbDeKKPXuq5+IUImt8XXZUtBFUeakqWT4oXgnsToeNoI0=; []scayt_verLang=6; ASP.NET_SessionId=removed0l4mhioft0uavblzdeq; last_msg_check=1425606361000
Thanks,
Joe

Cookie and Set-Cookie HTTP headers are defined in RFC 6265 Section 4 with RFC 2616 Section 2.2 providing the basic types.
cookie-header = "Cookie:" OWS cookie-string OWS
cookie-string = cookie-pair *( ";" SP cookie-pair )
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
token = <token, defined in [RFC2616], Section 2.2>
Token as defined in RFC 2616...
token = 1*<any CHAR except CTLs or separators>
CHAR = <any US-ASCII character (octets 0 - 127)>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
Let's look at your cookie (I've stripped out most of the junk).
JavascriptEnabled=true; Cms_User_Id=removedlotsoftextI0=; []scayt_verLang=6; ASP.NET_SessionId=removed0l4mhioft0uavblzdeq; last_msg_check=1425606361000
You have a bunch of cookie-pairs...
JavascriptEnabled=true
Cms_User_Id=removedlotsoftextI0=
[]scayt_verLang=6
ASP.NET_SessionId=removed0l4mhioft0uavblzdeq
last_msg_check=1425606361000
The cookie-name []scayt_verLang is invalid because it contains separators which are not allowed in a token.
I0= is not its own pair, but the tail end of the very long value of Cms_User_Id. = is allowed in a cookie-value so it's valid.

Related

HTTP empty header

Is it acceptable to have empty header in HTTP?
By empty i mean ":" no header name and no header value.
The same question is also relvant to HTTP2 (suppose it is the same answer but to be sure).
Thanks.
HTTP defines a header field as:
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
; obsolete line folding
; see Section 3.2.4
The token part is later on defined as:
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
; any VCHAR, except delimiters
The implication is that the header name must be at least 1 byte, and the value can be 0 or more characters.
HTTP/2 uses the same underlying data-model.
https://www.rfc-editor.org/rfc/rfc7230#section-3.2.4

What is the meaning of SP and HT in separators defention

In the the HTTP headers RFC I need to understand the definition of token:
token = 1*
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
I do not get what is the meaning of SP and HT at the end of the separators list? How to write this in a regex?
Both are defined in the very same RFC:
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>

Backslash escaped characters in JavaCC token

I'm writing JavaCC parser for a character stream like this
Abc \(Def\) Gh (Ij; Kl); Mno (Pqr)
and should get it tokenized like this
Abc \(Def\) Gh
LPAREN
Ij
SEMICOLON
Kl
RPAREN
SEMICOLON
Mno
LPAREN
Pqr
RPAREN
The current token definition is
TOKEN:
{
< WORDCHAR : (~[";", "(", ")"])+ >
| <LPAREN: "(">
| <RPAREN: ")">
| <SEMICOLON: ";">
}
How should I change the WORDCHAR token to include backslash escaped parentheses but not parentheses without leading backslash?
TOKEN:
{
< WORDCHAR : (~[";", "(", ")"] | "\\(" | "\\)")+ >
| <LPAREN: "(">
| <RPAREN: ")">
| <SEMICOLON: ";">
}

Representing simple mathematics using BNF

I have written the following BNF "code", which attempts to describe simple mathematics using BNF. The issue I am having is that I have no idea how to add parentheses (brackets).
Digit ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
Digits ::= <Digit>|<Digit><Digit>;
Number ::= <Digits>|<Digits>.<Digits>;
Addition ::= <Value> + <Value>;
Subtraction ::= <Value> - <Value>;
Multiplication ::= <Value> * <Value>;
Division ::= <Value> / <Value>;
Value ::= <Number>|<Addition>|<Subtraction>|<Multiplication>|<Division>;
The other issue is that I'm not sure that the BNF is 100% correct, as the Value "description" doesn't look right to me.
Digit ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
Digits ::= <Digit>|<Digit><Digits>;
Number ::= <Digits>|<Digits>.<Digits>;
Operator ::= "+" | "-" | "*" | "/"
Bracket_Left ::= "("
Bracket_Right ::= ")"
Value ::= <Number>|<Bracket_Left><Value><Bracket_Right>|<Value><Operator><Value>
Maybe not the most elegant solution, but should work. Always keep in mind the power of recursion.
If you are after operator precedence too, you should use well known method by a recursion (right one in my example):
AddSub ::= <MulDiv> ("+" | "-") <AddSub> | <MulDiv>;
MulDiv ::= <Brackets> ("*" | "/") <MulDiv> | <Brackets>;
Brackets ::= "(" <AddSub> ")" | <Decimal>;
Decimal ::= <Integer> "." <Integer> | <Integer>;
Integer ::= <Digit> <Integer> | <Digit>;
Digit ::= "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9";
and operator precedence is automatically followed by parser, without further intervention. I didn't invent this method, it is there for decades, but I have to admit it's kind of genial.

List of valid characters for the fragment identifier in an URL?

I'm using the fragment identifier to create a permalink for AJAX events in my web app similar to this guy. Something like:
http://www.myapp.com/calendar#filter:year/2010/month/5
I've done quite a bit of searching but can't find a list of valid characters for the fragment idenitifer. The W3C spec doesn't offer anything.
Do I need to encode the characters the same as the URL in has in general?
There doesn't seem to be any good information on this anywhere.
See the RFC 3986.
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
So you can use !, $, &, ', (, ), *, +, ,, ;, =, something matching %[0-9a-fA-F]{2}, something matching [a-zA-Z0-9], -, ., _, ~, :, #, /, and ?
https://www.rfc-editor.org/rfc/rfc3986#section-3.5:
fragment = *( pchar / "/" / "?" )
and
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
pct-encoded = "%" HEXDIG HEXDIG
So, combined, the fragment cannot contain #, a raw %, ^, [, ], {, }, \, ", < and > according to the RFC.
One other RFC speak of that: RFC-1738
URL schemeparts for ip based protocols:
HTTP
httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
hpath = hsegment *[ "/" hsegment ]
hsegment = *[ uchar | ";" | ":" | "#" | "&" | "=" ]
search = *[ uchar | ";" | ":" | "#" | "&" | "=" ]

Resources