Is it acceptable to have empty header in HTTP?
By empty i mean ":" no header name and no header value.
The same question is also relvant to HTTP2 (suppose it is the same answer but to be sure).
Thanks.
HTTP defines a header field as:
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
; obsolete line folding
; see Section 3.2.4
The token part is later on defined as:
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
; any VCHAR, except delimiters
The implication is that the header name must be at least 1 byte, and the value can be 0 or more characters.
HTTP/2 uses the same underlying data-model.
https://www.rfc-editor.org/rfc/rfc7230#section-3.2.4
Related
I am trying to create a URI library from the RFC 3986 definitions. While validating host there are 3 allowed formats:
IP-literal
IPv4address
Reg-name
Here IPv4address's format is little ambiguous (or I am making a mistake in understanding that). The definition says it can be of the format: dec-octet "." dec-octet "." dec-octet "." dec-octet and
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
When the definitions says "2" %x30-34 DIGIT does it mean if any of the three digits (0-255 range number) can be represented in percent-encoded format or only some particular digits can be represented in the percent-encoded format.
I have a very long string like this sample bellow and I'm struggling to find a regex to split it in parts according to the patern, for example: '1. OAS / AC' and '2. OAS / AD'.
This slice of text has:
1) a varying number in the beginning
2) two capital letters varying from A to Z
I tried this:
x <- stringr::str_split(have, "([1-9])( OAS / )([A-Z]{2})")
but not works
Thanks in advance, for any help!
Example
require(stringr)
have <- "1. OAS / AC 12345/this is a test string to regex, 2. OAS / AD 79856/this is another test string to regex, 3. OAS / AE 87987/this is a new test string to regex. 4. OAS / AZ 78798456/this is one mode test string to regex."
want <- stringr::str_split(have, "([1-9])( OAS / )([A-Z]{2})")
want <- list(
"1. OAS / AC " = "12345/this is a test string to regex,",
"2. OAS / AD " = "79856/this is another test string to regex,",
"3. OAS / AE " = "87987/this is a new test string to regex.",
"4. OAS / AZ " = "78798456/this is one mode test string to regex."
)
We could do this with a positive lookahead, looking for the pattern of a number, followed by a peroid:
str_split(have, "(?=\\d+\\.)")
[1] "" "1. OAS / AC 12345/this is a test string to regex, "
[3] "2. OAS / AD 79856/this is another test string to regex, " "3. OAS / AE 87987/this is a new test string to regex. "
[5] "4. OAS / AZ 78798456/this is one mode test string to regex."
And we can further clean it up:
str_split(have, "(?=\\d{1,2}\\.)") %>% unlist() %>% .[-1]
[1] "1. OAS / AC 12345/this is a test string to regex, " "2. OAS / AD 79856/this is another test string to regex, "
[3] "3. OAS / AE 87987/this is a new test string to regex. " "4. OAS / AZ 78798456/this is one mode test string to regex."
You may use
library(stringr)
have <- "1. OAS / AC 12345/this is a test string to regex, 2. OAS / AD 79856/this is another test string to regex, 3. OAS / AE 87987/this is a new test string to regex. 4. OAS / AZ 78798456/this is one mode test string to regex."
r <- stringr::str_match_all(have, "(\\d+\\. OAS / [A-Z]{2})\\s*(.*?)(?=\\s*\\d+\\. OAS / [A-Z]{2}|\\z)")
res <- r[[1]][,3]
names(res) <- r[[1]][,2]
Result:
dput(res)
# => structure(c("12345/this is a test string to regex,", "79856/this is another test string to regex,",
# "87987/this is a new test string to regex.", "78798456/this is one mode test string to regex."
# ), .Names = c("1. OAS / AC", "2. OAS / AD", "3. OAS / AE", "4. OAS / AZ"
# ))
See the regex demo.
Pattern details
(\d+\. OAS / [A-Z]{2}) - Capturing group 1:
\d+ - 1+ digits
\. - a .
OAS / - a literal OAS / substring
[A-Z]{2} - two uppercase letters
\s* - 0+ whitespaces
(.*?) - Capturing group 2: any 0+ chars other than line break chars, as few as possible
(?=\s*\d+\. OAS / [A-Z]{2}|\z) - a positive lookahead: immediately to the right of the current location, there must
\s*\d+\. OAS / [A-Z]{2} - 0+ whitespaces, 1+ digits, ., space, /, space, two uppercase letters
| - or
\z - end of string.
They way you described the issue is kinda unclear, but if you want to simply extract till "OAS / AC",
library(qdap)
beg2char(have, " ", 4)#looks for the fourth occurrence of \\s and extracts everything before it.
For the above function to work, the sentences should be individual strings in a character vector
If your aim is to actually insert an "=" sign between the two letter sub-string and the number occurring after "OAS",
gsub("([A-Z])\\s*([0-9])","\\1 = \\2",have,perl=T)
Is this cookie string valid? Specifically this bit I0=; []scayt_verLang=6; I cant find a simple breakdown on the spec or an online validator.
Cookie JavascriptEnabled=true; Cms_User_Id=removed6CYjfBVknUjmvf9Pp/uSVYoemoQOXCcB0SOg3kZWX9/KZfo9v5C8O7MmLg1Xz0qXf94Wf86p4rLi2lxxminXfnP/16p6pzmwIU5qz7Of4plcQkK6JM6XiU/zbyZb3gksDOz2s8xjhfzWg0ekjgTZUx76/kFuW10/Rf7O8n05aIZzhUX0Gd9UNjk40zLA1DkJ02uNGtMbnil9P9iqVARhE0CNjCZFxc9qoLpyyRXtqG8nv0V/3k175KXzzg6iW6j9jH/DuGH8ko5YZoo6TxiIcW3ViRnFVfoiMK49iatauD2nF6xOtRV6LLH57RV3DhkhTTb/MQurw8bHYbsZWJRIuSnFwKeFUEOoxvRG4friI6d4Qug11F1oM3ECSdbDeKKPXuq5+IUImt8XXZUtBFUeakqWT4oXgnsToeNoI0=; []scayt_verLang=6; ASP.NET_SessionId=removed0l4mhioft0uavblzdeq; last_msg_check=1425606361000
Thanks,
Joe
Cookie and Set-Cookie HTTP headers are defined in RFC 6265 Section 4 with RFC 2616 Section 2.2 providing the basic types.
cookie-header = "Cookie:" OWS cookie-string OWS
cookie-string = cookie-pair *( ";" SP cookie-pair )
cookie-pair = cookie-name "=" cookie-value
cookie-name = token
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
token = <token, defined in [RFC2616], Section 2.2>
Token as defined in RFC 2616...
token = 1*<any CHAR except CTLs or separators>
CHAR = <any US-ASCII character (octets 0 - 127)>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
Let's look at your cookie (I've stripped out most of the junk).
JavascriptEnabled=true; Cms_User_Id=removedlotsoftextI0=; []scayt_verLang=6; ASP.NET_SessionId=removed0l4mhioft0uavblzdeq; last_msg_check=1425606361000
You have a bunch of cookie-pairs...
JavascriptEnabled=true
Cms_User_Id=removedlotsoftextI0=
[]scayt_verLang=6
ASP.NET_SessionId=removed0l4mhioft0uavblzdeq
last_msg_check=1425606361000
The cookie-name []scayt_verLang is invalid because it contains separators which are not allowed in a token.
I0= is not its own pair, but the tail end of the very long value of Cms_User_Id. = is allowed in a cookie-value so it's valid.
I'm trying to analyze HTML code and extract all CSS classes and ID's from the source. So I need to extract whatever is between two quotation marks, which can be preceded by either class or id:
id="<extract this>"
class="<extract this>"
/(?:id|class)="([^"]*)"/gi
replacement expression: $1
this regex in english: match either "id" or "class" then an equals sign and quote, then capture everything that is not a quote before matching another quote. do this globally and case insensitively.
Since you prefer using regular expression, here is one way I suppose.
\b(?:id|class)\s*=\s*"([^"]*)"
Regular expression:
\b # the boundary between a word char (\w) and not a word char
(?: # group, but do not capture:
id # 'id'
| # OR
class # 'class'
) # end of grouping
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
= # '='
\s* # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
" # '"'
( # group and capture to \1:
[^"]* # any character except: '"' (0 or more times)
) # end of \1
" # '"'
You may want to try this:
<?php
$css = <<< EOF
id="<extract this>"
class="<extract this>"id="<extract this2>"
class="<extract this3>"id="<extract this4>"
class="<extract this5>"id="<extract this6>"
class="<extract this7>"id="<extract this8>"
class="<extract this9>"
EOF;
preg_match_all('/(?:id|class)="(.*?)"/sim', $css , $classes, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($classes[1]); $i++) {
echo $classes[1][$i]."\n";
}
/*
<extract this>
<extract this>
<extract this2>
<extract this3>
<extract this4>
<extract this5>
<extract this6>
<extract this7>
<extract this8>
<extract this9>
*/
?>
DEMO:
http://ideone.com/Nr9FPt
I'm using the fragment identifier to create a permalink for AJAX events in my web app similar to this guy. Something like:
http://www.myapp.com/calendar#filter:year/2010/month/5
I've done quite a bit of searching but can't find a list of valid characters for the fragment idenitifer. The W3C spec doesn't offer anything.
Do I need to encode the characters the same as the URL in has in general?
There doesn't seem to be any good information on this anywhere.
See the RFC 3986.
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
So you can use !, $, &, ', (, ), *, +, ,, ;, =, something matching %[0-9a-fA-F]{2}, something matching [a-zA-Z0-9], -, ., _, ~, :, #, /, and ?
https://www.rfc-editor.org/rfc/rfc3986#section-3.5:
fragment = *( pchar / "/" / "?" )
and
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
pct-encoded = "%" HEXDIG HEXDIG
So, combined, the fragment cannot contain #, a raw %, ^, [, ], {, }, \, ", < and > according to the RFC.
One other RFC speak of that: RFC-1738
URL schemeparts for ip based protocols:
HTTP
httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
hpath = hsegment *[ "/" hsegment ]
hsegment = *[ uchar | ";" | ":" | "#" | "&" | "=" ]
search = *[ uchar | ";" | ":" | "#" | "&" | "=" ]