I'm trying to understand http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2
HTTP/1.1 header field values can be folded onto multiple lines if the
continuation line begins with a space or horizontal tab. All linear
white space, including folding, has the same semantics as SP. A
recipient MAY replace any linear white space with a single SP before
interpreting the field value or forwarding the message downstream.
LWS = [CRLF] 1*( SP | HT )
Can i put any number of <CR><LF><SP>, without putting any header value on the line ?
i.e. is this valid : Header:<CR><LF><SP><CR><LF><SP>Value
Yes, but see http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-22.html#rfc.section.3.2.4.p.3 - it's deprecated in the upcoming revision of the HTTP spec.
Related
RFC7233 is nice and clear, except for line endings.
I am specifically interested the HTTP response body of a multipart/byteranges response. I assume each line is terminated by a CRLF as HTTP headers are, but this document isn't explicit about it. What I'm totally befuddled about is the last line: --THIS_SEPARATOR_SEPARATES--. Is it followed by a CRLF?
Full block:
HTTP/1.1 206 Partial Content
Date: Wed, 15 Nov 1995 06:25:24 GMT
Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
Content-Length: 1741
Content-Type: multipart/byteranges; boundary=THIS_STRING_SEPARATES
--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 500-999/8000
...the first range...
--THIS_STRING_SEPARATES
Content-Type: application/pdf
Content-Range: bytes 7000-7999/8000
...the second range
--THIS_STRING_SEPARATES--
Sorry I really can't find it, so help would be greatly appreciated.
NOTE: please no gut feelings, only RFC references.
If you read RFC 7233 more carefully, Appendix A refers to RFC 2046 Section 5.1 for the actual format of the MIME data within the HTTP body:
When a 206 (Partial Content) response message includes the content of
multiple ranges, they are transmitted as body parts in a multipart
message body ([RFC2046], Section 5.1) with the media type of
"multipart/byteranges".
RFC 2046 Section 5.1 defines the formal definition of the "multipart" media type and how its boundaries are formatted and parsed.
To answer your question, here is the formal syntax from RFC 2046:
The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part. The boundary may be followed by zero or more characters of
linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.
NOTE: The CRLF preceding the boundary delimiter line is conceptually
attached to the boundary so that it is possible to have a part that
does not end with a CRLF (line break). Body parts that must be
considered to end with line breaks, therefore, must have two CRLFs
preceding the boundary delimiter line, the first of which is part of
the preceding body part, and the second of which is part of the
encapsulation boundary.
...
The boundary delimiter line following the last body part is a
distinguished delimiter that indicates that no further body parts
will follow. Such a delimiter line is identical to the previous
delimiter lines, with the addition of two more hyphens after the
boundary parameter value.
--gc0pJq0M:08jU534c0p--
NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the
boundary value with the beginning of each candidate line. An exact
match of the entire candidate line is not required; it is sufficient
that the boundary appear in its entirety following the CRLF.
...
The only mandatory global parameter for the "multipart" media type is
the boundary parameter, which consists of 1 to 70 characters from a
set of characters known to be very robust through mail gateways, and
NOT ending with white space. (If a boundary delimiter line appears to
end with white space, the white space must be presumed to have been
added by a gateway, and must be deleted.) It is formally specified
by the following BNF:
boundary := 0*69 bcharsnospace
bchars := bcharsnospace / " "
bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
"+" / "_" / "," / "-" / "." /
"/" / ":" / "=" / "?"
Overall, the body of a "multipart" entity may be specified as
follows:
dash-boundary := "--" boundary
; boundary taken from the value of
; boundary parameter of the
; Content-Type field.
multipart-body := [preamble CRLF]
dash-boundary transport-padding CRLF
body-part *encapsulation
close-delimiter transport-padding
[CRLF epilogue]
transport-padding := *LWSP-char
; Composers MUST NOT generate
; non-zero length transport
; padding, but receivers MUST
; be able to handle padding
; added by message transports.
encapsulation := delimiter transport-padding
CRLF body-part
delimiter := CRLF dash-boundary
close-delimiter := delimiter "--"
preamble := discard-text
epilogue := discard-text
discard-text := *(*text CRLF) *text
; May be ignored or discarded.
body-part := MIME-part-headers [CRLF *OCTET]
; Lines in a body-part must not start
; with the specified dash-boundary and
; the delimiter must not appear anywhere
; in the body part. Note that the
; semantics of a body-part differ from
; the semantics of a message, as
; described in the text.
OCTET := <any 0-255 octet value>
Each delimiter at the beginning of a new part is terminated by a CRLF, and any CRLF that immediately precedes a delimiter is parsed as part of the boundary and not the data of the preceding part. However, there is no CRLF on the end of the final closing boundary, unless there is an epilogue present (which is very rarely used in email, and I have never seen it used in HTTP as there is no way to determine when then epilogue ends unless there is a valid Content-Length header present, which is not supposed to be used with self-terminating content types like MIME).
That spec references:
https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1
Which explicitly states:
--gc0pJq0M:08jU534c0p
The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part. The boundary may be followed by zero or more characters of
linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.
I have refactored a man page's paragraph so that each sentence is it's own line. When rendering with man ./somefile.3 The output is slightly different.
Let me show an example:
This is line 1. This is line 2.
vs.
This is line 1.
This is line 2.
Are rendering like so:
First:
This is line 1. This is line 2.
Second:
This is line 1. This is line 2.
There is an extra space between the sentences. Note that I have made sure that there is no extra white space. I have more experience with Latex, asciidoc, and markdown and I can control that there, is it possible with troff/groff? I'd like to avoid that if possible. I don't think it should be there.
The troff input standard is to have a newline at the end of each sentence, and to let the typesetter do its job with filling. (Althought I doubt it was the intent, it does make it play nicer with source control.) Therefore, it considers sentence ends to be at the end of a line that ends with a period (or ? or !, and optionally followed by ',",*,],),or †). It also believes that sentences should have two spaces between them. This almost certainly derives from the typography standards at Bell Labs at the time; It's rather curious that this behavior is not settable through any fill modes.
groff does provide a way to set the "inter-sentence" spacing, with the extended .ss request:
.ss word_space_size [sentence_space_size]
Change the size of a space between words. It takes its units as one
twelfth of the space width parameter for the current font. Initially
both the word_space_size and sentence_space_size are 12. In fill mode,
the values specify the minimum distance.
If two arguments are given to the ss request, the second argument sets
the sentence space size. If the second argument is not given, sentence
space size is set to word_space_size. The sentence space size is used
in two circumstances: If the end of a sentence occurs at the end of a
line in fill mode, then both an inter-word space and a sentence space
are added; if two spaces follow the end of a sentence in the middle of
a line, then the second space is a sentence space. If a second
argument is never given to the ss request, the behaviour of UNIX troff
is the same as that exhibited by GNU troff. In GNU troff, as in UNIX
troff, a sentence should always be followed by either a newline or two
spaces.
So you can specify that the "sentence space" should be zero-width by making the request
.ss 12 0
As far as I know, this is a groff extension; heirloom troff supports it, but older dwb derived versions may not.
Example:
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1.
This is line 2.
SET SENTENCE SPACING
.ss 12 0
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1.
This is line 2.
Results:
$ groff -T ascii spaces.tr |sed -n -e/./p
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1. This is line 2.
SET SENTENCE SPACING
This is line 1. This is line 2.
This is line 1. This is line 2.
This is line 1. This is line 2.
So the following will work, but I hope there is a better option.
This is line 1. \
This is line 2.
renders as
This is line 1. This is line 2.
Below is HTTP-message definition in latest HTTP RFC 7230
HTTP-message = start-line
*( header-field CRLF )
CRLF
[ message-body ]
Below is definition of header-field,
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
..and:
obs-text = %x80-FF
..and ABNF's:
VCHAR = %x21-7E
; visible (printing) characters
As we can see, field-value could have multiple obs-folds and obs-folds has one CRLF. It is strange for me for I think CRLF is the end of a header line. Is there an example that multiple CRLFs are encoded into one header-field? Or, do I misunderstand the definition?
Your understanding of the standard is correct. In the past, multi-line header values were supported under RFC 2616. This feature was known as "Line Folding":
HTTP/1.1 header field values can be folded onto multiple lines if the continuation line begins with a space or horizontal tab. All linear white space, including folding, has the same semantics as SP. A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream.
So the following two forms were equivalent:
Header: value1, value2
and
Header: value1,
value2
The newer RFC 7230 explicitly deprecates this. In fact the "obs" in "obs-fold" stands for "obsolete".
Historically, HTTP header field values could be extended over multiple
lines by preceding each extra line with at least one space or
horizontal tab (obs-fold). This specification deprecates such line
folding except within the message/http media type (Section 8.3.1). A
sender MUST NOT generate a message that includes line folding (i.e.,
that has any field-value that contains a match to the obs-fold rule)
unless the message is intended for packaging within the message/http
media type.
So although I've never seen this feature in practice (or at least haven't noticed it), it exists. Moreover, it seems that line folding wasn't even completely deprecated, and its use is still allowed for the HTTP media type header.
Multi-line headers are still supported by standard HTTP header parsers in languages such as PHP [arv], Java, and Go.
The only concrete example I managed to find of such a header was in this technet blog post which has this image:
Note the yellow 0d 0a (carriage return, line feed) WITHIN the Content-Type header.
Can the 'class' attribute of HTML5 elements contain line breaks? Is it allowable in the specs and do browsers support it?
I ask because I have some code that dynamically inserts various classes into the element and this has created one very long line that is hard to manage. Normally I would build the class value using a variable but the CMS I'm using requires the template conditional tags to be positioned inline with the HTML. I can't use variables or PHP.
What I found in my research is that some HTML tag attributes need to be a single line, but I haven't been able to discover if the class attribute is one of those.
Does anyone know something about this?
Per the HTML 4 spec, the class attribute is CDATA:
User agents should interpret attribute values as follows:
o Replace character entities with characters
o Ignore line feeds
o Replace each carriage return or tab with a single space.
so you're in good shape there.
The HTML5 spec describes a class as a set of space separated tokens, where a 'space' includes newlines.
So you should be good there, too.
Can the [class] attribute of HTML5 elements contain line breaks?
Yes. The HTML5 spec says:
The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.
The link proceeds to say:
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.
And space characters include:
space (' ')
tab (\t)
line feed (\n)
form feed (\f)
carriage return (\r)
The space characters, for the purposes of this specification, are U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).
Newlines as you would add to UTF-8 documents are:
line feeds (\n)
carriage returns (\r)
a carriage return followed immediately by a line feed (\r\n)
I have defined a custom file type with these lines:
syn region SubSubtitle start=+=+ end=+=+
highlight SubSubtitle ctermbg=black ctermfg=DarkGrey
syn region Subtitle start=+==+ end=+==+
highlight Subtitle ctermbg=black ctermfg=DarkMagenta
syn region Title start=+===+ end=+===+
highlight Title ctermbg=black ctermfg=yellow
syn region MasterTitle start=+====+ end=+====+
highlight MasterTitle cterm=bold term=bold ctermbg=black ctermfg=LightBlue
I enclose all of my headings in this kind of document like this:
==== Biggest Heading ==== // this will be bold and light blue
===Sub heading === // this will be yellow
bla bla bla // this will be normally formatted
However right now when ever I use an equals sign in my code it thinks that it is a title. Is there anyway that I can force a match to be only on one line?
UPDATE: My previous answer was wrong, you can do this with a region, just do
syn region SubSubtitle start=+=+ end=+=+ oneline
See :help syn-oneline and :help syn-arguments. Guess it shows that I can't actually run vim right now, hunh?
Previous answer
According to my reading of the :help syntax, there's no way to do this with a region. However, you could do this with a syn-match:
syn match SubSubtitle /=\#<!=[^=]*==\#!/
The /=\#<!/ says there's no = immediately before your match, and the /=\#!/ says there's no = immediately after, so this matches exactly one =, a bunch of non-= (not including newlines - to include newlines it would have to be \_[^=]), then exactly one =.
The rest are similar
syn match Subtitle /=\#<!=\{2}[^=]*=\{2}=\#!/
syn match Title /=\#<!=\{3}[^=]*=\{3}=\#!/
syn match MasterTitle /=\#<!=\{4}[^=]*=\{4}=\#!/
You can still do matches within syn-matches, so if you have any nesting going on, it will still work.
For example
syn match Todo /\<TODO\>/ containedin=SubSubtitle,Subtitle,Title,MasterTitle contained