What are the rules around whitespace in attribute selectors? - css

I'm reading the spec on attribute selectors, but I can't find anything that says if whitespace is allowed. I'm guessing it's allowed at the beginning, before and after the operator, and at the end. Is this correct?

The rules on whitespace in attribute selectors are stated in the grammar. Here's the Selectors 3 production for attribute selectors (some tokens substituted with their string equivalents for illustration; S* represents 0 or more whitespace characters):
attrib
: '[' S* [ namespace_prefix ]? IDENT S*
[ [ '^=' |
'$=' |
'*=' |
'=' |
'~=' |
'|=' ] S* [ IDENT | STRING ] S*
]? ']'
;
Of course, the grammar isn't terribly useful to someone looking to understand how to write attribute selectors, as it's intended for someone who's implementing a selector engine.
Here's a plain-English explanation:
Whitespace before the attribute selector
This isn't covered in the above production, but the first obvious rule is that if you're attaching an attribute selector to another simple selector or a pseudo-element, don't use a space:
a[href]::after
If you do, the space is treated as a descendant combinator instead, with the universal selector implied on the attribute selector and anything that may follow it. In other words, these selectors are equivalent to each other, but different from the above:
a [href] ::after
a *[href] *::after
Whitespace inside the attribute selector
Whether you have any whitespace within the brackets and around the comparison operator doesn't matter; I find that browsers seem to treat them as if they weren't there (but I haven't tested extensively). These are all valid according to the grammar and, as far as I've seen, work in all modern browsers:
a[href]
a[ href ]
a[ href="http://stackoverflow.com" ]
a[href ^= "http://"]
a[ href ^= "http://" ]
Whitespace is not allowed between the ^ (or other symbol) and = as these are treated as a single token, and tokens cannot be broken apart.
If IE7 and IE8 implement the grammar correctly, they should be able to handle them all as well.
If a namespace prefix is used, whitespace is not allowed between the prefix and the attribute name.
These are incorrect:
unit[sh| quantity]
unit[ sh| quantity="200" ]
unit[sh| quantity = "200"]
These are correct:
unit[sh|quantity]
unit[ sh|quantity="200" ]
unit[sh|quantity = "200"]
Whitespace within the attribute value
But notice the quotes around the attribute values above; if you leave them out, and you try to select something whose attribute has spaces in its value you have a syntax error.
This is incorrect:
div[class=one two]
This is correct:
div[class="one two"]
This is because an unquoted attribute value is treated as an identifier, which doesn't include whitespace (for obvious reasons), whereas a quoted value is treated as a string. See this spec for more details.
To prevent such errors, I strongly recommend always quoting attribute values, whether in HTML, XHTML (required), XML (required), CSS or jQuery (once required).
Whitespace after the attribute value
As of Selectors 4 (following the original publication of this answer), attribute selectors can accept flags in the form of an identifier appearing after the attribute value. Two flags have been defined pertaining to character case, one for case-insensitive matching:
div[data-foo="bar" i]
And one for case-sensitive matching (whose addition I had a part in, albeit by proxy of the WHATWG):
ol[type="A" s]
ol[type="a" s]
The grammar has been updated thus:
attrib
: '[' S* attrib_name ']'
| '[' S* attrib_name attrib_match [ IDENT | STRING ] S* attrib_flags? ']'
;
attrib_name
: wqname_prefix? IDENT S*
attrib_match
: [ '=' |
PREFIX-MATCH |
SUFFIX-MATCH |
SUBSTRING-MATCH |
INCLUDE-MATCH |
DASH-MATCH
] S*
attrib_flags
: IDENT S*
In plain English: if the attribute value is not quoted (i.e. it is an identifier), whitespace between it and attrib_flags is required; otherwise, if the attribute value is quoted then whitespace is optional, but strongly recommended for the sake of readability. Whitespace between attrib_flags and the closing bracket is optional as always.

Related

How to read CSS syntax on MDN

selectors-list ::=
selector[:pseudo-class] [::pseudo-element]
[, selectors-list]
properties-list ::=
[property : value] [; properties-list]
I'm trying to get familiar with CSS and I'd be happy to understand the rules of reading these (from https://developer.mozilla.org/en-US/docs/Web/CSS/Reference).
This MDN page seems to use a pseudo-BNF notation to describe CSS syntax. The notation used is very confusing because [] :: : ; and , all mean something in CSS, yet they used [] to represent optional parts and ::= for grammar rule definition.
I can give you a rough English translation of what they meant:
style-rule ::=
selectors-list {
properties-list
}
A style-rule is made of a selectors-list followed by { followed by a properties-list followed by }.
selectors-list ::=
selector[:pseudo-class] [::pseudo-element]
[, selectors-list]
A selectors-list is made of a selector, optionally followed by : and a pseudo-class, optionally followed by :: and a pseudo-element, optionally followed by , and another selectors-list.
This definition is not only incorrect (you may use multiple pseudo-class in a row), but has a confusing name. If a pseudo-class and a pseudo-element are something separate from the selector, why would you call a list of all three a selectors-list?
Leaving that aside...
properties-list ::=
[property : value] [; properties-list]
A properties-list can be completely empty, or may contain a property followed by : followed by a value, optionally followed by ; and another properties-list.
And then, they don't even use their pseudo-BNF to define what is a selector, a pseudo-class, a pseudo-element, a property or a value. This whole notation is way more confusing than helpful. This MDN page should probably get rewritten.

Avoiding left recursion in parsing LiveScript object definitions

I'm working on a parser for LiveScript language, and am having trouble with parsing both object property definition forms — key: value and (+|-)key — together. For example:
prop: "val"
+boolProp
-boolProp
prop2: val2
I have the key: value form working with this:
Expression ::= TestExpression
| ParenExpression
| OpExpression
| ObjDefExpression
| PropDefExpression
| LiteralExpression
| ReferenceExpression
PropDefExpression ::= Expression COLON Expression
ObjDefExpression ::= PropDefExpression (NEWLINE PropDefExpression)*
// ... other expressions
But however I try to add ("+"|"-") IDENTIFIER to PropDefExpression or ObjDefExpression, I get errors about using left recursion. What's the (right) way to do this?
The grammar fragment you posted is already left-recursive, i.e. without even adding (+|-)boolprop, the non-terminal 'Expression' derives a form in which 'Expression' reappears as the leftmost symbol:
Expression -> PropDefExpression -> Expression COLON Expression
And it's not just left-recursive, it's ambiguous. E.g.
Expression COLON Expression COLON Expression
can be derived in two different ways (roughly, left-associative vs right-associative).
You can eliminate both these problems by using something more restricted on the left of the colon, e.g.:
PropDefExpression ::= Identifier COLON Expression
Also, another ambiguity: Expression derives PropDefExpression in two different ways, directly and via ObjDefExpression. My guess is, you can drop the direct derivation.
Once you've taken care of those things, it seems to me you should be able to add (+|-)boolprop without errors (unless it conflicts with one of the other kinds of expression that you didn't show).
Mind you, looking at the examples at http://livescript.net, I'm doubtful how much of that you'll be able to capture in a conventional grammar. But if you're just going for a subset, you might be okay.
I don't know how much help this will be, because I know nothing about GrammarKit and not much more about the language you're trying to parse.
However, it seems to me that
PropDefExpression ::= Expression COLON Expression
is not quite accurate, and it is creating an ambiguity when you add the boolean property production because an Expression might start with a unary - operator. In the actual grammar, though, a property cannot start with an arbitrary Expression. There are two types of key-property definitions:
name : expression
parenthesized_expression : expression
(Which is to say, expressions need to start with a ().
That means that a boolean property definition, starting with + or - is recognizable from the first token, which is precisely the condition needed for successful recursive descent parsing. There are several other property definition syntaxes, including names and parenthesized_expressions not followed by a :
That's easy to parse with an LR(1) parser, like the one Jison produces, but to parse it with a recursive-descent parser you need to left-factor. (It's possible that GrammarKit can do this for you, by the way.) Basically, you'd need something like (this is not complete):
PropertyDefinition ::= PropertyPrefix PropertySuffix? | BooleanProperty
PropertyPrefix ::= NAME | ParenthesizedExpression
PropertySuffix ::= COLON Expression | DOT NAME

Converting EBNF to BNF basics

I'm not quite sure how to answer a question for my computer languages class. I am to convert the following statement from EBNF form to BNF form:
EBNF: expr --> [-] term {+ term}
I understand that expressions included within curly braces are to be repeated zero or more times, and that things included within right angle braces represents zero or one options. If my understanding is correct, would this be a correct conversion?
My BNF:
expr --> expr - term
| expr + term
| term
Bonus Reading
Converting EBNF to BNF (general rules)
I don't think that's correct. In fact, I don't think the EBNF is actually valid EBNF. The answer to the question How to convert BNF to EBNF shows how valid EBNF is constructed, quoting from ISO/IEC 14977:1996, the Extended Backus-Naur Form standard.
I think the expression:
expr --> [-] term {+ term}
should be written:
expr = [ '-' ] term { '+', term };
This means that an expression consists of an optional minus sign, followed by a term, followed by a sequence of zero of more occurrences of a plus sign and a term.
Next question: which dialect of BNF are you targeting? Things get tricky here; there are many dialects. However, here's one possible translation:
<expr> ::= [ MINUS ] <term> <opt_add_term_list>
<opt_add_term_list> ::= /* Nothing */
| <opt_add_term_list> <opt_add_term>
<add_term> ::= PLUS term
Where MINUS and PLUS are terminals (for '-' and '+'). This is a very austere but minimal BNF. Another possible translation would be:
<expr> ::= [ MINUS ] <term> { PLUS <term> }*
Where the { ... }* part means zero or more of the contained pattern ... (so PLUS <term> in this example). Or you could use quoted characters:
<expr> ::= [ '-' ] <term> { '+' <term> }*
And so the list of possible alternatives goes on. You'll have to look at the definition of BNF you were given to work to, and you should complain about the very sloppy EBNF you were given, if it was meant to be ISO standard EBNF. If it was just some random BNF-style language called EBNF, I guess it is just the name that is confusing. Private dialects are fine as long as they're defined, but it isn't possible for people not privy to the dialect to know what the correct answer is.

Whitespace in Treetop grammar

How explicit do I need to be when specifying were whitespace is or is not allowed? For instance would these rules:
rule lambda
'lambda' ( '(' params ')' )? block
end
rule params
# ...
end
rule block
'{' # ... '}'
end
be sufficient to match
lambda {
}
Basically do I need to specify everywhere optional whitespace may appear?
Yes, you do. In these rules you need to skip whitespace, but, for instance, when you parse strings, which may contain whitespace, you would like to retain them; that's why you have to specify.
However, before applying treetop to your string, you may try to run a "quick and dirty" regexp-based algorithm that discards whitespace from the places where they're optional. Still, this may be much harder that specifying whitespaces in your grammar.

Do values in CSS attribute selector values need to be quoted? [duplicate]

This question already has answers here:
CSS attribute selectors: The rules on quotes (", ' or none?)
(2 answers)
Closed last year.
e.g.:
a[href="val"]
Does "val" need to have quotes around it? Single or double are acceptable? What about for integers?
TLDR: Quotes are required unless the value meets the identifier specification for CSS2.1
The CSS spec might say they are optional, but the real world presents a different story. When making a comparison against the href attribute you will need to use quotes (single or double work in my very limited testing - latest versions of FF, IE, Chrome.)
Interestingly enough the css spec link referenced by #Pekka happens to use quotes around their href-specific examples.
And it's not just due to non-alpha characters like the period or slashes that give this unique situation a quote requirement - using a partial match selector ~= doesn't work if you just use the "domain" in "domain.com"
Ok, every answer here is wrong (including my own previous answer.) The CSS2 spec didn't clarify whether quotes are required in the selector section itself, but the CSS3 spec does and quotes the rule as a CSS21 implementation:
http://www.w3.org/TR/css3-selectors/
Attribute values must be CSS identifiers or strings. [CSS21] The case-sensitivity of attribute names and values in selectors depends on the document language.
And here is the identifier info:
http://www.w3.org/TR/CSS21/syndata.html#value-def-identifier
In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F".
My answer seemed correct but that's because the '~=' is a white-space selector comparator so it will never match a partial string inside an href value. A '*=' comparator does work however. And a partial string like 'domain' does work for matching href='www.domain.com'. But checking for a full domain name would not work because it violates the identifier rule.
According to the examples in the CSS 2.1 specs, quotes are optional.
In the following example, the selector matches all SPAN elements whose "class" attribute has exactly the value "example":
span[class=example] { color: blue; }
Here, the selector matches all SPAN elements whose "hello" attribute has exactly the value "Cleveland" and whose "goodbye" attribute has exactly the value "Columbus":
span[hello="Cleveland"][goodbye="Columbus"] { color: blue; }
Numbers are treated like strings, i.e. they can be quoted, but they don't have to.
No, they don't have to have quotes, tough in order to avoid ambiguities many people do use quotes, which are needed if the value contains whitespace.
Either single or double quotes are fine, and integers will be treated the same way (css does not have a distinction between strings and integers).
See the examples in the spec.
They do not need to be quoted.
There is also no distinction between strings/doubles/integers. CSS isn't Turing-complete, let alone typed.

Resources