How to test a CSS parser? - css

I'm writing a parser to parse CSS.
I started by modifying the CSS reference grammar, to use whichever grammar and lexer syntax are supported by the 3rd-party parser generator tool which I'm using.
I think that I've finished coding the grammar: the parser-generator is able now to generate state transition tables for/from my grammar.
The result (the output from the parser-generator) is approximately 116 "rules", which correspond to 116 cases in a switch statement. Examples of these rules/switch statements are:
Stylesheet begins with specifying a charset
Stylesheet begins without specifying a charset:
Stylesheet is empty
Stylesheet begins with whitespace
...etc...
The parser-generator has done all it can for me, and now I'm begining to write (by hand) the various cases of the switch statements, which will build what I think people call an 'abstract syntax tree'.
My question is about how to test this. I think that what I want is a set of CSS files which exercise the various combination and possibilities: e.g. one CSS file which specifies a charset; another file which doesn't specify a charset; etc.
Is there general a way to auto-generate this set of input data, for an arbitrary grammar or set of rules?
Alternatively, is there a set of specifically CSS files, whose purpose is to cover the combination and possibilities allowed by the standard CSS grammar?
Feel free to comment too if I'm going about this all wrong.
At the moment I don't need:
Files to test handling of illegal input (i.e. of files which don't conform to the grammar)
Testing of how various browsers render based on their parsing of CSS

Microsoft made a set of many thousands of CSS tests for IE8 compliance with the CSS spec.
http://samples.msdn.microsoft.com/ietestcenter/css.htm
While they are focused on testing browser compliance, possibly you could adapt them.
There are also the older W3C test suites, which are not as complete, but might serve your purpose:
http://www.w3.org/Style/CSS/Test/

A context free grammar implicitly proposes an infinite set of (parse) trees. Each proposed tree has a set of leaves which make a concrete sentence in the language accepted by that grammar. By exploring the the set of proposed trees (e.g, by expanding each nonterminal according to it possible alternatives), you can generate any arbitrary instance of the language. You can generate a set of tests by walking the tree proposals and making random choices. A more focused approach would be to use iterative deepening search to generate sentences ordered by size. With any interesting grammer, you're likely to get a huge number of instances, but hey, that's what automated testing is for.
What I wouldn't do is generate such sentences from your production grammar, because the sentences you generate will be, by definition, the ones it accepts :-{ What you should do is construct your sentence generator using the reference grammar, to exploit the fact that you what it accepts and what you've implemented might be different.

4 years late for OP but SimonSapin/css-parsing-tests seems like a decent test suite for parsers.

Related

why underscores are not recommended for variable names in Julia?

I read in the Julia doc page https://docs.julialang.org/en/v1/manual/variables/#:~:text=Variable%20names%20must%20begin%20with,Sm%20math%20symbols)%20are%20allowed. :
Word separation can be indicated by underscores ('_'), but use of
underscores is discouraged unless the name would be hard to read
otherwise
My question is if there any reasons to discourage the usage of underscores? Thanks.
I don't think underscores are really discouraged in user code and for internal variables. It is mostly for being consistent with the style in Base Julia, which follows this, mostly. And consistency is good, right?
But if you create a package or module, then the interface normally consists of types and functions. Typenames have strong convetion that they should be CapitalCase. User-facing functions are normally lowercase without _, because they are supposed to be simple, brief and should express a single well-defined concept. A bit like the Unix philospophy: every function should do one thing, and do it well.
A convention discouraging composite and long identifier names encourages you to create simple functions. If your function needs a name with underscores, it's possibly a sign that you should break it into multiple functions.
But in your own code, use whatever convension that suits you.
I’m no expert on Julia, but the line you quote is located under the header “Stylistic Conventions” and I would presume that’s basically it.
There is an additional section about naming conventions in the docs under Style Guide
There is a line in there that says:
“Underscores are also used to indicate a combination of concepts”.
So if you decided to use a lot of underscores in your function names, the next programmer to work on your code might think you are “combining concepts”.

CSS BEM, which chars for the modifier..?

I'm not sure which is the official website, I've found getbem.com and en.bem.info.
One suggests using -- for the modifier (Naming):
CSS class is formed as block’s or element’s name plus two dashes.
The other _ for the modifier (Modifier name):
A modifier name is delimited by a single underscore (_).
I know I can use either, and really it's just important to be consistent, but I like to try and use official specs whenever possible.
Which is the official website..?
Should I really be using -- or _ for modifiers..?
Which is the official website?
BEM started as an informal set of guidelines by Yandex, which they later formalized on en.bem.info, so in that regard en.bem.info is the "canonical" version of BEM.
With that said, there are many flavors of BEM, and I myself use a variant influenced largely by Harry Roberts and Nicolas Gallagher.
This brings me to your next question:
Should I really be using -- or _ for modifiers?
To that my answer is: you should be consistent in your usage, but you may use whatever character(s) you'd like for the variant of BEM that you're using. Just be sure that everyone on your team understands which variant you're standardizing on.
This is similar to using tabs vs two spaces vs four spaces (vs hotdogs). It doesn't actually make a difference beyond being something that people tend to have an irrationally strong personal preference for.
To help normalize people, I use an example syntax of a block, element, and modifier that shows what the variant is normalizing on:
I used to primarily use:
the-block__the-element--the-modifier
But I now prefer:
TheBlock_theElement-theModifier
for its brevity.

Avoiding name collisions in QML

While learning QML, I want to learn a good style from the beginning.
However, I already encountered some problems when it comes to naming, and I can't find something like "world-acclaimed" coding conventions, that not only cover the order of your statements, but also some good naming conventions
(as found here: http://doc.qt.io/qt-4.8/qml-best-practices-coding.html).
The problems I find are as follows:
properties, IDs, (model)roles - they can all clash pretty easy, especially as the IDs and roles are present through multiple layers of items.
So are there any good guidelines on how to name your:
properties
roles
IDs
functions
function variables
components
that have proved worthy in the field?
QML already enforces some naming conventions - types must begin with capital letter, properties must begin with lower case letter, etc.
Unfortunately, QML elements come with quite a lot of stuff to them, and it is often the case where you get naming conflicts. In that case the "innermost declaration" seems to take precedence when resolving stuff, that is your own declarations will shadow stock properties, and there is no way to address those anymore, unlike in C++ where you can BaseType::stuff. I have outlined a possible approach in case you need to override and still access "inherited" members here.
If all you need is to avoid clashes - prepending something works very well. The most basic way is to use the underscore as in _something - QML's stuff never begins with an underscore, so there is no danger of clashing. For types, I prepend a character as well, which is also useful to sort/group components in the project tree view, I prepend U_Something - U for UI, C for core, P for prototype and so on. Same old approach that has been used through perpetuity in programming, especially in languages that don't have classes, namespaces and such and the only way to avoid conflicts is to use names such as VK_ERROR_FORMAT_NOT_SUPPORTED quite common in C APIs such as Vulkan.

How do you document your Less? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I use JSDoc to document all the JS I write, but I am curious about how people document their less. I guess I could use JSDoc, but it doesn't seem right since less is not JS. I also want to avoid documenting my less with JSDoc if there is a standard way that might allow different IDEs to provide tooling support.
Does anyone know of a standard way of documenting less?
Documenting source code involves different issues, and may have different objectives, than documenting CSS, be it LESS or any other variant.
Source code involves classes and methods involving contracts, such as the types and meaning of parameters and return values. It also may have complex logic that requires explanation, or handle multiple conditions, or deal with various edge cases. It may be implementing an API which third parties will consume, which should have its own stand-along documentation that can be "read". Systems such as JSDoc are designed with all this in mind. People reading the code can easily understand the purpose and signature and logic of the various routines, and the comments can be processed into API documents.
In a similar vein, source code is typically organized logically into a hierarchy of modules and classes. When reading the documentation, it's common to want to jump from a description of a subclass to the description of its superclass, or up to the module level. Tools like JSDoc also make this easy, by spitting out sets of interlinked HTML pages, most often.
On the other hand, consider a library such as Underscore, to which only some parts of the above apply. There are no modules, or classes, or class hierarchies. Instead, it is a bag of tools. Therefore, there is really no need for a lot of JSDoc-like machinery. Instead, what I want to do is to be able to READ the code and easily see what's happening, or get a narrative about the functions provided, probably with some code examples. That's why they use Docco, as recommended by a commenter. It's perfect for that. And as the commenter also mentioned, it can be used with almost any programming language, including CSS.
Compared to "languages" like JavaScript, CSS is (typically) flatter, and does not have the notion of "contracts" of parameters and return values, nor complex computations, although in systems like LESS of course you have mix-ins and calculations. With CSS, you also have the situation that in many cases the effect of the CSS is something visual, like say a button colored a certain way with text of a certain size. We have two potential consumers of comments in CSS: the programmer who is actually looking at the CSS code, and the UI designer or implementor who wants to know what styles are defined and check how they work.
Personally, I would adopt two approaches here, mapped to the two types of consumers. In the CSS code itself, I'd simply comment narratively, describing the purpose and structure of the rule. Parallel to that, I'd build a separate "styleguide" site, which contains visual examples of all the styles. There have been various attempts to automate the creation of such styleguides, with varying degrees of success. I have not used them, so cannot say how useful they might be. Personally, I'd go with a hand-rolled style guide.
It's also worth pointing out that the only thing worse than no documentation is wrong documentation. Whatever documentation approach you take, you have to make sure it's really sustainable and maintainable. In that sense, simpler is better.
Finally, let me note that the need for extensive documentation is inversely proportional to how well designed a set of styles and classes is. There is not much point in papering over baroque designs with poorly factored classes, weird dependencies, and poor naming, with lots of documentation. Instead, you might want to focus on refactoring your CSS so it's at least a bit more self-documenting.

Goto is considered harmful, but did anyone attempt to make code using goto re-usable and maintainable?

Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf). I was wondering is anyone attempted to find a way to make code using goto's re-usable and maintainable and not-harmful by adding any other language extensions or developing a language which allows for gotos.
The reason I ask the question is that it occurs to me that code written in Assembly language often used goto's and global variables to make the program work well within a limited space. The Atari 2600 which had 128 bytes of ram and the program was loaded from ROM cartridge. In this case, it was better to use unstructured programming and to make the most of the freedoms this allows to make the most of a very limited space for the program.
When you compare this with a game programmed today without the use of gotos, the game takes up much more space.
Then it occurs to me that perhaps its possible to program with the use of gotos if some rules or other language changes are made to support this, then the negative effects of gotos could be reduced or eliminated. Has anyone tried to find a way to make goto's NOT considered harmful by creating a language or some rules to follow which allow gotos to be not harmful.
If no-one looked for a way to use gotos in a non-harmful way then perhaps we adopted structured programming un-necessarily based solely on this paper? Perhaps there is another solution which allows for the use of gotos without the down-side.
Comparing gotos to structured programming is comparing a situation where the programmer has to remember what every labels in the code actually mean and do, and where there are, to a situation where the conditional branches are explicitly described.
As of the advantages of the goto statement regarding the place a program might take, I think that games today are big because of the graphic and sound resources they use. That is, show 1,000,000 polygons. The cost of a goto compared to that is totally neglectable.
Moreover, the structural statements are ultimately compiled into goto ("jmp") statements by the compiler when outputting assembly.
To answer the question, it might be possible to make goto less harmful by creating naming and syntax conventions. Enforcing these conventions into rules is however pretty much what structural programming does.
Linus Torvald argued once that goto can make source code clearer, but goto is useful in so very special cases that I would not dare use it as a programmer.
This question is somehow related to yours, since I think this one of the most common situations where a goto is needed.

Resources