Where does $4 come from here? - bnf

This is in the first rule of Perl:
grammar : GRAMPROG
PL_parser->expect = XSTATE;
remember stmtseq
$$ = 0;
How can $4 work when there're only 3 elements on the right side?

An embedded action (the code { PL_parser->expect = XSTATE; } which occurs in the middle of the rule) counts as an element. So there are 4 elements. $1 is the terminal GRAMPROG, $2 is the embedded action, $3 is the nonterminal remember, and $4 is the nonterminal stmtseq. (The value of $2 is whatever value is assigned to $$ inside the embedded action. Currently it would be garbage.)

Under the covers, yacc only really supports actions at the end of a production. So when you interleave an action { PL_parser->expect = XSTATE; } in the middle of a production, yacc (or whatever descendent you're using) pulls out the action and sticks it at the end of an empty rule as such:
grammar: GRAMPROG $$1 remember stmtmseq
newPROG(block_end($3, $4));
$$ = 0;
PL_parser->expect = XSTATE;
(If your yacc variant support dumping the verbose grammar and you do that, you'll see a lot of $$1, $$2, etc. rules for actions.)
In this case the interleaved action doesn't actually assign anything to $$, but if it had, the grammar rule could have accessed the value as $2.


Need of awk command explaination

I want to know how the below command is working.
awk '/Conditional jump or move depends on uninitialised value/ {block=1} block {str=str sep $0; sep=RS} /^==.*== $/ {block=0; if (str!~/oracle/ && str!~/OCI/ && str!~/tuxedo1222/ && str!~/vprintf/ && str!~/vfprintf/ && str!~/vtrace/) { if (str!~/^$/){print str}} str=sep=""}' file_name.txt >> CondJump_val.txt
I'd also like to know how to check the texts Oracle, OCI, and so on from the second line only. 
The first step is to write it so it's easier to read
awk '
/Conditional jump or move depends on uninitialised value/ {block=1}
block {
str=str sep $0
/^==.*== $/ {
if (str!~/oracle/ && str!~/OCI/ && str!~/tuxedo1222/ && str!~/vprintf/ && str!~/vfprintf/ && str!~/vtrace/) {
if (str!~/^$/) {
print str
' file_name.txt >> CondJump_val.txt
It accumulates the lines starting with "Conditional jump ..." ending with "==...== " into a variable str.
If the accumulated string does not match several patterns, the string is printed.
I'd also like to know how to check the texts Oracle, OCI, and so on from the second line only.
What does that mean? I assume you don't want to see the "Conditional jump..." line in the output. If that's the case then use the next command to jump to the next line of input.
/Conditional jump or move depends on uninitialised value/ {
perhaps consolidate those regex into a single chain ?
if (str !~ "oracle|OCI|tuxedo1222|v[f]?printf|vtrace") {
print str
There are two idiomatic awkisms to understand.
The first can be simplified to this:
$ seq 100 | awk '/^22$/{flag=1}
Why does this work? In awk, flag can be tested even if not yet defined which is what the stand alone flag is doing - the input is only printed if flag is true and flag=1 is only executed when after the regex /^22$/. The condition of flag being true ends with the regex /^31$/ in this simple example.
This is an idiom in awk to executed code between two regex matches on different lines.
In your case, the two regex's are:
/Conditional jump or move depends on uninitialised value/ # start
# in-between, block is true and collect the input into str separated by RS
/^==.*== $/ # end
The other 'awkism' is this:
block {str=str sep $0; sep=RS}
When block is true, collect $0 into str and first time though, RS should not be added in-between the last time. The result is:
str="first lineRSsecond lineRSthird lineRS..."
both depend on awk being able to use a undefined variable without error

Regular expression for a tcp/udp port recognition (16-bits)

I have a lex file port_regex.l that contains the following code.
DECIMAL_16bits [ \t]*[:digit:]{1,4}[ \t]*
SPACE [ \t]
%x S_rule S_dst_port
BEGIN S_rule;
<S_rule>(dst-port){SPACE} {
<S_dst_port>\{{DECIMAL_16bits}\} {
printf("\n\nMATCH [%s]\n\n", yytext);
BEGIN S_rule;
. { ECHO; }
int main(void)
while (yylex() != 0)
int yywrap(void)
return 1;
I create an executable from it as follows.
flex port_regex.l
gcc lex.yy.c -o port_regex
which creates an executable called port_regex.
I have a file that contains test data called port.file which is given below.
dst-port {234}
dst-port {236}
dst-port {233}
dst-port {2656}
How do I test the port.file using port_regex executable.
can I do something like
./port_regex < port.file
I tried the above and it doesn't seem to work??
So long as your application doesn't become a lot more complex, I think using start conditions is a good way to go, instead of introducing a yacc-generated parser.
A couple of thoughts:
The examples I see sometimes use parentheses with BEGIN (BEGIN(comment)) and sometimes not (BEGIN comment). I doubt that it makes any difference, but you should be consistent.
The book says that the default rule to echo unmatched characters is still in effect, even under exclusive start conditions, so you shouldn't need
. { ECHO; }
and since your start conditions are exclusive, it wouldn't fire anyway. Just to make sure, you might rewrite it as
<*>.|\n ECHO;

How to delete partial duplicate lines with AWK?

I have files with these kind of duplicate lines, where only the last field is different:
I need to remove the first occurrence of the line and leave the second one.
I've tried:
awk '!x[$0]++ {getline; print $0}' file.csv
but it's not working as intended, as it's also removing non duplicate lines.
#!/bin/awk -f
s = substr($0, 0, match($0, /,[^,]+$/))
if (!seen[s]) {
print $0
seen[s] = 1
If your near-duplicates are always adjacent, you can just compare to the previous entry and avoid creating a potentially huge associative array.
#!/bin/awk -f
s = substr($0, 0, match($0, /,[^,]*$/))
if (s != prev) {
print prev0
prev = s
prev0 = $0
print $0
Edit: Changed the script so it prints the last one in a group of near-duplicates (no tac needed).
As a general strategy (I'm not much of an AWK pro despite taking classes with Aho) you might try:
Concatenate all the fields except
the last.
Use this string as a key to a hash.
Store the entire line as the value
to a hash.
When you have processed all lines,
loop through the hash printing out
the values.
This isn't AWK specific and I can't easily provide any sample code, but this is what I would first try.

Using Vim, how can I make CSS rules into one liners?

I would like to come up with a Vim substitution command to turn multi-line CSS rules, like this one:
#main {
padding: 0;
margin: 10px auto;
into compacted single-line rules, like so:
#main {padding:0;margin:10px auto;}
I have a ton of CSS rules that are taking up too many lines, and I cannot figure out the :%s/ commands to use.
Here's a one-liner:
:%s/{\_.\{-}}/\=substitute(submatch(0), '\n', '', 'g')/
\_. matches any character, including a newline, and \{-} is the non-greedy version of *, so {\_.\{-}} matches everything between a matching pair of curly braces, inclusive.
The \= allows you to substitute the result of a vim expression, which we here use to strip out all the newlines '\n' from the matched text (in submatch(0)) using the substitute() function.
The inverse (converting the one-line version to multi-line) can also be done as a one liner:
:%s/{\_.\{-}}/\=substitute(submatch(0), '[{;]', '\0\r', 'g')/
If you are at the beginning or end of the rule, V%J will join it into a single line:
Go to the opening (or closing) brace
Hit V to enter visual mode
Hit % to match the other brace, selecting the whole rule
Hit J to join the lines
Try something like this:
This removes the newlines after opening braces and semicolons ('{' and ';') and then removes the extra whitespace between the concatenated lines.
If you want to change the file, go for rampion's solution.
If you don't want (or can't) change the file, you can play with a custom folding as it permits to choose what and how to display the folded text. For instance:
" {rtp}/fold/css-fold.vim
" [-- local settings --] {{{1
setlocal foldexpr=CssFold(v:lnum)
setlocal foldtext=CssFoldText()
let b:width1 = 20
let b:width2 = 15
nnoremap <buffer> + :let b:width2+=1<cr><c-l>
nnoremap <buffer> - :let b:width2-=1<cr><c-l>
" [-- global definitions --] {{{1
if exists('*CssFold')
setlocal foldmethod=expr
" finish
function! CssFold(lnum)
let cline = getline(a:lnum)
if cline =~ '{\s*$'
return 'a1'
elseif cline =~ '}\s*$'
return 's1'
return '='
function! s:Complete(txt, width)
let length = strlen(a:txt)
if length > a:width
return a:txt
return a:txt . repeat(' ', a:width - length)
function! CssFoldText()
let lnum = v:foldstart
let txt = s:Complete(getline(lnum), b:width1)
let lnum += 1
while lnum < v:foldend
let add = s:Complete(substitute(getline(lnum), '^\s*\(\S\+\)\s*:\s*\(.\{-}\)\s*;\s*$', '\1: \2;', ''), b:width2)
if add !~ '^\s*$'
let txt .= ' ' . add
let lnum += 1
return txt. '}'
I leave the sorting of the fields as exercise. Hint: get all the lines between v:foldstart+1 and v:voldend in a List, sort the list, build the string, and that's all.
I won’t answer the question directly, but instead I suggest you to reconsider your needs. I think that your “bad” example is in fact the better one. It is more readable, easier to modify and reason about. Good indentation is very important not only when it comes to programming languages, but also in CSS and HTML.
You mention that CSS rules are “taking up too many lines”. If you are worried about file size, you should consider using CSS and JS minifiers like YUI Compressor instead of making the code less readable.
A convenient way of doing this transformation is to run the following
short command:
Go to the first line of the file, and use the command gqG to run the whole file through the formatter. Assuming runs of nonempty lines should be collapsed in the whole file.

How can I search CSS with Perl?

First question from a long time user.
I'm writing a Perl script that will go through a number of HTML files, search them line-by-line for instances of "color:" or "background-color:" (the CSS tags) and print the entire line when it comes across one of these instances. This is fairly straightforward.
Now I'll admit I'm still a beginning programmer, so this next part may be extremely obvious, but that's why I came here :).
What I want it to do is when it finds an instance of "color:" or "background-color:" I want it to trace back and find the name of the element, and print that as well. For example:
If my document contained the following CSS:
.css_class {
font-size: 18px;
font-weight: bold;
color: #FFEFA1;
font-family: Arial, Helvetica, sans-serif;
I would want the script to output something like:
Ideally it would output this as a text file.
I would greatly appreciate any advice that could be given to me regarding this!
Here is my script in full thus far:
$color = "color:";
open (FILE, "index.html");
#document = `<FILE>`;
close (FILE);
foreach $line (#document){
if($line =~ /$color/){
print $line;
Since you asked for advice (and this isn't a coding service) I'll offer just that.
Always use strictures and warnings:
use strict;
use warnings;
Always check the return value of open calls:
open(FILE, 'filename') or die "Can't read file 'filename' [$!]\n";
Use the three-arg form of open and lexical filehandles instead of globs:
open(my $fh, '<', 'filename') or die "Can't read file 'filename' [$!]\n";
Don't slurp when line-by-line processing will do:
while (my $line = <$fh>) {
# do something with $line
Use backreferences to retrieve data from regex matches:
if ($line =~ /color *: *(#[0-9a-fA-F]{6})/) {
# color value is in $1
Save the class name in a temporary variable so that you have it when you match a color:
if ($line =~ /^.(\w+) *\{/) {
$class = $1;
Well, this is not as simple as it seems.
CSS classes can be defined in many ways. For example,
.classy {
color: black;
Good luck using a line-by-line approach for parsing that.
Actually, my first approach would be searching CPAN. This looks promising:
CSS - Object oriented access to Cascading Style Sheets (CSS)
I installed HTML::TreeBuilder and CSS modules from CPAN and concocted the following aberration:
use strict;
use HTML::TreeBuilder;
use CSS;
foreach my $file_name (#ARGV) {
my $tree = HTML::TreeBuilder->new; # empty tree
my $styles = $tree->find('style');
if ($styles) {
foreach my $style ($styles) {
# This is an insane hack, not guarantee
# to work in the future.
my $css = CSS->new;
$css->read_string(join "\n", #{$style->{_content}});
print $css->output;
$tree = $tree->delete;
This thing only prints all the CSS selectors from list of HTML files, but nicely formatted so you should be able to continue from here.
For yet another way to do it, you can ask perl to read from the file in sections other than lines, for example by using the "}" as a record separator.
my $color = "color:";
open (my $fh, '<', "index.html") || die "Can't open file: $!";
local $/ = "}";
while( my $section = <$fh>) {
if($section =~ /$color(.*)/) {
my ($selector) = $line =~ /(.*){/;
print "$selector, $section\n";
Untested! Also, this of course assumes that your CSS neatly ends its sections with a } on a line on it's own.
I'm not having problems with the regex's but rather with the capture of data. Since CSS elements are typically multi-line, I need to figure out how to create an array between the { and } with each linebreak as a delimiter for list items.
No, you don't.
For the problem as stated, the only lines of interest will be those containing either a class name or a color definition, and possibly also lines containing } to mark the end of a class. All other lines can be ignored, so there's no need to put them into an array.
Since class specifications cannot be nested[1], the last seen set of class names will always be the active set of classes. Therefore, you need only record the last seen set of class names and, when a color specification is encountered, print those class names.
There are still some potential difficulties handling cases in which a specification block is shared by multiple classes (.foo, .bar, .baz { ... }), which may or may not be spread across multiple lines, or if multiple attributes are defined on the same line, but dealing with those should follow fairly easily from what I've already laid out. Depending on your input data, you may also need to include a basic state engine to keep track of whether you're in comments or not.
[1] i.e., Although you can have semantically-nested classes, such as .foo and .foo .bar, they have to be specified in the CSS file as
.foo {
.foo .bar {
and cannot be
.foo {
.bar {
Although I have not tested the code below, but something like this should work:
if ($line =~ m/\.(.*?) \{(.*?)color:(.*?);(.*)/) {
print "$1,$3\n";
You should invest some time learning regular expressions for Perl.
