How can I search CSS with Perl? - css

First question from a long time user.
I'm writing a Perl script that will go through a number of HTML files, search them line-by-line for instances of "color:" or "background-color:" (the CSS tags) and print the entire line when it comes across one of these instances. This is fairly straightforward.
Now I'll admit I'm still a beginning programmer, so this next part may be extremely obvious, but that's why I came here :).
What I want it to do is when it finds an instance of "color:" or "background-color:" I want it to trace back and find the name of the element, and print that as well. For example:
If my document contained the following CSS:
.css_class {
font-size: 18px;
font-weight: bold;
color: #FFEFA1;
font-family: Arial, Helvetica, sans-serif;
}
I would want the script to output something like:
css_class,#FFEFA1
Ideally it would output this as a text file.
I would greatly appreciate any advice that could be given to me regarding this!
Here is my script in full thus far:
$color = "color:";
open (FILE, "index.html");
#document = `<FILE>`;
close (FILE);
foreach $line (#document){
if($line =~ /$color/){
print $line;
}
}

Since you asked for advice (and this isn't a coding service) I'll offer just that.
Always use strictures and warnings:
use strict;
use warnings;
Always check the return value of open calls:
open(FILE, 'filename') or die "Can't read file 'filename' [$!]\n";
Use the three-arg form of open and lexical filehandles instead of globs:
open(my $fh, '<', 'filename') or die "Can't read file 'filename' [$!]\n";
Don't slurp when line-by-line processing will do:
while (my $line = <$fh>) {
# do something with $line
}
Use backreferences to retrieve data from regex matches:
if ($line =~ /color *: *(#[0-9a-fA-F]{6})/) {
# color value is in $1
}
Save the class name in a temporary variable so that you have it when you match a color:
if ($line =~ /^.(\w+) *\{/) {
$class = $1;
}

Well, this is not as simple as it seems.
CSS classes can be defined in many ways. For example,
.classy {
color: black;
}
Good luck using a line-by-line approach for parsing that.
Actually, my first approach would be searching CPAN. This looks promising:
CSS - Object oriented access to Cascading Style Sheets (CSS)
Edit:
I installed HTML::TreeBuilder and CSS modules from CPAN and concocted the following aberration:
use strict;
use HTML::TreeBuilder;
use CSS;
foreach my $file_name (#ARGV) {
my $tree = HTML::TreeBuilder->new; # empty tree
$tree->parse_file($file_name);
my $styles = $tree->find('style');
if ($styles) {
foreach my $style ($styles) {
# This is an insane hack, not guarantee
# to work in the future.
my $css = CSS->new;
$css->read_string(join "\n", #{$style->{_content}});
print $css->output;
}
}
$tree = $tree->delete;
}
This thing only prints all the CSS selectors from list of HTML files, but nicely formatted so you should be able to continue from here.

For yet another way to do it, you can ask perl to read from the file in sections other than lines, for example by using the "}" as a record separator.
my $color = "color:";
open (my $fh, '<', "index.html") || die "Can't open file: $!";
{
local $/ = "}";
while( my $section = <$fh>) {
if($section =~ /$color(.*)/) {
my ($selector) = $line =~ /(.*){/;
print "$selector, $section\n";
}
}
Untested! Also, this of course assumes that your CSS neatly ends its sections with a } on a line on it's own.

I'm not having problems with the regex's but rather with the capture of data. Since CSS elements are typically multi-line, I need to figure out how to create an array between the { and } with each linebreak as a delimiter for list items.
No, you don't.
For the problem as stated, the only lines of interest will be those containing either a class name or a color definition, and possibly also lines containing } to mark the end of a class. All other lines can be ignored, so there's no need to put them into an array.
Since class specifications cannot be nested[1], the last seen set of class names will always be the active set of classes. Therefore, you need only record the last seen set of class names and, when a color specification is encountered, print those class names.
There are still some potential difficulties handling cases in which a specification block is shared by multiple classes (.foo, .bar, .baz { ... }), which may or may not be spread across multiple lines, or if multiple attributes are defined on the same line, but dealing with those should follow fairly easily from what I've already laid out. Depending on your input data, you may also need to include a basic state engine to keep track of whether you're in comments or not.
[1] i.e., Although you can have semantically-nested classes, such as .foo and .foo .bar, they have to be specified in the CSS file as
.foo {
...
}
.foo .bar {
...
}
and cannot be
.foo {
...
.bar {
...
}
}

Although I have not tested the code below, but something like this should work:
if ($line =~ m/\.(.*?) \{(.*?)color:(.*?);(.*)/) {
print "$1,$3\n";
}
You should invest some time learning regular expressions for Perl.

Related

How do I create an extract for a delta from the command line

I have a delta with some methods in it.
I want to create a schema extract that contains only the methods in the delta in an automated way so that I don't have to create one by hand or using the hateful selection tree in the Jade IDE.
The jadeworld documentation suggests I might be able to do it:
https://www.jadeworld.com/docs/jade-70/content/resources/userguide/chapter_10_-_extracting_and_loading_schemas/extracting_schemas_as_a_non-gui_client_application.htm
When I try, no extract files are created.
This is the command I am running:
jadclient path=E:\Jade63\System\ schema=JadeSchema ini=C:\Jade63\bin\jade.ini app=JadeBatchExtract endJade File d:\temp\delta.scm d:\temp\delta.ddb d:\temp\param.unl delta=TFS3274
Any help would be appreciated.
For 'File' extracts, you need to specify which schema to extract. This is the fourth parameter, after the UNL file, before adding the delta argument. I've added this to the example below, assuming 'Delta' is the schema name.
jadclient path=E:\Jade63\System\ schema=JadeSchema ini=C:\Jade63\bin\jade.ini app=JadeBatchExtract endJade File d:\temp\delta.scm d:\temp\delta.ddb d:\temp\param.unl Delta delta=TFS3274
Unfortunately, I'm not sure if this will extract just the methods that are in the specified delta. Rather, I believe everything specified by the UNL file will be extracted, but where any methods are checked out to a delta, the version in the specified delta will be extracted.
You'll need to experiment to confirm, but in my experience, patches are more suitable for performing extracts without needing to specify what's changed.
Kevin's has answered the question I asked, I'm just adding this bit here for anyone else who happens this way. I was trying to automate creating a UNL file from a delta. The following perl script will generate a UNL file from a schema extract file. So you can create a schema extract from a delta in the IDE, then run this script on it to create a UNL, which you can then use for creating subsequent extracts.
#!/usr/bin/perl
$state="init";
$class="";
$method="";
#result=();
while(<>)
{
if($state eq "init")
{
if(m/typeDefinitions/)
{
$state="inTypes";
}
}
elsif($state eq "inTypes")
{
if(m/[^(]+\(\r/)
{
$state="inClass";
($class=$_) =~ s/\s*(\S+).*\(/$1/;
$class =~ s/[\r\n]//g;
}
elsif(m/inverseDefinitions/)
{
$state="done";
}
}
elsif($state eq "inClass")
{
if(m/jadeMethodDefinitions/)
{
$state="inMethod";
}
elsif(m/^\s*\)\r/)
{
$state="inTypes";
}
}
elsif($state eq "inMethod")
{
if(m/[^(]+[(]/)
{
($method=$_) =~ s/\s*(\S+)\(.*/$1/;
$method =~ s/[\r\n]//g;
$state="inClass";
push #result, "Method $class $method\n";
}
}
}
#result = sort #result;
print #result;
print "\n";

Building of a string, which depends on variable number of parameters

My question is: how to build a string in Less, which depends on variable number of parameters. For instance, I would like to make a mixin, which helps me to write #font-face CSS rules. So I need to build src:... fonts property for arbitrary number of formats (.eot, .ttf, .oft, .woff, .woff2, .svg) of my font. Here is my Less loop to process all font formats in list:
// #some-types - it is my font formats list, just smth. like 'otf', 'eot',...
// #old-src-value - it is string with src for my font from previous loop
// #counter - it is my loop counter
.make-font-src(#some-types; #old-src-value; #counter) when (#counter <= length(#some-types)) {
// Here we get next font format from #some-types
#font-type: extract(#some-types, #counter);
// Used for building 'format("opentype")' - like part of src value string
.get-font-format(#font-type);
// Building a part of src value string for this iteration
#src-value: e('#{old-src-value}, url("#{font-path}#{font-filename}.#{font-type}") format("#{font-format}")');
// Recursive call of this mixin for looping
.make-font-src(#some-types; #src-value; (#counter + 1));
}
So I'm stuck in how to fetch complete src value string, when all font formats will be processed in the loop? Also please refer to this codepen demo.
As mentioned in my comment, this would not cause a recursive definition error because you have assigned the value to a different variable and then used it. However, it seems like Less is processing the property-value setting line as soon as the first iteration of the loop is completed. You can verify this by changing the counter value for the first iteration itself to 2 or more.
One solution (a better approach to the problem in my opinion) would be to use the property merging with comma feature and set the property-value pair directly like in the below snippet:
.make-font-src(#some-types; #counter) when (#counter <= length(#some-types)) {
#font-path: 'some/test/path/';
#font-filename: 'Arial';
#font-type: extract(#some-types, #counter);
src+: e('url("#{font-path}#{font-filename}.#{font-type}") format("#{font-type}")');
.make-font-src(#some-types; (#counter + 1));
}
div.test {
.make-font-src('eot', 'woff', 'svg'; 1);
}
This when compiled would produce the following output:
div.test {
src: url("some/test/path/Arial.eot") format("eot"),
url("some/test/path/Arial.woff") format("woff"),
url("some/test/path/Arial.svg") format("svg");
}
Finally, I found my own solution: if we add special 'getter' mixin with guard, which triggered on last iteration of the loop, we can get full src value from our loop mixin.
.getter(#cond; #list) when (#cond = length(#list)) {
#font-src-full: #src-value;
}
Here is a fiddle with demo

Regular expression for a tcp/udp port recognition (16-bits)

I have a lex file port_regex.l that contains the following code.
DECIMAL_16bits [ \t]*[:digit:]{1,4}[ \t]*
SPACE [ \t]
%x S_rule S_dst_port
%%
%{
BEGIN S_rule;
%}
<S_rule>(dst-port){SPACE} {
BEGIN(S_dst_port);
}
<S_dst_port>\{{DECIMAL_16bits}\} {
printf("\n\nMATCH [%s]\n\n", yytext);
BEGIN S_rule;
}
. { ECHO; }
%%
int main(void)
{
while (yylex() != 0)
;
return(0);
}
int yywrap(void)
{
return 1;
}
I create an executable from it as follows.
flex port_regex.l
gcc lex.yy.c -o port_regex
which creates an executable called port_regex.
I have a file that contains test data called port.file which is given below.
dst-port {234}
dst-port {236}
dst-port {233}
dst-port {2656}
How do I test the port.file using port_regex executable.
can I do something like
./port_regex < port.file
I tried the above and it doesn't seem to work??
So long as your application doesn't become a lot more complex, I think using start conditions is a good way to go, instead of introducing a yacc-generated parser.
A couple of thoughts:
The examples I see sometimes use parentheses with BEGIN (BEGIN(comment)) and sometimes not (BEGIN comment). I doubt that it makes any difference, but you should be consistent.
The book says that the default rule to echo unmatched characters is still in effect, even under exclusive start conditions, so you shouldn't need
. { ECHO; }
and since your start conditions are exclusive, it wouldn't fire anyway. Just to make sure, you might rewrite it as
<*>.|\n ECHO;

Where does $4 come from here?

This is in the first rule of Perl:
grammar : GRAMPROG
{
PL_parser->expect = XSTATE;
}
remember stmtseq
{
newPROG(block_end($3,$4));
$$ = 0;
}
How can $4 work when there're only 3 elements on the right side?
An embedded action (the code { PL_parser->expect = XSTATE; } which occurs in the middle of the rule) counts as an element. So there are 4 elements. $1 is the terminal GRAMPROG, $2 is the embedded action, $3 is the nonterminal remember, and $4 is the nonterminal stmtseq. (The value of $2 is whatever value is assigned to $$ inside the embedded action. Currently it would be garbage.)
Under the covers, yacc only really supports actions at the end of a production. So when you interleave an action { PL_parser->expect = XSTATE; } in the middle of a production, yacc (or whatever descendent you're using) pulls out the action and sticks it at the end of an empty rule as such:
grammar: GRAMPROG $$1 remember stmtmseq
{
newPROG(block_end($3, $4));
$$ = 0;
}
$$1:
{
PL_parser->expect = XSTATE;
}
(If your yacc variant support dumping the verbose grammar and you do that, you'll see a lot of $$1, $$2, etc. rules for actions.)
In this case the interleaved action doesn't actually assign anything to $$, but if it had, the grammar rule could have accessed the value as $2.

Is there a standard way to diff du outputs to detect where disk space usage has grown the most

I work with a small team of developers where we share a unix file system to store somewhat large datasets. This file system has a somewhat prohibitive quota on it so about once a month we have to figure out where our free space has gone and see what we can recover.
Obviously we use du a fair amount but this is still a tedious process. I had the thought that we may be able to keep last months du output around and compare it to this months to see where we've had the most growth. My guess this plan isn't very original.
With this in mind I am asking if there are any scripts out there that already do this.
Thanks.
I wrote a program to do this called diff-du. I can't believe nobody had already done this! Anyhow, I find it useful and I hope you will too.
I really don't know if there is a standard way but I need it sometime ago and I wrote a small perl script to handle that. Here is the part of my code:
#!/usr/bin/perl
$FileName = "du-previous";
$Location = ">";
$Sizes;
# Current +++++++++++++++++++++++++++++
$Current = `du "$Location"`;
open my $CurrentFile, '<', \$Current;
while (<$CurrentFile>) {
chomp;
if (/^([0-9]+)[ \t]+(.*)$/) {
$Sizes{$2} = $1;
}
}
close($CurrentFile);
# Previous ++++++++++++++++++++++++++++
open(FILE, $FileName);
while (<FILE>) {
chomp;
if (/^([0-9]+)[ \t]+(.*)$/) {
my $Size = $Sizes{$2};
$Sizes{$2} = $Size - $1;
}
}
close(FILE);
# Show result +++++++++++++++++++++++++
SHOW: while (($key, $value) = each(%Sizes)) {
if ($value == 0) {
next SHOW;
}
printf("%-10d %s\n", $value, $key);
}
close(FILE);
#Save Current +++++++++++++++++++++++++
open my $CurrentFile, '<', \$Current;
open(FILE, ">$FileName");
while (<$CurrentFile>) {
chomp;
print FILE $_."\n";
}
close($CurrentFile);
close(FILE);
The code is not very error-tolerant so you may adjust it.
Basically the code, get the current disk usage information, compare the size with the lastest time it run (saved in 'du-previous'), print the different and save the current usage information.
If you like it, take it.
Hope this helps.
What you really really want is the awesome kdirstat.
For completeness, I've also found du-diff and don't see it mentioned in any other answer. Andrew's diff-du (mentioned in another answer) seems to be more advanced that this one.

Resources