How to use regex to match css elements - css

I'm calling indexOf on the head section in jQuery. I want to see if CSS contains any font-weight:bold property. The issue is, that property can have 4 alters:
font-weight:bold
font-weight: bold
font-weight :bold
font-weight : bold
How can I match this using regex?

Try this
\b(?:font-weight\s*:[^;\{\}]*?\bbold)\b
or would be better to use:
\b(?:font-weight[\s*\\]*:[\s*\\]*?\bbold\b)
Explanation
\b # Assert position at a word boundary
(?: # Match the regular expression below
font-weight # Match the characters “font-weight” literally
[\s*\\] # Match a single character present in the list below
# A whitespace character (spaces, tabs, and line breaks)
# The character “*”
# A \ character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
: # Match the character “:” literally
[\s*\\] # Match a single character present in the list below
# A whitespace character (spaces, tabs, and line breaks)
# The character “*”
# A \ character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\b # Assert position at a word boundary
bold # Match the characters “bold” literally
\b # Assert position at a word boundary
)
Hope this helps.

Remember that font-weight:bold; could also be set as an integer 700 or through a shorthand font notation which expands the range of strings to match quite a bit.
\b(?:font.*?:[^;]*?\b(bold|700))\b
Fiddled

Try this:
/font-weight\: bold|font-weight\:bold|font-weight \:bold|font-weight \: bold/

Try this:
RegularExpressionValidator.ValidationExpression = "font-weight\s?:\s?bold"

You cannot use regular expressions for that. CSS may contain strings like so
.foo{ content: "font-weight:bold;" }
You need a proper parser. Luckily the browser has one and offers APIs to access the individual CSS rules.

Related

Regular expression to allow only one space between words

I've a textbox in an ASP.NET application, for which I need to use a regular expression to validate the user input string. Requirements for regex are -
It should allow only one space between words. That is, total number of spaces between words or characters should only be one.
It should ignore leading and trailing spaces.
Matches:
Test
Test abc
Non Matches:
Test abc def
Test abc --> I wanted to include multiple spaces between the 2 words. However the editor ignores these extra spaces while posting a question.
Assuming there must be either one or two 'words' (i.e. sequences of non-space characters)
"\s*\S+(\s\S+)?\s*"
Change \S to [A-Za-z] if you want to allow only letters.
Pretty straightforward:
/^ *(\w+ ?)+ *$/
Fiddle: http://refiddle.com/gls
Maybe this one will do?
\s*\S+?\s?\S*\s*
Edit: Its a server-encoded regex, meaning that you might need to remove one of those escaping slashes.
How about:
^\s*(\w+\s)*\w+\s*$

RegEx Expression to validate TextBox data

The validator should allow only alphabetic characters (a-z and A-Z), dots(.),comma(,), slash(/) and hyphen (-). Please help to to find out one. Or tell me how to create one customized to my specifications.
I have tried [a-zA-Z,-/.] and it works but my requirements only allow for a maximum of 1 of each of the non-letter characters I specified (.,/-).
Try: ^[A-Za-z]*[-a-zA-Z,/.]{1}[A-Za-z]*$
Explanation
^ Anchor to start of string
[A-Za-z]* may be surrounded by multiple letters
[-a-zA-Z,/.]{1} Only one of the enclosed characters
[A-Za-z]* may be surrounded by multiple letters
$ Anchor to end of string

regex to allow space if followed by character

I have an asp.net regularexpressionvalidator that I need to match on a textbox. If there is any text, logically the rules are as follows:
The text must be at least three characters, after any trimming to remove spaces.
Characters allowed are a-zA-Z0-9-' /\&.
I'm having major pain trying to construct an expression that will allow a space as the thrid character only if there is a fourth non-space character.
Can anyone suggest an expression? My last attempt was:
^[a-zA-Z0-9-'/\\&\.](([a-zA-Z0-9-'/\\&\.][a-zA-Z0-9-' /\\&\.])|([a-zA-Z0-9-' /\\&\.][a-zA-Z0-9-'/\\&\.]))[a-zA-Z0-9-' /\\&\.]{0,}$
but that does not match on 'a a'.
Thanks.
OK, now this is all in one regex:
^\s*(?=[a-zA-Z0-9'/\\&.-])([a-zA-Z0-9'/\\&.\s-]{3,})(?<=\S)\s*$
Explanation:
^ # Start of string
\s* # Optional leading whitespace, don't capture that.
(?= # Assert that...
[a-zA-Z0-9'/\\&.-] # the next character is allowed and non-space
)
( # Match and capture...
[a-zA-Z0-9'/\\&.\s-]{3,} # three or more allowed characters, including space
)
(?<=\S) # Assert that the previous character is not a space
\s* # Optional trailing whitespace, don't capture that.
$ # End of string
This matches
abc
aZ- &//
a ab abc x
aaa
a a
and doesn't match
aa
abc!
a&
Simplifying your allowed characters to be a-z and space for clarity, doesn't this do it?
^ *[a-z][a-z ]+[a-z] *$
Ignore spaces. Now a letter. Then some letters or spaces. Then a letter. Ignore more spaces.
The full thing becomes:
^ *[a-zA-Z0-9-'/\\&\.][a-zA-Z0-9-'/\\&\. ]+[a-zA-Z0-9-'/\\&\.] *$

Regular expression to match maximium of five words

I have a regular expression
^[a-zA-Z+#-.0-9]{1,5}$
which validates that the word contains alpha-numeric characters and few special characters and length should not be more than 5 characters.
How do I make this regular expression to accept a maximum of five words matching the above regular expression.
^[a-zA-Z+#\-.0-9]{1,5}(\s[a-zA-Z+#\-.0-9]{1,5}){0,4}$
Also, you could use for example [ ] instead of \s if you just want to accept space, not tab and newline. And you could write [ ]+ (or \s+) for any number of spaces (or whitespaces), not just one.
Edit: Removed the invalid solution and fixed the bug mentioned by unicornaddict.
I believe this may be what you're looking for. It forces at least one word of your desired pattern, then zero to four of the same, each preceded by one or more white-space characters:
^XX(\s+XX){0,4}$
where XX is your actual one-word regex.
It's separated into two distinct sections so that you're not required to have white-space at the end of the string. If you want to allow for such white-space, simply add \s* at that point. For example, allowing white-space both at start and end would be:
^\s*XX(\s+XX){0,4}\s*$
You regex has a small bug. It matches letters, digits, +, #, period but not hyphen and also all char between # and period. This is because hyphen in a char class when surrounded on both sides acts as a range meta char. To avoid this you'll have to escape the hyphen:
^[a-zA-Z+#\-.0-9]{1,5}$
Or put it at the beg/end of the char class, so that its treated literally:
^[-a-zA-Z+#-.0-9]{1,5}$
^[a-zA-Z+#.0-9-]{1,5}$
Now to match a max of 5 such words you can use:
^(?:[a-zA-Z+#\-.0-9]{1,5}\s+){1,5}$
EDIT: This solution has a severe limitation of matching only those input that end in white space!!! To overcome this limitation you can see the ans by Jakob.

Trying to remove hex codes from regular expression results

My first question here at so!
To the point;
I'm pretty newbish when it comes to regular expressions.
To learn it a bit better and create something I can actually use, I'm trying to create a regexp that will find all the CSS tags in a CSS file.
So far, I'm using:
[#.]([a-zA-Z0-9_\-])*
Which is working pretty fine and finds the #TB_window as well as the #TB_window img#TB_Image and the .TB_Image#TB_window.
The problem is it also finds the hex code tags in the CSS file. ie #FFF or #eaeaea.
The .png or .jpg or and 0.75 are found as well..
Actually it's pretty logical that they are found, but aren't there smart workarounds for that?
Like excluding anything between the brackets {..}?
(I'm pretty sure that's possible, but my regexp experience is not much yet).
Thanks in advance!
Cheers!
Mike
CSS is a very simple, regular language, which means it can be completely parsed by Regex. All there is to it are groups of selectors, each followed by a group of options separated by colons.
Note that all regexes in this post should have the verbose and dotall flags set (/s and /x in some languages, re.DOTALL and re.VERBOSE in Python).
To get pairs of (selectors, rules):
\s* # Match any initial space
([^{}]+?) # Ungreedily match a string of characters that are not curly braces.
\s* # Arbitrary spacing again.
\{ # Opening brace.
\s* # Arbitrary spacing again.
(.*?) # Ungreedily match anything any number of times.
\s* # Arbitrary spacing again.
\} # Closing brace.
This will not work in the rare case of having a quoted curly bracket in an attribute selector (e.g. img[src~='{abc}']) or in a rule (e.g. background: url('images/ab{c}.jpg')). This can be fixed by complicating the regex some more:
\s* # Match any initial space
((?: # Start the selectors capture group.
[^{}\"\'] # Any character other than braces or quotes.
| # OR
\" # An opening double quote.
(?:[^\"\\]|\\.)* # Either a neither-quote-not-backslash, or an escaped character.
\" # And a closing double quote.
| # OR
\'(?:[^\']|\\.)*\' # Same as above, but for single quotes.
)+?) # Ungreedily match all that once or more.
\s* # Arbitrary spacing again.
\{ # Opening brace.
\s* # Arbitrary spacing again.
((?:[^{}\"\']|\"(?:[^\"\\]|\\.)*\"|\'(?:[^\'\\]|\\.)*\')*?)
# The above line is the same as the one in the selector capture group.
\s* # Arbitrary spacing again.
\} # Closing brace.
# This will even correctly identify escaped quotes.
Woah, that's a handful. But if you approach it in a modular fashion, you'll notice it's not as complex as it seems at first glance.
Now, to split selectors and rules, we go have to match strings of characters that are either non-delimiters (where a delimiter is the comma for selectors and a semicolon for rules) or quoted strings with anything inside. We'll use the same pattern we used above.
For selectors:
\s* # Match any initial space
((?: # Start the selectors capture group.
[^,\"\'] # Any character other than commas or quotes.
| # OR
\" # An opening double quote.
(?:[^\"\\]|\\.)* # Either a neither-quote-not-backslash, or an escaped character.
\" # And a closing double quote.
| # OR
\'(?:[^\'\\]|\\.)*\' # Same as above, but for single quotes.
)+?) # Ungreedily match all that.
\s* # Arbitrary spacing.
(?:,|$) # Followed by a comma or the end of a string.
For rules:
\s* # Match any initial space
((?: # Start the selectors capture group.
[^,\"\'] # Any character other than commas or quotes.
| # OR
\" # An opening double quote.
(?:[^\"\\]|\\.)* # Either a neither-quote-not-backslash, or an escaped character.
\" # And a closing double quote.
| # OR
\'(?:[^\'\\]|\\.)*\' # Same as above, but for single quotes.
)+?) # Ungreedily match all that.
\s* # Arbitrary spacing.
(?:;|$) # Followed by a semicolon or the end of a string.
Finally, for each rule, we can split (once!) on a colon to get a property-value pair.
Putting that all together into a Python program (the regexes are the same as above, but non-verbose to save space):
import re
CSS_FILENAME = 'C:/Users/Max/frame.css'
RE_BLOCK = re.compile(r'\s*((?:[^{}"\'\\]|\"(?:[^"\\]|\\.)*"|\'(?:[^\'\\]|\\.)*\')+?)\s*\{\s*((?:[^{}"\'\\]|"(?:[^"\\]|\\.)*"|\'(?:[^\'\\]|\\.)*\')*?)\s*\}', re.DOTALL)
RE_SELECTOR = re.compile(r'\s*((?:[^,"\'\\]|\"(?:[^"\\]|\\.)*\"|\'(?:[^\'\\]|\\.)*\')+?)\s*(?:,|$)', re.DOTALL)
RE_RULE = re.compile(r'\s*((?:[^;"\'\\]|\"(?:[^"\\]|\\.)*\"|\'(?:[^\'\\]|\\.)*\')+?)\s*(?:;|$)', re.DOTALL)
css = open(CSS_FILENAME).read()
print [(RE_SELECTOR.findall(i),
[re.split('\s*:\s*', k, 1)
for k in RE_RULE.findall(j)])
for i, j in RE_BLOCK.findall(css)]
For this sample CSS:
body, p#abc, #cde, a img .fgh, * {
font-size: normal; background-color: white !important;
-webkit-box-shadow: none
}
#test[src~='{a\'bc}'], .tester {
-webkit-transition: opacity 0.35s linear;
background: white !important url("abc\"cd'{e}.jpg");
border-radius: 20px;
opacity: 0;
-webkit-box-shadow: rgba(0, 0, 0, 0.6) 0px 0px 18px;
}
span {display: block;} .nothing{}
... we get (spaced for clarity):
[(['body',
'p#abc',
'#cde',
'a img .fgh',
'*'],
[['font-size', 'normal'],
['background-color', 'white !important'],
['-webkit-box-shadow', 'none']]),
(["#test[src~='{a\\'bc}']",
'.tester'],
[['-webkit-transition', 'opacity 0.35s linear'],
['background', 'white !important url("abc\\"cd\'{e}.jpg")'],
['border-radius', '20px'],
['opacity', '0'],
['-webkit-box-shadow', 'rgba(0, 0, 0, 0.6) 0px 0px 18px']]),
(['span'],
[['display', 'block']]),
(['.nothing'],
[])]
Simple exercise for the reader: write a regex to remove CSS comments (/* ... */).
What about this:
([#.]\S+\s*,?)+(?=\{)
First off, I don't see how the RE you posted would find .TB_Image#TB_window. You could do something like:
/^[#\.]([a-zA-Z0-9_\-]*)\s*{?\s*$/
This would find any occurrences of # or . at the beginning of the line, followed by the tag, optionally followed by a { and then a newline.
Note that this would NOT work for lines like .TB_Image { something: 0; } (all on one line) or div.mydivclass since the . is not at the beginning of the line.
Edit: I don't think nested braces are allowed in CSS, so if you read in all the data and get rid of newlines, you could do something like:
/([a-zA-Z0-9_\-]*([#\.][a-zA-Z0-9_\-]+)+\s*,?\s*)+{.*}/
There's a way to tell a regex to ignore newlines as well, but I never seem to get that right.
It's actually not an easy task to solve with regular expressions since there are a lot of possibilities, consider:
descendant selectors like #someid ul img -- those are all valid tags and are separated by spaces
tags that don't start with . or # (i.e. HTML tag names) -- you have to provide a list of those in order to match them since they have no other difference from attributes
comments
more that I can't think of right now
I think you should instead consider some CSS parsing library suitable for your preferred language.

Resources