Regex for replacing class names throughout codebase in Atom Editor - css

I've been struggling with this for the last few days. I apologize if this is a duplicate, I wasn't able to find what I needed when searching for this particular question.
I have class names like the following:
class="block underline primary"
className="text-center block primary-dark"
class="grey bg-black inline-block block"
I'd like to search an entire codebase using Atom's regex search feature and replace every instance with a new class name. I would need the following rules:
Make sure the string is contained in class="" or className=""
Make sure it's only matching the exact word, so in the above it would only match block and not inline-block if that's what I was searching for.
I currently have this which almost does what I need, but isn't accounting for className or class and will return paragraphs or things not contained within a class which I don't want:
(\s(block)\s)|(="(block)\s)|(\s(block)")
Is there any way to do a regex find and replace in one fell swoop? I understand I might not get everything because classes can be programmatically added, but I'd like to get as much as possible with find and replace and not screw other things up. Any help or direction is greatly appreciated.
edit
I also need to account for class names like the following:
class="block block-title blockDisabled"
So in the end I only want to target block and nothing else.

You could use the following expression:
(className|class)="(?:block|block\s([^"]*)|([^"]*)\sblock|([^"]*)\sblock(?=\s)([^"]*))"
Live Example Here
From there, if you want to remove the block class, you would use $1="$2$3$4$5" for the replacement.
However, if you want to replace the block class, as your title implies, then you would use $1="$2$3$4$5 replacement-class" for the replacement (where the string "replacement-class" is the class you're replacing the block class with).
Explanation:
(className|class) - Capture the attribute name
=" - Match the opening of the attribute value
(?: - Start of a non-capturing group
block - Match the string block
| - or...
block\s([^"]*) - Capture the classes after the string block
| - or...
([^"]*)\sblock - Capture the classes before the string block
| - or...
([^"]*)\sblock(?=\s)([^"]*) - Capture the classes around the string block
) - End of the non-capturing group
" - Match the closing of the attribute value

Related

How to verify a text is present on a webpage for 'n' times

I wanted to verify a text in a webpage exist for 2 times or ‘n’ times. I have used “Page Should Contain” keyword but it says “Pass” when it identifies single occurrence. I don’t want to verify using locator.
Ex: I want to verify the text "Success" is available in a current webpage for 3 times using robot framework
Any inputs/suggesstions would be helpful.
Too bad you don't want to use a locator, as robotframework has a keyword just for that:
Xpath Should Match X Times //*[contains(., "Success")] 2
The caveat is the locator should not be prepended with xpath= - just straight expression.
The library keyword Page Should Contain does pretty much exactly that, by the way.
And if you want to find how many times the string is present in the page - easy:
${count}= Get Matching Xpath Count //*[contains(., "Success")]
And then do any kind of checks on the result, e.g.
Should Be Equal ${count} 2
I thought the problem of not using locator sounds fun (the rationale behind the requirement still unclear, yet), so another solution - look in the source yourself:
${source}= Page Source # you have the whole html of the page here
${matches}= Get Regexp Matches ${source} >.*\b(Success)\b.*<
${count}= Get Length ${matches}
The first one gets the source, the second gets all non-overlapping (separate) occurrences of the target string, when it is (hopefully) inside a tag. The third line returns the count.
Disclaimer - please don't actually do that, unless you're 100% sure of the source and the structure. Use a locator.

Why does BEM often use two underscores instead of one for modifiers?

In BEM, I understand that with modifiers, two dashes makes sense so that you can distinguish the modifier in my-block-my-modifier with my-block--my-modifier.
But why use block__element instead of block_element?
Double Underscore is used to define sub element of a block.
i.e:
<nav class="main-nav">
<a class="main-nav__item" href="#">Text</a>
</nav>
Where main-nav is a block & main-nav__item is a sub element.
This is done because some people might name their block like this main_nav which will create confusion with single underscore like this : main_nav_item
Therefore double underscore will clarify stuff like this: main_nav__item.
I'm going to 2nd #Imran Bughio's answer, but I'm trying to further clarify the issue.
In standard BEM style, single underscores are reserved for modifiers. Also, they usually represent a combination of key/value pairs. E.g.
.block_align_vertical
.block_align_horizontal
.block__element_size_big
.block__element_size_small
This is in contrast to the modified BEM syntax championed by inuit.css for example, which are boolean.
.block--vertical
.block--horizontal
.block__element--big
.block__element--small
From my experience when using the modified syntax, you quickly run into expression limitations. E.g. if you would write
.block--align-vertical
.block--align-horizontal
.block__element--size-big
.block__element--size-small
The key/value relation would not be unique, if you would try to describe a key such as background-attachment which would result in
.block__element--background-attachment-fixed
Another added benefit is, that you can use the libraries based on the standard naming convention for added productivity in your workflow:
http://bem.info/tools/bem/bem-tools/
https://github.com/hoho/jquery-bem
It's also worth mentioning that the BEM syntax is not forced upon us and if you find another syntax that you find more readable then by all means go with that. The important thing is consistency, ensuring other developers work to the same syntax.
An example of an alternative syntax used is in SUIT CSS by Nicolas Gallagher. Which uses the following syntax.
ComponentName
ComponentName--modifierName
ComponentName-elementName
ComponentName.is-stateOfComponent
You can read more here SUIT CSS naming conventions
Because whose blocks can be hyphen or underscore delimited, for example:
.site-search {} /* Block */
.site-search__field {} /* Element */
.site-search--full {} /* Modifier */
or
.site_search {} /* Block */
.site_search__field {} /* Element */
.site_search--full {} /* Modifier */
According to BEM naming convention, single underscore has two other usages,
The modifier name is separated from the block or element name by a single underscore (_).
The modifier value is separated from the modifier name by a single underscore (_).
So for separating element name from the block name is done by a double underscore.
The element name is separated from the block name by a double underscore (__).

What's the correct format for TCDL linkAttributes?

I can see the technology-independent Tridion Content Delivery Language (TCDL) link has the following parameters, which are pretty well described on SDL Live Content.
type
origin
destination
templateURI
linkAttributes
textOnFail
addAnchor
VariantId
How do we add multiple attribute-value pairs for the linkAttributes? Specifically, what do we use to escape the double quotes as well as separate pairs (e.g. if we need class="someclass" and onclick="someevent").
The separate pairs are just space delimited, like a normal series of attributes. Try XML encoding the value of linkAttributes however. So, " become &quote;, etc...
If you are using some Javascript, you might take care of the Javascript quotes too, as in \".
Edit: after I figured out your real question, the answer is a lot simpler:
You should wrap the values inside your linkAttributes in single quotes. Spaces inside linkAttributes are typically handled fine; but if not, escape then with %20.
If you need something more or want something that isn't handled by the standard tcdl:ComponentLink, remember that you can always create your own TCDL tag and and use a TagHandler or TagRenderer (look them up in the docs for examples or search for Jaime's article on TagRenderer) to do precisely what you want.
My original answer was to a question you didn't ask: what is the format for TCDL tags (in general). But the explanation might still be useful to some, so remains below.
I'd suggest having a look at what format the default building blocks (e.g. the Link Resolver TBB in the Default Finish Actions) output and use that as a guide line.
This is what I could quickly get from the transport package of a published page:
<tcdl:Link type="Page" origin="tcm:5-199-64" destination="tcm:5-206-64"
templateURI="tcm:0-0-0" linkAttributes="" textOnFail="true"
addAnchor="" variantId="">Home</tcdl:Link>
<tcdl:ComponentPresentation type="Embedded" componentURI="tcm:5-69"
templateURI="tcm:5-133-32">
<span>
...
One of the things that I know from experience: your entire TCDL tag will have to be on a single line (I wrapped the lines above for readability only). Or at least that is the case if it is used to invoke a REL TagRenderer. Clearly the tcdl:ComponentPresentation tag above will span multiple lines, so that "single line rule" doesn't apply everywhere.
And that is probably the best advice: given the fact that TCDL tags are processed at multiple points in Tridion Publishing, Deployment and Delivery pipeline, I'd stick to the format that the default TBBs output. And from my sample that seems to be: put everything on a single line and wrap the values in (double) quotes.

How to write regex to extract FlickR Image ID From URL?

I'm looking to do do two things, and I am looking to do them in a beautiful way. I am working on a project that allows users to upload flickr photos by simply entering their flickr image URL. Ex: http://www.flickr.com/photos/xdjio/226228060/
I need to:
make sure it is a URL that matches the following format: http://www.flickr.com/photos/[0]/[1]/
extract the following part: http://www.flickr.com/photos/xdjio/[0]/
Now I could very easily write some string methods to do the above but I think it would be messy and love learning how to do these things in regex. Although not being a regex ninja I am currently unable to do the above.
Given an input string with a URL like the one you provided, this will extract the image ID for any arbitrary user:
string input = "http://www.flickr.com/photos/xdjio/226228060/";
Match match = Regex.Match(input, "photos/[^/]+/(?<img>[0-9]+)", RegexOptions.IgnoreCase | RegexOptions.SingleLine);
if(match.Success)
{
string imageID = match.Groups["img"].Value;
}
Breaking it down, we are searching for "photos/" followed by one or more characters that is not a '/', followed by a /, followed by one or more characters that are numbers. We also put the numbers segment into a named group called "img".
thought i would add to this that when using the javascript asp.net validator it doesn't support the grouping name.
the regex to use in this situation would be:
photos/[^/]+/([0-9]+)
thought someone might find this useful

How to extract element id attribute values from HTML

I am trying to work out the overhead of the ASP.NET auto-naming of server controls. I have a page which contains 7,000 lines of HTML rendered from hundreds of nested ASP.NET controls, many of which have id / name attributes that are hundreds of characters in length.
What I would ideally like is something that would extract every HTML attribute value that begins with "ctl00" into a list. The regex Find function in Notepad++ would be perfect, if only I knew what the regex should be?
As an example, if the HTML is:
<input name="ctl00$Header$Search$Keywords" type="text" maxlength="50" class="search" />
I would like the output to be something like:
name="ctl00$Header$Search$Keywords"
A more advanced search might include the element name as well (e.g. control type):
input|name="ctl00$Header$Search$Keywords"
In order to cope with both Id and Name attributes I will simply rerun the search looking for Id instead of Name (i.e. I don't need something that will search for both at the same time).
The final output will be an excel report that lists the number of server controls on the page, and the length of the name of each, possibly sorted by control type.
Quick and dirty:
Search for
\w+\s*=\s*"ctl00[^"]*"
This will match any text that looks like an attribute, e.g. name="ctl00test" or attr = "ctl00longer text". It will not check whether this really occurs within an HTML tag - that's a little more difficult to do and perhaps unnecessary? It will also not check for escaped quotes within the tag's name. As usual with regexes, the complexity required depends on what exactly you want to match and what your input looks like...
"7000"? "Hundreds"? Dear god.
Since you're just looking at source in a text editor, try this... /(id|name)="ct[^"]*"/
Answering my own question, the easiest way to do this is to use BeautifulSoup, the 'dirty HTML' Python parser whose tagline is:
"You didn't write that awful page. You're just trying to get some data out of it. Right now, you don't really care what HTML is supposed to look like. Neither does this parser."
It works, and it's available from here - http://crummy.com/software/BeautifulSoup
I suggest xpath, as in this question

Resources