Is there a way to escape non-alphanumeric characters in Nokogiri css? - css

I have an anchor tag:
file.html#stuff-morestuff-CHP-1-SECT-2.1
Trying to pull the referenced content in Nokogiri:
documentFragment.at_css('#stuff-morestuff-CHP-1-SECT-2.1')
fails with the error:
unexpected '.1' after '[#<Nokogiri::CSS:
:Node:0x007fd1a7df9b40 #type=:CONDITIONAL_SELECTOR, #value=[#<Nokogiri::CSS::Node:0x007fd1a7df9b90 #type=:ELEMENT_NAME, #value=["*"]>, #<Nokogiri::CSS::Node:0x007fd1a7df9cd0 #
type=:ID, #value=["#unixnut4-CHP-1-SECT-2"
]>]>]' (Nokogiri::CSS::SyntaxError)
Just trying talk through this - I think Nokogiri is complaining about the .1 in the selectorId, because . is not valid in an html id.
I don't own the content, so I really don't want to go through and fix all the bad IDs if it is avoidable. Is there a way to escape non-alphanumeric selectors in a nokogiri .css() call?

Assuming your HTML looks something like this:
<div id='stuff-morestuff-CHP-1-SECT-2.1'>foo</div>
The string in question, stuff-morestuff-CHP-1-SECT-2.1, is a valid HTML ID, but it isn’t a valid CSS selector — the . character isn’t valid there.
You should be able to escape the . with a slash character, i.e. this is a valid CSS selector:
#stuff-morestuff-CHP-1-SECT-2\.1
Unfortunately this doesn’t seem to work in Nokogiri, there may be a bug in the CSS to XPath translation that it does. (It does work in the browser).
You can get around this by just checking the id attribute directly:
documentFragment.at_css('*[id="stuff-morestuff-CHP-1-SECT-2.1"]')
Even if slash escaping worked, you would probably have to check the id attribute like this if it value started with a digit, which is valid in HTML but cannot be (as far as I can tell) expressed as a CSS selector, even with escaping.
You could also use XPath, which has an id function that you can use here:
documentFragment.xpath("id('stuff-morestuff-CHP-1-SECT-2.1')")

Related

Scss classnames with special symbols like "[]"

This is a strange question. I request link type from server which might return "[TCP]" or "[UDP]". And I wanna use the string as classname directly for different background color like. What I want is :
<div className={`${styles.span} ${styles["[TCP]"]}`}/>
But the css selector ".[TCP]" is not allowed, below error given:
SassError: Invalid CSS after "&": expected selector, was ".[TCP]"
Now I am using .replace(/\[|\]/g,"") split the string "[TCP]" --> "TCP". But I hope someone can tell me another way or it's impossible.
"You can use [TCP] as classname."
As written here (demo) you can use any character for classname except NULL. All you have to do is in CSS write \ before special characters. In your case, it would look like this .\[TCP\].
But I believe it's much easier to just remove the special characters.

Why we need to escape CSS?

Given the following examples which I picked up from here:
CSS.escape(".foo#bar") // "\.foo\#bar"
CSS.escape("()[]{}") // "\(\)\[\]\{\}"
Since .foo#bar is a valid CSS selector expression. Why we need to append \ before some characters? Suppose I want to write my own program which does the same task of escaping all the values/expressions in a CSS file then, how should I proceed?
PS: I am always confused about the escaping, how should I think when it comes to escaping some input?
You escape strings only when those strings contain special symbols that you want to be treated literally. If you are expecting a valid CSS selector as user input, you shouldn't be escaping anything.
.foo#bar is a valid CSS selector, but it means something completely different from \.foo\#bar. The former matches an element with that respective class and ID, e.g. <div class=foo id=bar> in HTML. The latter matches an element with the element name ".foo#bar", which in a hypothetical markup language could be represented as <.foo#bar> (obviously this is not legal HTML or XML syntax, but you get the picture).

How to convert complex xpath to css

I have a complex html structure. New to CSS. Want to change my xpath to css as there could be some performance impact in IE
Xpath by firebug: .//*[#id='T_I:3']/span/a
I finetuned to : //div[#id='Overview']/descendant::*[#id='T_I:3']/span/a
Now I need corresponding CSS for the same. Is it possible or not?
First of all, I don't think your "finetuning" did the best possible job. An element id should be unique in the document and is therefore usually cached by modern browsers (which means that id lookup is instant). You can help the XPath engine by using the id() function.
Therefore, the XPath expression would be: id('T_I:3')/span/a (yes, that's a valid XPath 1.0 expression).
Anyway, to convert this to CSS, you'd use: #T_I:3 > span > a
Your "finetuned" expression converted would be: div#Overview #T_I:3 > span > a, but seriously, you only need one id selection.
The hashtag # is an id selector.
The space () is a descendant combinator.
The > sign is a child combinator.
EDIT based on a good comment by Fréderic Hamidi:
I don't think #T_I:3 is valid (the colon would be confused with the
start of a pseudo-class). You would have to find a way to escape it.
It turns out you also need to escape the underscore. For this, use the techniques mentioned in this SO question: Handling a colon in an element ID in a CSS selector.
The final CSS selector would be:
#T\5FI\3A3 > span > a

Capybara/Poltergeist: CSS ID with a colon raises Capybara::Poltergeist::InvalidSelector

I have a CSS selector with a colon in the name, which apparently is a problem.
Example:
selector = 'input#billing:street1'
find(selector)
I get the following error message:
The browser raised a syntax error while trying to evaluate the selector "input#billing:region_id" (Capybara::Poltergeist::InvalidSelector)
Is there any way to use the selector the way it is? I know that I could do something like that:
selector = 'billing:street1'
find(:xpath, ".//input[#id='#{selector}']")
but I'd prefer not to do it for various reasons.
I use Cucumber, Capybara, Poltergeist/PhantomJS
This is more of an educated guess based on my experience with CSS and Javascript, but you could try something like this:
selector = 'input#billing\:street1'
find(selector)
Notice the backslash in front of the colon, this escapes the character in CSS. For Javascript however, it is slightly different. You will need two slashes to escape the character. Like so:
selector = 'input#billing\\:street1'
find(selector)
I'm not sure which one would do the trick (if either would) since I have zero experience with Cucumber, Capybara, and Poltergeist/PhantomJS, but based on your code it looks as if you would want to try the double slash \\ option first.

Create a regex for a string with 1 item that changes

I am trying to build a regex for an inline CSS code that 1 item on changes
This is the line of code in question
<div="Box1" style="background-color:Transparent;border-color:Transparent;border-style:None;height:436px;"></div>
I need to be able to pick this out but the height is different on every page
so all the rest is exactly the same but the height changes
If you got that line, you can use the following regex to get the height.:
'<div="Box1" style="background-color:Transparent;border-color:Transparent;border-style:None;height:436px;"></div>'
.match(/height:([\sa-z0-9]+);/)
This will return:
["height:436px;", "436px"]
This example is in JS, I don't know in what language you want to use the Regex? But in CSS you cant.
[0-9]+ matches an arbitrary number.
However, for the HTML part you should not use a regex at all but a HTML parser - and then only use a regex on the style attribute.

Resources