Is it possible to parse a stylesheet with Nokogiri? - css

I've spent my requisite two hours Googling this, and I can not find any good answers, so let's see if humans can beat Google computers.
I want to parse a stylesheet in Ruby so that I can apply those styles to elements in my document (to make the styles inlined). So, I want to take something like
<style>
.mystyle {
color:white;
}
</style>
And be able to extract it into a Nokogiri object of some sort.
The Nokogiri class "CSS::Parser" (http://nokogiri.rubyforge.org/nokogiri/Nokogiri/CSS/Parser.html) certainly has a promising name, but I can't find any documentation on what it is or how it works, so I have no idea if it can do what I'm after here.
My end goal is to be able to write code something like:
a_web_page = Nokogiri::HTML(html_page_as_string)
parsed_styles = Nokogiri::CSS.parse(html_page_as_string)
parsed_styles.each do |style|
existing_inlined_style = a_web_page.css(style.declaration) || ''
a_web_page.css(style.declaration)['css'] = existing_inlined_style + style.definition
end
Which would extract styles from a stylesheet and add them all as inlined styles to my document.

Nokogiri can't parse CSS stylesheets.
The CSS::Parser that you came across parses CSS expressions. It is used whenever you traverse a HTML tree by CSS selectors rather than XPath (this is a cool feature of Nokogiri).
There is a Ruby CSS parser, though. You can use it together with Nokogiri to achieve what you want.
require "nokogiri"
require "css_parser"
html = Nokogiri::HTML(html_string)
css = CssParser::Parser.new
css.add_block!(css_string)
css.each_selector do |selector, declarations, specificity|
html.css(selector).each do |element|
style = element.attributes["style"]&.value || ""
element.set_attribute('style', [style, declarations].compact.join(" "))
end
end

#molf definitely had a great start there, but it still required debugging a handful of problems to get it working in production. Here is the current, tested version of this:
html = Nokogiri::HTML(html_string)
css = CssParser::Parser.new
css.add_block!(html_string) # Warning: This line modifies the string passed into it. In potentially bad ways. Make sure the string has been duped and stored elsewhere before passing this.
css.each_selector do |selector, declarations, specificity|
next unless selector =~ /^[\d\w\s\#\.\-]*$/ # Some of the selectors given by css_parser aren't actually selectors.
begin
elements = html.css(selector)
elements.each do |match|
match["style"] = [match["style"], declarations].compact.join(" ")
end
rescue
logger.info("Couldn't parse selector '#{selector}'")
end
end
html_with_inline_styles = html.to_s

Related

Rails 4: how to identify and format links, hashtags and mentions in model attributes?

In my Rails 4 app, I have a Post model, with :copy and :short_copy as custom attributes (strings).
These attributes contain copies for social medias (Facebook, Twitter, Instagram, Pinterest, etc.).
I display the content of these attributes in my Posts#Show view.
Currently, URLs, #hashtags and #mentions are formatted like the rest of the text.
What I would like to do is to format them in a different fashion, for instance in another color or in bold.
I found the twitter-text gem, which seems to offer such features, but my problem is that I do NOT need — and do NOT want — to have these URLs, #hashtags and #mentions turn into real links.
Indeed, it looks like the twitter-text gem converts URLs, #hashtags and #mentions by default with Twitter::Autolink, as explained in this Stack Overflow question.
That's is not what I am looking for: I just want to update the style of my URLs, #hashtags and #mentions.
How can I do this in Ruby / Rails?
—————
UPDATE:
Following Wes Foster's answer, I implemented the following method in post.rb:
def highlight(string)
string.gsub!(/\S*#(\[[^\]]+\]|\S+)/, '<span class="highlight">\1</span>')
end
Then, I defined the following CSS class:
.highlight {
color: #337ab7;
}
Last, I implemented <%= highlight(post.copy) %> in the desired view.
I now get the following error:
ArgumentError
wrong number of arguments (1 for 2..3)
<td><%= highlight(post.copy) %></td>
What am I doing wrong?
—————
I'm sure each of the following regex patterns could be improved to match even more options, however, the following code works for me:
def highlight_url(str)
str.gsub!(/(https?:\/\/[\S]+)/, '[\1]')
end
def highlight_hashtag(str)
str.gsub!(/\S*#(\[[^\]]+\]|\S+)/, '[#\1]')
end
def highlight_mention(str)
str.gsub!(/\B(\#[a-z0-9_-]+)/i, '[\1]')
end
# Initial string
str = "Myself and #doggirl bought a new car: http://carpictures.com #nomoremoney"
# Pass through each
highlight_mention(str)
highlight_hashtag(str)
highlight_url(str)
puts str # > Myself and [#doggirl] bought a new car: [http://carpictures.com] [#nomoremoney]
In this example, I've wrapped the matches with brackets []. You should use a span tag and style it. Also, you can wrap all three gsub! into a single method for simplicity.
Updated for the asker's add-on error question
It looks like the error is references another method named highlight. Try changing the name of the method from highlight to new_highlight to see if that fixes the new problem.

How to use `&` and a tag on the same selector

I am trying to write a nested selector that selects a certain tag that has a certain attribute, for example
<li foo="bar">
To select this, li[foo="bar"] would work, but I want to nest it under [foo="bar"] using the scss & notation because I have other things with the [foo="bar"] attribute (e.g., <div foo="bar" class="baz">), and I want to group them together. When I try:
[foo = "bar"]{
&li{
...
}
&.baz{
...
}
}
It returns an error that says li may only be used at the beginning of a compound selector, and if I try:
[foo = "bar"]{
li&{
...
}
&.baz{
...
}
}
then it says & may only be used at the beginning of a compound selector. How can I do this correctly?
The right syntax nowadays would be li#{&}.
Last I heard, this is actually an upcoming feature in SASS 3.3 or 3.4.
See this thread for a question similar to yours and this thread for the proposed solution to be included (which, at the time of writing, seems to be &__element).
The issue here isn't the use of [] and & together - it's the use of a plain element in the selector. Your example with .baz should work as expected.

Two nots in CSS selector with Nokogiri

Right now I have a selector working with jQuery as follows:
.original-tweet:not([data-is-reply-to="true"],.retweeted)
However this doesn't seem to work using the Nokogiri gem in ruby:
doc.css('.original-tweet:not([data-is-reply-to="true"],.retweeted)')
The above causes a cash, but each of the parts of the not independently work:
doc.css('.original-tweet:not([data-is-reply-to="true"])')
and
doc.css('.original-tweet:not(.retweeted)')
What's the best way to actually get the selector I want. Is this something that just isn't supported in nokogiri?
Okay, I solved it with XPATH
The following worked (note: the xpath I created was entirely computer generated)
doc.xpath("//*[contains(concat(' ', #class, ' '), ' original-tweet ') and not(#data-is-reply-to = \"true\") and not(#data-retweet-id)]")
Edit: further inspection shows that this is still selecting items with the retweeted class (turns out this was a false assumption on my part, I should have been looking for the data-retweet-id attribute instead of the retweet class)
github.com/sparklemotion/nokogiri/issues/451 - this issue relates to why I needed to use xpath here.
While the selector may work with jQuery, it's not a valid CSS selector:
> $$('.original-tweet:not([data-is-reply-to="true"], .retweeted)')
Error: SyntaxError: DOM Exception 12
.original-tweet:not([data-is-reply-to="true"]):not(.retweeted) should work.
For now a possible workaround might be:
doc.css('.original-tweet:not([data-is-reply-to="true"])') - doc.css('.retweeted')

How Can WYSIHTML5 output inline CSS?

I am running WYSIHTML5 to allow myself to enter email text and format it for sending as HTML. However when I view HTML of the formatted text I get classes associated with elements for Colors. This is expected behavior but since I need to send the output in an email hence I would like to have those colors to be in Inline CSS, since I cannot attach CSS files with the email like that. Example here
<span class="wysiwyg-color-green">Testing</span>
That is if I select green color for text: Testing. Is there any way to modify that green to become part of html itself like
<span style="color:green">Testing</span>
I have tried to search for this but could not find, so I am not asking without first looking for it. If anybody could please just point somewhere. Even a link to any guide to this, will do. I do not wish that you spend time writing code for me.
You could do it with php :
str_replace ( 'class="wysiwyg-color-green"', 'style="color:green"' ,$html)
You can do the same with javascript, altrough it's always safer to do everything server-side.
http://www.w3schools.com/jsref/jsref_replace.asp
Here's the javascript code I used but may be a good idea to heed Jean-Georges warning above:
replaceColorStylesWithInlineCss = function (htmlContents){
result = htmlContents.replace('class="wysiwyg-color-black"', 'class="wysiwyg-color-black" style="color:black"');
result = result.replace('class="wysiwyg-color-silver"', 'class="wysiwyg-color-silver" style="color:silver"');
result = result.replace('class="wysiwyg-color-gray"', 'class="wysiwyg-color-gray" style="color:gray"');
result = result.replace('class="wysiwyg-color-maroon"', 'class="wysiwyg-color-maroon" style="color:maroon"');
result = result.replace('class="wysiwyg-color-red"', 'class="wysiwyg-color-red" style="color:red"');
result = result.replace('class="wysiwyg-color-purple"', 'class="wysiwyg-color-purple" style="color:purple"');
result = result.replace('class="wysiwyg-color-green"', 'class="wysiwyg-color-green" style="color:green"');
result = result.replace('class="wysiwyg-color-olive"', 'class="wysiwyg-color-olive" style="color:olive"');
result = result.replace('class="wysiwyg-color-navy"', 'class="wysiwyg-color-navy" style="color:navy"');
result = result.replace('class="wysiwyg-color-blue"', 'class="wysiwyg-color-blue" style="color:blue"');
result = result.replace('class="wysiwyg-color-orange"', 'class="wysiwyg-color-orange" style="color:orange"');
return result
};
Note: I kept the wysiwyg styles in there because I'm saving to the db and want it to display properly in the wysihtml5 section when I load it again. DRY it up if you're clever.

How can I locate a tag via CSS selectors, referencing the content of a sibling tag?

I'm working on a Ruby script that will parse and manipulate some XML files. I'm using Nokogiri for the XML handling.
The problem I have is that there are several constructs like this one:
<USER_ELEMENT>
<NAME>ATTRIBUTE01</NAME>
<VALUE>XXX</VALUE>
</USER_ELEMENT>
I need to set the <VALUE> tag that's within the same of a particular <VALUE>ATTRIBUEnn</VALUE>. My current approach is using
xml.css('USER_ELEMENT').find { |node| node.at_css('NAME').text == 'ATTRIBUTEnn'}.at_css('VALUE').content = 'NEW_VALUE'
but it looks rather ugly.
I'm wondering which would be a cleaner way of dealing with the situation?
Using XPath:
attnn = "ATTRIBUTE01"
xml.at_xpath("//USER_ELEMENT[NAME='#{attnn}']/VALUE").content = "Yay"
puts xml
#=> <USER_ELEMENT>
#=> <NAME>ATTRIBUTE01</NAME>
#=> <VALUE>Yay</VALUE>
#=> </USER_ELEMENT>
In English, that XPath says:
//USER_ELEMENT - find elements with this name anywhere in the document
[…] - but only if…
NAME="ATTRIBUTE01" - …you can find a child NAME element with this text
/VALUE - and now find the child VALUE elements of these
The css selector for siblings is ~:
xml.at('USER_ELEMENT > NAME[text()="ATTRIBUTE01"] ~ VALUE').content = 'NEW_VALUE'
I don't know if nokogiri supports CSS3, but if it does, this should work
xml.css('USER_ELEMENT NAME:content("ATTRIBUTEnn") + VALUE').content = "NEW_VALUE"

Resources