scrapy with ::before Selector - web-scraping

<div class="priceContainerDescuentoPG">
<p class="priceDescuentoPG">
::before
"$29.610"
</p>
</div>
You tried to extract this price with scrapy using this xpath the result is empty.
//*[#class="priceDescuentoPG"]/text()
I'm sure the problem is the :: before Selector. How can I solve and skip this selector? Thanks for the help

This happen because the value of priceDescuentoPG is calculated using a function on general.js line 117:
var descuentoTargetaPGVitrinas = function(){
$('.vitrineHome ul .midLevel').each(function(){
var precioVigente = $(this).find('.priceProd').text().replace("$","").replace(/,/g,"").replace(/\./g,"");
var descuentoPG = (precioVigente*0.90);
$(this).find('.priceDescuentoPG').text('$' + formatearMoneda(descuentoPG/100,0,'.',','));
});
};
You can copy this function inside your spider:
prices = response.xpath("//*[#class='priceProd']/text()").extract()
for price in prices:
price_prod = price.strip().replace('$', '').replace(',','').replace('.', '')
descuento_pg = float(price_prod) * 0.90

Related

How to make charecter bold/colored in <h:outputText> [duplicate]

I'm assuming it's not possible, but just in case it is or someone has a nice trick up their sleeves, is there a way to target certain characters using CSS?
For example make all letters z in paragraphs red, or in my particular case set vertical-align:sup on all 7 in elements marked with the class chord.
Hi I know you said in CSS but as everybody told you, you can't, this is a javascript solution, just my 2 cents.
best...
JSFiddle
css
span.highlight{
background:#F60;
padding:5px;
display:inline-block;
color:#FFF;
}
p{
font-family:Verdana;
}
html
<p>
Let's go Zapata let's do it for the revolution, Zapatistas!!!
</p>
javascript
jQuery.fn.highlight = function (str, className) {
var regex = new RegExp(str, "gi");
return this.each(function () {
this.innerHTML = this.innerHTML.replace(regex, function(matched) {return "<span class=\"" + className + "\">" + matched + "</span>";});
});
};
$("p").highlight("Z","highlight");
Result
That's not possible in CSS. Your only option would be first-letter and that's not going to cut it. Either use JavaScript or (as you stated in your comments), your server-side language.
The only way to do it in CSS is to wrap them in a span and give the span a class. Then target all spans with that class.
As far as I understand it only works with regular characters/letters. For example: what if we want to highlight all asterisk (\u002A) symbols on page. Tried
$("p").highlight("u\(u002A)","highlight");in js and inserted * in html but it did not worked.
In reply to #ncubica but too long for a comment, here's a version that doesn't use regular expressions and doesn't alter any DOM nodes other than Text nodes. Pardon my CoffeeScript.
# DOM Element.nodeType:
NodeType =
ELEMENT: 1
ATTRIBUTE: 2
TEXT: 3
COMMENT: 8
# Tags all instances of text `target` with <span class=$class>$target</span>
#
jQuery.fn.tag = (target, css_class)->
#contents().each (index)->
jthis = $ #
switch #.nodeType
when NodeType.ELEMENT
jthis.tag target, css_class
when NodeType.TEXT
text = jthis.text()
altered = text.replaceAll target, "<span class=\"#{css_class}\">$&</span>"
if altered isnt text
jthis.replaceWith altered
($ document).ready ->
($ 'div#page').tag '⚀', 'die'

target text after br tag using cheerio

I'm practicing creating an API by scraping using cheerio. I'm scraping from this fairly convoluted site:
http://www.vegasinsider.com/nfl/odds/las-vegas/
I'm trying to target the text after these <br> tags within the anchor tag in this <td> element:
<td class="viCellBg1 cellTextNorm cellBorderL1 center_text nowrap"
width="56">
<a class="cellTextNorm" href="/nfl/odds/las-vegas/line-movement/packers-#-
bears.cfm/date/9-05-19/time/2020#BT" target="_blank">
<br>46u-10<br>-3½ -10
</a>
</td>
The code below is what i'm using to target the data I want. The problem I'm having is I don't know how to get that text after the <br> tags. I've tried .find('br') and couldn't get it to work. Here is the code:
app.get("/nfl", function(req, res) {
var results = [];
axios.get("http://www.vegasinsider.com/nfl/odds/las-vegas/").then(function(response) {
var $ = cheerio.load(response.data);
$('span.cellTextHot').each(function(i,element) {
// console.log($(element).text());
var newObj = {
time:$(element).text()
}
$(element).parent().children().each(function(i,thing){
if(i===2){
newObj.awayTeam = $(thing).text();
}
else if (i===4){
newObj.homeTeam = $(thing).text();
}
});
newObj.odds= $(element).parent().next().next().text().trim();
$('.frodds-data-tbl').find('td').next().next().children().each(function(o, oddsThing){
if(o===0){
newObj.oddsThing = $(oddsThing).html();
}
});
res.json(results);
});
});
You can see I am able to output all the text in this box to the newObj.odds value. I was trying to use something like the next line where I'm targeting that td element and loop through and break out each row into its own newObj property, newObj.oddsLine1 and newObj.oddsLine2 for example.
Hope that makes sense. Any help is greatly appreciated.
You can't select text nodes with cheerio, you need to use js dom properties / functions:
$('td a br')[0].nextSibling.nodeValue
Note $(css)[0] will give you the first element as a js object (rather than a cheerio object)

Cypress testing pseudo CSS class :before

Is there a way in which I can test the content of the pseudo CSS class for :before on my element with Cypress?
I have seen links documenting:
Accessing nth-child pseudo element
Accessing the actual content pseudo class of a normal CSS class
But I have not found anything for CSS classes using the ::before pseudo class.
Imagine the code:
.myClass:before {
content: "foo-";
}
<div>
<span class="myClass">Bar</span>
</div>
How could one test that 'foo-' is present?
There's a way to assert on the CSS properties of pseudo-elements, although it's not as simple as just using a Cypress command.
Use cy.get() to get a reference to the element.
Read the Window object off of the element, and then invoke Window.getComputedStyle(), which can read the computed CSS of pseudo selectors.
Use getPropertyValue on the returned CSS declaration to read the value of the content property.
Assert on it.
Here's an example that works on the code posted in the OP:
cy.get('.myClass')
.then($els => {
// get Window reference from element
const win = $els[0].ownerDocument.defaultView
// use getComputedStyle to read the pseudo selector
const before = win.getComputedStyle($els[0], 'before')
// read the value of the `content` CSS property
const contentValue = before.getPropertyValue('content')
// the returned value will have double quotes around it, but this is correct
expect(contentValue).to.eq('"foo-"')
})
Based on Zach's answer I created a command that returns the pseudo-element property (without single quotes around).
function unquote(str) {
return str.replace(/(^")|("$)/g, '');
}
Cypress.Commands.add(
'before',
{
prevSubject: 'element',
},
(el, property) => {
const win = el[0].ownerDocument.defaultView;
const before = win.getComputedStyle(el[0], 'before');
return unquote(before.getPropertyValue(property));
},
);
You will use it like this
it('color is black', () => {
cy.get('button')
.before('color')
.should('eq', 'rgb(0,0,0)'); // Or .then()
});
Try asserting on the text of the parent:
cy.get('.myClass').parent().should('have.text', 'foo-bar')
If that doesn't work, you may have to use the textContent property:
cy.get('.myClass').parent(). should($el => expect ($el).to.contain('foo-bar')
)
This was my solution to get, convert and compare a hexadecimal's background-color with a rgb returned.
const RGBToHex = (rgbColor) => {
// it parse rgb(255, 13, 200) to #fa92D4
const [red, green, blue] = rgbColor.replace(/[a-z]|\(|\)|\s/g, '').split(',');
let r = parseInt(red, 10).toString(16);
let g = parseInt(green, 10).toString(16);
let b = parseInt(blue, 10).toString(16);
if (r.length === 1) r = `0${r}`;
if (g.length === 1) g = `0${g}`;
if (b.length === 1) b = `0${b}`;
return `#${r}${g}${b}`;
};
cy.get('.element').then(($el) => {
const win = $el[0].ownerDocument.defaultView;
const before = win.getComputedStyle($el[0], 'before');
const bgColor = before.getPropertyValue('background-color');
expect(RGBToHex(bgColor)).to.eq('#HEXA');
});

How to get :before css element value

I have this minor stupid point I have to cover with automated test, and its driving me crazy, I am not able to get the value of :before css element, the code is really simple as the test also, but I still need some help on it. Here is the code I have.
.text-currency-positive::before {
content: "+ ";
}
<div class="amount">
<span class="text-currency text-currency-positive text-monospace text-nowrap">
::before
100,00 €
</span>
</div>
Ok here we go:
<h1 class="element">The value of my pseudo element is: </h1> // see the class
Then we add the pseudo element:
.element:before {
content: '+NEW';
}
Now the JS:
var content = window.getComputedStyle(
document.querySelector('.element'), ':before' // target the classes pseude
).getPropertyValue('content'); // here we can get the computed styles/values
This is not enough, as this would return a string ..."+". So we do this:
var makeVar = content.replace(/"/g, ''); // replace the quotes with nothing
var firstLetter = makeVar.charAt(0); // here we get the value of first char after we have replaced the quotes
Display it:
var h1 = document.querySelector(".element");
h1.innerHTML += " content is: " + firstLetter;
If you do not replace the quotes with the regex, youll get "+" returned, but maybe that is what you want.
Cheers, link:
https://codepen.io/damianocel/pen/ooJqem

How to apply CSS to second word in a string?

If I have the following string: John Smith, how could I use CSS to set font-weight: bold on the second word in order to achieve: John Smith.
Can this be done in pure CSS?
Update: I am retrieving user's name from the server, so in my template it is #{user.profile.name}.
Since a js solution was suggested and pure CSS isn't presently possible: Live demo (click).
Sample markup:
<p class="bold-second-word">John Smith</p>
<p class="bold-second-word">This guy and stuff.</p>
JavaScript:
var toBold = document.getElementsByClassName('bold-second-word');
for (var i=0; i<toBold.length; ++i) {
boldSecondWord(toBold[i]);
}
function boldSecondWord(elem) {
elem.innerHTML = elem.textContent.replace(/\w+ (\w+)/, function(s, c) {
return s.replace(c, '<b>'+c+'</b>');
});
}
It cannot be done in pure CSS, sorry. But if you are willing to accept a JavaScript fix, then you might want to look into something like this:
Find the start and end index of the second word in the element's textContent.
Add contenteditable attribute to element.
Use the Selection API to select that range.
Use execCommand with the bold command.
Remove contenteditable attribute.
EDIT: (just saw your edit) I agree this is a bit too hack-y for most uses. Perhaps you'd be better off saving what the last name is as meta-data?
It seems to be impossible by using only pure CSS. However, with a bit of JS you could get there pretty easily:
const phrases = document.querySelectorAll('.bold-second-word');
for (const phrase of phrases) {
const words = phrase.innerHTML.split(' ');
words[1] = `<b>${words[1]}</b>`; // this would return the second word
phrase.innerHTML = words.join(' ');
}
<p class="bold-second-word">John Smith</p>
<p class="bold-second-word">Aaron Kelly Jones</p>

Resources