scrapy: if node contains child node - web-scraping

I have the following web page:
https://www.bbc.co.uk/search?q=Juice&sa_f=search-product&filter=news&suggid=
for each article, I have the following html section i want to scrape the text of from:
<dd>
<span class="signpost-site" data-site="news">News
</span>
<span class="signpost-section">Europe
</span>
</dd>
In this case, I want "Europe"
sometimes the
<span class="signpost-section">
is missing and instead there is
<dd>
<span class="signpost-site" data-site="news">News
</span>
</dd>
In this case I want ""
The intention is to create a csv and ensure each article has the right tag at the right index number.
currently my code is
response.xpath('//footer//dd/span[#class="signpost-section"]/text()').extract()
which gets existing tags only. I am unsure on how to check whether or not the
<span class="signpost-section">
exists within
response.xpath('//footer//dd/span[#class="signpost-site"])
ideally i want something along the lines of
if <span class="signpost-section"> (exists in) response.xpath('//footer//dd/span[#class="signpost-site"])
then
response.xpath('//footer//dd/span[#class="signpost-section"]/text()').extract()
else ""

I would just use .extract_first() with a specified default value (used when no match):
response.xpath('//footer//dd/span[#class="signpost-section"]/text()').extract_first(default='')

Related

How to add ALT or Title attribute in WordPress for mobile menu icon

I am using a WordPress theme that I am trying to make 100% accessible. I cant get any help from the developer and I am not very good at the back end of WordPress. I need help with this:
[ 2.4 Navigable: Provide ways to help users navigate, find content, and determine where they are. ]
a (anchor) element must contain text. The text may occur in the anchor text or in the title attribute of the anchor or in the Alt text of an image used within the anchor.
This is the code in the theme:
<div class="qodef-mobile-menu-opener">
<a href="javascript:void(0)">
<span class="qodef-mobile-opener-icon-holder">
<i class="qodef-icon-font-awesome fa fa-bars " ></i> </span>
</a>
</div>
Where in WordPress is this code located so I can insert a title attribute or Alt text.
Thank You!

Why won't JAWS read this link on arrow navigation

This markup is able to be read by JAWS arrow navigation in IE11, however in Microsoft Edge it only allows it to be fully read via tab navigation. Does anyone have any idea why Edge refuses to read this list item? We have other list items that are similar to this that are read just fine, but this one in particular doesn't seem to be available to be read via JAWS arrow navigation.
<li id="some-id">
<a href="/destination">
<span class="submenu"></span>
<span class="label"> Destination </span>
<span class="count">
<span class="value" data-info=0></span>
</span>
</a>
</li>
I've tried setting data-info=0 with and without quotes. I'm stumped.

Using Css to prevent wrapping of text with previous text?

Is there a CSS way to prevent a text element from wrapping from a previous text element?
i.e.
My Table Header <i class="fa fa-search"></i> shows as
My
Table
Header
MySearchIcon
But I want it to display like this:
My
Table
Header MySearchIcon
The actual code is a combination of Razor and Html
<th>
#Html.DisplayNameFor(m => m.Projects.FirstOrDefault().Name) <i class="fa fa-search smaller fa-fade"></i>
</th>
This is in a header of a table and depending on the header length I do want wrapping but I want a space and the icon to be stuck to the last word in the header and not wrap.
Because of the Razor piece of the code I can't place the non-breaking space up against the output directly,
I believe you can use white-space:nowrap; in your css.
You can also examine this question:
HTML+CSS: How to force div contents to stay in one line?
Sure. You can use an HTML entity for a nonbreaking space. It would make your text node and icon be as follows
My Table Header <i class="fa fa-search"></i>
The solution ended up being to add a element after the razor code.
i.e.
<th>
#Html.DisplayNameFor(m => m.Projects.FirstOrDefault().Name)<text> <i class="fa fa-search smaller fa-fade"></i></text>
</th>

ARIA: Treat HTML element as a whole

Is there any way to tell an assistive tool to treat an element (e.g: <div>) as a whole, and not split it in child elements?
First example
Using iOS VoiceOver and a with a field on it, it gets splitted into two different elements:
Second example
This elements are splitted in two parts, where the best solution would be read "122 points" and "First position":
<div class="row">
<div class="stat lg col-xs-6">
<span>122</span>
<i class="icon icon-prize" aria-hidden="true"></i>
<h5>Points</h5>
</div>
<div class="stat lg col-xs-6">
<span>1ยบ</span>
<i class="icon icon-prize" aria-hidden="true"></i>
<h5>Position</h5>
</div>
</div>
VoiceOver on iOS does indeed sometimes split a sentence, although your example code actually works fine. I used your code as the first line in the screen shots below and then copied the text without the <a> tag as the second line. The second line gets broken up by VoiceOver but the <a> tag does not.
<span class="label info">
<a href="/round/next">
Next round starts <strong>in 3 days</strong>
</a>
</span>
<br>
Next round starts <strong>in 3 days</strong>
(Note: I have the enhanced outline turned on for VoiceOver so the black outline is probably thicker than what you're used to seeing.)
I found that using role="button" the element is treated as a group and its innerText property is read, but announced as a button.

Trying to built correct xpath

I have a problem with building an xpath, maybe you have an idea.
I am working on some automation and I really need to get the full path to this "delete reminder" button, but it must be also checking name, which is located in span 7 (Dh3M5EdV6l in this case). Tried everything, nothing works, it must be that name somewhere included, but I just have no idea how to combine values from 2 spans into one path. Is it possible?
CSS might work too
<li class="row-fluid" data-target="upcoming_reminder_row">
<div class="span7">Dh3M5EdV6l</div>
<div class="span3">Oct 10, 2025 12:00 PM EDT</div>
<div class="span2">
<div class="pull-right">
<i class="icon-Edit" data-target="edit_reminder" data-value="67">
</i>
<i class="icon-Close" data-target="delete_reminder" data-value="67">
</i>
</div>
</div>
</li>
Basically, we need to locate the li element that has the "span7" element with an appropriate text. Then, we'll locate the desired button inside it. Both things in a single XPath expression:
//li[#data-target = "upcoming_reminder_row" and div[#class = "span7"] = "Dh3M5EdV6l"]//i[#data-target = "delete_reminder"]

Resources