xpath result list with empty elements

xpath result list with empty elements - web-scraping

I have this html structure that I want to scrap with xpath, and I'm getting some empty elements in the resulting list:
<div class="row">
<div class="content title">File 1</div>
<div class="content version"><span>Version: </span>1.1</div>
<div class="content date"><span>Date: </span>01-01-2022</div>
<div class="content size"><span>Size: </span>20Mb</div>
</div>
The ideal expected result is: ['File 1', '1.1', '01-01-2022', '20Mb']
other acceptable results would be:
['File 1', 'Version: 1.1', 'Date: 01-01-2022', 'Size: 20Mb']
or
['File 1', 'Version:', '1.1', 'Date:', '01-01-2022', 'Size:', '20Mb'] > this is the one I was trying in my example below
Instead I'm getting: ['File 1', '', 'Version:', '1.1', '', 'Date:', '01-01-2022', '', 'Size:', '20Mb', ''] using the xpath sentence:
//div[#class="row"]/descendant::*/text())
(tried different xpath but can't get rid of those empy elements in between)
Note: note that the title section doesn't have a span tag as the rest.

I think I get it!
//div[#class="row"]/div/text()[last()]

Related

Is it possible to to get an empty string in a list when there is no element, using CSS selector?

I want to scrape some items, which are on the same page, using Scrapy.
HTML looks like this:
<div class="container" id="1">
<span class="title">
product-title1
</span>
<div class="description">
product-desc
</div>
<div class="price">
1.0
</div>
</div>
I need to extract name, description and price.
Unfortunately, sometimes product doesn't have the description and HTML look like this:
<div class="container" id="2">
<span class="title">
product-title2
</span>
<div class="price">
2.0
</div>
</div>
Currently I am using CSS selectors which returns list of all elements existing on the website:
title = response.css('span[class="title"]').extract()
['product-title1', 'product-title2', 'product-title3']
description = response.css('div[class="description"]').extract()
['desc1','desc3']
price = response.css('div[class="price"]').extract()
['1.0','2.0','3.0']
Is it possible to get for example an empty string in place of missing 'desc2' when description object isn't there, using CSS selector?

I recommend you to rewrite you code:
for section in response.xpath('//div[#class="container"]'):
title = section.xpath('./span[#class="title"]/text()').get(default='not-found') # you can use any default value here or just empty string
desctiption = section.xpath('./div[#class="description"]').get()
price = section.xpath('./div[#class="price"]/text()').get()

Check this out..
for section in response.xpath('//div[#class="container"]'):
title = section.xpath('./span[#class="title"]/text()').get()
desctiption_tag = section.xpath("//div[contains(#class,'description')]")
if desctiption_tag:
desctiption = section.xpath('./div[#class="description"]').get()
else:
desctiption = "String"
price = section.xpath('./div[#class="price"]/text()').get()

How to target specific line of text within text-widget in WordPress?

I have this html:
<aside id="secondary" class="widget-area col-md-4"
role="complementary">
<div id="text-3" class="widget widget_text">
<div class="textwidget"><p><strong>Sign In</strong>.
</p>
<p><strong>Create Account</strong></p>
<p><strong>Find your next employee on Adsler
</strong></p>
</div>
</div></aside><!-- #secondary -->
I want to target Find your next employee on Adsler
Tried.
.textwidget:nth-of-type (3) {font-size: 30px}
Nothing. Should I be doing something else?

Easiest way to achieve this is using javascript or jQuery
<script>
$( ".textwidget:contains('Find your next employee on Adsler')" ).css( {fontSize: "50px"} );
<script>
This script will find all the strings 'Find your next employee on Adsler' under specified class/id.
You can add the script directly in the js file or add this jquery code by downloading a plugin naming as "simple custom css & js"
Hope this will work.

Place a text widget in the sidebar on the page and place the following html in it.
<span style="font-size: large;">Foobar</span>

How to create and reference custom heading ids with reStructuredText?

Currently, if I have:
My header
=========
`My header`_
rst2html Docutils 0.14 produces:
<div class="document" id="my-header">
<h1 class="title">My header</h1>
<p><a class="reference internal" href="#my-header">My header</a></p>
Is it possible to obtain the following ouptut instead:
<h1 class="title" id="my-custom-header">My header</h1>
<p><a class="reference internal" href="#my-custom-header">My header</a></p>
So note how I want two changes:
the id to be inside the heading, not on a separate div
control over the actual id
The closest I could get was:
<div class="document" id="my-header">
<span id="my-custom-header"></span>
<h1 class="title">My header</h1>
<p><a class="reference external" href="my-custom-header">My header</a></p>
but this is still not ideal, as I now have multiple ids floating around, and not inside the h1.
Asciidoc for example has that covered with:
[[my-custom-header]]
== My header
<<my-custom-header>>

Showing Number of Comments without the Link

When I use <?php comments_number('0', '1', '%'); ?> this code, that come with span HTML code. I need to show number of comments on a tag's title attribute. So how can I?
<div class="comment">
</div>
this is how is it looking

You can get the numeric value using get_comments_number():
<div class="comment">
</div>

xquery- how to assign a number(in sequence) when obtaining a number of records via single query using FLWOR expression

While trying to process and obtain data from an XML document, I want to obtain a number of records using a single FLWOR expression-- I am doing this by using the 'let' clause to obtain the data-- so I have a number of rows of data.
Eg.
<div id="main">
<p> Text #1</p>
<p> Text #2</p>
<p> Text #3</p>
<p> Text #4</p>
</div>
Now, I understand how to get the 4 'p' elements -- however I would like to also assign a sequence number to each line -- viz. I want to obtain the text this way--
<data>Text #1</data><sequence> Sequence #1</sequence>
<data>Text #2</data><sequence> Sequence #2</sequence>
<data>Text #3</data><sequence> Sequence #3</sequence>
<data>Text #4</data><sequence> Sequence #4</sequence>

There sure is a more elegant way than doing it like this.... but it works:
let $t := <div id="main">
<p>Text A</p>
<p>Text B</p>
<p>Text C</p>
<p>Text D</p>
</div>
for $pos in $t/p/position()
let $p := $t/p[$pos]
return (
<data>{ $p/text() }</data>,
<sequence>Sequence { $pos }</sequence>
)

While I'm not completely sure what string value you would to have for the <sequence/> element, one solution could look as follows:
for $p in <div id="main">
<p>Text #1</p>
<p>Text #2</p>
<p>Text #3</p>
<p>Text #4</p>
</div>/p
return (
<data>{ $p/text() }</data>,
<sequence>Sequence { replace($p, 'Text ', '') }</sequence>
)

You can use the at keyword in the FLWOR expression's for clause:
declare variable $div :=
<div id="main">
<p> Text #1</p>
<p> Text #2</p>
<p> Text #3</p>
<p> Text #4</p>
</div>;
for $p at $pos in $div/p
return (
<data>{$p/text()}</data>,
<sequence>Sequence #{$pos}</sequence>
)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

xpath result list with empty elements - web-scraping

I think I get it! //div[#class="row"]/div/text()[last()]

Related

Is it possible to to get an empty string in a list when there is no element, using CSS selector?

How to target specific line of text within text-widget in WordPress?

How to create and reference custom heading ids with reStructuredText?

Showing Number of Comments without the Link

xquery- how to assign a number(in sequence) when obtaining a number of records via single query using FLWOR expression

Categories

Resources