Is it possible to to get an empty string in a list when there is no element, using CSS selector? - css

I want to scrape some items, which are on the same page, using Scrapy.
HTML looks like this:
<div class="container" id="1">
<span class="title">
product-title1
</span>
<div class="description">
product-desc
</div>
<div class="price">
1.0
</div>
</div>
I need to extract name, description and price.
Unfortunately, sometimes product doesn't have the description and HTML look like this:
<div class="container" id="2">
<span class="title">
product-title2
</span>
<div class="price">
2.0
</div>
</div>
Currently I am using CSS selectors which returns list of all elements existing on the website:
title = response.css('span[class="title"]').extract()
['product-title1', 'product-title2', 'product-title3']
description = response.css('div[class="description"]').extract()
['desc1','desc3']
price = response.css('div[class="price"]').extract()
['1.0','2.0','3.0']
Is it possible to get for example an empty string in place of missing 'desc2' when description object isn't there, using CSS selector?

I recommend you to rewrite you code:
for section in response.xpath('//div[#class="container"]'):
title = section.xpath('./span[#class="title"]/text()').get(default='not-found') # you can use any default value here or just empty string
desctiption = section.xpath('./div[#class="description"]').get()
price = section.xpath('./div[#class="price"]/text()').get()

Check this out..
for section in response.xpath('//div[#class="container"]'):
title = section.xpath('./span[#class="title"]/text()').get()
desctiption_tag = section.xpath("//div[contains(#class,'description')]")
if desctiption_tag:
desctiption = section.xpath('./div[#class="description"]').get()
else:
desctiption = "String"
price = section.xpath('./div[#class="price"]/text()').get()

Related

How to create and reference custom heading ids with reStructuredText?

Currently, if I have:
My header
=========
`My header`_
rst2html Docutils 0.14 produces:
<div class="document" id="my-header">
<h1 class="title">My header</h1>
<p><a class="reference internal" href="#my-header">My header</a></p>
Is it possible to obtain the following ouptut instead:
<h1 class="title" id="my-custom-header">My header</h1>
<p><a class="reference internal" href="#my-custom-header">My header</a></p>
So note how I want two changes:
the id to be inside the heading, not on a separate div
control over the actual id
The closest I could get was:
<div class="document" id="my-header">
<span id="my-custom-header"></span>
<h1 class="title">My header</h1>
<p><a class="reference external" href="my-custom-header">My header</a></p>
but this is still not ideal, as I now have multiple ids floating around, and not inside the h1.
Asciidoc for example has that covered with:
[[my-custom-header]]
== My header
<<my-custom-header>>

Rename multiple css with identical name

I'm working inside a templated system where i can implement code, but i can't modified the core of the file. My layer are stacked like this:
<div class="layer1">
<div class="layer2">
<div class=" layer3">
<div class="layer4">
</div>
</div>
</div>
</div>
<div class="layer1">
<div class="layer2">
<div class=" layer3">
<div class="layer4">
</div>
</div>
</div>
</div>
<div class="layer1">
<div class="layer2">
<div class=" layer3">
<div class="layer4">
</div>
</div>
</div>
</div>
As you can see, my class all have the same name (layer1, layer2, etc...). I want to know if there's a way by using Javascript, Jquery or any other online client side library to modify the CSS class name so, for example, the first layer1 become level1 and the following layer1 become level 2?
Thank for your answer!
As other people already said, jQuery actually does what you want.
As long as you don't know the number of “layers” you have, you better find all elements by classname substring:
$('*[class^="layer"]')
Then you can get the list of the element classes and change old names to new ones.
Many different ways to do this:
Solution 1:
Use addClass() and removeClass()
$(".layer1").removeClass('old_class').addClass('new_class');
Replace old_class with your older class and new_class with your new class
Solution 2:
If you are able to get the element by ID
You can set the class by using .attr()
$("#id").attr('class', 'new_class');
an all around solution working with className :
var elem=document.querySelectorAll('[class^="layer"]') ;
for(i in elem){
x = elem[i].className;
var y=x.replace("layer" , "level");
elem[i].className=y||x;
}

Showing Number of Comments without the Link

When I use <?php comments_number('0', '1', '%'); ?> this code, that come with span HTML code. I need to show number of comments on a tag's title attribute. So how can I?
<div class="comment">
</div>
this is how is it looking
You can get the numeric value using get_comments_number():
<div class="comment">
</div>

Loading text onto a webpage using ASP.NET

Which is the best way to load text into a webpage using ASP.NET
I have 3 records in a MS SQL database table. Each record would correspond to a paragraph in a webpage.
Table:
The HTML code for the About us page would look like this (note I put the corresponding paragraphs) :
<div id="content-1b">
<div id="content-1-1b">
</div>
<div id="content-1-2b"><p class="none">paragraph 1</p>
<p class="nonetop">paragraph 2</p>
</div>
</div>
<div id="content-3b"><p class="nonetop">paragraph 3</p>
</div>
Regards,
Tea
Declare these Divs as server controls by giving an unique Id a d runat=server so that you can access the object back in code behind.
These divs are equavilent to HTMLGenericControl so you can just say divid.innerHTML = content.
Or you can also use a repeater and bind to your div
Setting your divs or paragraph tags as runat="server" with an ID will allow you to access innerhtml/text
HTML:
<div id="content-1b">
<div id="content-1-1b">
</div>
<div id="content-1-2b"><p class="none" id="p1" runat="server">paragraph 1</p>
<p id="p2" runat="server" class="nonetop">paragraph 2</p>
</div>
</div>
<div id="content-3b"><p id="p3" runat="server" class="nonetop">paragraph 3</p>
</div>
Running code like: (assuming your content is in a datatable and that you want the first item (0) from each row)
For each row as Datarow in dt
p1.InnerHtml += Row.Item(0).ToString
Next
Will loop through your datatable, get the first column from each row and append the text to the paragraph tag.
Update:
For Single Content
If your SQL returns 3 rows, 1 for each area you want to fill simply do:
Assuming dt is the datatable you are filling and that the first column of each row is your content.
p1.InnerHTML = dt.rows(0).Item(0)
p2.InnerHTML = dt.rows(1).Item(0)
p3.InnerHTML = dt.rows(2).Item(0)

Does IE7 have a problem applying CSS to dynamically created DOM Elements?

I'm building an HTML page that uses endless scrolling functionality to render new list elements as you scroll down, like on Facebook. I'm using the jquery.pageless plugin.
The thing is, now testing it on IE7, when I load the dynamic content, none of it is styled like it should be. The first set (20 rows), which the server generated in the html page, look fine. Then the next 20-per-page that are rendered with javascript, don't have any of the styles applied.
How do I fix this? Having a hard time testing it in IE7 from a mac. Is this a problem with IE7? Or could perhaps the elements not be being appended to the correct parent div in IE (using jQuery so I doubt this)? Or is there a common hack for reloading the stylesheets after every dynamically created html element is added?
The doctype is HTML5: <!DOCTYPE html>
Thanks for the advice!
Update:
Looking in the IE7 developer panel, the HTML is being spit out all malformed. The first time around it looks like this:
<article class='community-page page none vevent' data-status='available' data-type='community_page' itemscope='itemscope' itemtype='http://www.data-vocabulary.org/Event'>
<figure class='snapshot'><time class='availability dtstart' datetime='2010-12-16T00:00:00-08:00' itemprop='startDate' title='2010-12-16T00:00:00-08:00'><span class='value-title' title='2010-12-16T00:00:00-08:00'></span></time><span></span><img alt="Logo for Heavenly Cleaning" class="photo" src="/images/41/heavenly-cleaning-logo-small.JPG?1297971958" title="Logo for Heavenly Cleaning" />
Like
</figure>
<section class='details' itemprop='seller' itemtype='http://data-vocabulary.org/Organization'>
<header class='header'>
<hgroup>
<h3 class='user fn org' itemprop='name'>
Name<span class='hyphen'>-</span><span class='distance'>Wheaton, IL</span>
</h3>
<h2 class='title'><span class='quotation-mark'>"</span>Quote<span class='quotation-mark'>"</span><time class='expiration-date dtend'><span class='value-title' title='2011-12-16T00:00:00-08:00'></span></time></h2>
</hgroup>
</header>
<p class='highlights'></p>
<p class='description' itemprop='description'></p>
<footer class='footer'>
<address class='location adr' itemprop='address' itemscope='itemscope' itemtype='http://data-vocabulary.org/Address'>
<span class='locality' itemprop='locality'></span>
<span class='geo' itemprop='geo' itemtype='http://data-vocabulary.org/Geo'>
<span class='latitude' itemprop='latitude'>
<span class='value-title' title='41.850249'></span>
</span>
<span class='longitude' itemprop='longitude'>
<span class='value-title' title='-88.0855459'></span>
</span>
</span>
<span class='tel' itemprop='tel'></span>
</address>
Category: Home
</footer>
</section>
</article>
The second time around it looks more like this:
<article class='community-page page none vevent' data-status='available' data-type='community_page' itemscope='itemscope' itemtype='http://www.data-vocabulary.org/Event'/>
<figure class='snapshot'/>
<time class='availability dtstart' datetime='2010-12-16T00:00:00-08:00' itemprop='startDate' title='2010-12-16T00:00:00-08:00'/>
<span class='value-title' title='2010-12-16T00:00:00-08:00'/>
</time/>
<a href="/users/25?page_id=25" class="fancy-ajax logo">
<span/>
<img alt="Logo for Heavenly Cleaning" class="photo" src="/images/41/name-logo-small.JPG?1297971958" title="Logo for Heavenly Cleaning" />
</a>
Like
</figure/>
...
I am returning it as a json string and appending it like this:
$(window).load(function() {
var params = paginator;
params.dataType = "string";
$('#content').pageless({
url: window.location.pathname,
params: params,
distance: 500,
totalPages: 10,
loaderImage: "/images/loaders/load.gif",
scrape: function(data) {
var data = $.parseJSON(data);
var paginator = data.paginator;
var search = data.search;
var html = data.pages // html string;
if (data.more == false) {
$.pageless.settings.totalPages = $.pageless.settings.currentPage;
if($.pageless.settings.totalPages <= $.pageless.settings.currentPage){
$.pageless.stopListener();
}
}
$.pageless.settings.params = {dataType: "string", paginator: paginator, q: search.q, c: search.c, l: search.l, a: search.a};
return html;
}
});
})
Since you're using HTML5 elements, I assume you're using HTML5Shiv or Modernizr to hack IE to support those elements?
If not then yes, you will definitely have issues, since IE6/7/8 will simply not recognise those tags as valid HTML.

Resources