How to limit the number of elements found by BeautifulSoup? - web-scraping

While scraping a webpage using BeautifulSoup, is there a way to limit the number of elements found by the find method family.
For eg if I want only the first 5 tags can I do this using BeautifulSoup?

.find_all() and .select() return standard python list, so you can use for example [:5] to get only first 5 results:
from bs4 import BeautifulSoup
txt = '''
<div>Tag 1</div>
<div>Tag 2</div>
<div>Tag 3</div>
<div>Tag 4</div>
<div>Tag 5</div>
<div>Tag 6</div>
<div>Tag 7</div>
'''
soup = BeautifulSoup(txt, 'html.parser')
for div in soup.find_all('div')[:5]:
print(div.text)
Prints:
Tag 1
Tag 2
Tag 3
Tag 4
Tag 5
EDIT: You can use CSS selector for selecting first 5 elements:
from bs4 import BeautifulSoup
txt = '''
<div>Tag 1</div>
<div>Tag 2</div>
<div>Tag 3</div>
<div>Tag 4</div>
<div>Tag 5</div>
<div>Tag 6</div>
<div>Tag 7</div>
'''
soup = BeautifulSoup(txt, 'html.parser')
for div in soup.select('div:nth-of-type(-n+5)'):
print(div.text)
Prints:
Tag 1
Tag 2
Tag 3
Tag 4
Tag 5

Related

Select divs after same level element

How can I select one element after a specific element? In e.g.
<div class="general">
<div class="inner">Foo 1</div>
<div class="inner">Foo 2</div>
<div class="inner">Foo 3</div>
<div class="header">Bar</div>
<div class="inner">Foo 4</div>
<div class="inner">Foo 5</div>
<div class="inner">Foo 6</div>
<div class="inner">Foo 7</div>
<div class="inner">Foo ...</div>
<div class="inner">Foo n</div>
</div>
How can I select the 4 div.inner after .header without selecting first three div.inner?
FYI, I am unable to modify any HTML, I am able to modify only CSS.
EDIT
I have not made the correct question. I just need all the .inner elements after .header skipping the first n ('n' might change in the future) elements.
EDIT
Goal is to style "Foo 7" to "Foo n", inclusive.
use the general sibling selector ~ as it will only match if the sibling comes afterwards.
.header ~ .inner { }
Okay, now that the question has been edited once more, again making my previous answer looking wrong, I change my answer once more to make it fit the question:
You can use several + selectors to "count up" to the first sibling that should be affected and then add a ~ selector after that to select all following siblings with the same class:
.header + .inner + .inner + .inner ~.inner {
color: red;
}
<div class="general">
<div class="inner">Foo 1</div>
<div class="inner">Foo 2</div>
<div class="inner">Foo 3</div>
<div class="header">Bar</div>
<div class="inner">Foo 4</div>
<div class="inner">Foo 5</div>
<div class="inner">Foo 6</div>
<div class="inner">Foo 7</div>
<div class="inner">Foo ...</div>
<div class="inner">Foo ...</div>
<div class="inner">Foo ...</div>
<div class="inner">Foo n</div>
</div>
That's the best way to get what you're looking for
I hope I get it right
.header ~ .inner
demo Link

How to select element with CSS when some items are nested before

In this scenario is it possible to use CSS selector to only select item 4?
<div>
<div class="a">item 1</div>
<div class="a">item 2</div>
<div class="a">item 3</div>
</div>
<div class="a">item 4</div>
<div class="a">item 5</div>
In this exact example, yes - though if you get much more complex it's going to get ugly and you'll probably need some (relatively simple) JavaScript.
In this example just use the Adjacent Sibling selector in combination with the :not() pseudo-class. This will target any class="a" that immediately follows a div that's not class="a".
div:not(.a) + .a {
color: red;
}
<div>
<div class="a">item 1</div>
<div class="a">item 2</div>
<div class="a">item 3</div>
</div>
<div class="a">item 4</div>
<div class="a">item 5</div>

Mosaic layout with angular material

I'm wodering if it's possible to make a mozaic layout with angular material. What I want to do is to display section differently depending on section count. For example if I have 3 sections to dispay it would be something like
Section1 Section2 Section3
for 4 sections it would be like
Section1 Section2
Section3 Section4
I couldn't find any examples for this. Thanks
You can do this by defining the flex directive on each of the elements as defined in the layout documentation for Angular Material.
To achieve the layout defined in the question, you could use the directives in this manner:
<div layout="row" layout-sm="column">
<div flex>Section 1</div>
<div flex>Section 2</div>
</div>
<div layout="row" layout-sm="column">
<div flex>Section 3</div>
<div flex>Section 4</div>
</div>

Quick nth-child issue

I've a quick :nth-child issue that I'm struggling to solve. I'm aiming to target every 3rd and 4th item in a grouping of 4 items that form a list.
For example:
<div class="normal">Item 1</div>
<div class="normal">Item 2</div>
<div class="different">Item 3</div>
<div class="different">Item 4</div>
<div class="normal">Item 5</div>
<div class="normal">Item 6</div>
<div class="different">Item 7</div>
<div class="different">Item 8</div>
In this example I'd like to target all instances of <div class="different"> - i've used a lot of nth-child generators to come up with an answer but nothing gets me to what I need.
Any help would be much appreciated!
Use div:nth-child(4n-1), div:nth-child(4n). The logic is simple — you want to select items in groups of four, so 4n would be the common denominator. Since you want to select the penultimate and the last items in the group, 4n-1 and 4n respectively would do the job.
As follow is a simple diagram illustrating my point:
#1
#2
#3 <- 4th item - 1
#4 <- 4th item
#5
#6
#7 <- 4th item -1
#8 <- 4th item
div:nth-child(4n-1), div:nth-child(4n) {
background-color: #eee;
}
<div class="normal">Item 1</div>
<div class="normal">Item 2</div>
<div class="different">Item 3</div>
<div class="different">Item 4</div>
<div class="normal">Item 5</div>
<div class="normal">Item 6</div>
<div class="different">Item 7</div>
<div class="different">Item 8</div>

Custom table with neutral tag between rows

I created the following custom table:
http://jsfiddle.net/pg92nzh2/ (live example)
<div class="table-custom">
<div class="table-heading">
<div class="table-cell">title 1</div>
<div class="table-cell">title 2</div>
<div class="table-cell">title 3</div>
<div class="table-cell">title 4</div>
</div>
<div class="table-row">
<div class="table-cell">test 1</div>
<div class="table-cell">test 2</div>
<div class="table-cell">test 3</div>
<div class="table-cell">test 4</div>
</div>
<div class="table-row">
<div class="table-cell">test 1</div>
<div class="table-cell">test 2</div>
<div class="table-cell">test 3</div>
<div class="table-cell">test 4</div>
</div>
</div>
Everything works great but I would like to be able to put a tag between rows (div, span or whatever) that won't affect the behavior of my table (the final goal is to use ng-repeat to this tag so I can have modular tables). Here is an example of what I would like to do: http://jsfiddle.net/gzdrz9mr/
I would like to put this tag preferably wherever I want without affecting anything...

Resources