BeautifulSoup find specific string - web-scraping

I just started a little 'project' to learn beautiful soup and even though BS website is massive I couldn't find answer to my question.
I'm analysing Billboard100 and managed to get a list of all songs by their divs as an array - nice, I can call separate list for different songs. The problem starts when I need to get few details - name of song, name of artist etc from that list. I tried text.strip() and then split() and indexing but different songs have different details in different positions, which means I should probably find them by div classes as they use the same ones for all songs and that's where I get stuck.
<div class="chart-list-item__title">
<span class="chart-list-item__title-text">
Mona Lisa
</span>
</div>
<div class="chart-list-item__artist">
Lil Wayne Featuring Kendrick Lamar
</div>
That's just a bit of the code - let's say I'm trying to get 'Mona Lisa' and 'Lil Wayne Featuring Kendrick Lamar'. Is there a way of using BeautifulSoup on HTML that I already extracted from the original HTML?

You should be able to find the div with the desired class name:
This code assumes that you have just the card (the list item for your desired song) as your soup, not the whole page:
title = card.find("div", {"class": "chart-list-item__title"}).contents[0]
artist = card.find("div", {"class": "chart-list-item__artist"}).contents[0]

Related

Two Collections in one Each loop

Collection "notes" (markdown): meteor **amazing** and reactjs *learning next*
Collection "links": google - www.google.com and stackoverflow - www.stackoverflow.com
I want to display the notes and links in ONE LIST sorted by the creation date:
1. google - www.google.com
2. meteor - amazing
3. reactjs - learning next
4. stackoverflow - www.stackoverflow.com
and NOT like that:
1. meteor - amazing
2. reactjs - learning next
1. google - www.google.com
2. stackoverflow - www.stackoverflow.com
The "notes" and "links" collections have a completely different structure:
notes = new Mongo.Collection('notes');
{{#each note}}
{filename}}
{{#markdown}} {{note}} {{/markdown}}
{{/each}}
links = new Mongo.Collection('links');
{{#each link}}
{filename}}
<a href={{link}}> {{link}} </a>
{{/each}}
QUESTION:
Should I have one collection for both of them? Is there a Package for this? Or how can I solve this?
If one Item is changed, only this Item should be rendered again.
.fetch() the documents and keys you want from each collection.
.map() the keys from each array into a set of common keys
You'll now have two arrays with common keys
Append one array to the other
Sort the whole array by the key you want to sort by
Return that array from your helper
Display in a single template
Alternatively you can omit step 2 and keep each document in its original structure but then your template is going to have to recognize what kind of document its rendering and display it accordingly.

Creating view with node reference

I am using drupal 6. please give me idea about how should i create view with below requirements,
There will be list of awards (gold , silver etc)
There will be list of companies who won one of the above award (comp1->gold, comp2->gold, comp3->silver etc)
I need to display list of awards first and when user clicks on any award he will be redirected to page having list of companies who won that award.
I created two content types for Awards and Companies and award_id is used as node reference in Company content type. Please guide further.I am using drupal 6. Thanks.
Fist task (just a simple list of awards is easy - just create a view that displays nodes in "awards" type, order as you want, limit as you want...
Get also node id filed, if it's not available at start, since you gonna need it.
For the second task - create a new view, that will list companies and under arguments add that award id. Then, you should pass award id to page as extra parameter:
/awords/3
Where 3 is award id.
Sorry, Drupal 6 is pretty old and I didn't use it lately, but that's the basic idea.

DBpedia : Get list of Chinese universities and their adresses to populate google map?

I'm trying to get list of Chinese universities and their adresses. The minimum being the City/Town name. I will use these addresses to populate a googlemap, fiddle here.
I saw interesting code such as:
SELECT ?resource ?value
WHERE {
?resource a <http://dbpedia.org/class/yago/CitiesAndTownsInDenmark> .
?resource <http://dbpedia.org/property/populationTotal> ?value .
FILTER (?value > 100000)
}
ORDER BY ?resource ?value
Since CitiesAndTownsInChina doesn't work,
1. Where to find the exact name of the class I'am targeting ? and
2. Where to find dbpedia's operators manual ?
Note: I'am a very active user on Wikipedia, I'am well aware of all the data available there, but the dbpedia ontology/syntaxe/keywords is quite hard to get.
Personal note: queries on http://dbpedia.org/snorql/ , http://dbpedia.org/sparql/ , http://querybuilder.dbpedia.org/
(Expanding on my reply to How to find cities with more than X population in a certain country)
CitiesAndTownsInDenmark exists because people use the category http://en.wikipedia.org/wiki/Category:Cities_and_towns_in_Denmark in wikipedia. Wikipedia categories are pretty loose and as a result there's a lot of variation in style, so even if a useful category exists the name may not be guessable.
In addition categories are maintained manually, and may not be consistently applied.
A good place to start is looking at the data. Visiting http://dbpedia.org/page/Beijing I see yago:MetropolitanAreasOfChina which seems promising, but if you follow that link you'll see it's not well populated.
As a consequence avoid relying on the existence of such categories and directly querying for populated places in a country. This information comes from wikipedia infoboxes, and they're much more consistent than categories. Taking Beijing as an exemplar again I found:
select ?s {
?s a <http://dbpedia.org/ontology/PopulatedPlace> ;
<http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/China>
}
(The relevant properties and values for my query were found by copying link location in the Beijing page)
with the result:
"http://dbpedia.org/resource/Hulunbuir"
"http://dbpedia.org/resource/Guangzhou"
"http://dbpedia.org/resource/Chongqing"
"http://dbpedia.org/resource/Kuqa_County"
"http://dbpedia.org/resource/Changzhou"
... nearly 3000 results ...
You'll notice that position is encoded multiple times (geo:lat and long, georss:point, various dbpprop:latd longd things), and there seem to be two values excitingly. You can either simply deal with the multiple values in whichever format you prefer, or try picking just one using GROUP BY and SAMPLE.
As for a manual, almost everything I know of are academic papers, and not very useful. However the data is reasonably self documenting.
for your first question:
you can see possible classes by querying one member of your intended set of entities (ex: Shanghai).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?type WHERE {
<http://dbpedia.org/resource/Shanghai> rdf:type ?type.
FILTER regex(str(?type), ".*China", "i").
} LIMIT 100
which gives this result:
dbpedia:class/yago/MetropolitanAreasOfChina [http]
dbpedia:class/yago/PortCitiesAndTownsInChina [http]
dbpedia:class/yago/MunicipalitiesOfThePeople'sRepuBlicOfChina [http]
dbpedia:class/yago/PopulatedCoastalPlacesInChina [http]
they are CamelCase versions of the categories that you will find at the bottom of wikipedia pages. I was fooled for a while by the erroneous capitalization of RepuBlic and finally saw that it contains only 4 cities, so it is of limited use for you.
so I would propose to go with #user205512 answer and get the cities by linking 2 properties.
for your second question:
I would advice you to search/ask on http://answers.semanticweb.com

How can I transform this haml into a table?

I have the following haml code:
- #theLinks.each_index do |x|
%br
%form{:action=>'/Download', :method=>"post",:enctype=>"multipart/form-data"}
%input{:type=>"submit", :name=>"#{#theLinks[x].url}", :value=>"Name: #{#theLinks[x].Name} Study Time: #{#theLinks[x].studyTime} Comments: #{#theLinks[x].comments}"}
Basically, for each person, list the time they participated in a study and the comments on the study. Right now, this renders as a set of buttons. I'd like to render it as a table, with each row clickable in the same way (ie, using the 'post' method, so that only the haml file has to be edited without touching the rest of the files).
Ideally, I'd also like to be able to sort the table by name, time, or comments, but that might be getting ahead of myself.
So how can I change this list of buttons into a table with clickable rows?
Okay, how about this code? This makes a table with three columns, one for the name(clickable buttons like what you did), one for the time spent, and one for comments. Time and comments are just plain text, so only the name is clickable. In the future, if you want to add sorting, just convert the table headers to links that have ajax functions in them for sorting. I think jQuery has a function/plugin for sorting tables so you can just look into their doc(if you use jquery)
%table
%tr
%th Name
%th Time spent
%th Comments
- #theLinks.each do |link|
%tr
%td
%form{:action=>'/Download', :method=>"post",:enctype=>"multipart/form-data"}
%input{:type=>"submit", :name=>"#{link.url}", :value=>"Name: #{link.Name}}"
%td= "Study Time: #{link.studyTime}"
%td= "Comments: #{link.comments}"

Sorting grouped nodes by taxonomy term

Ok, here's the problem:
I have a list of contacts, which i have created in views, that are grouped by taxonomy terms like so:
(term:) Staff:
(node:) John Doe
john#doe.com
(node:) Jane Doe
jane#doe.com
(term:) Management:
Fred Doe
fred#doe.com
and so on...
As it is now, i have no idea what decides the order of the taxonomy terms (ie: why is the 'Staff' nodes coming before the 'Management nodes').
So what i need to do is to be able to sort the order of the terms, and also the order of the nodes in each 'category' (or what you would call it).
I have tried to sort the terms by weight, but the only thing that happens is that i get duplicated nodes output, and nothing happens with the order of the actual terms.
As for the order of the nodes, i was thinking that maybe a hidden CCK-field with some sort of weight, but i dont know. But the biggest problem is still the order of the categories.
If anyone has an answer to this it would be very helpful.
Thank you.
EDIT:
Strange, i tried that before i asked the question, but now it seems to work. However i still get duplicated nodes when i sort by taxonomy weight, for some reason. I really need to get rid of those. Heres how my view setup look, if its any help:
Fields: taxonomy=all terms (limited to one vocabulary)
image attach content
Sort criteria:
Taxonomy weight:descending
Filters: Taxonomy term id(with depth) // to filter out what page it belongs
Node type : contact
node published : yes
dont know if that information helps at all
/Anders
The solution is simple, in views you can sort your result by the taxonomy term. You have 3 options as default.
From the views interface:
Term Taxonomy terms. Note that using this can cause duplicate nodes to appear in views; you must add filters to reduce the result set.
Term ID The taxonomy term ID
Taxonomy Weight The term weight field
The sorting in views, is located in the top right corner, and gives a wealth of options as to how you want to sort your results.
Edit:
Duplicates is a known problem with taxonomy terms. The problem is that if a node has two terms that fit it will be included once for each term. When you use the taxonomy term filter, you can reduce duplicates, which should fix your problem:
http://grab.by/16vw
I seldom have sort problems with views, but I have to admit it's not something I ever really focused on. Here's a short list of things you might wish to check. If that doesn't solve, it would be great if you could provide some more detail on your settings and what appears to be the default sorting in your current configuration.
How did you set the sort criteria in the views UI? You have basic settings available there (top right of the UI panel). See below for some screenshots that should help you finding your way around the configuration.
How did you set your taxonomy term order (accessible from somewhere similar to: http://example.com/admin/content/taxonomy/3 - where the number is the taxonomy ID).
Here is some more information on sorting capabilities of views.
Screenshots on how to configure sorting
NOTE: In this example I show how to sort nodes according to whether they are published or not, but the procedure applies equally for taxonomy terms.
In this view I already set up some sorting, add yours by clicking on the + button
alt text http://img15.yfrog.com/img15/7118/screenshot005vy.png
Select what kind of content you want to sort
link text http://img3.yfrog.com/img3/2341/screenshot006jkz.png
Select the information you want your content to be sorted by
alt text http://img3.yfrog.com/img3/4816/screenshot007nt.png
And finally select the direction of sorting!
alt text http://img37.yfrog.com/img37/9806/screenshot008ah.png
Now you should be good to go! :)
Hope this helps!

Resources