So I am working with the string method s.upper(), but when it executes it does not convert the whole sentence to uppercase. I am wondering if my understanding is wrong or I am doing something wrong.
I tried using the s.lower method and it gave me the same display with nothing changed.
Here is my code:
foods = ['soup', 'waffle', 'pizza']
for food in foods:
food[0].upper()
print(foods[0], foods[1], foods[2])
The code displays :
soup waffle pizza
when I expected
SOUP WAFFLE PIZZA
Here's a simple example that should elucidate the problem:
food = 'Soup'
food.upper()
print(food)
>>> 'Soup'
When you called food.upper(), it returned SOUP, but you didn't do anything with the result. Instead, you need to assign the result to something.
food = 'Soup'
food = food.upper()
print(food)
>>> 'SOUP'
This time, since you assigned the result of the call to food.upper() back to the variable food, it now has the value SOUP which you see when you print it.
Related
Good morning :)
I am trying to scrape the content of this website: https://public.era.nih.gov/pubroster/preRosIndex.era?AGENDA=438653&CID=102313
The text I am trying to get seems to be located inside some <p> and separated by <br>.
For some reason, whenever I try to access a <p>, I get the following mistake: "ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?", and this even if I do find instead of find_all().
My code is below (it is a very simple thing with no loop yet, I just would like to identify where the mistake comes from):
from selenium import webdriver
import time
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(executable_path='MYPATH/chromedriver',options=options)
url= "https://public.era.nih.gov/pubroster/preRosIndex.era?AGENDA=446096&CID=102313"
driver.maximize_window()
driver.implicitly_wait(5) # wait up to 3 seconds before calls to find elements time out
driver.get(url)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
column = soup.find_all("div", class_="col-sm-12")
people_in_column = column.find_all("p").find_all("br")
Is there anything obvious I am not understanding here?
Thanks a lot in advance for your help!
You are trying to select a lsit of items aka ResultSet multiples times which is incorrect meaning using find_all method two times but not iterating.The correct way is as follows. Hope, it should work.
columns = soup.find_all("div", class_="col-sm-12")
for column in columns:
people_in_column = column.find("p").get_text(strip=True)
print(people_in_column)
Full working code as an example:
from selenium import webdriver
import time
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
options = Options()
options.add_argument("headless")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url= "https://public.era.nih.gov/pubroster/preRosIndex.era?AGENDA=446096&CID=102313"
driver.maximize_window()
driver.implicitly_wait(5) # wait up to 3 seconds before calls to find elements time out
driver.get(url)
content = driver.page_source#.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
columns = soup.find_all("div", class_="col-sm-12")
for column in columns:
people_in_column = column.find("p").get_text(strip=True)
print(people_in_column)
Output:
Notice of NIH Policy to All Applicants:Meeting rosters are provided for information purposes only. Applicant investigators and institutional officials must not communicate directly with study section members about an application before or after the review. Failure to observe this policy will create a serious breach of integrity in the peer review process, and may lead to actions outlined inNOT-OD-22-044, including removal of the application from immediate
review.
I am trying to get some data from kick starter. How can use beautiful soup library?
Kick Starter link
https://www.kickstarter.com/discover/advanced?woe_id=2347575&sort=magic&seed=2600008&page=7
These are the following information I need
Crowdfunding goal
Total crowdfunding
Total backers
Length of the campaign (# of days)
This is my current code
import requests
r = requests.get('https://www.kickstarter.com/discover/advanced?woe_id=2347575&sort=magic&seed=2600008&page=1')
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('div', attrs={'js-react-proj-card grid-col-12 grid-col-6-sm grid-col-4-lg'})
len(results)
i'll give you some of hint that i know, and hope you can do by yourself.
crawling has legal problem when you abuse Term of Service.
find_all should use with 'for' statment. it works like find all on web page(Ctrl + f).
e.g.
for a in soup.find_all('div', attrs={'js-react-proj-card grid-col-12 grid-col-6-sm grid-col-4-lg'}):
print (a)
3.links should be open 'for' statement. - https://www.kickstarte...seed=2600008&page=1
bold number repeated in for statement, so you can crawling all data In orderly
4.you sholud linked twice. - above link, there is list of pj. you should get link of these pj.
so code's algorithm likes this.
for i in range(0,10000):
url = www.kick.....page=i
for pj_link in find_all(each pj's link):
r2 = requests.get(pj_link)
soup2 = BeautifulSoup(r2.text, 'html.parser')
......
There seems to be 3 ways to display output in Jupyter:
By using print
By using display
By just writing the variable name
What is the exact difference, especially between number 2 and 3?
I haven't used display, but it looks like it provides a lot of controls. print, of course, is the standard Python function, with its own possible parameters.
But lets look at a simple numpy array in Ipython console session:
Simply giving the name - the default out:
In [164]: arr
Out[164]: array(['a', 'bcd', 'ef'], dtype='<U3')
This is the same as the repr output for this object:
In [165]: repr(arr)
Out[165]: "array(['a', 'bcd', 'ef'], dtype='<U3')"
In [166]: print(repr(arr))
array(['a', 'bcd', 'ef'], dtype='<U3')
Looks like the default display is the same:
In [167]: display(arr)
array(['a', 'bcd', 'ef'], dtype='<U3')
print on the other hand shows, as a default, the str of the object:
In [168]: str(arr)
Out[168]: "['a' 'bcd' 'ef']"
In [169]: print(arr)
['a' 'bcd' 'ef']
So at least for a simple case like this the key difference is between the repr and str of the object. Another difference is which actions produce an Out, and which don't. Out[164] is an array. Out[165] (and 168) are strings. print and display display, but don't put anything on the Out list (in other words they return None).
display can return a 'display' object, but I won't get into that here. You can read the docs as well as I can.
I am learning from the book Learn Python The Hard Way 3.6, by Zed Shaw
There are a series of 6 target.write commands towards the bottom of the script and he wants me to simplify them into a single target.write command using strings formats and escapes. However, I am stuck.
Here is the original code:
from sys import argv
script, filename = argv
print(f"We're going to erase {filename}")
print("If you don't want that, hit CTRL-C (^C).")
print("If you do want that, hit RETURN.")
input("?")
print("Opening the file...")
target = open(filename,'w')
print("Truncating the file. Goodbye!")
target.truncate()
print("Now I'm going to ask you for three lines")
line1 = input("line 1:")
line2 = input("line 2:")
line3 = input("line 3:")
print("Im going to write these to the file.")
target.write(line1)
target.write("\n")
target.write(line2)
target.write("\n")
target.write(line3)
target.write("\n")
print("And finnaly, we close it")
target.close()
So far I have tried
target.write(line1),(line2),(line3)
but this gives a logical error of only writing to one line not all three.
target.write(line1) + (line2) + (line3)
with this one I get error
'unsupported operand types for +: 'int' + 'str'
target.write(line1),\n,(line2)\n(line3),\n
with this one I get error:
unexpected character after line continuation character
(<string>,line 22)
I have been googling and searching here for answers but have not found anything. One person posted a very similar question except for Zed's 2.7 book. However I am reading Zed's 3.6 book so the answers were no help to me unfortunately.
I'm not sure what you have and haven't covered so far in the book as I'm not familiar with it but one way to do what you want is to format the string first and then pass it to the write method like this:
target.write("{0}\n{1}\n{2}\n".format(line1, line2, line3))
The following works and retrurns a list of semmingly random tracks which GraceNote thinks are similar to Bowe's work:
radioPlayList = pygn.createRadio(GRACENOTE_CLIENT_ID, GRACENOTE_USER_ID, artist='Bowie', count='3');
However, I would strongly prefer to pass a genre, rather than an atrist - I just can't figure our how.
This radioPlayList = pygn.createRadio(GRACENOTE_CLIENT_ID, GRACENOTE_USER_ID, genre='38', count='3'); returns <RESPONSES>\n <RESPONSE STATUS="NO_MATCH">\n </RESPONSE>\n</RESPONSES> which lead me to beleive that Genre should not just be a simple number.
And trying to give the genre as a text, radioPlayList = pygn.createRadio(GRACENOTE_CLIENT_ID, GRACENOTE_USER_ID, genre='Oldies', count='3'); gives <RESPONSES>\n <MESSAGE>GCSP: RADIOCREATE error: [8] radio: Invalid attribute seed.</MESSAGE>\n <RESPONSE STATUS="ERROR">\n </RESPONSE>\n</RESPONSES>\n so that is obviously not the way to do it.
QUESTION: how can I pass a Genre (only) and get a radio playlist in return?
The only Pygn docuemntation which I can find does not help. I am hoping that #cweichen will se thsi question & help me. Does anyone else know how?
[Update] Looking in the code of Pygn's test.py, I see
# Example how to create a radio playlist by genre classical music
result = pygn.createRadio(clientID, userID, genre='36061', popularity ='1000', similarity = '1000')
print(json.dumps(result, sort_keys=True, indent=4))
Question: where do I get a list of those genre values? The file readme.md says genre: a genre ID from the genres below, but here is no list below.
To get the list of genres (or moods, or eras) you need to make a call to the "fieldvalues" API - this isn't in pygn yet, but you can see how to do it here:
https://developer.gracenote.com/rhythm-api#attribute-station
This call will give you the list of supported genres:
https://cXXXXXXX.web.cddbp.net/webapi/json/1.0/radio/fieldvalues?fieldname=RADIOGENRE&client=CLIENT_ID&user=USER_ID
You can then use the returned ID's with pygn.createRadio()