Web Scraping Yahoo Finance Recommendation Rating

Web Scraping Yahoo Finance Recommendation Rating - web-scraping

I am trying to web scrape Yahoo's Finance Recommendation Rating using BeautifulSoup but it keeps returning 'None'.
E.g. Recommendation Rating for AAPL is '2'
https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL
Please advise. Thank you!
Below is the code:
from requests import get
from bs4 import BeautifulSoup
tickers = ['AAPL']
url = 'https://sg.finance.yahoo.com/quote/%s/profile?p=%s'%(ticker, ticker)
print(url)
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
#yf_rec refers to yahoo finance recommendation
try:
yf_rec = html_soup.find('div', attrs={'class':'B(8px) Pos(a) C(white) Py(2px) Px(0) Ta(c) Bdrs(3px) Trstf(eio) Trsde(0.5) Arrow South Bdtc(i)::a Fw(b) Bgc($buy) Bdtc($buy)'}).text.strip()
except:
pass
print(yf_rec)

Related

Unable to scrape Craigslist with Beautifulsoup

I am just learning and new to scraping
Yesterday I was able to scrape craigslist with a beautiful soup. Today I am unable to.
Here is my code to scrape the first page of rental housing search result on CL.
from requests import get
from bs4 import BeautifulSoup
#get the first page of the san diego housing prices
url = 'https://sandiego.craigslist.org/search/apa?hasPic=1&availabilityMode=0&sale_date=all+dates'
response = get(url) # link exlcudes posts with no picures
html_soup = BeautifulSoup(response.text, 'html.parser')
#get the macro-container for the housing posts
posts = html_soup.find_all('li', class_="result-row")
print(type(posts)) #to double check that I got a ResultSet
print(len(posts)) #to double check I got 120 (elements/page)
The html_soup is not the same as it is in the actual url. It actually has the following in there:
<script>
window.cl.specialCurtainMessages = {
unsupportedBrowser: [
"We've detected you are using a browser that is missing critical features.",
"Please visit craigslist from a modern browser."
],
unrecoverableError: [
"There was an error loading the page."
]
};
</script>
Any help would be much appreciated.
I am not sure if I've potentially been 'blocked' somehow from scraping. I read this article about proxies and rotating IP addresses, but I do not want to break rules if I've been blocked, and also do not want to spend money on this. Is it not allowed to scrape craigslist? I have seen so many educational tutorials on it so thought it was okay.

import requests
from pprint import pp
def main(url):
with requests.Session() as req:
params = {
"availabilityMode": "0",
"batch": "8-0-360-0-0",
"cc": "US",
"hasPic": "1",
"lang": "en",
"sale_date": "all dates",
"searchPath": "apa"
}
r = req.get(url, params=params)
for i in r.json()['data']['items']:
pp(i)
break
main('https://sapi.craigslist.org/web/v7/postings/search/full')

Newbie, Scraping Issue , FUTBIN web scraping issue

I'm new to web scraping and i was trying to scrape through FUTBIN (FUT 22) player database
"https://www.futbin.com/players" . My code is below and I don't know why if can't get any sort of results from the FUTBIN page but was successful in other webpages like IMDB.
CODE :`
import requests
from bs4 import BeautifulSoup
request = requests.get("https://www.futbin.com/players")
src = request.content
soup = BeautifulSoup(src, features="html.parser")
results = soup.find("a", class_="player_name_players_table get-tp`enter code here`")
print(results)

Beautifulsoup requests.get() redirecting from mentioned url

I use the mentioned code to scrape a specific page
from bs4 import BeautifulSoup
import requests
url = "https://www.mychoize.com/self-drive-car-rentals-pune/cars"
page = requests.get(url=)
print(page.history)
for resp in page.history:
print(resp.status_code, resp.url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div', class_ = "product-box")
for list in lists:
title = list.find('h3', class_ = "margin-o ng-binding")
#print(title)
But it keeps scraping the homepage('https://www.mychoize.com').
In order to stop it from redirecting to homepage I tried the following code to explore the response history
from bs4 import BeautifulSoup
import requests
url = "https://www.mychoize.com/self-drive-car-rentals-pune/cars"
page = requests.get(url ,allow_redirects=True)
print(page.history)
for resp in page.history:
print(resp.status_code, resp.url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div', class_ = "product-box")
for list in lists:
title = list.find('h3', class_ = "margin-o ng-binding")
#print(title)
I obtained the following output
[<Response [302]>, <Response [301]>]
302 https://www.mychoize.com/self-drive-car-rentals-pune/cars
301 http://www.mychoize.com/
How do I prevent it from redirecting?

Unable to scrape a table

I'm attempting to scrape the data from a table on the following website: https://droughtmonitor.unl.edu/DmData/DataTables.aspx
import requests
from bs4 import BeautifulSoup
url = 'https://droughtmonitor.unl.edu/DmData/DataTables.aspx'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
drought_table = soup.find('table', {'id':'datatabl'}).find('tbody').find_all('tr')
for some reason I am getting no outputs. I've tried to use pandas for the same job
import pandas as pd
url = 'https://droughtmonitor.unl.edu/DmData/DataTables.aspx'
table = pd.read_html(url)
df = table[0]
But also ended up getting an empty dataframe.
What could be causing this?

By checking network tool of browser it's obvious site uses Fetch/XHR to load table in another request.
Image: network monitor
You can use this code to get table data:
import requests
import json
headers = {
'Content-Type': 'application/json; charset=utf-8',
}
params = (
('area', '\'conus\''),
('statstype', '\'1\''),
)
response = requests.get(
'https://droughtmonitor.unl.edu/DmData/DataTables.aspx/ReturnTabularDMAreaPercent_national',
headers=headers, params=params
)
table = json.loads(response.content)
# Code generated by https://curlconverter.com/

Why is this CSS selector returning no results?

I am following along with a webscraping example in Automate-the-boring-stuff-with-python but my CSS selector is returning no results
import bs4
import requests
import sys
import webbrowser
print("Googling ...")
res = requests.get('https://www.google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
linkelems = soup.find_all(".r a")
numopen = min(5, len(linkelems))
for i in range(numopen):
webbrowser.open('https://google.com' + linkelems[i].get('href'))
Has google since modified how they store search links ?
From inspecting the search page elements I see no reason this selector would not work.

There are two problems:
1.) Instead of soup.find_all(".r a") use soup.select(".r a") Only .select() method accepts CSS selectors
2.) Google page needs that you specify User-Agent header to return correct page.
import bs4
import sys
import requests
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
print("Googling ...")
res = requests.get('https://www.google.com/search?q=' + ' '.join(sys.argv[1:]), headers=headers)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
linkelems = soup.select(".r a")
for a in linkelems:
print(a.text)
Prints (for example):
Googling ...
Tree - Wikipediaen.wikipedia.org › wiki › Tree
... and so on.

A complimentary answer to Andrej Kesely's answer.
If you don't want to deal with figuring out what selectors to use or how to bypass blocks from Google, then you can try to use Google Search Engine Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that bypass blocks, data extraction, and more is already done for the end-user. All that needs to be done is just to iterate over structured JSON and get the data you want.
Example code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google", # search engine
"q": "fus ro dah", # query
"api_key": os.getenv("API_KEY"), # environment variable with your API-KEY
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
link = result['link']
print(link)
------------
'''
https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)
https://knowyourmeme.com/memes/fus-ro-dah
https://en.uesp.net/wiki/Skyrim:Unrelenting_Force
https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah
https://www.nexusmods.com/skyrimspecialedition/mods/4889/
https://www.nexusmods.com/skyrimspecialedition/mods/14094/
https://tenor.com/search/fus-ro-dah-gifs
'''
Disclaimer, I work for SerpApi.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Web Scraping Yahoo Finance Recommendation Rating - web-scraping

Related

Unable to scrape Craigslist with Beautifulsoup

Newbie, Scraping Issue , FUTBIN web scraping issue

Beautifulsoup requests.get() redirecting from mentioned url

Unable to scrape a table

Why is this CSS selector returning no results?

Categories

Resources