I'm Bart and I am new into Python and this is my first post here.
As a fan of whisky I wanted to scrape some shops to give me recent deals on whisky, however, I stuck with Asda's page. I browsed here for ages but without any luck hence my post.
Thank you.
Browser is opening, and closing as expected.
below is my creation:
Import libraries
# import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
# import pandas as pd
# import requests
from selenium.webdriver.firefox.options import Options as FirefoxOptions
# specify url
#url = "https://groceries.asda.com/product/whisky/glenmorangie-the-original-single-malt-scotch-whisky/68303869"
url = "https://groceries.asda.com/search/whisky/1/relevance-desc/so-false/Type%3A3612046177%3AMalt%20Whisky"
# run webdriver with headless option
options = FirefoxOptions()
driver = webdriver.Firefox(options=options)
options.add_argument('--headless')
# get page
driver.get(url)
# execute script to scroll down the page
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;')
# sleep for 30s
time.sleep(30)
# close driver
driver.close()
# find element by xpath
results = driver.find_elements_by_xpath("//*[#id='componentsContainer']//*[#id='listingsContainer']//*[#class='product active']//*[#class='title productTitle']")
"""soup = BeautifulSoup(browser.page_source, 'html.parser')"""
print('Number of results', len(results))
Here is the output.
Traceback (most recent call last):
File "D:/PycharmProjects/Giraffe/asda.py", line 29, in <module>
results = driver.find_elements_by_xpath("//*[#id='componentsContainer']//*[#id='listingsContainer']//*[#class='product active']//*[#class='title productTitle']")
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 410, in find_elements_by_xpath
return self.find_elements(by=By.XPATH, value=xpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1007, in find_elements
'value': value})['value'] or []
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: Tried to run command without establishing a connection
Process finished with exit code 1
I tried to stick to the way you have already written. Do not go for hardcoded delay as that is always inconsistent. Try to opt for Explicit Wait. That said this is how you can get the result:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://groceries.asda.com/search/whisky"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get(url)
item = wait.until(EC.presence_of_element_located((By.XPATH, "//h3[#class='co-product-list__title']")))
driver.execute_script("arguments[0].scrollIntoView();",item)
results = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//li[contains(#class,'co-item')]//*[#class='co-product__title']/a")))
print('Number of results:', len(results))
driver.quit()
Output:
Number of results: 61
Related
I need to run a websocket server on ESP32 and the official example raises the following exception when I connect from any client:
MPY: soft reboot
Network config: ('192.168.0.200', '255.255.255.0', '192.168.0.1', '8.8.8.8')
b'Sec-WebSocket-Version: 13\r\n'
b'Sec-WebSocket-Key: k5Lr79cZgBQg7irI247FMw==\r\n'
b'Connection: Upgrade\r\n'
b'Upgrade: websocket\r\n'
b'Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits\r\n'
b'Host: 192.168.0.200\r\n'
b'\r\n'
Finished webrepl handshake
Task exception wasn't retrieved
future: <Task> coro= <generator object 'echo' at 3ffe79b0>
Traceback (most recent call last):
File "uasyncio/core.py", line 1, in run_until_complete
File "main.py", line 22, in echo
File "uasyncio/websocket/server.py", line 60, in WSReader
AttributeError: 'Stream' object has no attribute 'ios'
My micropython firmware and libraries:
Micropython firmware: https://micropython.org/resources/firmware/esp32-idf3-20200902-v1.13.bin
Pip libraries installed: micropython-ulogging, uasyncio.websocket.server
My main.py:
import network
import machine
sta_if = network.WLAN(network.STA_IF)
sta_if.active(True)
sta_if.ifconfig(('192.168.0.200', '255.255.255.0', '192.168.0.1', '8.8.8.8'))
if not sta_if.isconnected():
print('connecting to network...')
sta_if.connect('my-ssid', 'my-password')
while not sta_if.isconnected():
machine.idle() # save power while waiting
print('Network config:', sta_if.ifconfig())
# from https://github.com/micropython/micropython-lib/blob/master/uasyncio.websocket.server/example_websock.py
import uasyncio
from uasyncio.websocket.server import WSReader, WSWriter
def echo(reader, writer):
# Consume GET line
yield from reader.readline()
reader = yield from WSReader(reader, writer)
writer = WSWriter(reader, writer)
while 1:
l = yield from reader.read(256)
print(l)
if l == b"\r":
await writer.awrite(b"\r\n")
else:
await writer.awrite(l)
import ulogging as logging
#logging.basicConfig(level=logging.INFO)
logging.basicConfig(level=logging.DEBUG)
loop = uasyncio.get_event_loop()
loop.create_task(uasyncio.start_server(echo, "0.0.0.0", 80))
loop.run_forever()
loop.close()
MicroPython 1.13 implements asyncio v3 which has a number of breaking changes compared to the 3 year old sample referenced.
I suggest you refer to Peter Hinch's excellent documentation on asyncio,
and the asyncio V3 tutorial
I encountered the same problem. I looked at the old implementation of Stream class [1] and the new one [2].
It seems to me, that you need to edit server.py from uasyncio/websocket/.
You can download the files from [3] to your PC. Then at the bottom of the file replace the two instances of "reader.ios" by "reader.s".
Save the file to your ESP32 and it should work. Of course you need to use "from server import WSReader, WSWriter" instead of "from uasyncio.websocket.server import WSReader, WSWriter".
[1] https://github.com/pfalcon/pycopy-lib/blob/master/uasyncio/uasyncio/__init__.py
[2] https://github.com/micropython/micropython/blob/master/extmod/uasyncio/stream.py
[3] https://pypi.org/project/micropython-uasyncio.websocket.server/#files
https://github.com/pfalcon/pycopy-lib/tree/master/uasyncio has a recent (may'21) sample that should also work on standard MicroPython.
or checkout https://awesome-micropython.com under web servers
I am working on a design pattern to make my python unittest as a POM, so far I have written my page classes in modules HomePageObject.py,FilterPageObject.py, my base class (for common stuff)TestBase in BaseTest.py, my testcase modules are TestCase1.py and TestCase2.py and one runner module runner.py.
In runner class i am using loader.getTestCaseNames to get all the tests from a testcase class of a module. In both the testcase modules the name of the test class is same 'Test' and also the method name is same 'testName'
Since the names are confilicting while importing it in runner, only one test is getting executed. I want python to scan all the modules that i specify for tests in them and run those even the name of classes are same.
I got to know that nose might be helpful in this, but not sure how i can implement it here. Any advice ?
BaseTest.py
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ChromeOptions
import unittest
class TestBase(unittest.TestCase):
driver = None
def __init__(self,testName,browser):
self.browser = browser
super(TestBase,self).__init__(testName)
def setUp(self):
if self.browser == "firefox":
TestBase.driver = webdriver.Firefox()
elif self.browser == "chrome":
options = ChromeOptions()
options.add_argument("--start-maximized")
TestBase.driver = webdriver.Chrome(chrome_options=options)
self.url = "https://www.airbnb.co.in/"
self.driver = TestBase.getdriver()
TestBase.driver.implicitly_wait(10)
def tearDown(self):
self.driver.quit()
#staticmethod
def getdriver():
return TestBase.driver
#staticmethod
def waitForElementVisibility(locator, expression, message):
try:
WebDriverWait(TestBase.driver, 20).\
until(EC.presence_of_element_located((locator, expression)),
message)
return True
except:
return False
TestCase1.py and TestCase2.py (same)
from airbnb.HomePageObject import HomePage
from airbnb.BaseTest import TestBase
class Test(TestBase):
def __init__(self,testName,browser):
super(Test,self).__init__(testName,browser)
def testName(self):
try:
self.driver.get(self.url)
h_page = HomePage()
f_page = h_page.seachPlace("Sicily,Italy")
f_page.selectExperience()
finally:
self.driver.quit()
runner.py
import unittest
from airbnb.TestCase1 import Test
from airbnb.TestCase2 import Test
loader = unittest.TestLoader()
test_names = loader.getTestCaseNames(Test)
suite = unittest.TestSuite()
for test in test_names:
suite.addTest(Test(test,"chrome"))
runner = unittest.TextTestRunner()
result = runner.run(suite)
Also even that one test case is getting passed, some error message is coming
Ran 1 test in 9.734s
OK
Traceback (most recent call last):
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\runfiles.py", line 275, in <module>
main()
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\runfiles.py", line 97, in main
return pydev_runfiles.main(configuration) # Note: still doesn't return a proper value.
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\_pydev_runfiles\pydev_runfiles.py", line 874, in main
PydevTestRunner(configuration).run_tests()
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\_pydev_runfiles\pydev_runfiles.py", line 773, in run_tests
all_tests = self.find_tests_from_modules(file_and_modules_and_module_name)
File "F:\eclipse-jee-neon-3-win32\eclipse\plugins\org.python.pydev.core_6.3.3.201805051638\pysrc\_pydev_runfiles\pydev_runfiles.py", line 629, in find_tests_from_modules
suite = loader.loadTestsFromModule(m)
File "C:\Python27\lib\unittest\loader.py", line 65, in loadTestsFromModule
tests.append(self.loadTestsFromTestCase(obj))
File "C:\Python27\lib\unittest\loader.py", line 56, in loadTestsFromTestCase
loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
TypeError: __init__() takes exactly 3 arguments (2 given)
I did this by searching for all the modules of test classes with a pattern and then used __import__(modulename) and called its Test class with desired parameters,
Here is my runner.py
import unittest
import glob
loader = unittest.TestLoader()
suite = unittest.TestSuite()
test_file_strings = glob.glob('Test*.py')
module_strings = [str[0:len(str)-3] for str in test_file_strings]
for module in module_strings:
mod = __import__(module)
test_names =loader.getTestCaseNames(mod.Test)
for test in test_names:
suite.addTest(mod.Test(test,"chrome"))
runner = unittest.TextTestRunner()
result = runner.run(suite)
This worked but still looking for some organized solutions.
(Not sure why second time its showing Ran 0 tests in 0.000s )
Finding files... done.
Importing test modules ... ..done.
----------------------------------------------------------------------
Ran 2 tests in 37.491s
OK
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
I am pretty new to Python Beautiful Soup and I don't have much knowledge about html or js. I tried to use bs4 to download all xls files in this page, but it seems that bs4 cannot find the links under "attachment" section. Could someone help me out?
My current code is:
"""
Scrapping of all county-level raw data from
http://www.countyhealthrankings.org for all years. Data stored in RawData
folder.
Code modified from https://null-byte.wonderhowto.com/how-to/download-all-
pdfs-webpage-with-python-script-0163031/
"""
from bs4 import BeautifulSoup
import urlparse
import urllib2
import os
import sys
"""
Get all links
"""
def getAllLinks(url):
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read(),"html.parser")
links = soup.find_all('a', href=True)
return links
def download(links):
for link in links:
#raw_input("Press Enter to continue...")
#print link
#print "------------------------------------"
#print os.path.splitext(os.path.basename(link['href']))
#print "------------------------------------"
#print os.path.splitext(os.path.basename(link['href']))[1]
suffix = os.path.splitext(os.path.basename(link['href']))[1]
if os.path.splitext(os.path.basename(link['href']))[1] == '.xls':
print link #cannot find anything
currentLink = urllib2.urlopen(link)
links =
getAllLinks("http://www.countyhealthrankings.org/app/iowa/2017/downloads")
download(links)
(By the way, my desired link looks like this.)
Thanks!
This seems to be one of the tasks for which BeautifulSoup (in itself, at least) is inadequate. You can, however, do it with selenium.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.countyhealthrankings.org/app/iowa/2017/downloads')
>>> links = driver.find_elements_by_xpath('.//span[#class="file"]/a')
>>> len(links)
30
>>> for link in links:
... link.get_attribute('href')
...
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/CHR2017_IA.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2017%20County%20Health%20Rankings%20Iowa%20Data%20-%20v1.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2017%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2017%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/CHR2016_IA.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2016%20County%20Health%20Rankings%20Iowa%20Data%20-%20v3.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2016%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2016%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/CHR2015_IA.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2015%20County%20Health%20Rankings%20Iowa%20Data%20-%20v3.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2015%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2015%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/CHR2014_IA_v2.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2014%20County%20Health%20Rankings%20Iowa%20Data%20-%20v6.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2014%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2014%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/states/CHR2013_IA.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2013%20County%20Health%20Ranking%20Iowa%20Data%20-%20v1_0.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2013%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2013%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/states/CHR2012_IA.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2012%20County%20Health%20Ranking%20Iowa%20Data%20-%20v2.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2012%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2012%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/states/CHR2011_IA.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2011%20County%20Health%20Ranking%20Iowa%20Data%20-%20v2.xls'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2011%20Health%20Outcomes%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2011%20Health%20Factors%20-%20Iowa.png'
'http://www.countyhealthrankings.org/sites/default/files/states/CHR2010_IA_0.pdf'
'http://www.countyhealthrankings.org/sites/default/files/state/downloads/2010%20County%20Health%20Ranking%20Iowa%20Data%20-%20v2.xls'
Code:
from bs4 import BeautifulSoup
import urllib.request # i use this one instead of urllib
import re
import sys
import bs4 as bs
f = open('recipe.txt', 'r')
for line in f.readlines():
wholeline = line.strip()
sauce = urllib.request.urlopen(wholeline).read()
soup = bs.BeautifulSoup(sauce, 'html.parser')
for div in soup.find_all('div', class_='nutritional_info'):
outfile = open('C:/Diabetes Scraper/ingredients.txt', 'a')
outfile.write(div.txt + '\n')
When I run this code it give me an error message saying:
File "C:/Diabetes Scraper/stackoverflow_scraper.py", line 14, in <module>
outfile.write(str(div.txt + '\n'))
builtins.TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
When I run:
outfile.write(str(div.txt))
it gives me a bunch of chinese symbols in the file.
When I run :
outfile.write(div.txt)
It gives me the error:
File "C:/Diabetes Scraper/stackoverflow_scraper.py", line 14, in <module>
outfile.write(div.txt)
builtins.TypeError: must be str, not None
So how do I fix this problem? Should I write it into another type of file or scrape the data some other way or...?
An example link for scraping the data:
http://www.diabetes.org/mfa-recipes/recipes/2017-03-grilled-vegetable-sandwich.html
I created a Qt resource file with all my ui files inside of it, I compiled that with pyrcc4 command-line in a python file, and then I loaded the ui files using loadUi. Here an example:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import os
import sys
from PyQt4.QtCore import Qt, QFile
from PyQt4.uic import loadUi
from PyQt4.QtGui import QDialog
from xarphus.gui import ui_rc
# I import the compiled qt resource file named ui_rc
BASE_PATH = os.path.dirname(os.path.abspath(__file__))
#UI_PATH = os.path.join(BASE_PATH, 'gui', 'create_user.ui')
UI_PATH = QFile(":/ui_file/create_user.ui")
# I want to load those compiled ui files,
# so I just create QFile.
class CreateUser_Window(QDialog):
def __init__(self, parent):
QDialog.__init__(self, parent)
# I open the created QFile
UI_PATH.open(QFile.ReadOnly)
# I read the QFile and load the ui file
self.ui_create_user = loadUi(UI_PATH, self)
# After then I close it
UI_PATH.close()
Well its works fine, but I have a problem. When I open the GUI-window once, everything works fine. After closing the window I try to open the same GUI-window again, I get ja long traceback.
Traceback (most recent call last): File "D:\Dan\Python\xarphus\xarphus\frm_mdi.py", line 359, in
create_update_form self.update_form = Update_Window(self) File
"D:\Dan\Python\xarphus\xarphus\frm_update.py", line 135, in init
self.ui_update = loadUi(UI_PATH, self) File
"C:\Python27\lib\site-packages\PyQt4\uic__init__.py", line 238, in
loadUi return DynamicUILoader(package).loadUi(uifile, baseinstance,
resource_suffix) File
"C:\Python27\lib\site-packages\PyQt4\uic\Loader\loader.py", line 71,
in loadUi return self.parse(filename, resource_suffix, basedir) File
"C:\Python27\lib\site-packages\PyQt4\uic\uiparser.py", line 984, in
parse document = parse(filename) File
"C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser) File
"C:\Python27\lib\xml\etree\ElementTree.py", line 657, in parse
self._root = parser.close() File
"C:\Python27\lib\xml\etree\ElementTree.py", line 1654, in close
self._raiseerror(v) File "C:\Python27\lib\xml\etree\ElementTree.py",
line 1506, in _raiseerror raise err xml.etree.ElementTree.ParseError:
no element found: line 1, column 0
Can everyone help me?
Maybe I have a solution, but I don't know if that is a perfectly pythonic.
Well, we know all python projects have an __ init __-file. We need it for initializing Python packages, right? Well I thought: Why not use this file? What did I do? I define in the __ init __ -file a function like so:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
from PyQt4.uic import loadUi
from PyQt4.QtCore import Qt, QFile
def ui_load_about(self):
uiFile = QFile(":/ui_file/about.ui")
uiFile.open(QFile.ReadOnly)
self.ui_about = loadUi(uiFile)
uiFile.close()
return self.ui_about
Now in my "About_Window"-class I do this:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import os
import sys
from PyQt4.QtCore import Qt, QFile
from PyQt4.uic import loadUi
from PyQt4.QtGui import QDialog
import __init__ as ui_file
class About_Window(QDialog):
def __init__(self, parent):
QDialog.__init__(self, parent)
self.ui_about = ui_file.ui_load_about(self)
You see I importe the package-file (__ init __-file) as ui_file and then I call the function and save the return of the function in the variable self.ui_about.
In my case I open the About_Window from a MainWindow(QMainWindow), and it looks like so:
def create_about_form(self):
self.ui_about = About_Window(self)
# Now when I try to show (show()-method) a window I get two windows
# The reason is: I open and load the ui files from compiled
# qt resorce file that was define in __init__-module.
# There is a function that opens the resource file, reads
# the ui file an closes and returns the ui file back
# That's the reason why I have commented out this method
#self.ui_about.show()
You see I commented out the show()-method. It works without this method. I only define the About_Window()-class. Well I know that isn't maybe the best solution, but it works. I can open the window again and again without traceback.
If you have a better solution or idea let me know :-)