I am interested in downloading data from a national public dataset referring to Vaccine in Italy.
https://app.powerbi.com/view?r=eyJrIjoiMzg4YmI5NDQtZDM5ZC00ZTIyLTgxN2MtOTBkMWM4MTUyYTg0IiwidCI6ImFmZDBhNzVjLTg2NzEtNGNjZS05MDYxLTJjYTBkOTJlNDIyZiIsImMiOjh9
In particular I am interested in downloading the last table.
I tried to use a scraping model via HTML, but it seems that data is not stored directly into the HTML source page.
Then I thought to use the code below in Python 3.9:
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import time
import pyautogui
import win32api
import win32gui
from win32con import *
driver = webdriver.Chrome()
LINK = ".tableEx .bodyCells div:nth-child(1) > .pivotTableCellWrap:nth-child("
TO_ADD ="1)"
data = []
action = ActionChains(driver)
driver.get("https://app.powerbi.com/view?r=eyJrIjoiMzg4YmI5NDQtZDM5ZC00ZTIyLTgxN2MtOTBkMWM4MTUyYTg0IiwidCI6ImFmZDBhNzVjLTg2NzEtNGNjZS05MDYxLTJjYTBkOTJlNDIyZiIsImMiOjh9")
driver.set_window_size(784, 835)
driver.execute_script("window.scrollTo(0,0)")
driver.execute_script("window.scrollTo(0,0)")
for i in range(1,293):
print(i)
if i%10==0:
win32api.SetCursorPos((1625,724))
win32api.mouse_event(MOUSEEVENTF_WHEEL, x, y, -3, 0)
time.sleep(6)
else:
pass
action = ActionChains(driver)
TO_ADD = str(i)+")"
action.move_to_element(driver.find_element(By.CSS_SELECTOR, LINK+TO_ADD)).perform()
action.context_click().send_keys(Keys.ARROW_DOWN).send_keys(Keys.ARROW_DOWN).perform()
time.sleep(0.5)
x,y = pyautogui.locateCenterOnScreen(r'C:\Users\migli\Documents\Learning\copy.png')
pyautogui.moveTo(x,y,0.1)
pyautogui.click()
time.sleep(0.5)
x,y = pyautogui.locateCenterOnScreen(r'C:\Users\migli\Documents\Learning\copiaselez.png')
pyautogui.moveTo(x,y,0.1)
pyautogui.click()
data.append(pyperclip.paste())
Images for clicking:
copiaselez
copy
This seems to achieve what I am trying to do. But then it blocks around the 14th cycle. I don't know why. Maybe I should scroll down the page in some manner, but I tried to do it manually during the code inserting an sleep time around 10th cycle, but it gets error as well.
I also thought using an API but it seems not to exist one.
Any idea is accepted.
Related
I want to include in the argument an object name that contains a specific name from the ui file.
I have created pickers in QtDesigner and have imported them for Maya 2022.
It assigned a command to each button. But I realized I needed a huge number of commands.
It's just this scene.
from PySide2 import QtWidgets
from PySide2 import QtGui
from PySide2 import QtCore
from PySide2.QtUiTools import QUiLoader
from maya.app.general.mayaMixin import MayaQWidgetBaseMixin
import shiboken2 as shiboken
UIFILEPATH = 'D:/MAYA/pyside_pick/ui/PicsTest5.ui'
class MainWindow(MayaQWidgetBaseMixin,QtWidgets.QMainWindow):
def __init__(self,parent=None):
super(MainWindow,self).__init__(parent)
self.UI = QUiLoader().load(UIFILEPATH)
self.setWindowTitle(self.UI.windowTitle())
self.setCentralWidget(self.UI)
#PushButton
self.UI.pushButton_sphere.clicked.connect(self.PushedCmd)
#Comand
def PushedCmd(self):
bTEXT = str(self.UI.pushButton_sphere.text())
cmds.select('pSphere1')
print(bTEXT)
def main():
window = MainWindow()
window.show()
if __name__ == '__main__':
main()
If it is given an object name like above, it certainly works.
But there are commands that need to be directed only to objects containing "pushButton_".
I tried like this
button1 = self.findChild(QtWidgets.QPushButton, 'pushButton_*')
self.button1.clicked.connect(self.testPrint)
def testPrint(self):
print(self.button1)
I meant to define button1 as a QPushButton containing 'pushButton _' and print its name when clicked.
Unfortunately, I learned that asterisks can not be used as searches.
Then, I tried like this
button1 = self.findChild(QtWidgets.QPushButton, 'pushButton_sphere')
self.button1.clicked.connect(self.testPrint)
def testPrint(self):
print(self.button1)
The result was written as (PySide2.QtWidgets.QPushButton)already deleted.
This is probably rudimentary, but being Jap I couldn't find a workable solution.
Tell me how to output the object name when I press the button, please.
Also tell me if the notation is wrong.
//a[contains(#class,'inprogress')] - selects active matches
//span[contains(#itemprop,'name')] - selects all matches
How do I select only matches that aren't active? (atctives are red colored)
https://www.fudbal91.com/previews/2022-03-30
You can use not() like
//a[not(contains(#class,'inprogress'))]
And if you want use both then use then both together
//a[not(contains(#class,'inprogress'))]//span[contains(#itemprop,'name')]
from selenium import webdriver
from selenium.webdriver.common.by import By
#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager
import time
url = 'https://www.fudbal91.com/previews/2022-03-30'
#driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get(url)
time.sleep(2)
all_items = driver.find_elements(By.XPATH, '//a[not(contains(#class,"inprogress"))]//span[contains(#itemprop,"name")]')
print('len(all_items):', len(all_items))
for item in all_items:
print(item.text)
I am trying to scrape the price data from this website: https://fuelkaki.sg/home
However, the data does not appear to be in the HTML code of the page. Upon inspecting, the data seems to be nested in the tag, for instance under Caltex for the retailer name, and similarly under multiple nested tags for the price data, which I am unable to scrape with the following code (there are no results to be found).
Any help would be much appreciated.
import requests
from bs4 import BeautifulSoup
URL = 'https://fuelkaki.sg/home'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find('div', class_='fuel-name')
The table is behind JS (JavaScript) so BeautifulSoup won't see it.
Here's how I'd do it:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)
url = "https://fuelkaki.sg/home"
driver.get(url)
time.sleep(3)
element = driver.find_element_by_xpath('//*[#class="table"]')
print(element.text)
driver.close()
Output:
Diesel
92
95
98
Others
(V-Power, etc)
Caltex
30 September 2020, 02:05pm
S$ 1.73
S$ 2.02
S$ 2.06
N.A.
S$ 2.61
and so on...
EDIT:
If you want the table in a Dataframe try this:
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = False
driver = webdriver.Chrome(options=options)
url = "https://fuelkaki.sg/home"
driver.get(url)
time.sleep(3)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser").select_one(".table")
df = pd.read_html(str(soup))
df = pd.concat(df).rename(columns={"Unnamed: 0": ""})
df.to_csv("fuel_data.csv", index=False)
driver.close()
Outputs a .csv file with the table's data:
I am trying to extract some information from HTML of a web page.
But neither regex method nor list comprehension method works.
At http://bitly.kr/RWz5x, there is some key called encparam enclosed in getjason from a javascript tag which is 49th from all script elements of the page.
Thank you for your help in advance.
sam = requests.get('http://bitly.kr/RWz5x')
#html = sam.text
html=sam.content
soup = BeautifulSoup(html, 'html.parser')
scripts = soup.find_all('script')
#your_script = [script for script in scripts if 'encparam' in str(script)][0]
#print(your_script)
#print(scripts)
pattern = re.compile("(\w+): '(.*?)'")
fields = dict(re.findall(pattern, scripts.text))
Send your request to the following url which you can find in the sources tab:
import requests
from bs4 import BeautifulSoup as bs
import re
res = requests.get("https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930")
soup = bs(res.content, 'lxml')
r = re.compile(r"encparam: '(.*)'")
data = soup.find('script', text=r).text
encparam = r.findall(data)[0]
print(encparam)
It is likely you can avoid bs4 altogether:
import requests
import re
r = requests.get("https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930")
p = re.compile(r"encparam: '(.*)'")
encparam = p.findall(r.text)[0]
print(encparam)
If you actually want the encparam part in the string:
import requests
import re
r = requests.get("https://navercomp.wisereport.co.kr/v2/company/c1010001.aspx?cmp_cd=005930")
p = re.compile(r"(encparam: '\w+')")
encparam = p.findall(r.text)[0]
print(encparam)
I am trying to get the movie names from https://www.sunnxt.com/movie/inside/ website. When I go to inspect element it shows me elements with movie names, but when I perform css selector or xpath expression, it did not give the movie name.
When I click on view source, I saw the code is different there. and all the movie data is placed between <script></script> tag.
Please help me to get all movie name.
To retrieve the movie names you have to induce WebDriverWait inconjunction with expected_conditions as visibility_of_all_elements_located as follows :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get("https://www.sunnxt.com/movie/inside/")
movieList = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//h2[#class='title']//following::div[#class='home_movie_list_wrap']//a//h2")))
for item in movieList:
print(item.text)
driver.quit()
Console Output :
Gulaebaghavali
KALAKALAPPU 2
Motta Shiva Ketta Shiva
Annadurai
Aramm
Kaththi Sandai
Meesaya Murukku
Spyder
Sathriyan
Bogan
Brindavanam
Vivegam
Bairavaa
Karuppan
Muthina Kathirika
Dharmadurai
Thozha
Pichaikkaran
Devi
Aranmanai 2
Jackson Durai
Hello Naan Pei Pesuren
Kathakali
Kodi