Is it possible to create an automated python script/macro for a series of mouse clicks? The goal is to open a webpage, click button to open upload data window, and finally hit save button to crate a process.I am thinking of something equivalent to automated VBA macros which are recorded as operations are performed on sheets.
In past I have used pyautogui package for this activity but it requires hard coding of co-ordinates for mouse click and hence tedious to code.
Maybe try to use selenium with python...
Check the docs and examples.
Any easy example would be:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
To download a file with firefox try:
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Another option would be the usage of pythons webbrowser.
Automatetheboringstuff gives some good examples.
Related
hope you guys can help me with this as I'm kinda new to scrapy/web scraping in general and not really sure on how to proceed forward with this problem.
Essentially, I have 2 spiders:
audio_file_downloader_spider, This spider is going to:
check whether a particular webpage passed to it contains an audio file (.mp3/.wav, etc.)
Suppose that an audio file URL is found on the webpage, download the audio file to the local machine.
Suppose that there is NO audio file URL found on the webpage, please tell audio_recorder_spider to scrape the webpage
audio_recorder_spider, is going to use selenium chromedriver and it will:
Press the audio player play button on the webpage
Record the playback stream to an mp3 file (this is definitely doable)
Now... the problem I'm currently facing is..., how do we do something like this with scrapy?
Currently, I have both spiders ready and I can run the audio_file_downloader_spider with this command from the terminal (I'm using macOS):
scrapy crawl audio_file_downloader_spider \
-a url="<some_webpage_url>"
Now, I need to somehow tell scrapy to execute audio_recorder_spider on the same webpage url IF and ONLY IF there is no audio file URL detected on the webpage so that audio_recorder_spider can record the audio playback on the webpage.
Now I'm not too familiar with scrapy just yet but I did read the their item pipeline documentation. One of the examples they gave in their documentation shows a code which will automatically get a screenshot off of a URL using Splash and save that screenshot as a PNG file with a custom name. The code can be seen below:
import hashlib
from urllib.parse import quote
import scrapy
from itemadapter import ItemAdapter
from scrapy.utils.defer import maybe_deferred_to_future
class ScreenshotPipeline:
"""Pipeline that uses Splash to render screenshot of
every Scrapy item."""
SPLASH_URL = "http://localhost:8050/render.png?url={}"
async def process_item(self, item, spider):
adapter = ItemAdapter(item)
encoded_item_url = quote(adapter["url"])
screenshot_url = self.SPLASH_URL.format(encoded_item_url)
request = scrapy.Request(screenshot_url)
response = await maybe_deferred_to_future(spider.crawler.engine.download(request, spider))
if response.status != 200:
# Error happened, return item.
return item
# Save screenshot to file, filename will be hash of url.
url = adapter["url"]
url_hash = hashlib.md5(url.encode("utf8")).hexdigest()
filename = f"{url_hash}.png"
with open(filename, "wb") as f:
f.write(response.body)
# Store filename in item.
adapter["screenshot_filename"] = filename
return item
So this got me thinking, would it be possible to do the same thing but instead of using Splash and taking a screenshot of the webpage, I want to use Selenium and record the audio playback off of the URL.
Any help would be greatly appreciated, thanks in advance!
I am trying to scrape this website: https://www.casablanca-bourse.com/bourseweb/Societe-Cote.aspx?codeValeur=12200 the problem is i only want extract data from the tab "indicateurs clès" and i can't find a way to access to it in code source without clicking on it.
Indeed, i can't figure out the URL of this specific tab... i checked the code source and i found that there's a generated code that changed whenver i clicked on that tab
Any suggestions?
Thanks in advance
The problem is that this website uses AJAX to get the table in the "Indicateurs Clès", so it is requested from the server only when you click on the tab. To scrape the data, you should send the same request to the server. In other words, try to mimic the browser's behavior.
You can do it this way (for Chromium; for other browsers with DevTools it's pretty much similar):
Press F12 to open the DevTools.
Switch to the "Network" tab.
Select Fetch/XHR filter.
Click on the "Indicateurs Clès" tab on the page.
Inspect the new request(s) you see in the DevTools.
Once you find the request that returns the information you need ("Preview" and "Response"), right-click the request and select "Copy as cURL".
Go to https://curl.trillworks.com/
Select the programming language you're using for scraping
Paste the cURL to the left (into the "curl command" textarea).
Copy the code that appeared on the right and work with it. In some cases, you might need to inspect the request further and modify it.
In this particular case, the request data contains `__VIEWSTATE` and other info, which is used by the server to send only the data necessary to update the already existing table.
At the same time, you can omit everything but the __EVENTTARGET (the tab ID) and codeValeur. In such a case the server will return page XHTML, which includes the whole table. After that, you can parse that table and get all you need.
I don't know what tech stack you were initially going to use for scraping the website, but here is how you can get the tables with Python requests and BeautifulSoup4:
import requests
from bs4 import BeautifulSoup
params = (
('codeValeur', '12200'),
)
data = {
'__EVENTTARGET': 'SocieteCotee1$LBFicheTech',
}
response = requests.post('https://www.casablanca-bourse.com/bourseweb/Societe-Cote.aspx', params=params, data=data)
soup = BeautifulSoup(response.content)
# Parse XHTML to get the data you exactly need
Try to create an one-click button to import multiple tables from Oracle. Following is the code behind the On Click event of the button (with one table for now):
Private Sub Command0_Click()
If Not IsNull(DLookup("Name", "MSysObjects", "Name='FCR_LABOR_COST_SUMMARY1'")) Then
DoCmd.DeleteObject acTable, "FCR_LABOR_COST_SUMMARY1"
End If
DoCmd.RunSavedImportExport ("Import-FCR_LABOR_COST_SUMMARY1")
End Sub
Encountered the an error "Run-time '31602': The specification with the specified index does not exist. Specify a different index. 'Import-FCR_LABOR_COST_SUMMARY1'." when running "DoCmd.RunSavedImportExport"
The source table does not have any index on it. No need to have any index on the target table. Look like Access is trying to enforce an index on the target table. Is there anyway to turn this off? I'm new to Access and VB, please provide advice and directions on how to resolve this. Thanks.
To save a specication follow this document:
Create an import or export specification
1. Start the import or export operation from Access.
2. The import and export wizards are available on the External Data tab. The import wizards are in the Import & Link group, and the export wizards are in the Export group.
3. Follow the instructions in the wizard. After you click OK or Finish, and if Access successfully completes the operation, the Save Import Steps or Save Export Steps page appears in the wizard.
4. On the wizard page, click Save import steps or Save export steps to save the details of the operation as a specification.
5. Access displays an additional set of controls. This figure shows the dialog box with those controls available.
6. The Save Import Steps dialog box
In the Save as box, type a name for the specification.
In the Description box, type a description to help you or other users identify the operation at a later time.
7. To create an Outlook task that reminds you when it is time to repeat this operation, click Create Outlook Task.
8. Click Save Import or Save Export to save the specification. Access creates and stores the specification in the current database.
9. If you clicked Create Outlook Task on either the Save Import Steps or Save Export Steps page of the wizard, an Outlook Task window appears. Fill in the details of the task and then click Save & Close.
If the saved import or export specification you choose for the Saved Import Export Name argument is deleted after the macro is created, Access displays the following error message when the macro is run:
The specification with the specified index does not exist. Specify a different index. 'specification name'.
From: https://support.office.com/en-us/article/runsavedimportexport-macro-action-41c366d8-524e-4c7e-847d-c2cf7abb2049
I've been at this one for a few days, and no matter how I try, I cannot get scrapy to abstract text that is in one element.
to spare you all the code, here are the important pieces. The setup does grab everything else off the page, just not this text.
from scrapy.selector import Selector
start_url = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html"
#BASIC ITEM AND SPIDER YADA, SPARE YOU THE DETAILS
hxs = Selector(response)
response_css = response.css("body")
desc_data = hxs.xpath('//*[#id="DETAILS_TRUNC_TEXT"]//text()').extract()
desc_data2 = response_css.css('#DETAILS_TRUNC_TEXT::text').extract()
both return empty lists. Yes, I found the xpath and css selector via chrome, but the rest of them work just fine as I'm able to find other data on the site. Please help me find out why this isn't working.
To get the data you need to use any browser simulator like selenium so that It can catch the response of dynamically generated content. You need to put some delay to let the webpage load it's content fully. This is how you can go:
from selenium import webdriver
from scrapy import Selector
import time
driver = webdriver.Chrome()
URL = "https://www.tripadvisor.com/VacationRentalReview-g34416-d12428323-On_the_Beach_Wide_flat_beach_Sunsets_Gulf_view_Sharks_teeth_Shells_Fish-Manasota_Key_F.html"
driver.get(URL)
time.sleep(5) #If you take out this line you won't get anything because the content of that page take some time to get loaded.
sel = Selector(text=driver.page_source)
item = sel.css('#DETAILS_TRUNC_TEXT::text').extract() #It is working
item_ano = sel.xpath('//*[#id="DETAILS_TRUNC_TEXT"]//text()').extract() #It is also working
print(item, item_ano)
driver.quit()
I tried your xpath and css in scrapy shell, and got nothing also.
Then I used view(response) command and found out the site is dynamic.
Here is a screenshot:
You can see that the details under Overview doesn't show up, and that's why no matter how you try, you still got nothing.
Solutions: Try Selenium (check the solution that SIM provided in the last answer) or Splash.
Good Luck. :)
I wish to simulate a right click on a file. This is done by opening a Windows Explorer window and then right clicking on it.
The main issue is finding the location of the file in Windows Explorer. I am currently using Autoit v3.3.8.1.
My code 's first line:
RunWait (EXPLORER.EXE /n,/e,/select,<filepath>)
The next step is the problem. Finding the coordinates of the file.
After that, right clicking at that coordinates (it seems to me at this time) is not a problem....
Some background:
OS: Windows 7 64-bit
Software Languages: C#, Autoit (for scripting)
The Autoit script is called by a code similar to that below:
Process p = new Process();
p.StartInfo.FileName = "AutoItScript.exe";
p.StartInfo.UseShellExecute = false;
p.Start();
The code is compiled into a console class file which is run at startup. The autoit script runs as the explorer window opens up.
It seems as though you are taking the wrong approach to the problem, so I'll answer what you are asking and what you should be asking.
First up though, that line of code is not valid, and is not what you want either. You want to automate the explorer window, and RunWait waits for the program to finish. Furthermore you want those items to be strings, that code would never work.
Finding the item in explorer
The explorer window is just a listview, and so you can use normal listview messages to find the coordinates of an item. This is done most simply by AutoIt's GUIListView library:
#include<GUIListView.au3>
Local $filepath = "D:\test.txt"
Local $iPid = Run("explorer.exe /n,/e,/select," & $filepath)
ProcessWait($iPid)
Sleep(1000)
Local $hList = ControlGetHandle("[CLASS:CabinetWClass]", "", "[CLASS:SysListView32; INSTANCE:1]")
Local $aClient = WinGetPos($hList)
Local $aPos = _GUICtrlListView_GetItemPosition($hList, _GUICtrlListView_GetSelectedIndices($hList))
MouseClick("Right", $aClient[0] + $aPos[0] + 4, $aClient[1] + $aPos[1] + 4)
As has already been mentioned, sending the menu key is definitely a better way than having to move the mouse.
Executing a subitem directly
This is how it should be done. Ideally you should never need an explorer window open at all, and everything can be automated in the background. This should always be what you aim to achieve, as AutoIt is more than capable in most cases. It all depends on what item you want to click. If it is one of the first few items for opening the file in various programs, then it is as simple as either:
Using ShellExecute, setting the verb parameter to whatever it is you want to do.
Checking the registry to find the exact command line used by the program. For this you will need to look under HKCR\.ext where ext is the file extension, the default value will be the name of another key in HKCR which has the actions and icon associated with the filetype. This is pretty well documented online, so google it.
If the action is not one of the program actions (so is built into explorer) then it is a little more complex. Usually the best way will be to look at task manager when you start the program and see what it runs. Other things can be found online, for example (un)zipping. Actions like copy, delete, rename, create shortcut, send to... They can all be done directly from AutoIt with the various File* functions.
With more information, it would be possible to give you more specific help.
First, you might want to look at the Microsoft Active Accessibility SDK. In particular look at this interface...
http://msdn.microsoft.com/en-us/library/accessibility.iaccessible.aspx
You can use this to walk the items in the control and find the one with the file name you are looking for and its screen location.
From there, maybe try something like this for simulating the right click.
How can I use automation to right-click with a mouse in Windows 7?
Once you have done the right click, use accessibility again to find the right option on the context menu.
Maybe there's an easier way, you should be able to cobble something together like this if you don't find one. Good luck!
Suppose I have a file named test.txt on D drive. It needs to right click for opening Context Menu. To do this, the following code should work:
Local $filepath = "D:\test.txt"
Local $iPid = Run("explorer.exe /n,/e,/select," & $filepath)
ProcessWait($iPid)
Sleep(1000)
Send('+{F10}')