How to web-scrape and find form element - web-scraping

Trying to scrape the following site: https://israeldrugs.health.gov.il/#!/byDrug
You need to enter a search term in the form and press the blue button on the left.
However, failed with bs4 because it cannot find the form element.
Thanks for your help.
The previous answer to the same question does not work anymore.

If you go to view-source:https://israeldrugs.health.gov.il/#!/byDrug in your browser you can see the initial HTML which you receive when you make a request to the page. This is most likely the HTML you are working on with bs4.
It seems like the form element is only inserted after the pageload with javascript.
What you can do is using a tool like Selenium. It is designed to interact with a webpage like a user would. So you literally tell it to open a page, find the form element & input something and then press that button.
When you have installed Selenium your code will look something like:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
driver = webdriver.Chrome(executable_path="chromedriver.exe")
driver.get(f"https://israeldrugs.health.gov.il/#!/byDrug")
time.sleep(1) # maybe needed to let the form load
form = driver.find_elements_by_css_selector("form")[0]
inp = "Your search term"
form.send_keys(inp)
form.send_keys(Keys.ENTER) # It seems like you don't have to press the button, if you input ENTER into the form
time.sleep(1) # time to load the results
# get the new html of the page after if was updated with the search results
# from here you can work with bs4 again (though you can do all the things you do in bs4 also with Selenium)
html = driver.page_source

Related

#document iFrame access using Playwright

How to access the virutal document #document contents using playwright ? I tried using iFrame and Pagelocator. However, I am unable to reach document location.
Is there an option in Playwright to approach this?
This is the page URL - https://sites.google.com/view/pinnednote/home
Yes, you have frame_locator for python or frameLocator for javascript.
Here an example in your page (Using python)
# Import needed libs
from playwright.sync_api import sync_playwright
# We initiate the playwright page
p = sync_playwright().start()
browser = p.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
# Navigate
page.goto("https://sites.google.com/view/pinnednote/home")
# We get the first iframe
iframe1 = page.frame_locator("//iframe[#jsname='WMhH6e']")
# We get the iframe inside the first iframe
iframe2 = iframe1.frame_locator("#innerFrame")
# We get the iframe inside the second iframe
iframe3 = iframe2.frame_locator("#userHtmlFrame")
# We print the title of this third iframe
print(iframe3.locator("//title").inner_text())
The page has a lot of iframes being honest.
About managing frames with playwright: Playwirght frameLocator

Print Friendly Page

So I would like to be able to have a print button for entries in our database so users can print an entry via a print friendly "form".
My thought was to create a separate page, add labels and have those labels pull the relevant information.
I know I can add the open widget information via this code:
app.datasources.ModelName.selectKey(widget.datasource.item._key);
app.showPage(app.pages.TestPrint);
But I'm running into a few problems:
I can't get the page to open in a new window. Is this possible?
window.open(app.pages.TestPrint);
Just gives me a blank page. Does the browser lose the widget source once the new window opens?
I can't get the print option (either onClick or onDataLoad) to print JUST the image (or widget). I run
window.print();
And it includes headers + scroll bars. Do I need to be running a client side script instead?
Any help would be appreciated. Thank you!
To get exactly what you'd want you'd have to do a lot of work.
Here is my suggested, simpler answer:
Don't open up a new tab. If you use showPage like you mention, and provide a "back" button on the page to go back to where you were, you'll get pretty much everything you need. If you don't want the back to show up when you print, then you can setVisibility(false) on the button before you print, then print, then setVisibility(true).
I'll give a quick summary of how you could do this with a new tab, but it's pretty involved so I can't go into details without trying it myself. The basic idea, is you want to open the page with a full URL, just like a user was navigating to it.
You can use #TestPrint to indicate which page you want to load. You also need the URL of your application, which as far as I can remember is only available in a server-side script using the Apps Script method: ScriptApp.getService().getUrl(). On top of this, you'll probably need to pass in the key so that your page knows what data to load.
So given this, you need to assemble a url by calling a server script, then appending the key property to it. In the end you want a url something like:
https://www.script.google.com/yourappaddress#TestPage?key=keyOfYourModel.
Then on TestPage you need to read the key, and load data for that key. (You can read the key using google.script.url).
Alternatively, I think there are some tricks you can play by opening a blank window and then writing directly to its DOM, but I've never tried that, and since Apps Script runs inside an iframe I'm not sure if it's possible. If I get a chance I'll play with it and update this answer, but for your own reference you could look here: create html page and print to new tab in javascript
I'm imagining something like that, except that your page an write it's html content. Something like:
var winPrint = window.open('', '_blank', 'left=0,top=0,width=800,height=600,toolbar=0,scrollbars=0,status=0');
winPrint.document.write(app.pages.TestPage.getElement().innerHTML);
winPrint.document.close();
winPrint.focus();
winPrint.print();
winPrint.close();
Hope one of those three options helps :)
So here is what I ended up doing. It isn't elegant, but it works.
I added a Print Button to a Page Fragment that pops up when a user edits a database entry.
Database Edit Button code:
app.datasources.ModelName.selectKey(widget.datasource.item._key);
app.showDialog(app.pageFragments.FragmentName);
That Print Button goes to a different (full) Page and closes the Fragment.
Print Button Code:
app.datasources.ModelName.selectKey(widget.datasource.item._key);
app.showPage(app.pages.ModelName_Print);
app.closeDialog();
I made sure to make the new Print Page was small enough so that Chrome fits it properly into a 8.5 x 11" page (728x975).
I then created a Panel that fills the page and populated the page with Labels
#datasource.item.FieldName
I then put the following into the onDataLoad for the Panel
window.print();
So now when the user presses the Print Button in the Fragment they are taken to this new page and after the data loads they automatically get a print dialog.
The only downside is that after printing the user has to use a back button I added to return to the database page.
1.
As far as I know, you cannot combine window.open with app.pages.*, because
window.open would require url parameter at least, while app.pages.* is essentially an internal routing mechanism provided by App Maker, and it returns page object back, suitable for for switching between pages, or opening dialogs.
2.
You would probably need to style your page first, so like it includes things you would like to have printed out. To do so please use #media print
ex: We have a button on the page and would like to hide it from print page
#media print {
.app-NewPage-Button1 {
display : none;
}
}
Hope it helps.
1. Here is how it is done, in a pop up window, without messing up the current page (client script):
function print(widget, title){
var content=widget.getElement().innerHTML;
var win = window.open('', 'printWindow', 'height=600,width=800');
win.document.write('<head><title>'+title+'/title></head>');
win.document.write('<body>'+content+'</body>');
win.document.close();
win.focus();
win.print();
win.close();
}
and the onclick handler for the button is:
print(widget.root.descendants.PageFragment1, 'test');
In this example, PageFragment1 is a page fragment on the current page, hidden by adding a style with namehidden with definition .hidden{display:none;} (this is different than visible which in App Maker seems to remove the item from the DOM). Works perfectly...
2. You cannot open pages from the app in another tab. In principle something like this would do it:
var w=window.parent.parent;
w.open(w.location.protocol+'//'+w.location.host+w.location.pathname+'#PrintPage', '_blank');
But since the app is running in frame nested two deep from the launching page, and with a different origin, you will not be able to access the url that you need (the above code results in a cross origin frame access error). So you would have to hard code the URL, which changes at deployment, so it gets ugly very fast. Not that you want to anyway, the load time of an app should discourage you from wanting to do that anyway.

selenium dynamic value - for textbox

I am in the middle of building a test case, where i came across this problem. In my web page there exists a search text box. I have recorded the web page using selenium ide.
type | id=search_input_char_name_136 | myproduct // textbox for search
click | css=button.oe_button | - // search icon click
I got the above code by recording, here in "type" action an id value is show for textbox, when i use the same value while testing, id not found error occurs. so i have recorded the action again and i found id is dynamic. for each time the id keeps on changing.
i have googled it and found xpath will be solution for this. i am very new to selenium and xpath. i couldn't figure out the solution. so help to slove this.. provide me the xpath syntax to type=id of search-textbox.
In the selenium IDE itself, after the recording is finished, try to click the drop down named Target in the IDe window for this textbox. There, in that drop down, you can get various locators like xpath, name, css, dom etc. Observe, which one is static for all the recordings and use that locator. Hope it helps. Let me know if you are still struck with this issue.

Selenium and iframe

I have an iframe that gets loaded when i click on a tab on a page. When i use Firebug to look at the iframe on IE8, all i see is:
iframe id=tabContextFrame class=contextFrame contentEditable=inherit src=/xyz.dt?forward=show&layouttype=NoHeader&runid=1234 name=tabContextFrame url=/xyz.dt?forward=show&layouttype=NoHeader&runid=1234 scrolling=auto
and that's it.The hierarchy below the iframe can't be seen. I want to click on a link within the iframe. To find the elements within the iframe, I did a selenium.click("on the tab that loads the iframe") and then selenium.getHtmlSource(). From this source, I can at least locate my link of interest. I did a selenium.click("//span[text()='Link']") but it doesn't seem to do anything. Any ideas please?
Here is the code:
selenium.click("//span[text()='tab that loads iframe']");
Thread.sleep(5000);
selenium.selectFrame("tabContextFrame");
selenium.mouseOver("//span[text()='Link']");
selenium.mouseDown("//span[text()='Link']");
selenium.mouseUp("//span[text()='Link']");
Thread.sleep(5000);
selenium.selectFrame("null");
I'm guessing you are using Selenium 1.0. Have you looked at Selenium 2.0 and WebDriver. I found the following and it worked for me:
Q: How do I type into a contentEditable iframe? A: Assuming that the
iframe is named "foo":
driver.switchTo().frame("foo");
WebElement editable = driver.switchTo().activeElement();
editable.sendKeys("Your text here");
Sometimes this doesn't work, and this is because the iframe
doesn't have any content. On Firefox you can execute the following
before "sendKeys":
((JavascriptExecutor) driver).executeScript("document.body.innerHTML = '<br>'");
This is needed because the iframe has no content by default:
there's nothing to send keyboard input to. This method call inserts an
empty tag, which sets everything up nicely.
Remember to switch out of the frame once you're done (as all further
interactions will be with this specific frame):
driver.switchTo().defaultContent();
I found this on http://code.google.com/p/selenium/wiki/FrequentlyAskedQuestions
Use driver.switchTo().defaultContent(); first then do your operation

WebDriver HtmlUnitDriver NoSuchElementException

I'm using Webdriver to test my web application. When I work with FireFoxDriver or ChromeDriver everything seems to be ok. When I work with HtmlUnitDriver though things start to go wrong.
Here is a sample code:
WebDriver driver = new HtmlUnitDriver();
driver.get("http://localhost:8099/");
WebElement loginButton = driver.findElement(By.xpath("//button[#type='button']"));
loginButton.click();
i'v looked at the driver.getPageSource result, and the source code presented there is very partial.
it doesnt show me all the elements. it is the same a clicking view source on the page.
what i need from the driver is the entire source, like firebug or chrome inspector give me.
any ideas on how i can retrieve it?
my app was written with the GWT.
thanks a million
Have you tried to enable JavaScript for HtmlUnitDriver?
I believe that the HTMLUnitDriver emulates IE by default (link) and there are other questions related to clicking buttons with IE. Have you tried this?
// Press enter on the button
loginButton.sendKeys("\n");
Also, have you tried adding an ID to the element and using that to find the button?

Resources