I am trying to get the xpath and click that particular xpath, but xpath keeps on varying(dynamic) like below
//*[#id="slipstream_action_bar_widget599"]/dl/dd[2]/span/span/svg
//*[#id="slipstream_action_bar_widget414"]/dl/dd[2]/span/span/svg
I need your help to get the xpath on the fly and pass that
Please try xpath as copied below:
//*[contains(#id,"slipstream_action_bar_widget")]/dl/dd[2]/span/span/svg
Related
I hope you are all doing well.
I am facing an error during web scraping in R using the Selector Gadget Tool where when I am selecting the data using the tool on the Coursera website, the no. of values it shows is correct (10). But when I copy that particular CSS code in R and run it, it's showing 18 names in the list. Please if anyone can help me with this. Here is a screenshot of the selector gadget output:
And here is what gets returned in R when I scrape that css selector:
The rendered content seen via a browser is not exactly the same as that returned by an XHR request (rvest). This is because a browser can run JavaScript to update content.
Inspect the page source by pressing Ctrl+U in browser on that webpage.
You can re-write your css selector list to match the actual html returned. One example would be as follows, which also removes the reliance on dynamic classes which change more frequently and would break your program more quickly.
library(rvest)
read_html("https://in.coursera.org/degrees/bachelors") |>
html_elements('[data-e2e="degree-list"] div[class] > p:first-child') |>
html_text2()
Learn about CSS selectors and operators here: https://developer.mozilla.org/en-US/docs/Learn/CSS/Building_blocks/Selectors
I'm new in Scrapy.
I try to get link to the next page from this site https://book24.ru/knigi-bestsellery/?section_id=1592
What how html looks like: enter image description here
In scrapy shell I wrote this command:
response.css('li.pagination__button-item._next a::attr(href)')
It returns an empty list.
I have also tried
response.css('a.pagination__item._link._button._next.smartLink')
but it also returns an empty list.
I will be grateful for the help!
The page is generated with JavaScript, see how it look with 'view(response)'.
# with css:
In [1]: response.css('head > link:nth-child(28) ::attr(href)').get()
Out[1]: 'https://book24.ru/knigi-bestsellery/page-2/'
# with xpath:
In [2]: response.xpath('//link[#rel="next"]/#href').get()
Out[2]: 'https://book24.ru/knigi-bestsellery/page-2/'
I would like to add to #SuperUser's answer. Seeing as the site loads the HTML via JavaScript, please read the documentation on how to handle JavaScript websites. scrapy-playwright is a recent library that I have found to be quite fast and easy to use when scraping JS rendered sites.
I have the following <html> text:
Text
How should I do for getting "Text" value? I am trying with this, but I get an empty value:
=INDEX(importxml("http://www.remoteurl.com";"//a[#href='link.html']");1)
I tried using your syntax and it worked for me. I shortened it a little for testing purposes.
=importxml("https://www.remoteurl.com","//a[#href='link.html']")
Be sure that the href value you are passing in the xpath query is exactly what is present on the web page, e.g. if the web page uses a relative path then you must also use the same relative path.
I was doing it properly, but the problem is that coding was inside an iframe, so it was impossible to reach it.
Hi All,
I have the following table with links that I need to select. In this specific example I need to select the DIY Payroll but sometimes this can change its position within the table. The current xpath is:
.//*[#id='catalog-category-div-1']/table/tbody/tr/td1/ul/li[4]/a
So I do a:
By.xpath(".//*[#id='catalog-category-div-1']/table/tbody/tr/td[1]/ul/li[4]/a").click()
But the problem is here is that it can change position where it can be in td[2] or td[3] and li[n'th postion]
Can I have selenium go through the table and click on it based on text. Will the By.linktext() work here ?
You can use the following codes these codes will handle the dynamic changes.
You can use linkText() method as follows:
driver.findElement(By.linkText("DIY Payroll")).click();
If you want to use xpath then you can use following code.
driver.findElement(By.xpath(.//a[contains(text(),'DIY Payroll')).click();
If you need any more clarification you are welcome :)
I would suggest that you try By.linkText() or By.partialLinkText(). It will locate an A tag that contains the desired text.
driver.findElement(By.linkText("DIY Payroll")).click();
A couple issues you might run into:
The link text may exist more than once on the page. In this case, find an element that's easy to find (e.g. by id) that is a parent of only the link you want and then search from that element.
driver.findElement(By.id("someId")).findElement(By.linkText("DIY Payroll")).click();
The A tag may contain extra spaces, other characters, be capitalized, etc. In these case, you'll just have to try using .partialLinkText() or trial and error.
In some cases I've seen a link that isn't an A tag or contains additional tags inside. In this case, you're going to have to find another method to locate the text like XPath.
You should use a CSS selector for this case:
Can you try:
By.CssSelector("a.browse-catalog-categories-link")
You can use XPath to do this. //a will select all 'a' tags. The part inside of the square brackets will select everything with text "DIY Payroll". Combined together you get the desired solution.
//a[contains(text(),'DIY Payroll')]
I am trying to find the attribute value of the title attribute.
Now, I have a list of similar links on the same page and I want to select the first link and get its title attribute.
I have used the following selenium command:
self.se.get_attribute("css=a[href*='radio?rid=']:nth-of-type(1)#title")
But it is giving me an error.
Could someone please help me figure out the problem?
Thanks
You should use XPath syntax instead of CSS selectors. You didn't post any HTML to match, so therefore a made up example: to get the title of the first link found in a div with id myDiv, use the following:
self.se.get_attribute("xpath=//div[#id='myDiv']//a[1]#title")
Where:
//div[#id='myDiv'] matches any div with id "myDiv";
//a[1] select the first link found anywhere in the previously selected div (use 2 for the second, and so on.
#title specifies the attribute you want to retrieve.