What to use: Xpath or CSS selector while scraping Linkedin?

What to use: Xpath or CSS selector while scraping Linkedin? - web-scraping

i want to scrape Linkedins activity posts - comments, number of views and so on.
What selenium method to choose: Xpath or CSS?
I am trying to do this with Xpath but i have hmm the strange feeling that it is changing based on profile, language and chrome version.... How to do this for general usage?
Can anybody advice?

Xpath can change with the execution of javascript or can be different on different profiles. If the only chance is using xpath, then it is ok but if there is an id or special class you should use them.
In selenium, you have multiple options to select an element by id.
driver.find_element_by_id('ember87')
driver.find_element_by_xpath("//*[#id='ember87']")
And of course you can use any other css selector and generally this is the convenient way.
driver.find_element_by_css_selector("#ember87")
driver.find_element_by_css_selector("div#ember87")
Also you can use the parent element to make selection more special and more convenient.
driver.find_element_by_css_selector("#ember72>#ember87")

Related

Automate process of grabbing elements from a webpage

I'm looking to automate test cases for webpage development using Robot Framework. I have about 5000 test case strings that describe pathways to different page elements. Now I'm going to be going through and grabbing specific "id" or "css selector" within the webpage for automation. My default option is to manually inspect each button, link, table etc. and enter it into a huge spreadsheet for automation, but I feel like there must be a less arduous method to extracting the elements.
I've looked into different options and the closest thing I can find to a solution is python webscraping, but from what I understand webscraping requires the elements are already defined and your goal is extract information rather than the actual elements.
Does anyone have a solution that might be a bit less tedious than inspecting 5000 webpage elements? ;)

If you can put your page in IFRAME, than you could probably use JS (in the parent) to wait until page is loaded and then get (all or specific) elements in the IFRAME.
That way you should be able to get all the elements of fully rendered page.
(never did this, but it should work)

Class over ID in CSS

I'm coding a landing page. When should I use an ID for an element instead of a class? I know IDs are only referred to once on a page, while classes are referred to multiple times. I also know using a class is faster than using an ID. When should we use a class over and ID for an element?

Simply put, ID's are for JavaScript, classes are for styling (CSS). You can use Id's for styling still however, where needed. But in general, you should work towards using CSS classes and re-usable code.
Some people also follow a convention of using ID's for chrome elements on their site. Myself, I use classes for everything to allow for future code re-usability. I can't tell you how many times this has made life easier 6 months down the track. If I need to target something with JavaScript, then I also add an ID.
Performance
JavaScript
Id's are faster than classes when referenced in JavaScript.
CSS
Performance depends on the individual browser rendering engine.
Best Practice
Check out this handy guide on MDN that teaches you to write efficient CSS.

use xpath or css selector to parse all the element in a hybrid mobile app

I am currently working for a mobile UI automation testing. Our application is hybrid mobile app based on Cordova. So I am planning to use appium to run some automation tests.
One thing I need to figure out is how to find all the element in a page.
I was previously planed to use xpath to find all elements, since we can detect xpath through appium inspector. However, my colleague does not agree with me, since he wants to use css selectors as the key to find all element in the mobile app. But for appium, it does not show css selector in the inspector.
So, i am just curious which approach should be better?
Thanks

Below is my breakdown of Locators and how/why I like to use each for iOS Automation. All of this is experiential based on my work with Native iOS applications.
Giant disclaimer:
I don't know anything about Cordova. I hear there are issues that exist with UIAutomation if there are class names that aren't native. If that is the case, I suggest sticking to accessibility id and class name locator strategy.
Locators for iOS Automation
CSS Selectors
CSS Selectors do not exist in Appium.
Class Name
The closest you'd get to CSS Selectors is the Class Name selector. I don't really use them because UIAutomation gives me what I need and allows me to check for the name/text of the element in the locator strategy.
XPath
You don't want to use XPath because it's slow and flakey on iOS. (It can sometimes return an entirely incorrect element). It can sometimes cause Instruments to fail for no reason. Highly suggest staying away from XPath.
Accessibility ID
You should use this when UIAutomation fails to find the element. It's quicker than XPath but is useful when UIAutomation doesn't let you at an element (.actionSheets() is broken I think. I use this for when the action sheet is up and I need to .click() a button)
UIAutomation
You should use Apple's UIAutomation, 2 framework as it is the quickest, native solution to iOS.
The UIAutomation framework allows you to use classes and hierarchy to specify which element you want. When you use Appium, use the find_elements_by_ios_uiautomation function on your webdriver.
Example Usage here
But the example usage doesn't tell you how powerful UIAutomation really is. A common problem I ran into is trying to find all cells of a tableview when there was more than one tableview on the screen.
Find all UIACells for the UIATableview "Cart"
**Sample View Hierarchy**
<application>
<window>
<table name="Items">
<cell name="Foo, not in cart"></cell>
</table>
<table name="Cart">
<cell name="Bar. IM IN YOUR CART"></cell>
</table>
</window>
</application>
OK now to find those cells in an array:
value = '.tableviews()["Cart"].cells()'
cells = driver.find_elements_by_ios_uiautomation(value)
Extra Reading: Guide that goes over predicates and why they're awesome
Limitations for UIAutomation
If your element doesn't have a visible name. (Developers like to put invisible buttons behind "Hint Overlays" and the like.)
Suggested Solution: add an accessibility id, use the accessibility id locator.
Testing locator strategies
There's a nice place in the Inspector (provided in the Appium.app GUI) which lets you try whatever locator strategy and value you want. You should use it. It helps so, so much.

Selenium IDE - plugin to find alternate ID selectors?

I'm using extjs, so often I need the parent's ID, or the parent's parent's ID for a selector. Is there any plugin which lets me traverse up a DOM element looking for different IDs, or a plugin which can filter out certain ID patterns? (such as ext-gen\d+).

You can use Firebug and Firepath plugin for Firefox to find the IDs

There is possibility to use XPath - it can search even element which contains something. See one of examples here: http://zvon.org/xxl/XPathTutorial/Output/example6.html
I believe you can set the IDE to search by XPath and not only by ID

There is no solution. Use CSS selectors manually and make sure you're running on >FF3.x

Xpath is unique identification of the element. Xpath works better with FF,Safari and Chrome. But it breaks in IE sometime, so in that case you can use Css selectors.You can use Firebug. Over Firebug you need to install firepath.
You dont have to consider the DOM before using XPATH generated by firepath. It is very useful if you are having the elements that have dynamic IDs.
Both Firebug and firepath are the mozilla addons.
First install Firebug then firepath.
In the Selenium IDE when you are recording something. There is drop-down menu in Target in that select XPath:Position. It will work for dynamic element and does not change.
Cheers,
Amit Shakya

How do I optimize my stylesheet by removing unmatched and/or unnecessary CSS selectors

I have inherited a massive stylesheet with many thousand selectors and I'm certain that a good number of them are unnecessary and never actually match elements on the site. In the interests of optimizing, I'd like to remove those orphaned selectors/rules.
Are there any tools that would allow me to compare the CSS against the entirety of the site to identify which selectors are required and which are not?
The site has AJAX components, so writing a curl/wget script to traverse the site and then loop through each selector and grep for a match isn't particularly feasible either (even though that would be kinda fun...)
All suggestions welcomed.
Thanks,
JD

There is a Firefox plugin called "Dust-Me Selectors".
https://addons.mozilla.org/en-US/firefox/addon/5392/
"It extracts all the selectors from all the stylesheets on the page you're viewing, then analyzes that page to see which of those selectors are not used. The data is then stored so that when testing subsequent pages, selectors can be crossed off the list as they're encountered."
It's a fairly manual process but could be what you're looking for.

You can try one of the many online optimizers, like this one:
http://www.cleancss.com/
The Robson Compressor apparently does the best job of combining and removing redundant selectors.
http://iceyboard.no-ip.org/projects/css_compressor
Several of the online optimizers have the ability to remove unused selectors.

check CSS Coverage (an extension for Firebug) http://perishablepress.com/press/2010/06/21/how-to-micro-optimize-your-css/
In my opinion better than dust-me selectcor

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

What to use: Xpath or CSS selector while scraping Linkedin? - web-scraping

Related

Automate process of grabbing elements from a webpage

Class over ID in CSS

use xpath or css selector to parse all the element in a hybrid mobile app

Selenium IDE - plugin to find alternate ID selectors?

How do I optimize my stylesheet by removing unmatched and/or unnecessary CSS selectors

Categories

Resources