web scraping without using selector gadgets - web-scraping

I am beginner in web scraping. I have seen quite many videos regarding it.
If I want to scrape a data from a website, is it possible for me to do so without using the selector gadgets manually?
I want to code this in R.

Related

Can't scrape a website which uses Java Server Faces (JSF)

I am trying to scrape data from a website which uses JSF (JSF is also in the URL like https://xxxx/xxx/x.jsf) for my work.
I have tried a couple of scraping tools like Parsehub & Octoparse but I noticed that they try to reload the page to extract data to a .csv file. After reloading the page, the problem is that all the results are gone and I have to recall (re-filter) the data I need from the website.
Is there a scraping tool that can help me with that? I know that I may get it done using Java or Python, but my programming skills are not enough for such a thing.

Crawling a list of URLS for specific links and Javascript

I have little experience with crawling and need help:
I have a list of URLs and I want to find out whether a certain tool is used on the websites.
The tool is working via an iframe which is loaded when a link with a specific URL is clicked.
So I am looking through the websites for this link. The problem I have is that sometimes this link is in an anchor element but sometimes it is in a javascript function (onclick on a button).
So the anchor element I can find (I tried around with different scraping frameworks like scrapy), but how do I find it when the link is in the function?
Is there an easier approach to the problem than looking for the -elements? E.g., downloading all html and all javascript and searching these files for the link? Because other than in classic crawling I do not want to extract structured data but rather know whether there is a specific link somewhere on the pages?
Thanks so much for any help or ideas!
Best
Martin

Where can I find the list of web scraping projects to practice?

I am looking to practice the web scraping projects. Can anyone tell me where can I find such list or give me some examples of web scraping projects?
Scraping is a broad term, you need to be more specific to get a detailed answer.
But take a look here:
http://google-rank-checker.squabbel.com/
Contains a lot of information and open source scraping code.
It's focused on Google and Bing but the information is quite universal and the code useful in any case.

How To Extract Data From a Login Site

I'm trying to figure out how to take live data from one site and have it displayed on my site. I would like to do so, where the data updates as it updates on the original site. My theme is sports gaming and my site is structured like ESPN. I would like to grab all the team standings and players stats.
Sorry if I am unclear.
So basically you want to scrape a website and display it in yours, possibly in a better way.
So I would recommend to use KIMONO, Its an web scraping service, which will provide you with the api to get the data in a proper Model.
Check it out, IT should get your job done.
If not you can create your own scraper in PHP (PHP Simple HTML DOM Parser) or Javascript, there are libraries in Javascript also.
Hope it helps!
Happy Coding !!!

how to hide myself while web scraping by html-agility-pack

I am trying to scrap content from some webpages of a site. I tried html-agility-pack with c#, which is doing good in scraping html.Here I need to go through some numbers of pages while scraping. Now my question is how can I hide my self as webscraper? As I do not want other side come to know that i am scraping their content.Please Let me know if there is any way that can help me.Looking forward for your responses.
Thanks
Use a tor proxy:
Tor Project
You can reset the proxy after every page or after every site. Keep in mind that some sites look for certain patterns and can tell your scraping them. With html agility pack the web is one big data repository, just make sure your not use someone else's data in a way that would get you in trouble.

Resources