Scraping data from <li> tag without any links in them - web-scraping

i am scraping data from a static website this website using scrapy. However, the issue i am facing with scraping the links for the seeds shown are that specific type of seed show under specific heading in a list. However these lists do not have any links within them neither does the main link change when you click on them. i wanted to ask what would be the best way of going about this hinderance?
please open this link "awangarden.com.pk/seed-store" , I'm trying to scrape the seed packets in all the options given. However i dont understand how to open the 2nd or 3rd option e.g first its seeds then from grass seeds. I dont understand how to make the transition between these two because theres no link in them that i can open
currently i did not write any code because i cannot change the links to get the rest of the seeds, i only have the code to extract data of a single type.

Related

Web scraping from a google search page using html tag

I'm trying to do a google search and get the first 5 result (title/URL) into a excel document.
I tried using 'Data Scraping' but depending on the search term, google will display a different page. Sometimes its will have video, images or related search term. So most of the time, I was not able to actually get all the result from the page as uiPath would not recognize them, probably because of the different div. So my thought was to get them by html tag, as every title use H3 but I can't find a way to do that.
Also tried with find children > get attributes but no success, I feel that might be the best ways tho, I'm just not enough experimented with it to make it work. Tried for hours.
Anyone had a similar problem and found a solution?
When I did this before I had to do multiple scrapes to get the data. The first scrape will get the initial page results and then you can do a second to get data on page 2 forward. I have had instances where i had to do multiple scrapes on the first page to get all the information but after page 1 the data is consistent and easy to scrape. Hope this helps.

Scraping website without specific address

I am trying to scrape a website that does not generate specific web address for the different pages I want to scrape. The reason for this is that each page is generated by selecting different options on some combo boxes, which thereafter produces the desired table.
Is is possible to scrape these tables using R and rvest?
EDIT:
Here is the link with a specific example:
http://www.odepa.gob.cl/precios/precios-al-consumidor-en-linea
You can use selenium webdriver, to control clicks and dynamic data in html pages.
Try this : https://github.com/ropensci/RSelenium

Automatically Parse a Website

I have an idea and want to see whether it is possible to implement. I want to parse a website (copart.com) that shows, daily, a different and large list of cars with the corresponding description for each car. Daily, I am tasked with going over each list (each containing hundreds of cars) and select each car that meets certain requirements (brand, year, etc). I want to know whether it is possible to create a tool that would parse these lists automatically and, in so doing, selects the cars that meet my criteria.
I was thinking something like website scrapers such as ParseHub, but I am not trying to extract data. I simply want a tool that goes over a website and automatically clicks the "select" button on each car that meets my criteria. This would save me enormous amounts of time daily. Thanks.
I think you can use selenium for this task. It automatically opens the web browser and you can locate the element with xPath and click on the select button. I've done that before for some home utility website.
Scrapy is a good tool designed for this. Depending on how the webpages are rendered, you may or may not need an additional tool like Selenium. Submit or "select" buttons are often just links that can be followed using HTML requests, without an additional browser emulation tool. If you could post some of the sample HTML we could give you more specifics.

Scraping data from a related URL column rendered from a Web Scrape/RSS

I am scraping data from a site, and each item has a related document URL. I want to scrape data from that document, which is available is HTML format after clicking link. Right now, I've been using Google Sheets to ImportFeed to get the basic columns filled.
Is there a next step that I could do to go into each respective URL and grab elements from the document and populate the Google sheet with them? The reason I'm using the RSS feed (instead of python and BS is because they actually offer an RSS feed.
I've looked, and haven't found a question that matches mine specifically.
Haven't personally tried this yet but I've come across web scraping samples using App Script with the use of UrlFetchApp.fetch. You can also check the XmlService sample which is also related to scraping.

Web scraping of an eCommerce website using Google Chrome extension

I am trying to do web scraping of an eCommerce website and have looked for all major kind of possible solutions.The best I found out is web scraping extension of Google Chrome. I actually want to pull out all data available in the website.
For example, I am trying to scrape data of an eCommerce site www.bigbasket.com. Now while trying to create a site map , I am stuck to this part where I have to chose element from a page. Same page of say category A, while being scrolled down contains various products ,and one category page is further split as as page 1, page 2 and few categories have page 3 and so on as well.
Now if I am selecting multiple elements of same page say page 1 it's totally fine, but when I am trying to select element from page 2 or page 3, the scraper prompts with different type element section is disabled,and asks me to enable by selecting the checkbox, and after that I am able to select different elements. But when I run the site map and start scraping, scraper returns null values and data is not pulled out. I don't know how to overcome this problem so that I can draw a generalized site map and pull the data in one go.
To prevent web scraping various websites now use rendering by JavaScript. The website (bigbasket.com), you're using also uses JS for rendering info to various elements. To scrape websites like these you will need to use Selenium instead of traditional methods (like beautifulsoup in Java).
You will also have to check various legal aspects of this and whether the website wants you crawling this data.

Resources