How to scrape data in a page with jquery button click using HtmlAgility pack - web-scraping

I am trying to scrape data from a page with similar content(Shopping website) using HtmlAgility pack.
There is a button to load more items designed of tag. On click it loads more items on same page.
If it is designed using tag then I will get the next items using the href attribute URL in tag and also I will be loading new page for the new next items, So no problem.
But here no new URL and items loaded on same page.
So is there any way to get this functionality implemented? How to trigger that load more button to get more items?

HtmlAgilityPack is an html parser alone, it knows only to parse a static html document. what you want may be accomplished using selenium web driver.
Another possibility is - if the number of item load actions is so that you can complete the loading manually - do so and save the resulting html locally, and only afterwards use HtmlAgiliyPack to parse the static html you stored locally (instead of parsing the http response).
Share the link of the site you are talking about so I can add some code snippets to exemplify.

Related

Dynamically updating the view with Handlebars

I'm working on an app using Express.js and Handlebars. I want to have a page where you can enter text into a search bar, and the view will update to display the results of your search. The backend part of this is already working - the data is being retrieved just fine - but I'm not sure how to best go about updating the view.
I'm using express-handlebars for this app, so the views are formatted as .handlebars files, and being sent to the client via the res.render method. I have seen examples of, for instance, using a button to fill out a Handlebars template, but these examples use a static HTML file with a template in a script tag; I don't know how to translate that to what I'm working with.

scraping video links from lazy loaded videos

I am trying to scrape a video from a page using a package called icrawler, but that video is not rendered instantly when a page loads, so when I get the html code of that page, the video tag doesn't exist but it does if I open the page in the browser and inspect.
How do i wait for the page to load the video before crawling it ?
The page most likely loads the video using javascript so, you would need library capable of rendering/executing HTML and javascript.
I took a quick look at icrawler and according to the doc it uses Cheerio which quoting from its doc "does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript".
The same docs mention that you could use something like PhahomJS (seems to be abandoned) or JSDom. Another alternative is to use Selenium.

Create a template page for html that can be used on every page

I would like to build the equivalent of a _layouts.cshtml page that I can use with every page in my web app. I don't want to use cshtml or razor though I want to use html. Is there a simple way to convert the _layouts page to standard html and then include it with the other pages in the app?
I have inherited an app that uses cshtml, with a _layouts page for menus and some content styling used on each page. I can duplicate the content on every page, but would prefer to have a generic solution for it. The client does not want to use cshtml pages. Any java library, such as AngularJS that would provide tools is ok for me to use. I'm not a front-end guy and this is a relatively low budget operation where I'm working for free. TIA.
i'm not familiar with .cshtml but on an html page you could use an iframe and update the contents of the iframe with different pages as needed.
you could also get external html and put it on the current page using jquery's get().
// get the contents of 'test.html' and put it into the 'result' element
// on the current page.
$.get("ajax/test.html", function(data) {
$(".result" ).html( data );
alert("Load was performed.");
});

ASP.NET Build Images Links Dynamically

I am developing a website that has product images on an external server. I have code that tests to see if the image exists like (pseudo code):
DynamicString = FunctionThatCreatesDynamicString()
' DynamicString = "http://external_server/path/to/file1.jpg"
If ImageExists(DyanmicString) = StatusCode.200 Then
' Embed link in ASP.NET page
Else
' Embed not found image in ASP.NET page
End If
My code builds fine and appears to execute. The problem occurs when I attempt to view the external link in a browser, the image appears properly (I have to authenticate first, but that's OK considering I'm on an internal network and this app will be used internally).
However, when I attempt the view the source in my generated HTML page, I am seeing the image to the "Not Found" image when I know the image is there.
I compared all the characters in my dynamically assembled to the external link and all the characters are matching up correctly.
I'm wondering if the authentication has anything to do with why the image is not rendering properly on my rendered HTML.
Any thoughts?
It turns out the problem was the authentication to the images. I tried and was able to use a completely different approach. But thanks for the tip tangurena.

How to include HTML contents from another site? I have access to both sites

I have a site which is using DNN (DotNetNuke) as a content management system. I am using another site for my event registrations. I have sent them my template; which displays the basics including a hover menu with many different items in it.
Issue is - as I update the menu on my site using DNN, I need it to be reflected on the site using my template - without me having to send them a new file. Anyone have suggetsions on how to approach this?
I don't want to send the events provider all of the DNN DLLs as well as my database login information in order to render the menu.
I created a page on my site that is something like 'menu.aspx' - this produces the menu in HTML format, however it has tags like in it that I'd like to remove before serving it to them.
What is the best approach for this? Do I need to write a custom server control using XMLHttp? Can I accomplish this in Javascript?
Any advice much appreciated.
Thank you!
If both sites are hosted on the same domain (eg site1.domain.com and site2.domain.com) you can use JavaScript and XmlHttpRequest to insert code from one site to another. Otherwise, the Same Origin Policy prevents you from using AJAX.
If they're not on the same domain but you have access to the page on their website, you can simply include there a JS script from your site :
<script type="text/javascript" src="http://yoursite.com/code.js"></script>
In the JS, simply document.write() what you want on the page. This way, you can easily change the content of the page on their site without having to send them a new file.
Finally, you can also use an iframe on their site, pointing to a page on yours.
EDIT: As Vincent E. pointed out, this will only work if they're on the same domain - my bad.
If you are unwilling or unable to use frames, then I would set up an ashx on your DNN server which renders the menu (if you've got it in a user control all the better, as you can just instatiate it and Render it directly to the output stream) and then just make an Ajax call to that from your events page and insert it directly into the DOM.
Here's a quick and hacky jquery-based example of the events page end of things:
<script type="text/javascript">
function RenderMenu(data)
{
$('#Menu').html(data);
}
$(document).ready(function() {
$.ajax({
type : 'GET',
url : 'http://localhost/AjaxHandlers/Menu.ashx',
data : '',
success : RenderMenu,
});
});
</script>
You'll want an empty div with the ID 'Menu' on the page where you want your menu to sit, but apart from that you're good to go.
If for whatever reason you can't get the menu HTML in an isolated way, then you'll need to do some text processing in RenderMenu, but it's still do-able.
I am not a web expert, so don't shoot me.
Can't you just put their registration form into an iFrame in DNN ?

Resources