Is it possible crawl ASP.NET pages? - asp.net

Is there a way to crawl some ASP.NET pages that uses doPostBack as events calling?
Example:
Page1.aspx:
Contains 1 LinkButton that redirects to Page2.aspx
Code-behind for LinkButton Click event:
Response.Redirect("Page2.aspx")
In client side this code is generated on click event:
doPostBack(...
Is it possible crawl pages using only HttpWebRequest?
I know that use Response.Redirect is not a good idea in this case, but I don't have choice.

Yes, it's possible if the code follows a well predictable pattern. You would have to gather the form data from the page and simulate what the doPostBack function does (putting some values in some hidden fields), and send a POST request to the server. What you get back would be a redirection page, so you would have to parse that to get the url of the target page.
If you mean if search engines like Google will crawl the pages, then that is very unlikely. They might attempt to follow some common patterns of posting and script linking, but generally you need to use proper links between the pages to be sure that they are crawlable.

Related

How a browser can reach the server without doing postback?

i've been asked this question and id not know the answer.
Thanks for any help!
Postback is a term used often in ASP.NET when a WebForm POSTs the single form back to the server and invokes some event in the code behind (like a click on a button for example). You could still use normal GET requests though to redirect to a given web page. For example you could use an anchor:
Go to page 2
When the user clicks on the anchor there is no postback occuring but a GET request to the target web page.
Another possibility is the user typing directly the address of the web page in his browser address bar.
Yet another possibility is to use javascript to perform an AJAX request which allows to invoke a web page without redirecting away from the current page. You could use any HTTP verb with AJAX.
We can use javascript code to do some function without postback. This will save the time for the request and response to the server. But this client side. you can't reach the server without posting back.But my mean you can do functionality by javascript which does not postback the page.
Hope it may help.
If you like to categories the call to the server you can say that there are two types.
The GET and the POST
The POST is the post back and are the parameters that you send using a form
and the GET that are the parameters that you can send from the url.
More about:
http://www.cs.tut.fi/~jkorpela/forms/methods.html
http://thinkvitamin.com/code/the-definitive-guide-to-get-vs-post/
http://catcode.com/formguide/getpost.html
but I think the interview question was about the Ajax call, and this is probably what they try to see if you know, how to use Ajax to reach the server with javascript and not make postback. But you need to know that Ajax can make POST back, but this is done with out leave the page, with out make a full page post back.

Login to ASPX page with Perl script

I am trying to scrap an ASPX page with Perl's WWW::Mechanize . The problem is that the page I want to scrape can be accessed only after logging in. I tried using the HTML::TreeBuilderX::ASP_NET module but can't seem to get it to work.
I tried setting the __VIEWSTATE, __EVENTTARGET and __EVENTARGUMENT parameters.
Does anybody have any experience in logging into ASPX page using a Perl script?
Generally, you have to post the form (you'll only find one, typically, on any given aspx page; the form I'm talking about is the one found on login page) with all input values, including those hidden fields (especially those hidden fields, actually). the only values that you should change are uid/pwd textboxes. So, get the list of all named input tags, post them all; that should return a redirect with a asp.net auth cookie, which you have to include in subsequent requests.

How does Asp.net Knows which Page to Generate?

When a client is on page A.aspx , and he press some button there is a postback.
The server knows which page to rebuild according to the request.
but how does the client knows which page to re-ask ? by the current url of his browser ?
where this information is saved in the client side ?
Its defined in the action property of <form>. The client does not need to re-ask, the server sends a response of his request.
ASP.NET is just a part of the .NET framework, but what every client sees on a web browser in plain old HTML.
ASP.NET gives you several controls that makes it easy to use them programatically, so we can set all sort of things in our code (that is run before the page is showing) to do the exactly what we want.
every link, button, image, grid, it's just HTML tags, like <a> for links, <input type="button"> for buttons etc...
Keep in mind that now, there are 2 variantes of the ASP.NET, the WebForms and the MVC (you can also read about choosing one in prole of the other)
in every ASP.NET WebForms there is always a <form> on the start of the <body> and wrapps all your code, so, any submit will do a PostBack into the same file name, in your example A.apsx will always post into A.aspx, then if you want, for example, send that request to B.aspx you need to have a Click Event that would use the Server.Transfer("B.aspx") and that would redirect the entire post to B.aspx just like it was a post from B.aspx
in the newest pattern, the ASP.NET MVC, it drives with Routes witch let's you set up any, every, one, multiple, ways to reach the same page. In MVC the URL does not point to a specific page, but to a specific Controller and it's up to the Controller to send, after processing the data, to a specific View, that is why in MVC there are no pages in the url (though you can add it to the route if you want, and you can accomplish the same with WebForms using a Routing plugin).
Now, in MVC it's there is no <form> wrapping up your entire code, you need to, if you want to submit something, create your own <form> and point to the correct route
but, just like in Webforms or any HTML page, posts are made through form submittion, and it's "path" it's always whats in the form attribute action that let's you know what's the next step.
I hope this helps you realizing that there is no big monster in ASP.NET, that is only a way to reuse controls and access them programmatically and that, in the end, it's all HTML :)
A general answer: on the client side it's either a submit from within a form or a link.
The form points to either a relative URL (that means the current URL plays a key role) or an absolute URL (the current URL plays little to no role).
For links it's generally the same: either they are relative or absolute. One big difference: links are use HTTP GET while forms can use HTTP POST (thus transferring more data without encoding them to the URL as parameters).
For a button it's the form that gets submitted.

aspx ashx mash-up

I'm retro-fitting a .aspx page with AJAX functionality (using VB, not C#). The codebehind populates the page with data pulled from a web-service. The page has two panels that are populted (with different data, of course) in this way. On a full page refresh, one or both panels might need to be populated. But populating Panel 2 can take a long time, and I need to be able to update panel 1 without refreshing Panel 2. Hence the need for AJAX (right?)
The solution I've come up with still has the old .aspx page with .aspx.vb codebehind, but introduces a Generic Handler (.ashx) page into the mix. Those first two components do the work on the user's first visit or on a full page refresh, but when AJAX is invoked, the request is handled by the .ashx page.
First question: Is this sound architecture? I haven't found a situation online quite like mine. Originally, I wanted to make the .aspx page into the AJAX handler by having the codebehind implement IHttpRequest, and then providing "ProcessRequest" and "IsReusable" methods, but I found I couldn't separate a regular visit to the page from an AJAX request, so my AJAX handlers took over even on the first visit to the page. Second question: Am I right to think that this approach (making the .aspx page do double-duty as the AJAX handler) will never work? Is it impossible to tell whether we're getting a full-page request or a partial-page (AJAX) request?
If the architecture is good, then I need to dynamically generate a lot of HTML in the .ashx file, right? If that is right, should I send HTML back to the client, or should I encode it in some way? I've heard of JSON encryption, but haven't figured out how to use it yet. So, Third question: Is "context.Response.Write" the only pipeline for sending data back to the client? And, if so, should I send back HTML or some kind of JSON-encoded objects?
Thanks in advance.
It sounds as if the page requires some AJAX functionality added to the UI.
Suggest using an UpdatePanel for each web form element that needs to have AJAXy refresh
functionality. That'll save you from having to refactor a bunch of code, and introduce a whole lot of HTML creation on your .ashx.
It'll be more maintainable over the long run, and require a shorter development cycle.
As pointed out by others, UpdatePanel would be a easier way - but you need to use multiple update panels with UpdateMode property set as conditional. Then you can trigger the update-panel refresh using any button on the page (see AsyncPostBackTrigger) or even using java-script (see this & this). On the server side, you may decide what has triggered the partial post-back and act accordingly by bypassing certain code if not needed.
You can also go with your approach - trick here is to capture the page output using HttpServerUtility.Execute in your ashx and write it back into the response (see this article where this trick has been used to capture user control output). Only limitation with this approach is that you can only simulate GET requests to your page and so you may have to change your page to accept parameters via query string. Personally, I will suggest that you create a user control that accept parameters via method/properties and will generate necessary output and then use the control on your page and in ashx (by dynmaically loading it in a temperory page - see this article).
EDIT: I am using jquery to illustrate how to do it from grid-row-view.
$(document).ready(function() {
$("tr.ajax-grid-row").click(function() {
$("#hidden-field-id").val($(this).find(".row-id").val()); // fill hidden filed
$("#hidden-button-id").click(); // simulate button click
});
});
You can place above script in the head element in markup - it is assuming that you have decorated each grid-row-view with css class "ajax-grid-row" and each row will have hidden field decorated with css class "row-id" to store row identifier or the value that you want to pass to server for that row. You can also use cell (but then you need to use innerHTML to get the value per row). "hidden-field-id" and "hidden-button-id" are client ids for hidden field and submit button - you should use Control.ClientID to get actual control ids if those are server controls.
JSON is not for that purpose, it is to pass objects serialized with a nice light weight notation, is you need to stream dinamically generated html using ashx, response.Write is what you have. You may want to take a look at MVC
Or you could use jquery if it's just html, the simpliest would be the load function, or you can look into Ajax with jquery. Since the ashx can be served as any resource it can be used in the load function.
I agree with #p.campbell and #R0MANARMY here. UpdatePanel could be the easiest approach here.
But then like me, if you don't want to go the UpdatePanel route, I don't see anything wrong with your approach. However, generating the html dynamically (entirely) at the back end is not a route I'll personally prefer (for the maintainence reasons). I'd rather prefer implementing a solution that will keep the design separate from the data.

What is the alternative to Response.Redirect() asp.net?

Hi One of the tips in "website performance tips" in various blogs says "Avoid Redirects". In my case, I am using Response.Redirect for the same page. I am passing a querystring and displaying appropriate information to the user.
Response.Redirect("FinalPage.aspx?NextID=" + ID);
So in our business logic, i am reloading the same page with different information.
So how do i avoid redirect? Is there any other alternative? BTW, my aim is to gain some performance there.
Redirect is the R in the PRG pattern which is an accepted pattern for processing posted requests. So it is definitely not evil.
However, there used to be a common interview question: "What is the difference between Server.Redirect() and Server.Transfer() and which one must be used?". People used to say Transfer because it did not involve a round-trip but web has changed so much since then. In those days you could not re-use the the common logic in the views unless you use Transfer or Redirect, but nowadays especially with ASP NET MVC there are tons of a ways to do that.
In your case, I am all for PRG and I believe redirect is semantically more correct. Also it prevents the form being re-submited if user clicks F5 or refresh.
The recommendation is for unnecessary redirects.
Your case is different - you are passing in information to the page, this is not strictly the same thing as a regular redirect (i.e. a page that moved).
You can also do a Server.Transfer, which does not require a new request to come in, thus lessening the load on the server. More information comparing the two is here.
In your case, you do want to do a Redirect because you are modifying the query string and changing something on the page, as opposed to shifting processing of the initial request to another page.
The main "evil" if it could be called such is that redirects require an extra round trip; the client requests one page (usually the same page, specifying that a particular button was clicked), and the server responds saying "request this page instead", and the browser then complies, resulting in the server actually serving up the next page.
It's sometimes necessary to do this, however there are now much better ways to control navigation in a website. For instance, instead of a "form" button that causes a postback and redirect, you could use a LinkButton that will behave like a hyperlink, allowing the browser to request the new page directly. You could also use a MultiView that shows different ASCXs, and control navigation by view-flipping (however, understand that this can have its own performance implications, especially when using them in a nested fashion).
I think if you want to redirect to same page then instead of doing Response.Redirect("FinalPage.aspx?NextID=" + ID); you could use NextID in ViewState also or Hidden Field so that you would not required to redirect SAME page and then check that hidden field or viewstate instead of checking QueryString
:D

Resources