I am trying to scrap an ASPX page with Perl's WWW::Mechanize . The problem is that the page I want to scrape can be accessed only after logging in. I tried using the HTML::TreeBuilderX::ASP_NET module but can't seem to get it to work.
I tried setting the __VIEWSTATE, __EVENTTARGET and __EVENTARGUMENT parameters.
Does anybody have any experience in logging into ASPX page using a Perl script?
Generally, you have to post the form (you'll only find one, typically, on any given aspx page; the form I'm talking about is the one found on login page) with all input values, including those hidden fields (especially those hidden fields, actually). the only values that you should change are uid/pwd textboxes. So, get the list of all named input tags, post them all; that should return a redirect with a asp.net auth cookie, which you have to include in subsequent requests.
Related
I want to write a DotNetNuke module that can take an HTML form and parse or transform it into an asp.net form that would then do a HTTPPost to the page specified in the HTML Form's action property.
We regularly run into the need to use pre-existing forms (from existing sites and Service Providers like Paypal and Constant Contact). Currently, we either use an IFrame, manually convert the form into an ASP.NET user control, or use a forms module to recreate the form. It seems like it should be fairly easy to create an automated way to handle these with ASP.NET.
My quick plan:
1.) Admin Users will paste the form into a settings page and then click a convert button
2.) The code will parse the HTML and generate a ascx User Control from it and store the posting address. We may just have to add runat="server" into each of the form controls
3.) The admin user will be able to specify a variety of response codes and corresponding messages. (i.e. 1 -> "Thank you for your Donation", 2-> "We were not able to process your request at this time due to ...")
Users would then fill out the form and hit submit. The system would get the names and values of all form controls and Silent Post that to the posting address and get the response code and then display the corresponding message.
Any thoughts or suggestions of the best way to do this? Are there any tools that already do this or would be helpful?
Thanks,
David O'Leary
An outside vendor did some html work for us, and I'm filling in the actual functionality. I have an issue that I need help with.
He created a simple html page that is opened as a modal pop-up. It contains a form with a few input fields and a submit button. On submitting, an email should be sent using info from the input fields.
I turned his simple html page into a simple aspx page, added runat=server to the form, and added the c# code inside script tags to create and send the email.
It technically works but has a big issue. After the information is submitted and the email is sent, the page (which is supposed to just be a modal pop-up type thing) gets reloaded, but it is now no longer a pop-up. It's reloaded as a standalone page.
So I'm trying to find out if there is a way to get the form to just execute those few lines of c# code on submission without reloading the form. I'm somewhat aware of cgi scripts, but from what I've read, that can be buggy with IIS and all. Plus I'd like to think I could get these few lines of code to run without creating a separate executable.
Any help is greatly appreciated.
If you don't want the page to reload after submission, you will need to use AJAX. that is alot more complicated. you would still need the submit.aspx, you cannot send email with javascript.
Code a redirect after the form submission, so instead of getting the same form back in the main document/page, you could get something like a blank page saying "Thanks for your submission!" or something of that nature.
Might be more simple to redirect the user to a result page using Respone.Redirect that displays some sort of "Your email has been sent" message, or even just redirect back to the base page.
Is there a way to crawl some ASP.NET pages that uses doPostBack as events calling?
Example:
Page1.aspx:
Contains 1 LinkButton that redirects to Page2.aspx
Code-behind for LinkButton Click event:
Response.Redirect("Page2.aspx")
In client side this code is generated on click event:
doPostBack(...
Is it possible crawl pages using only HttpWebRequest?
I know that use Response.Redirect is not a good idea in this case, but I don't have choice.
Yes, it's possible if the code follows a well predictable pattern. You would have to gather the form data from the page and simulate what the doPostBack function does (putting some values in some hidden fields), and send a POST request to the server. What you get back would be a redirection page, so you would have to parse that to get the url of the target page.
If you mean if search engines like Google will crawl the pages, then that is very unlikely. They might attempt to follow some common patterns of posting and script linking, but generally you need to use proper links between the pages to be sure that they are crawlable.
I have 3 asp.net pages: Search.aspx, Results.aspx and Login.aspx.
Now I want anonymous users to be able to search, so everyone can use search.aspx. This page calls
Server.Transfer(Results.aspx)
and therefore shows the results. Now when the user is not logged in, a link to the login page is displayed on the Results page. The problem is:
After the login, I want the user to be redirected automatically to the Results page. However I do not succeed so, as the PreviousPage property of Login.aspx is always null. And when I use
Request.UrlReferrer.LocalPath
it's the link to Search.aspx but not Results.aspx.
Also, when the user is on the Results page, how do I enable him to go back by clicking a link and all his search input criteria (like in textboxes) on the Search.aspx is still there so he can refine the search after having seen the results? Whenever I send the user back, all user input is lost.
I still haven't figured out if I should use a normal hyperlink, a linkbutton and how to retrieve the previous page url or preserve the input data.
I can use AJAX if that is any help, but a "pure" asp.net solution would be preferred.
When you do a Server.Transfer it is a serverside redirect...meaning the client still sees the original URL. You need to post the search criteria to the results page, store them locally, show the login link. When they are logged in you can redirect to the results page and rehydrate the search critera and show the results.
Try using Response.Redirect("url") instead of Server.Transfer. When you use Server.Transfer, the url on the client will remain the Search page and not actually redirect to the Results.
You can use User.Identity.IsAuthenticated to check if the user is logged in and show/hide the login button based on that.
To keep values for the page, you could store them in Session, then have the Search page look for them and, if they exist, place them in the controls.
You can also embed the URL you want to return to after login into the querystring if you want.
We are using standard asp.net forms authentication. Certain pages require a user to be logged in; and least some of these pages are delivered by https. There is a search control at the top of each page. When this is used, we don't care whether the user's session has expired, even if the current page requires a log in.
However, currently, when performing the search, the built-in forms authentication sees that the page being posted to requires authentication and redirects the user to the login page, with the previous page, not the search results page as the referrer.
What is the best way of bypassing the security here? I have considered posting to a different page using the PostBackUrl property, but if this is not https you get the "you are posting data to an unsecure connection" message, which users don't like.
Thanks for any help.
Edit: thanks Nick for your suggestion of using a GET on the search page. We are doing this already, but the query string is constructed by the search input control then redirects. How can we build up the query string without using a postback? (Obviously javascript is an option but I was hoping to find an alternative mechanism.)
For the search page you want to make sure the search is happening via a GET request. (i.e. like google with the "q" in the query string) Chances are you are doing a POST.
So change your
<form method="post" ...>
to
<form method="get" ...>
The biggest mistake most developers make with search pages is to do a post back. HTTP was designed to do queries or searches through the query string (thus the name), and to get a form to post to a query string instead of the body you need to use a "GET" method. This way any search device can use your search page, even the browsers search box.
Second you want to create a special location config for you search page. You add this to your web.config.
<location path="my-search-page.aspx">
<system.web>
<authorization>
<allow users="*" />
</authorization>
</system.web>
</location>
This creates a special override for that one page and everything inside the location tag uses the exact same web.config structure to override the web.config.
You will want to repeat this for each page you want to allow all users to.
If the search results page is performing a postback the pageload event will be fired before your search button is clicked.
So if the page they are on required a login that login command will be run before the search button click event sending them back to the login screen.
There are a few ways round this make the search a normal html form and make itperform a GET not a POST and mentioned by "Nick"
Ort if the whole page is inside a .net postback form you will need to add the search button event to a overload of the page load so it fires first.
This site has a good article on the page like cycle and its overrides.
http://www.15seconds.com/issue/020102.htm
As suggested in other answers the most correct way to do this would be to have the search input control in a separate form which has a method of get and an action of searchresults.aspx. However this is difficult with aspx as you can only have one server-side form on a page.
In the end the solution I came to, which works very well, was to have an HttpModule that spotted if the "search" button had been clicked (by looking to see if a param with its id existed) then built up a query string by looking for the criteria params and redirected to the search results page. This means that all the authentication / authorisation modules are bypassed as we have already called a redirect to the (unsecured) search results page before they are triggered.
It's slightly brittle but for us it works very well.