How to manipulate a .NET ASPX form programmatically? - asp.net

I'm trying to manipulate a .net ASP form on a site that's using AJAX Control Toolkit. The site is only accessible to valid logins, and I do have a valid account. It consists of a search page with a form. Each time a submit button is clicked on the form, the server is updated using the values of some text fields on the form, and then the VIEWSTATE and EVENTVALIDATION tokens will be updated based on the response from the server, ready for the next request.
I'm using HttpClient in Java to do this. I suspect there's something I'm not doing correctly with regard to interacting with ASPX forms in general.
When I hit the main search page for the first time (cookies are validating my login with the server), I get the HTML for the search page back. I extract the VIEWSTATE and EVENTVALIDATION tokens for the next request. I've examined the exact form fields and their values that need to be sent to the server in a POST by looking at the Chrome debugger utility after making a request on the site manually. I've replicated them exactly as they should be, inserting the VIEWSTATE and EVENTVALIDATION appropriately.
But the response I get back from the server is not what it should be. What I get back is just the same HTML for the main search page that I get the first time I hit the webpage. The form data I'm using looks like this:
ctl00$ScriptManager1:ctl00$ContentPlaceHolder1$UpdatePanel1|ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$acceptButton
ctl00_ContentPlaceHolder1_TabContainer1_ClientState:{"ActiveTabIndex":0,"TabState":[true,true]}
__EVENTTARGET:
__EVENTARGUMENT:
__LASTFOCUS:
__VIEWSTATE:<token extracted from first page hit>
__VIEWSTATEENCRYPTED:
__EVENTVALIDATION:<token extracted from first page hit>
ctl00$ContentPlaceHolder1$LabelFee:0
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$RadioButtonList1:Person
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$snameText:aSurname
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$HiddenField1:
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$fnameText:aFirstname
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$dayFromTextBox:01
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$monthFromTextBox:January
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$yearFromTextBox:2001
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$dayToTextBox:01
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$monthToTextBox:January
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$yearToTextBox:2008
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$DropDownList1:aCity
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$PropText:
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel2$RefText:
__ASYNCPOST:true
ctl00$ContentPlaceHolder1$TabContainer1$TabPanel1$acceptButton:Accept
I've also tried replicating the headers that the Chrome debugger shows, so my request is including the same Content-Type, Host, Origin, Referer, User-Agent (for my browser) and every other header, including this header X-MicrosoftAjax: Delta=true.
I know there's a lot of moving parts here, but I intentionally haven't mentioned how I'm actually making the POST request with the HttpClient lib because I'd don't want to complicate the question anymore or alienate anyone who doesn't know Java but knows ASP. I'd like to know if there's an ASP issue I'm not addressing, but I can post the Java code is necessary.
Edit:
I've checked the debugging info that HttpClient is outputting just before sending the request, and the form data is being added properly as multi-part form data. The headers are all there too.

This answer is a long shot, but I've seen weirder things.
You mention this header:
X-MicrosoftAjax: Delta=true
I did some deep googling and found that this is often shown as all lower case in dumps of Ajax and UpdatePanel POST requests:
x-microsoftajax: Delta=true
See here and here.
Could it be as simple as not casing the header correctly?

I eventually got this working. The problem was not specific to ASP in general, it was actually a problem with how Java (specifically HttpClient) was sending the request. I was using HttpClient to compile the request using multi-part form, but after using Fiddler to analyse and compare the requests (see the edited part of this question for more details on that) sent from both my application and the actual webpage, my app request was structured very differently.
The real website request had the form options embedded in the request body in what looked like a URL encoded query string. My request was a series of entries in the request body where each option was wrapped in the Content-Type and Content-Disposition headers. The requests succeeded after changing the POST to add the parameters like:
request.setEntity(new UrlEncodedFormEntity(paramList));

Related

ASP.NET form scraping not working

I'm trying to scrape some pages on a website that uses ASPX forms. The forms involve adding details of people by updating the server (one person at a time) and then proceeding to a results page that shows information regarding the specified people. There are 5 steps to the process:
Hit the login page (the site is HTTPS) by sending a POST request with my credentials. The response will contain cookies that will be used to validate all subsequent requests.
Hit the search criteria page by sending a GET request (no parameters). The only purpose of this is to discover the __VIEWSTATE and __EVENTVALIDATION tokens in the HTML response to be used in the next step.
Update the server with a person. This involves hitting the same webpage in step 2 but using a POST request with form parameters that correspond to the form controls on the page for adding person details and their values. The form parameters will include the __VIEWSTATE and __EVENTVALIDATION tokens gained from the previous step. The server response will include a new __VIEWSTATE and __EVENTVALIDATION. This step can be repeated using the new __VIEWSTATE and __EVENTVALIDATION, or can proceed to the next step.
Signal to the server that all people have been added. This involves hitting the same page as the previous 2 steps by sending a POST request with form parameters that correspond to the form controls on the page for signalling that all people have been added. The server response will simply be 25|pageRedirect||/path/to/results.aspx|.
Hit the search results page specified in the redirect response from the previous step by sending a GET request (no parameters - cookies are enough). The server response will be the HTML that I need to scrape.
If I follow the process manually with any browser, filling in the form controls and clicking the buttons etc. (testing with just one person) I get to the results page and the results are fine. If I do this programmatically from an application running on my machine, then ultimately the search results HTML is wrong (the page returns valid HTML, but there are no results compared with the browser version and some null values were there should not be).
I've run this using a Java application with Apache HttpClient handling the requests. I've also tried it using a Ruby script with Mechanize handling the requests. I've setup a proxy server using Charles to intercept and examine all 5 HTTPS requests. Using Charles, I've scrutinized the raw requests (headers and body) and made comparisons between requests made using a browser and requests made using the application(s). They are all identical (except for the VIEWSTATE / EVENTVALIDATION values and session cookie values, which I would expect to differ).
A few additional points about the programmatic attempts:
The login step returns successful data, and the cookies are valid (otherwise the subsequent requests would all fail)
Updating the server with a person (step 3) returns successful responses, in that they are the same as would be returned from interaction using a browser. I can only assume this must mean the server is updating successfully with the person added.
A custom header is being added to requests in step 3 X-MicrosoftAjax: Delta=true (just like the browser requests are doing)
I don't own or have access to the server I'm scraping
Given that my application requests are identical to the browser requests that succeed, it baffles me that the server is treating them differently somehow. I can't help but feel that this is an ASP.net issue with forms that I'm overlooking. I'd appreciate any help.
Update:
I went over the raw requests again a bit more methodically, and it turns out I was missing something in the form parameters of the requests. Unfortunately, I don't think it will be of much use to anyone else, because it would seem to be specific to this particular ASP servers logic.
The POST request that notifies the server that all people have been added (step 4) requires two form parameters specifying the county and address of the last person that was added to the search. I was including these form parameters in my request, but the values were empty strings. I figured the browser request was just snagging these values because when the user hits the Continue button on the form, those controls would have the values of the last person added. I figured they wouldn't matter and forgot about them, but I was wrong.
It's a peculiar issue that I should have caught the first time. I can't complain though, I am scraping a site after all.
Review Charles logs again. It is possible that the search results and other content may be coming over via Ajax, and that your Java/Ruby apps are not actually doing all of the requests/responses that happen with the browser. Look for any POST or GET requests in between the requests you are already duplicating. If search results are populated via Javascript your client app may not be able to handle this?

How does Backbone send a PUT and PATH request to server

Regarding this question and also many documents have stated that sending a PUT request directly via form in browser is impossible due to security reason.
However, What I am seeing in Backbone is that it could still send a direct PUT request via browser without a workaround like adding a hidden form field.
And they're confusing to me. Is there anything that I'm missing here?
A form can only send a GET or a POST request, as set in the method attribute.
However, Backbone delegates its requests to jQuery.ajax by default (or whatever you want via Backbone.ajax) which itself wraps XMLHttpRequest, an object that can send PUT/DELETE/PATCH requests.
From https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest
XMLHttpRequest is a JavaScript object that was designed by Microsoft
and adopted by Mozilla, Apple, and Google. It's now being standardized
in the W3C. It provides an easy way to retrieve data from a URL
without having to do a full page refresh. A Web page can update just a
part of the page without disrupting what the user is doing.
XMLHttpRequest is used heavily in AJAX programming.
many documents have stated that sending a PUT request directly via browser is impossible due to security reason
Citation please.
Backbone sends a PUT just like it sends any other request, with jQuery,
Backbone.ajax({
type: 'PUT'
...
});
It is just some server side langauges,like PHP, that have problems with receiving a PUT request.
The hidden form field is used when posting from a <form>. Backbone uses javascript.

Javascript to handle a form return

I have a standard html registration page that targets an external .asp page on submit.
What happens is that currently the .asp (which I don't have access to)returns an entire html page.
Instead I would like to somehow parse the returned html and populate the existing form with either
a) validation errors if incorrect
or
B) some sort of success message if all validated
Can anyone tell me if this is possible and or help with some pseudo code?
This is doable in JavaScript using ajax, but it requires that the ASP page (presumably on a different domain) sends appropriate CORS HTTP headers. Even if you don't have access to the actual ASP page, you may be able to get someone to setup the headers in IIS on their server.
Otherwise, you're stuck moving everything server-side, i.e. simulating the POST on your own webserver, and scraping the HTML to get the status back. That looks something like:
Postback the page to your own page (or use Ajax)
On your server, initiate a web request post of the data to the ASP page
Parse the results in your server code
Return an appropriate response to the browser client
The best you can do, assuming I'm interpreting your question correctly, is "scrape" the HTML returned from the asp page and make proper assumptions about the location and meaning of the text within the markup. I, personally, would strongly advise against developing anything of any kind of robustness based on what amounts to screen scraping, especially considering you don't have access to the .asp file itself. If I've misunderstood your problem, my apologies.

How to trigger HTTP PUT request without Ajax?

Regarding to a static link on a web page, the browser will issue GET or POST request to the web site, depending on whether a form of parameters attached.
However, I want the browser to issue a PUT request for that link, how can I do that? I know that Ajax could do it, but I don't want to use Ajax.
I want the browser to issue a PUT request for that link
it seems that PUT and DELETE are currently unsupported in html forms, according to this submission to the w3.
I know that Ajax could do it
Not always true. Because PUT and DELETE are at times unsupported by some browsers, ajax cannot consume them without making a dummy param to trigger a real PUT or DELETE server side, which gives the illusion of full HTTP support by ajax.

HTTPClient to simulate form submission on ASPX - Invalid viewstate

I am trying to simulate a form submission on an ASPX.NET site.
The flow of the website when accessed in a browser is as follows:
1) In a browser the user visits http://mysite.com/ which is configured with Basic Authentication
2) Upon correct credentials, the user is shown a form with one input text box and a button (URL stays http://mysite.com/ but the form being served is Default.aspx)
3)User enters some text and presses submit...
4) The page reloads... URL is still http://mysite.com/... but there is a timer which triggers after 10 secs and downloads a file from http://mysite.com/Downloader
I am trying to simulate this flow in my program using HTTPClient.
1) Do a GET on http://mysite.com
2) Extract hidden form fields __EVENTVALIDATION and __VIEWSTATE
3) Create a POST request with above two and other form fields and POST it to http://mysite.com RESULTS in Invalid Viewstate exception.
How do I achieve this in HTTPClient?
The usual way to do this is as follows: First, record the HTTP traffic using WireShark or Fiddler while you are using the website from the browser. Second, analyze the packet trace in detail, and collect every HTTP header and every HTTP payload from every GET and POST message sent by the browser. Third, try to send the same messages from your code. After sending an HTTP request, you will have to analyze the response of the server, and extract all pieces of data you need to insert into the next request. Don't forget to set the referer field, for example. Add each request to your code one by one, and record the traffic when you run the code. If you assemble your HTTP requests correctly, then your request packets should look like the requests of the browser.
I'm in the same scenario, I have to create a POST request to an external ASPX page.
I have captured the traffic using FIDDLER and tryed to simulare the call using online post request tool like https://www.codepunker.com
I have not been able to recreate the request...
In my opinion (and this require time) we have to:
Create a basic webrequest to the source form
Collect all the form elements with value
Create a POST request submitting all the elements including VIEWSTATE
NOTE: may be that you need to use a webclient that accepts cookies, check:
Accept Cookies in WebClient?
Good luck

Resources