Submit ASP.NET forms in parallel - asp.net

I need to submit one form multiple times in parallel. The server accepts the parameter _ASYNCPOST.
I can explain in an abstract way how the page works
Login
Submit form search (POST)
POST same form with new data (all these need to be done in parallel)
In the last step, I yield all the requests with every parameter I could find (including __VIEWSTATE, EVENTTARGET, etc)
The problem is that the first post works, but the rest return an error saying "The server data does not match the browser data, hit refresh"
Is what I'm trying to achieve possible?
I followed this doc https://blog.scrapinghub.com/2016/04/20/scrapy-tips-from-the-pros-april-2016-edition/

Related

How to eliminate false success messages when implementing post-redirect-get pattern?

When implementing the post-redirect-get pattern in a web application, it is common for the final step in your server code to look something like this (pseudocode):
if (postSuccessful)
{
redirect("/some-page?success=true")
}
That is, the redirect URL has some kind of success parameter in the query string so that you know when to display a nice looking "Your form has been submitted!" message on your page. The problem with this is that the success=true persists in the query string when it's only needed to initialize the page. If the user refreshes the page or bookmarks it, they will receive a false success message even though no additional POST has taken place.
Is there an elegant solution to this that doesn't involve using JavaScript to eliminate success=true from both the query string and the browser history? This solution works, but definitely adds complexity to a page's load process.
You can use server side technology to implement this feature, without any JavaScript. The stes are listed below:
When post is successful, redirect to /some-page with current timestamp information:
if (postSuccessful)
{
redirect("/some-page?success=true&timestamp=1559859090747")
}
When server receives GET /some-page?success=true&timestamp=1559859090747 request, compare the timestamp parameter with the current timestamp, check whether it is within the last 3 seconds (or you can change this number according to the network environment).
If the timestamp parameter is within last 3 seconds, then it means this GET /some-page?success=true request is a result of server redirect. If not, then it's more like a result of "user refreshes the page or bookmarks it".
In server code that handling GET /some-page, render different HTML according to the result of step 3. Display the success message only when current access is a result of server redirect.

Extract part of an URL behind a login page with Paw

I'm a newbie but I think Paw can do what i need :
I need to extract a session id behind a login page.
I go to https://admin.booking.com, filling the form (login and pass) and the landing page behind includes a session id :
https://admin.booking.com/pc/index.html?ses=xxxxyyyyyzzzzz11112222233333
I'd like to :
1) Push credentials with Paw as part of my request,
2) get the above item (ses) item as a response so i can use the php script extension provided by Paw and then call this script "on demand".
Is this possible ? If so, what should i do ?
Thanks for your help
UPDATE*: we've added a documentation article to describe the process a little more: Login via a web form in Paw. We've detailed the process to deal with CSRF tokens too.
Paw isn't quite yet ready for handling web/HTML forms. Though, there's one way to do it the right way: if you inspect the form with the Chrome dev tools you'll find the name of the input from the DOM/HTML:
In your case, you have the inputs: loginname, password, lang.
Also, find the <form…> tag to see what's the action attribute. If there's no action attribute (like in your example), it means the target URL for your form is the current page's URL (https://admin.booking.com/ in your case). Also, make sure the method="POST" is also there in the <form…> tag, otherwise this method won't work.
Then jump into Paw and set:
URL (in your case https://admin.booking.com/)
method to POST
go to the Body tab and use "Form URL-Encoded + fill up the fields from your form
If all works, you'll see Paw show a redirection request, and if you go to the right-hand side panel under "Response" > "Headers", you should see a Location header with a value similar to the URL you initially mentioned (https://admin.booking.com/pc/index.html?ses=xxxxyyyyyzzzzz11112222233333). Hurray! You got your value into Paw!
Now that you have that, you can create in a new request (click on the + button at the bottom of the left-hand side list). And wherever you want to use this session token/ID, you can insert a dynamic value to retrieve that URL value. You have more infos here, in our docs, but I'll describe the steps here:
On whichever field you want to insert the token, right-click and pick Responses > Response Header.
Make sure you pick the first request in the "Request" dropdown menu, and enter Location in the "Header" field:
You should see the value of the Location header of the previous response appear here.
Now what you want to do is to extract only the part you want (i.e. the value of the ses param in your case). For that you'll need that extension for Paw, so please install it now: https://luckymarmot.com/paw/extensions/RegExMatch
Copy the dynamic value you have just inserted (the blue token), and right-click on that field to insert a new dynamic value, and pick Extensions > RegExp match:
In the Input field, paste the previous dynamic value you copied. And use the RegExp field to write a regular expression that will successfully extract the part of the URL you want (this should work in your case ses=(.*)).
Now that you're set up. You should be able to use this little new blue token wherever you like and automagically extract the value from the previous form. And whenever you send again the initial request, and get a new token, everything else will also update! :)
It was a little long guide, but I hope this will help you and hopefully others too.

How to expose a validation API in a RESTful way?

I'm generally a fan of RESTful API design, but I'm unsure of how to apply REST principles for a validation API.
Suppose we have an API for querying and updating a user's profile info (name, email, username, password). We've deemed that a useful piece of functionality to expose would be validation, e.g. query whether a given username is valid and available.
What are the resource(s) in this case? What HTTP status codes and/or headers should be used?
As a start, I have GET /profile/validate which takes query string params and returns 204 or 400 if valid or invalid. But validate is clearly a verb and not a noun.
The type of thing you've described is certainly more RPC-style in its' semantics, but that doesn't mean you can't reach your goals in a RESTful manner.
There's no VALIDATE HTTP verb, so how much value can you get from structuring an entire API around that? Your story centers around providing users with the ability to determine whether a given user name is available - that sounds to me like a simple resource retrieval check - GET: /profile/username/... - if the result is a 404, the name is available.
What this highlights is that that client-side validation is just that - client side. It's a UI concern to ensure that data is validated on the client before being sent to the server. A RESTful service doesn't give a whit whether or not a client has performed validation; it will simply accept or reject a request based on its' own validation logic.
REST isn't an all-encompassing paradigm, it only describes a way of structuring client-server communications.
We have also encountered the same problem. Our reasoning for having the client defer to the server for validation was to prevent having mismatched rules. The server is required to validate everything prior to acting on the resources. It didn't make sense to code these rules twice and have this potential for them to get out of sync. Therefore, we have come up with a strategy that seems to keep with the idea of REST and at the same time allows us to ask the server to perform the validation.
Our first step was to implement a metadata object that can be requested from a metadata service (GET /metadata/user). This metadata object is then used to tell the client how to do basic client side validations (requiredness, type, length, etc). We generate most of these from our database.
The second part consist of adding a new resource called an analysis. So for instance, if we have a service:
GET /users/100
We will create a new resource called:
POST /users/100/analysis
The analysis resource contains not only any validation errors that occurred, but also statistical information that might be relevant if needed. One of the issues we have debated was which verb to use for the analysis resource. We have concluded that it should be a POST as the analysis is being created at the time of the request. However, there have been strong arguments for GET as well.
I hope this is helpful to others trying to solve this same issue. Any feedback on this design is appreciated.
You are confusing REST with resource orientation, there's nothing in REST that says you cannot use verbs in URLs. When it comes to URL design I usually choose whatever is most self-descriptive, wheather is noun or verb.
About your service, what I would do is use the same resource you use to update, but with a test querystring parameter, so when test=1 the operation is not done, but you can use it to return validation errors.
PATCH /profile?test=1
Content-Type: application/x-www-form-urlencoded
dob=foo
... and the response:
HTTP/1.1 400 Bad Request
Content-Type: text/html
<ul class="errors">
<li data-name="dob">foo is not a valid date.</li>
</ul>
A very common scenario is having a user or profile signup form with a username and email that should be unique. An error message would be displayed usually on blur of the textbox to let the user know that the username already exists or the email they entered is already associated with another account. There's a lot of options mentioned in other answers, but I don't like the idea of needing to look for 404s meaning the username doesn't exist, therefore it's valid, waiting for submit to validate the entire object, and returning metadata for validation doesn't help with checking for uniqueness.
Imo, there should be a GET route that returns true or false per field that needs validated.
/users/validation/username/{username}
and
/users/validation/email/{email}
You can add any other routes with this pattern for any other fields that need server side validation. Of course, you would still want to validate the whole object in your POST.
This pattern also allows for validation when updating a user. If the user focused on the email textbox, then clicked out for the blur validation to fire, slightly different validation would be necessary as it's ok if the email already exists as long as it's associated with the current user. You can utilize these GET routes that also return true or false.
/users/{userId:guid}/validation/username/{username}
and
/users/{userId:guid}/validation/email/{email}
Again, the entire object would need validated in your PUT.
It is great to have the validation in the REST API. You need a validation anyway and wy not to use it on the client side. In my case I just have a convention in the API that a special error_id is representing validation errors and in error_details there is an array of error messages for each field that has errors in this PUT or POST call. For example:
{
"error": true,
"error_id": 20301,
"error_message": "Validation failed!",
"error_details": {
"number": [
"Number must not be empty"
],
"ean": [
"Ean must not be empty",
"Ean is not a valid EAN"
]
}
}
If you use the same REST API for web and mobile application you will like the ability to change validation in both only by updating the API. Especialy mobile updates would take more than 24h to get published on the stores.
And this is how it looks like in the Mobile application:
The response of the PUT or POST is used to display the error messages for each field. This is the same call from a web application using React:
This way all REST API response codes like 200 , 404 have their meaning like they should. A PUT call responses with 200 even if the validation fails. If the call passes validation the response would look like this:
{
"error": false,
"item": {
"id": 1,
"created_at": "2016-08-03 13:58:11",
"updated_at": "2016-11-30 08:55:58",
"deleted_at": null,
"name": "Artikel 1",
"number": "1273673813",
"ean": "12345678912222"
}
}
There are possible modifications you could make. Maby use it without an error_id. If there are error_details just loop them and if you find a key that has the same name as a field put his value as error text to the same field.

Validate Origin of FORM POST to ensure it came from same server/app

I want find a platform/language agnostic solution to ensuring the origin of a FORM POST is from an expected source. I.e. Page1.aspx posting to Page2.php within the same web site.
Specifically what I am attempting to do here is to prevent request forgery.
Use a hidden field in your form, which contains a token your app generated. Store the token in the user session. When the form is submitted, your app will check that the value of the hidden field is identical to the value stored in the user session.
If it is identical, then you know the submitted form comes from where it is expected to come.
Old Thread, but might still be useful.
If you do not have session info set (best option) then you can include a hidden field with an encrypted timestamp then compare it (after de-crypt) to the current time on the process end to make sure it is relatively close and thus as recent as you deem necessary.
You could include into the form a hidden field which would be the SHA1Hash("some-secret" + Remote_IP + PerSessionSecret).
The PerSessionSecret is something you autogenerate in the beginning of the session. "some-secret" is a global secret value - which will help a little bit in case the randomly generated PerSessionSecret turns out not to be very random enough.
Then do the same calculation upon the form submission and you know it's most probably submitted from the same client that it was sent to. (Of course, if you have multiple clients behind the single address, like a proxy or a NAT, you can not distinguish between them reliably).

how to submit query to .aspx page in python

I need to scrape query results from an .aspx web page.
http://legistar.council.nyc.gov/Legislation.aspx
The url is static, so how do I submit a query to this page and get the results? Assume we need to select "all years" and "all types" from the respective dropdown menus.
Somebody out there must know how to do this.
As an overview, you will need to perform four main tasks:
to submit request(s) to the web site,
to retrieve the response(s) from the site
to parse these responses
to have some logic to iterate in the tasks above, with parameters associated with the navigation (to "next" pages in the results list)
The http request and response handling is done with methods and classes from Python's standard library's urllib and urllib2. The parsing of the html pages can be done with Python's standard library's HTMLParser or with other modules such as Beautiful Soup
The following snippet demonstrates the requesting and receiving of a search at the site indicated in the question. This site is ASP-driven and as a result we need to ensure that we send several form fields, some of them with 'horrible' values as these are used by the ASP logic to maintain state and to authenticate the request to some extent. Indeed submitting. The requests have to be sent with the http POST method as this is what is expected from this ASP application. The main difficulty is with identifying the form field and associated values which ASP expects (getting pages with Python is the easy part).
This code is functional, or more precisely, was functional, until I removed most of the VSTATE value, and possibly introduced a typo or two by adding comments.
import urllib
import urllib2
uri = 'http://legistar.council.nyc.gov/Legislation.aspx'
#the http headers are useful to simulate a particular browser (some sites deny
#access to non-browsers (bots, etc.)
#also needed to pass the content type.
headers = {
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'
}
# we group the form fields and their values in a list (any
# iterable, actually) of name-value tuples. This helps
# with clarity and also makes it easy to later encoding of them.
formFields = (
# the viewstate is actualy 800+ characters in length! I truncated it
# for this sample code. It can be lifted from the first page
# obtained from the site. It may be ok to hardcode this value, or
# it may have to be refreshed each time / each day, by essentially
# running an extra page request and parse, for this specific value.
(r'__VSTATE', r'7TzretNIlrZiKb7EOB3AQE ... ...2qd6g5xD8CGXm5EftXtNPt+H8B'),
# following are more of these ASP form fields
(r'__VIEWSTATE', r''),
(r'__EVENTVALIDATION', r'/wEWDwL+raDpAgKnpt8nAs3q+pQOAs3q/pQOAs3qgpUOAs3qhpUOAoPE36ANAve684YCAoOs79EIAoOs89EIAoOs99EIAoOs39EIAoOs49EIAoOs09EIAoSs99EI6IQ74SEV9n4XbtWm1rEbB6Ic3/M='),
(r'ctl00_RadScriptManager1_HiddenField', ''),
(r'ctl00_tabTop_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
#but then we come to fields of interest: the search
#criteria the collections to search from etc.
# Check boxes
(r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'), # file number
(r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'), # Legislative text
(r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'), # attachement
# etc. (not all listed)
(r'ctl00$ContentPlaceHolder1$txtSearch', 'york'), # Search text
(r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'), # Years to include
(r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'), #types to include
(r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation') # Search button itself
)
# these have to be encoded
encodedFields = urllib.urlencode(formFields)
req = urllib2.Request(uri, encodedFields, headers)
f= urllib2.urlopen(req) #that's the actual call to the http site.
# *** here would normally be the in-memory parsing of f
# contents, but instead I store this to file
# this is useful during design, allowing to have a
# sample of what is to be parsed in a text editor, for analysis.
try:
fout = open('tmp.htm', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
That's about it for the getting of the initial page. As said above, then one would need to parse the page, i.e. find the parts of interest and gather them as appropriate, and store them to file/database/whereever. This job can be done in very many ways: using html parsers, or XSLT type of technogies (indeed after parsing the html to xml), or even for crude jobs, simple regular-expression. Also, one of the items one typically extracts is the "next info", i.e. a link of sorts, that can be used in a new request to the server to get subsequent pages.
This should give you a rough flavor of what "long hand" html scraping is about. There are many other approaches to this, such as dedicated utilties, scripts in Mozilla's (FireFox) GreaseMonkey plug-in, XSLT...
Most ASP.NET sites (the one you referenced included) will actually post their queries back to themselves using the HTTP POST verb, not the GET verb. That is why the URL is not changing as you noted.
What you will need to do is look at the generated HTML and capture all their form values. Be sure to capture all the form values, as some of them are used to page validation and without them your POST request will be denied.
Other than the validation, an ASPX page in regards to scraping and posting is no different than other web technologies.
Selenium is a great tool to use for this kind of task. You can specify the form values that you want to enter and retrieve the html of the response page as a string in a couple of lines of python code.
Using Selenium you might not have to do the manual work of simulating a valid post request and all of its hidden variables, as I found out after much trial and error.
The code in the other answers was useful; I never would have been able to write my crawler without it.
One problem I did come across was cookies. The site I was crawling was using cookies to log session id/security stuff, so I had to add code to get my crawler to work:
Add this import:
import cookielib
Init the cookie stuff:
COOKIEFILE = 'cookies.lwp' # the path and filename that you want to use to save your cookies in
cj = cookielib.LWPCookieJar() # This is a subclass of FileCookieJar that has useful load and save methods
Install CookieJar so that it is used as the default CookieProcessor in the default opener handler:
cj.load(COOKIEFILE)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
To see what cookies the site is using:
print 'These are the cookies we have received so far :'
for index, cookie in enumerate(cj):
print index, ' : ', cookie
This saves the cookies:
cj.save(COOKIEFILE) # save the cookies
"Assume we need to select "all years" and "all types" from the respective dropdown menus."
What do these options do to the URL that is ultimately submitted.
After all, it amounts to an HTTP request sent via urllib2.
Do know how to do '"all years" and "all types" from the respective dropdown menus' you do the following.
Select '"all years" and "all types" from the respective dropdown menus'
Note the URL which is actually submitted.
Use this URL in urllib2.

Resources