Testing htmlencode issue in asp.net application - asp.net

In our application developed in html5 and javascript whenever a user submits(with text contain < and > and #) the form (containing a comments text field )we get the following error:
{"Message":"A potentially dangerous Request.Form value was detected from the client ......
Now the dev team has fixed this issue saying that they handled this at the server side.
Now i want to test different scenarios just to ensure that this issue wont repeat next time and for any other special characters .
Can anyone suggest me the different scenarios i can test here apart from entering the special characters in the comments text box and submitting the form?

Related

Pass value WinForm =>(page.html?query123)=> HTML =>(page.aspx?query123)=> asp.net //esle?

The problem here is the middle of the line (HTML).
The chain:
I have WinForm program that uses awesomium (alternative to native webBrowser) to view Html page that has a part of asp.net page in it's iframe.
The problem:
The problem is that I need to pass value to asp.net page, it is easily achieved without middle of the chain (Html iframe) by sending hashed and crypted querystring.
How it works:
WinForm do some thing, then use few-step-crypt to code all the needed values into 1 string.
Then it should send this string to asp.net page through the iframe (and that's the problem, it is easy to receive query string in asp.net page, but firstly I need to receive it in Html and send to asp.net).
Acceptable answers:
1) Probably the most easily one - using JavaScript. I have heard it is possible to be done in that way.
How I imagine this - I send query string from WinForm to Html page as http:\\HtmlPage.html?AspNet.aspx?CryptedString
Then Html receive it with JavaScript and put querystring "AspNet.aspx?CryptedString" into iframe's "src=http:\\" resulting in "src=http:\\AspNet.aspx?CryptedString"
And then I easily get it in asp.net page.
2) Somehow create >>>VIRTUAL<<<(NOTE: Virtual, I don't want querystring to be saved on the HDD, even don't suggest) asp.net or html page with iframe source taken directly from WinForm string.
Probably that is possible with awesomium, but I'm new to it and don't know how to (if it is possible ofc).
3) Some web service with which I can communicate between asp.net and WinForm through the existing HTML iframe.
4) Another way that replace one of 3 previous, that doesn't save "values" in querystring/else on HDD nor is visible for the user, doesn't use asp.net page's server to create iframe-page on it. On HTML page's server HTML is only allowed, PhP isn't.
5) If you don't know any of 4 above - suggest free PhP hosting without ads (if such exists, what I highly doubt).
Priority:
The best one would be #3, then #2, then #1, then #5 (#4 is excluded as it is unknown).
And in the end:
Thanks in advance for your help.
P.S.Currently at work, so I'll check/try all answers later on and will report tomorrow if any suits my needs. Thanks again.
Answering my own question. I have found 2 ways that can do what I did want.
The first one:
Creating a RAM file System.IO.MemoryStream or another method (google c# create a file in ram).
The second one:
Creating a hidden+encrypted+system+custom-readable-only-by-program-crypt file somewhere in the far away folder via File.SetAttributes Method and System.IO.StreamWriter/Reader or System.IO.FileStream or System.IO.TextWriter, etc. depending on what it should be.
Once this file was used for needs delete it + delete on exit + delete on start using
if (File.Exists(path)
{
File.Delete(path);
}
(Need more reputation to post few links -_-, and I don't want to post only part of them, either all or no at all, so use google if you'll need anything from here).
If you'll need to store "Small temp file" and not for a long time use first one, if "Heavy" use second one, unless you badly need to use RAM for it.

Request.Form treats Newlines differently in TextArea in ASP .NET

I have two pages which have text area controls on them. When the user submits one page, newlines are treated as char(13) + char(10). But on the other page, newlines are treated as char(10). I've confirmed this by looking at the Request.Form dictionary.
The two pages are hosted in the same ASP .NET 4.0 Web Forms application, and the pages look exactly the same from a markup perspective. I'm logged in as the same user in the same browser.
When I use JavaScript to check for the existence of char 10 and char 13 in the control in the browser, both pages only include a char(10).
It seems as if IIS/ASP.NET is configured to handle form requests differently on the two pages, but I can't figure out where the difference would be. What causes this behavior?
Different operating system use different combinations of characters to represent a new line.
On Windows it is CR + LF on Linux its LF and on Macs its CR.
CR = Carriage Return
LF = Line Feed
You can see the line ending characters if you copy / paste the text in Notepad++ and select View > Show all characters.

ASP.NET - Strange case of inexplicable 404 error

We are having a very strange problem on one particular web server (we do not have direct access to the web server, only FTP access).
Our ASP.NET application displays a dataset into a standard GridView. One of the columns in the GridView is a basic template column, with a link redirecting to another page - passing few parameters.
One of the parameters is EmployeeName - and the following page uses that parameter to set a label.
ON this particular web server (WEBSERVER1 in this example)... the resulting link generates an error 404 (page not found)
https://WWW.WEBSERVER1.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&EmployeeName=Knutson-Haushalter, Kathleen&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
On another web server (WEBSERVER2 in this example)... the resulting link properly opens the page.
http://WWW.WEBSERVER2.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&EmployeeName=Knutson-Haushalter, Kathleen&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
(unfortunately the links above are not rendered correctly
Yes, I am aware that WEBSERVER1 is running under SSL - but am not sure why this would make a difference.
Now, we have verified that the page Customer_011B.aspx is indeed present on WEBSERVER1.
Here comes the puzzle:
If we only remove the EmployeeName parameter, the page displays correctly. All database operations are performed correctly, etc. The only "problem" is that the EmployeeName is not reported in the target label.
In other words:
This DOES NOT work and all we get is error 404
https://WWW.WEBSERVER1.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&EmployeeName=Knutson-Haushalter, Kathleen&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
This DOES work and we get to the page and we retrieve all the needed data.
https://WWW.WEBSERVER1.COM/Customer_011B.aspx?WeekEnding=1/21/2012&GUID=n.a.&ReportToName=Mary Jo Eayrs&Assignment_Id=123772
Just in case you are wondering, the only parameter needed by our data access layer is that Assignment_Id number.
Also, note that I enclosed the links in double quotes... so that they would render properly...
Use the UrlEncode and UrlDecode to place the parametres on your url. I see that you use spaces and slash and commas. Parametres with slash/space/comma and other invalid url characters maybe cut or change by enabled url filter on one of the iis server.

how to submit query to .aspx page in python

I need to scrape query results from an .aspx web page.
http://legistar.council.nyc.gov/Legislation.aspx
The url is static, so how do I submit a query to this page and get the results? Assume we need to select "all years" and "all types" from the respective dropdown menus.
Somebody out there must know how to do this.
As an overview, you will need to perform four main tasks:
to submit request(s) to the web site,
to retrieve the response(s) from the site
to parse these responses
to have some logic to iterate in the tasks above, with parameters associated with the navigation (to "next" pages in the results list)
The http request and response handling is done with methods and classes from Python's standard library's urllib and urllib2. The parsing of the html pages can be done with Python's standard library's HTMLParser or with other modules such as Beautiful Soup
The following snippet demonstrates the requesting and receiving of a search at the site indicated in the question. This site is ASP-driven and as a result we need to ensure that we send several form fields, some of them with 'horrible' values as these are used by the ASP logic to maintain state and to authenticate the request to some extent. Indeed submitting. The requests have to be sent with the http POST method as this is what is expected from this ASP application. The main difficulty is with identifying the form field and associated values which ASP expects (getting pages with Python is the easy part).
This code is functional, or more precisely, was functional, until I removed most of the VSTATE value, and possibly introduced a typo or two by adding comments.
import urllib
import urllib2
uri = 'http://legistar.council.nyc.gov/Legislation.aspx'
#the http headers are useful to simulate a particular browser (some sites deny
#access to non-browsers (bots, etc.)
#also needed to pass the content type.
headers = {
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'
}
# we group the form fields and their values in a list (any
# iterable, actually) of name-value tuples. This helps
# with clarity and also makes it easy to later encoding of them.
formFields = (
# the viewstate is actualy 800+ characters in length! I truncated it
# for this sample code. It can be lifted from the first page
# obtained from the site. It may be ok to hardcode this value, or
# it may have to be refreshed each time / each day, by essentially
# running an extra page request and parse, for this specific value.
(r'__VSTATE', r'7TzretNIlrZiKb7EOB3AQE ... ...2qd6g5xD8CGXm5EftXtNPt+H8B'),
# following are more of these ASP form fields
(r'__VIEWSTATE', r''),
(r'__EVENTVALIDATION', r'/wEWDwL+raDpAgKnpt8nAs3q+pQOAs3q/pQOAs3qgpUOAs3qhpUOAoPE36ANAve684YCAoOs79EIAoOs89EIAoOs99EIAoOs39EIAoOs49EIAoOs09EIAoSs99EI6IQ74SEV9n4XbtWm1rEbB6Ic3/M='),
(r'ctl00_RadScriptManager1_HiddenField', ''),
(r'ctl00_tabTop_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
(r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
#but then we come to fields of interest: the search
#criteria the collections to search from etc.
# Check boxes
(r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'), # file number
(r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'), # Legislative text
(r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'), # attachement
# etc. (not all listed)
(r'ctl00$ContentPlaceHolder1$txtSearch', 'york'), # Search text
(r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'), # Years to include
(r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'), #types to include
(r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation') # Search button itself
)
# these have to be encoded
encodedFields = urllib.urlencode(formFields)
req = urllib2.Request(uri, encodedFields, headers)
f= urllib2.urlopen(req) #that's the actual call to the http site.
# *** here would normally be the in-memory parsing of f
# contents, but instead I store this to file
# this is useful during design, allowing to have a
# sample of what is to be parsed in a text editor, for analysis.
try:
fout = open('tmp.htm', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
That's about it for the getting of the initial page. As said above, then one would need to parse the page, i.e. find the parts of interest and gather them as appropriate, and store them to file/database/whereever. This job can be done in very many ways: using html parsers, or XSLT type of technogies (indeed after parsing the html to xml), or even for crude jobs, simple regular-expression. Also, one of the items one typically extracts is the "next info", i.e. a link of sorts, that can be used in a new request to the server to get subsequent pages.
This should give you a rough flavor of what "long hand" html scraping is about. There are many other approaches to this, such as dedicated utilties, scripts in Mozilla's (FireFox) GreaseMonkey plug-in, XSLT...
Most ASP.NET sites (the one you referenced included) will actually post their queries back to themselves using the HTTP POST verb, not the GET verb. That is why the URL is not changing as you noted.
What you will need to do is look at the generated HTML and capture all their form values. Be sure to capture all the form values, as some of them are used to page validation and without them your POST request will be denied.
Other than the validation, an ASPX page in regards to scraping and posting is no different than other web technologies.
Selenium is a great tool to use for this kind of task. You can specify the form values that you want to enter and retrieve the html of the response page as a string in a couple of lines of python code.
Using Selenium you might not have to do the manual work of simulating a valid post request and all of its hidden variables, as I found out after much trial and error.
The code in the other answers was useful; I never would have been able to write my crawler without it.
One problem I did come across was cookies. The site I was crawling was using cookies to log session id/security stuff, so I had to add code to get my crawler to work:
Add this import:
import cookielib
Init the cookie stuff:
COOKIEFILE = 'cookies.lwp' # the path and filename that you want to use to save your cookies in
cj = cookielib.LWPCookieJar() # This is a subclass of FileCookieJar that has useful load and save methods
Install CookieJar so that it is used as the default CookieProcessor in the default opener handler:
cj.load(COOKIEFILE)
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
To see what cookies the site is using:
print 'These are the cookies we have received so far :'
for index, cookie in enumerate(cj):
print index, ' : ', cookie
This saves the cookies:
cj.save(COOKIEFILE) # save the cookies
"Assume we need to select "all years" and "all types" from the respective dropdown menus."
What do these options do to the URL that is ultimately submitted.
After all, it amounts to an HTTP request sent via urllib2.
Do know how to do '"all years" and "all types" from the respective dropdown menus' you do the following.
Select '"all years" and "all types" from the respective dropdown menus'
Note the URL which is actually submitted.
Use this URL in urllib2.

Input type "hidden" vs text area

I'm having a weird issue with an input type hidden and was wondering if anyone has ever seen something like this before. I'm saving about 2MB of data to a hidden field, in a comma separated format, then I'm posting that data to a jsp that simply sets some headers (so the output is recognized as an excel file) and then echoes the data.
I'm seeing that the variable that holds this data gets empty to the jsp side, even though I see that it's getting posted to the server (I'm seeing it with an HTTP sniffer) and all data seems to be contained correctly in the hidden field (I'm seeing that with firebug). However, if I change the object type to be a text area, the data is received correctly on the server's side.
Another weird thing I'm observing is that if I use URL encoding on the data, even using a text area, nothing gets to the server. If I don't use URL encoding but I have the hidden field, nothing gets saved to the field (it's empty when I check it with firebug). I don't understand that either...
I'm wondering if there is any special security setting that prevents the hidden fields to post big amounts of data to a Tomcat web server. Does anybody know anything about that?
If it makes any difference, I'm using the default enctype on the form (application/x-www-form-urlencoded)
I'm currently using a text are and setting the style to visibility "hidden" but it bothers me not to understand what's going on *sigh... Any suggestion is appreciated
I think having 2MB of data in a hidden field is a mistake regardless. You should store that kind of thing on the server as part of the session state, not send it back and forth between the server and the user, as you are doing. Instead, use a hidden field or cookie for the session variable*, which will be used to look up the 2MB of data.
*Don't do this by hand. JSP already has support for session state, among other things.
The server can't tell the difference between a textarea and textbox. All form elements are simply posted as name/value pairs.
Most likely, you have a double-quote somewhere in your data that's terminating the value attribute of the hidden input element. For example:
<input type="hidden" value="Double " quote" />
You need to escape the double-quotes by replacing them with "
<input type="hidden" value="Double " quote" />

Resources