html scraping POST information using aps.net and htmlAgility

html scraping POST information using aps.net and htmlAgility - asp.net

Im working on a web-scraping project i know how to get certain data from target page through HtmlAgilityPack. But i dont know what to send in the Post information of that page. The page is not sending any information through querystring. There are three textboxes in that page, two checkboxes and a search button
when i inspect the button with firebug it gives following information
<img border="0" align="top" onclick="javascript:PVO_PId_Search(
document.ProtocolForm.searchplt,
document.ProtocolForm.towcheck.checked,
document.ProtocolForm.collateralcheck.checked,
document.ProtocolForm.selState[document.ProtocolForm.selState.selectedIndex].value,
document.ProtocolForm.selPltType[document.ProtocolForm.selPltType.selectedIndex].value)" onmouseout="this.className='flyOut'" onmouseover="this.className='flyOver'" alt="Search" value="PSearch" name="PSearch" onload="javascript:updateButtonWithOneTxtbox(document.ProtocolForm.searchplt,this,'v_images/Search_button.gif','v_images/Search_button_grey.gif');" src="v_images/Search_button.gif" class="flyOut">
Now my question is, Is there any tool or firefox addin available which i can use and monitor (or debug) what kind of Post information it is passing ?

You can use built-in Web Developer tools in Chrome and/or Safari, etc to inspect all http request/responses between your client (browser) and the (server) web site. You'll see it in the NETWORK inspector tools.
However, unless it's your site, or some worthy educational experiment, whether or not you can actually spoof (yes, that's what it really amounts to) a POST (or GET) to the site depends on whether or not it has some built in protections/validations that protect it from such attempts.
Update:
Just fire up Chrome and (on Windows) CTRL+SHIFT+I (Safari, its CTRL+ALT+I) and you should see the NETWORK INSPECTOR:
Update 2:
And just for reference, if you want network inspection that isn't dependent on a browser, Fiddler is always part of my personal tool kit.

Related

debugging IE9 "only display secure content"

I am getting IE9 "only display secure content" warning on my page. It calls a large number of frames etc that I did not write so I am looking for way to get IE9 to show me what it considers the insecure (http) page that being asked to be loaded. Is there a way to this? I also have access to IE7 & 8. I don't generally have any other tools on the machines with IE because they are just short term VMs, running MS's testing builds of IE7,8,9.

Open Fiddler, then click Allow in the warning.
Fiddler will show you all of the non-SSL requests.
You can also just look in Firebug's Net tab.

Emulating user browsing session for unit test

I'm searching for a framework that could allow me to emulate user browsing session.
A typical session looks like:
Browse to home page, get session
Be redirected to current page
Click on some link
Get connected
Submit a form
and co...
I would like to be able to define this session using API calls.
What frameworks would you recommend to be able to run this setup? It should be run headless (not inside the browser), to be able to execute via Hudson.
Language does not matter, python of java would be great.
Thank you,
Maxim.

There are multiple frameworks which can do this. Check out:
https://github.com/axefrog/XBrowser
http://htmlunit.sourceforge.net/
and the answer to this question:
Alternative to HtmlUnit

Have a look at htmlunit
Its even got decent javascript support, its Java based.
Support for the HTTP and HTTPS protocols
Support for cookies
Ability to specify whether failing responses from the server should throw exceptions or should be returned as pages of the appropriate type (based on content type)
Support for submit methods POST and GET (as well as HEAD, DELETE, ...)
Ability to customize the request headers being sent to the server
Support for HTML responses
Wrapper for HTML pages that provides easy access to all information contained inside them
Support for submitting forms
Support for clicking links
Support for walking the DOM model of the HTML document
Proxy server support
Support for basic and NTLM authentication
Excellent JavaScript support

take a look at Selenium WebDriver with Xvfb.
this post shows an example in Python:
'Python - Headless Selenium WebDriver Tests using PyVirtualDisplay'

How can I debug this Internet Explorer issue?

I have a Web Application (ASP.NET C# for .NET 3.5) that uses the Session object to store, amount little things the debug information so when things go wrong, this is the first place to go.
The process is simple actually,
no matter what browser (except IE), when I navigate to a page, in the Debug Log I have data, just like the one show below
alt text http://www.balexandre.com/temp/2010-04-14_1048.png
problem is that in Internet Explorer, the Debug Log is always blank (blank as no information, not no html code)
alt text http://www.balexandre.com/temp/2010-04-14_1051.png
What can I do?
I tried several Security settings of IE8:
add the site (machine name) to Trusted Sites
disable Protect Mode
set Local intranet security level to LOW
set Accept All Cookies under Privacy
checked the Allow Active Content under Advanced tab
I really don't know what more can I do :-(
Any help is greatly appreciated!

You could try using Fiddler - a web debugging proxy - to check traffic between IE and your site. Also, if you can, try other versions of IE on different machines/networks to see if it's a global problem, or just related to one browser. And don't forget you can hit F12 to enable developer console in IE.

Flex application bookmarking problem/"#" at end of url

I work in an area where the business users heavily depend on bookmarks to access their work-related web applications. Our standard browser is Internet Explorer v6. We have a new Flex application - when you add the site to Internet Explorer Favorites, then later try to access the site with the Favorites link, we get the following error message: "internet explorer cannot open the internet site http://our url. Operation aborted". If we then bring up the properties for the link and remove the trailing "#' from the url, the link works.
What is this trailing "#", and can it be removed? Is there a way to have Internet Explorer bookmarking to work for this site (other than manually editing the bookmark)? The problem doesn't occur in Firefox (but not everyone has access to that browser).

The trailing # is used to provide information to your client-side framework. It was originally meant to provide the ability to link to anchor points in an HTML document. It has been "hijacked" by JavaScript frameworks to provide state information to Flash and Flex applications.
The primary benefit of using # to navigate is that the browser doesn't navigate off the current page - meaning you only need to load your framework once. Traditional URLs would force an entire page reload.
Most likely you can't remove it. You should be able to provide a means for a secondary URL scheme that encodes what you need in a query string (?foobar=1).
You will need to configure server-side processing to either redirect the user to the hash URL or load the necessary information via a JavaScript hook to your Flex framework.
You might also look into the new Google Chrome plugin for IE.

You can turn this off in the compiler parameters in Flex Builder. Go into the project settings, then in "Flex Compiler" uncheck the box that says "Enable integration with browser navigation".

Freeware Plugin to View HTML source generated by ASP.NET?

Are there any freeware plugins that would help me view the HTML Source generated by ASP.NET?

Microsoft's Fiddler2 for IE
Or Firebug for FireFox
With these you see the real source generated by ASP.NET, not the mangled source as shown in a browsers 'view source' menu option

The Internet Explorer Developer Toolbar has many features. The Web Developer add-on for Firefox looks slick. Here is a walk through of using another add-on for FireFox.
If what your looking for is just to view the source, all browsers I am familiar with have that feature built in. Internet Explorer

You can use Internet Explorer's View Source button, under the 'Edit menu. Firefox has something similar under the View menu.
Edit: If you're looking for the source code for the application, you won't be able to see that no matter what you do. The server sends the client only what it wants the client to see. For ASP.NET, this means you'll see ASP.NET generated control IDs and the like. If you want to do this on your own without a web browser, try Wget.

You shouldn't have to look when that function is already built in to just about every single web browser out there. View Source is a standard feature.

Installing the Web Developer extension for Firefox will let you view 'generated' source (i.e. it includes changes made to the html by client side javascript etc..). Otherwise the standard 'view source' option available in any browser should suffice.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

html scraping POST information using aps.net and htmlAgility - asp.net

Related

debugging IE9 "only display secure content"

Emulating user browsing session for unit test

How can I debug this Internet Explorer issue?

Flex application bookmarking problem/"#" at end of url

Freeware Plugin to View HTML source generated by ASP.NET?

Categories

Resources