Webscraping a tricky asp.net page - asp.net

The overall goal is to perform a search on the following webpage http://www.cma-cgm.com/eBusiness/Tracking/Default.aspx with a container value of CMAU1173561. I have tried two approaches, the php extension cURL and python's mechanized. The php approached involves a performing a POST submit using the input fields found on the page (NOTE: These are really ugly on the asp.net page). The returned page does not contain any of the search results. The second approaches involves using python's mechanize module. In this approach I load the page, select the form, then change the text field ctl00$ContentPlaceBody$TextSearch to the container value. When I load the response again no search results.
I am at a really dead end. Any help would be appreciate because as it stands my next step is to become a asp.net expertm which i perfer not to.

The source of that page is pretty scary (giant viewstate, tables all over the place, inline CSS, styles that look like they were copied from Word).
Regardless...an ASP.Net form still passes the same raw data to the server as any other form (though it is abstracted to the developer).
It's very possible that you are missing the cookies which go along with the request. If the search page (or any piece of the site) uses session state, the ASP.Net session cookie must be included in the request. You will be able to tell it from its name (contains "asp.net" and "session").
I assume that you have used a tool like Firebug or Chrome to view the complete outgoing request when the page is submitted. From my quick test, it looks like the request may be performed with a GET, not a POST. I submitted a form, looked at the request, and pasted the URL into a new browser window.
Example: http://www.cma-cgm.com/eBusiness/Tracking/Default.aspx?ContNum=CMAU1173561&T=57201202648
This may be all you need to do.

Related

How to make form like example

there is this site https://www.delinski.at/ and it has a nice form where you can pick some values from dropdowns like Date, Number of Persons etc., and then submit the form. It redirects and I see the values on the redirected page link as parameters (if I have changed the defaults).
I searched for and tried several Form Plugins which all do not seem to work - most recent one (Form Maker) lets me design the form as I want but at the end I realized when I click on Submit, the values are not transfered to the target page (confirmed by Form Maker Support as work as intended). It's confusing because actually that should be a basic funciontality of a HTML form, right?
So I want to know if there are plugins where I can get a similar look&feel like the example given above.
That site is a Static Site Generated framework not WordPress. That site would also be very expensive to build cause that is all coded, and very well:)
You are not actually seeing a form there at all that is just how PHP natively uses the URL to navigate via a button.
Almost all the form plug ins for WP use the database write now and do not pass the parameters of the entered form as a php _ POST with a redirect.
I kind of think what you really are looking for is a faceted search feature
One of the best that comes to my mind is https://facetwp.com/demo/cars/?_vehicle_type=truck
Notice the car icons those are actually search buttons:) Of course you will have to build a template to do that neat stuff on the SSG site you linked but...
here is a really informative write upon how it works to get started.

Abusing HTTP POST

Currently reading Bloch's Effective Java (2nd Edition) and he makes a point to state, in bold, that overusing POSTs in web applications is inherently bad. Unfortunately, he doesn't specify why.
This startled me, because when I do any web development, all I ever use are POSTs! I have always steered clear of GETs for security reasons and because it felt more professional (long, unsightly URLs always bother me for some reason).
Are there performance differentials between GET and POST? Can anyone elaborate on why overusing POSTs is bad, and why? My understanding - and preliminary searches - seem to all indicate that these two are handles very similarly by the web server. Thanks in advance!
You should use HTTP as it's supposed to be used.
GET should be used for idempotent, read queries (i.e. view an item, search for a product, etc.).
POST should be used for create, delete or update requests (i.e. delete an item, update a profile, etc.)
GET allows refreshing the page, bookmark it, send the URL to someone. POST doesn't allow that. A useful pattern is post/redirect/get (AKA redirect after post).
Note that, except for long search forms, GET URLs should be short. They should usually look like http://www.foo.com/app/product/view?productId=1245, or even http://www.foo.com/app/product/view/1245
You should almost always use GET when requesting content. Only use POST when you are either:
Transmitting sensitive information which should not appear in the URL bar, or
Changing the state on the server (adding/changing/deleting stuff, altough recently some web applications use POST to change, PUT to add and DELETE to delete.)
Here's the difference: If you want to give the link to the page to a friend, or save it somewhere, or even only add it to your bookmarks, you need the full URL of the page. Just like your address bar should say http://stackoverflow.com/questions/7810876/abusing-http-post at the moment. You can Ctrl-C that. You can save that. Enter that link again, you're back at this page.
Now when you use any action other than GET, there is simply no URL to copy. It's like your browser would say you are at http://stackoverflow.com/question. You can't copy that. You can't bookmark that. Besides, if you would try to reload this page, your browser would ask you whether you want to send the data again, which is rather confusing for the non-tech-savy users of your page. And annoying for the entire rest.
However, you should use POST/PUT when transferring data. URL's can only be so long. You can't transmit an entire blog post in an URL. Also, if you reload such a page, You'll almost certainly double-post, because the above described message does not appear.
GET and POST are very different. Choose the right one for the job.
If you are using POST for security reasons, I might drop a mention of other security factors here. You need to ensure that you send the data from a form submit in encrypted form even if you are using POST.
As for the difference between GET and POST, it is as simple as GET is used to send a GET request. So, you would want to get data from a page and act upon it and that is the end of everything.
POST on the other hand, is used to POST data to the application. I am talking about transactions here (complete create, update or delete operations).
If you have a sensitive application that takes, say and ID to delete a user. You would not want to use GET for it because in that case, a witty user may raise mayhem simply changing the ID at the end of the URL and deleting all random uses.
POST allows more data and can be hacked to send streams of files as well. GET has a limited size though.
There is hardly any tradeoff in using GET or POST.

ASP.NET fragment cache -- control is null second time round

I have an ascx control which works just fine. It is contained in a larger aspx page. I want to put it in the fragment cache, so I added the appropriate CacheOutput directive at the top. However, now the control on the underlying aspx.cs file has the control variable set to null the second time the page has loaded. I found a few places on the web where it said this would happen, but I also didn't find a solution to accessing the control.
What am I missing?
Also, can I control where it is cached? Can I make it cache in the browser cache rather than at the server?
Question #1: Output caching only stores the HTML result on the server. If you want to interact or run any code in the user control at all, you may not use full output caching. You may want to look into a lower-level caching, perhaps database or object caching, or embed another user control within this one that uses full output caching itself but the outer user control no longer does.
Question #2: "Can I control where it is cached?" If you use output caching, then no. That always means cache on the server. However, there are lots of different levels of caching. You can only cache a full HTTP response at the browser: a single HTML page, a CSS file, etc. If you want to cache only part of a page at the browser, but have the rest of the page dynamic, you would have to do it with some kind of JavaScript. Either HTML5 local storage, or via AJAX that has appropriate caching headers or responds with a 304 Not Modified response.
Side note: The term "fragment cache" is more often referred to "partial caching" in the ASP.Net world.
SO Tips: These are two questions, and should really be asked as two individual questions in the future.
Also, there are many ways to solve your problems here; if you provided more context to what you are doing and the performance problem you are trying to solve, we could offer more specific answers.

Setting content expiration on all pages to avoid back button in ASP Classic

Is there a way to set pages to expire in ASP Classic so that the user can't hit back and re-do anything?
Is this a good practice?
If you force the page to 'expire', it would have the opposite effect you want: It would actually force the browser to make the request again (because it's been told the data it has expired)
I suspect you might be barking up the wrong tree here, though. Are the pages that "do stuff" using the Query String values as the parameters to take those actions? In other words, is the page that links to the 'action' page doing so via a regular anchor tag with query string parameters in the URL, or via a form using the GET method?
If so, you should change the form submitting that action to a POST form. Doing that will not only result in a prompt in the browser if the person uses the Back or Refresh buttons to try to reload that page, but also helps protect you against Cross-Site Request Forgery attacks. (more info on XSRF here)
What is the problem that you are trying to solve? If the back button is forcing something to be updated on the server, then you are better off making sure that you don't allow pages to be in the browser history that can cause problems.
After a POST, I often do a Response.Redirect so that the POST is not in the browser history. This helps avoid these types of issues.

iframed ASP actions trouble

This is actually a follow up on my previous question (link)
I've created the HttpHandler and it works fine for now, I'll add flexibility by using the querystring and session to point the post I'm making in the right direction.
The next question is as follows.
Now that I have the old page iframed as it should be, there's still the trouble of handling the postbacks (or actions) these pages trigger.
Every button action (asp form post) refers to a page that is not there (it's on the other server from which I am importing functionality).
I've tried using a url mapping to the other server but I get an error that tells me the external link is not a valid virtual directory. Hence I discarded this option.
I there anyway to keep functionality going inside the iframe?
please do ask clarification if you need it.
I got a solution from a colleague.
before passing the response string to the Iframe from the handler I use a string.replace to adjust the urls in the old site. This way they point to the old site and everything works again :)

Resources