append conditional html file output with xml response - asp.net

hey guys i have a restful xml service where client passes current version of html they are viewing. if the version on the server is the same as the client, i just respond with the current server version in xml. example: <Response ServerHTMLVersion="1" />
however if server html version is greater than current client version, i still spit out the same response like above like <Response ServerHTMLVersion="2" />. but problem being my client application needs to do a seperate http request to download the html file incase response xml version is greater than clients version
for performance reasons, i wanted to cut down this http request and i wanted to know what is the best way to do this. should i simply encode the html to make it xml safe and append that with xml response - problem with this being html is FAT and encoding makes it even fatter
OR
is there a better way of managing this? note that i am already gziping my response for both, xml as well as html right now
i wanted to know the way to do this keeping performance in mind. the restful xml service is implemented via asp.net 3.5 and iis 7

Have you thought about using HTTP headers? Since really the primary data here is the HTML, and the ServerHTMLVersion is a sort of "meta data" about that html, it should work.
Personally, I'd make the response to the request 1) blank when the versions match and 2) the HTML for non-matching versions; then, use the Pragma HTTP header to send something like Pragma: "ServerHTMLVersion=2". By doing this, you can easily check if the client and server versions differ, and just grab the full response if they're different.
Some people would debate the idea of returning HTML from a REST service, but I personally would consider this totally valid, and an nice clean way of separating your meta data from the actual user data.
-Jerod

Related

Is there a standard way in HTTP to specify no content should be returned?

For a PUT or POST (for example), I would like to specify to the server that I don't want any content returned in the response, even if it normally would. Essentially I'm looking for a way to perform blind inserts/updates, and was trying to avoid unnecessary response payloads if I have no intention of using them.
I thought maybe Accept: none as a request header (or something similar) might be an option, but couldn't find anything to support that.
Is there a standard way to specify this in an HTTP request, or do I have to just live with a little extra content in the response?
I think a minimal response is necessary to know if the request was handled correctly by web servers or if there was errors, even if it has no data other than status code and HTTP headers.
That said, you can use HEAD HTTP command to make a GET request having a response without the message body (you get back only headers). But this, AFAIK, doesn't work with POST or PUT requests.
Regards.
You might be interested in the proposal outlined in https://datatracker.ietf.org/doc/html/draft-snell-http-prefer-18.

What encoding should I use for an HTTP PUT?

I am writing a webserver. I implemented GET and POST (application/x-www-form-urlencoded, multipart/form-data) and that works fine.
I am thinking of adding a RESTful module to the server. So had a look at some stuff that's out there and got opinions about when to PUT, POST, and GET.
My question is: what encoding (application/x-www-form-urlencoded, multipart/form-data) does PUT support (per the HTTP specifications), or can it handle both?
I am trying to make the webserver as standard specific as I can without shooting myself in the foot.
The limitation to application/x-www-form-urlencoded and multipart/form-data is not in the HTTP standard but in HTML. It's the only formats that can be created by an HTML form. From HTTP point of view, you can use any format, as long as you specify it to the server (Content-Type header) and obviously that the server can understand it. If not, it reply with a 415 Unsupported Media Type status code.
See:
http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html#h-17.13.4
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.16
http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7
HTTP PUT can have whatever content-type the user wishes (the same as for all other HTTP methods).

Will sending an HTTP Header with Accept: text/html only download text from the page?

I'm writing a simple crawler, and ideally to save bandwidth, I'd only like to download the text and links on the page. Can I do that using HTTP Headers? I'm confused about how they work.
You're on the right track to solving the problem.
I'm not sure how much you already know about HTTP headers, but basically an HTTP header is just a string formatting for a web server - it follows a protocol - and is pretty straightforward in that aspect. You write a request, and receive a response. The requests look like the things you see in the Firefox plugin LiveHTTPHeaders at https://addons.mozilla.org/en-US/firefox/addon/3829/.
I wrote a small post at my site http://blog.gnucom.cc/2010/write-http-request-to-web-server-with-php/ that shows you how you can write a request to a web server and then later read the response. If you only accept text/html you'll only accept a subset of what is available on the web (so yes, it will "optimize" your script to an extent). Note this example is really low level, and if you're going to write a spider you may want to use an existing library like cURL or whatever other tools your implementation language offers.
Yes, with using Accept: text/html you should only get HTML as a valid responses. That’s at least how it ought to be.
But in practice there is a huge difference between the standards and the actual implementations. And proper content negotiation (that’s what Accept is for) is one of the things that are barely supported.
An HTML page contains just the text plus some tag markup.
Images, scripts and stylesheets are (usually) external files that are referenced from the HTML markup. This means that if you request a page, you will already receive just the text (without the images and other stuff).
Since you are writing the crawler, you should make sure it doesn't follow URLs from images, scripts or stylesheets.
I'm not 100% sure, but I believe that GET /foobar.png will return the image even if you send Accept: text/html. For this reason I believe you should just filter what kind of URLs you crawl.
In addition, you may try to read the response headers in the crawler and close the connection before you read the body if the Content-Type is not text/html. It might be worthwhile for undesired larger files.

HTTP Get content type

I have a program that is supposed to interact with a web server and retrieve a file containing structured data using http and cgi. I have a couple questions:
The cgi script on the server needs to specify a body right? What should the content-type be?
Should I be using POST or GET?
Could anyone tell me a good resource for reading about HTTP?
If you just want to retrieve the resource, I’d use GET. And with GET you don’t need a Content-Type since a GET request has no body. And as of HTTP, I’d suggest you to read the HTTP 1.1 specification.
The content-type specified by the server will depend on what type of data you plan to return. As Jim said if it's JSON you can use 'application/json'. The obvious payload for the request would be whatever data you're sending to the client.
From the servers prospective it shouldn't matter that much. In general if you're not expecting a lot of information from the client I'd set up the server to respond to GET requests as opposed to POST requests. An advantage I like is simply being able to specify what I want in the url (this can't be done if it's expecting a POST request).
I would point you to the rfc for HTTP...probably the best source for information..maybe not the most user friendly way to get your answers but it should have all the answers you need. link text
For (1) the Content-Type depends on the structured data. If it's XML you can use application/xml, JSON can be application/json, etc. Content-Type is set by the server. Your client would ask for that type of content using the Accept header. (Try to use existing data format standards and content types if you can.)
For (2) GET is best (you aren't sending up any data to the server).
I found RESTful Web Services by Richardson and Ruby a very interesting introduction to HTTP. It takes a very strict, but very helpful, view of HTTP.

Is the HTTP 'HEAD' verb useful in web development?

I've read the w3.org spec on the 'HEAD' verb, and I guess I'm missing something. I can't see how it would be useful.
Is the HTTP 'HEAD' verb useful in web development?
If so, how?
From RFC2616:
This method (HEAD) can be used for obtaining
metainformation about the entity
implied by the request without
transferring the entity-body itself.
This method is often used for testing
hypertext links for validity,
accessibility, and recent
modification.
The reason why HEAD is preferred to GET is due to the absence of the message body in the response making it using in scenarios where you want to determine if the content has changed at all - a change in the last modified time or content length usually signifies this.
Also, a HEAD request will provide some information about the server setup (whether it is IIS/Apache etc.), unless the server was masked; of course, this is available in all responses, but HEAD is preferred especially when you don't know the size of the response. HEAD is also the easiest way to determine if a site is up or down; again the irrelevance of the message body makes HEAD the ideal candidate.
I'm not sure about this, but RSS/ATOM feed readers would use HEAD over GET to ascertain if the contents of the feed have changed.
The HTTP HEAD can also be used to pre-authenticate to web server, before you do HTTP PUT/POST of some large data. Without the first HEAD request, you would be sending the large data to web server twice (because the first request would return 401 unauthorized reponse with WWW-authenticate header).
It's mainly for browsers and proxies to determine whether they can use a cached copy of the web document without having to download the whole thing (which would rather defeat the purpose of a cache).

Resources