Is query string approach reliable? - css

I am looking for some effective ways to bypass the cache whenever necessary. In the process of searching for that I have found this link
From the referenced post I found that the query string approach may not work when the squid like proxies are used. I did not test this.
I see that stackoverflow in itself is using the query string approach, below is the screenshot for the same captured before login to the site.
Would like to know if the query string approach is a reliable solution to push the css and js updates whenever a new software build is released.

It's reliable browser side, meaning that since the URL is different (because there's a different query parameter), it will fetch a new copy.
Server side it depends on your server. Some caching proxies may ignore query parameters for the purpose of determining URL equality. AWS CloudFront for example does so by default. That's always a configurable setting. Since, presumably, you are in charge of the server, you can configure it as needed.

Related

AWS Lambda, Caching {proxy+}

Simple ASP.Net AWS Lambda is uploaded and functioning with several gets like:
{proxy+}
api/foo/bar?filter=value
api/foo/barlist?limit=value
with paths tested in Postman as:
//#####.execute-api.us-west-2.amazonaws.com/Prod/{proxy+}
Now want to enable API caching but when doing so only the first api call gets cached and all other calls now incorrectly return the first cached value.
ie //#####.execute-api.us-west-2.amazonaws.com/Prod/api/foo/bar?filter=value == //#####.execute-api.us-west-2.amazonaws.com/Prod/api/foo/barlist?limit=value; In terms of the cache these are return the same but shouldn't be.
How do you setup the caching in APIGateway to correctly see these as different requests per both path and query?
I believe you can't use {proxy+} because that is a resource/integration itself and that is where the caching is getting applied. Or you can (because you can cache any integration), but you get the result you're getting.
Note: I'll use the word "resource" a lot because I think of each item in API Gateway as the item in question, but I believe technically AWS documentation will say "integration" because it's not just the resource but the actual integration on said resource...And said resource has an integration and parameters or what I'll go on to call query string parameters. Apologies to the terminology police.
Put another way, if you had two resources: GET foo/bar and GET foo/barlist then you'd be able to set caching on either or both. It is at this resource based level that caching exists (don't think so much as the final URL path, but the actual resource configured in API Gateway). It doesn't know to break {proxy+} out into an unlimited number of paths unfortunately. Actually it's method plus resource. So I believe you could have different cached results for GET /path and POST /path.
However. You can also choose the integration parameters as cache keys. This would mean that ?filter=value and ?limit=value would be two different cache keys with two different cached responses.
Should foo/bar and foo/barlist have the same query string parameters (and you're still using {proxy+}) then you'll run into that duplicate issue again.
So you may wish to do foo?action=bar&filter=value and foo?action=barlist&filter=value in that case.
You'll need to configure this of course, for each query string parameter. So that may also start to diminish the ease of {proxy+} catch all. Terraform.io is your friend.
This is something I wish was a bit more automatic/smarter as well. I use {proxy+} a lot and it really creates challenges for using their caching.

Is it dangerous if a web resource POSTs to itself?

While reading some articles about writing web servers using Twisted, I came across this page that includes the following statement:
While it's convenient for this example, it's often not a good idea to
make a resource that POSTs to itself; this isn't about Twisted Web,
but the nature of HTTP in general; if you do this, make sure you
understand the possible negative consequences.
In the example discussed in the article, the resource is a web resource retrieved using a GET request.
My question is, what are the possible negative consequences that can arrive from having a resource POST to itself? I am only concerned about the aspects related to the HTTP protocol, so please ignore the fact that I mentioned about Twisted.
The POST verb is used for making a new resource in a collection.
This means that POSTing to a resource has no direct meaning (POST endpoints should always be collections, not resources).
If you want to update your resource, you should PUT to it.
Sometimes, you do not know if you want to update or create the resource (maybe you've created it locally and want to create-or-update it). I think that in that case, the PUT verb is more appropriate because POST really means "I want to create something new".
There's nothing inherently wrong about a page POSTing back to itself - in fact, many of the widely-used frameworks (ASP.NET, etc.) use that method to handle various events that happen on the client - some data is posted back to the same page where the server processes it and sends a new reponse.

Is PUT/DELETE idempotent with REST automatic?

I am learning about REST and PUT/DELETE, I have read that both of those (along with GET) is idempotent meaning that multiple requests put the server into the same state.
Does a duplicate PUT/DELETE request ever leave the web browser (when using XMLHttpRequest)? In other words, will the server be updating the same database record for each PUT request, or will duplicate requests be ignored automatically?
If yes, how is using PUT or DELETE different from just using POST?
I read an article which suggested that RESTful web services were the way forward. Is there any particular reason why HTML5 forms do not support PUT/DELETE methods?
REST is just a design structure for data access and manipulation. There's no set-in-stone rules for how a server must react to data requests.
That being said, typically a REST request of PUT or DELETE would be as follows:
DELETE /item/10293
or
PUT /item/23848
foo=bar
fizz=buzz
herp=derp
The requests given are associated with a specific ID. Because of this, telling the server to delete the same ID 15 times will end up with pretty much the same result as calling it once, unless there's some sort of re-numbering going on.
With the PUT request, telling the server to update a specific item to specific values will also lead to the same result.
A case where a command would be non-idempotent would typically involve some sort of relative value:
DELETE /item/last
Calling that 15 times would likely remove 15 items, rather than the same last item. An alternative using HTTP properly might look like:
POST /item/last?action=delete
Again, REST isn't an official spec, it's just a structure with some common qualities. There are many ways to implement a RESTful structure.
As for HTML5 forms supporting PUT & DELETE, it's really up to the browsers to start supporting different methods rather than the spec itself. If all the browsers started implementing different methods for form submission, I'm sure they'd be added to the spec.
With the web going the way it is, a good RESTful implementation is liable to also incorporate some form of AJAX anyway, so to me it seems largely unnecessary.
Does a duplicate PUT/DELETE request ever leave the web browser (when using XMLHttpRequest)?
Yeah, sure. Idempotence is only a convention and it's not enforced. If you make a request, duplicate or not, it will run through.
In other words, will the server be updating the same database record for each PUT request, or will duplicate requests be ignored automatically?
If it conforms to REST it should update the same database record twice, for example running UPDATE user SET name = 'John' twice. There is not guarantee what it will or will not do though, it depends on how it's implemented.
If yes, how is using PUT or DELETE different from just using POST?
It's just a convention. PUT and DELETE requests may or may not be handled differently from POST in the site's code.
I read an article which suggested that RESTful web services were the way forward. Is there any particular reason why HTML5 forms do not support PUT/DELETE methods?
I'm not really sure, to be honest. You can work around this by using a hidden <input> field called _method or similar and set it to DELETE or PUT, and then handle that server side.
PUT operation are idempotent but not safe operation. On success if PUT operation is repeated it will not insert duplicate records. Repeat PUT operation in case of NetworkFailure errors after verifying conditional headers like If-unmodified-since and/or if-match. Don't repeat in case of 4XX or 5XX error codes.
REST aims to establish a syntax convention regarding the HTTP method to use; each back end is free to implement anything they want, devs could break the convention but will cause unnecessary confusion if used by others not involved in the development.
For DELETE, if you delete some item with an ID, the server should responded it's deleted; if delete again, it's no more there so server responded "already removed", also good because your purpose is fulfilled. Same for PUT, because you provide the new status of your resource, the status yet-to-be; if it's already updated, mission complete and it's the same for the client.

Is it a good idea to check the HTTP request query string and indicate an error once there's an unexpected parameter?

My ASP.NET application will have to handle HTTP GET requests that will have the following URL format:
http://mySite/getStuff?id="actualIdHere"
currently the requirement is to validate that there're no parameters in the query string except id and indicate an error like "unknown parameter P passed".
Is such requirement a good idea? Will it interfere with some obviously valid cases of using the application I haven't thought of?
It would be better to just validate the presence of id.
Validating unknown parameters doesn't serve much of a purpose, they will just be ignored.
Just edited my answer here:
There are also tracking solutions out there that will add to your query string.
One that comes to mind is web analytics.
If your application is going to be a public web site, you will want to implement some tracking of your traffic (e.g. google analytics).
If you want to implement a marketing campaign to draw traffic to your site, you will likely need to add a few parameters (specific to the tracking system you're using) to your querystring to check the effectiveness of your campaign.
It depends on your target audience.
It is not a good practice for public websites where you are aware of SEO, for example if you implement Google Analytics then a user come to your site from Search Results may have a parameter in URL like googleclid.
However in more protected websites it is fine.
It might affect forward compatiblity. For example, if you have separate client applications/websites that actually call this URLs, and future versions of these clients might provide additional parameters to getStuff (like a sort ordering, backlink, etc), making hard requirements on the parameters might make it harder to roll out new versions smoothly (i.e. cannot roll out new clients until the server is updated).
This in addition to the traffic forwarding parameters public websites might get as additional input, like the other answers mention.

HTTP Verbs and Content Negotiation or GET Strings for REST service?

I am designing a REST service and am trying to weight the pros and cons of using the full array of http verbs and content negotiation vs GET string variables. Does my choice affect cacheability? Neither solution may be right for every area.
Which is best for the crud and queries (e.g. ?action=PUT)?
Which is best for api version picking (e.g. ?version=1.0)?
Which is best for return data type(e.g. ?type=json)?
CRUD/Queries are best represented with HTTP verbs. A create and update is usually a PUT or POST. A retrieve would be a GET. Deletes would be a DELETE. Thats the generally mapping. The main point is that a GET doesn't cause side effects, and that the verbs do what you'd expect them to do.
Putting the action in the URI is OK if thats the -only- way to pass it (e.g, the http client library doesn't allow you to send non-GET/POST requests). Most libraries do, though, so its strongly advised not to pass the verb via the URL.
The "best" way to version the API would be using HTTP headers on a per-request basis; this lets clients upgrade/downgrade specific requests instead of every single one. Of course, that granularity of versioning needs to be baked in at the start and could severely complicate the server-side code. Most people just use the URL used the access the servers. A longer explanation is in a blog post by Peter Williams, "Versioning Rest Web Services"
There is no best return data type; it depends on your app. JSON might be easier for Ajax websites, whereas XML might be easier for complicated structures you want to query with Xpath. Protocol Buffers are a third option. Its also debated whether its better to put the return protocol is best specified in the URL or in the HTTP headers.
For the most part, headers will have the largest affect on caching, since proxies are suppose to respect them when told, as are user agents (obviously UA's behave differently, though). Caching based on URL alone is very dependent on the layers. Some user agents don't cache anything with a query string (Safari, iirc), and proxies are free to cache or not cache as they feel appropriate.

Resources