Is it a good idea to cache a sitemap.xml? - symfony

I have a sitemap.xml page which is dynamically generated by my Symfony application. I also use Varnish. I would like to know if it's a good idea to cache the sitemap.xml page or if it's useless?
Thanks.

How often will it change? If it isn't changing on a minute-by-minute basis, but is being read often (and hence being generated often), then it's probably worth caching it. You may want some code to generate it offline and then invalidate the Varnish cache if a particular timeliness for a newly update file is required.

Related

ASP MVC. Some users get old scripts, despite that we use bundles

We have an ASP MVC 5 applications. We use bundles with optimization enabled by default. But we have heard several times from users, that they get errors, that we think are caused by old versions of user scripts. Their browsers somehow take scripts from cache, despite the fact, that we have edited that script files and bundles should be updated. The worst part of the problem is that we can't imitate or recreate this problem. We don't know how. We already have tried to make test-changes to scripts like adding some "console.log('test')" lines in order to see, if the browser takes the cached version, but everything was ok, the hash in the end of <script src="....?v='hash'"> changed and the browser took the newest version from first time. I should mention, that our site is a single page application. Don't know, maybe its somehow related with the problem.
Have you faced this kind of problem?
There's not enough information here to give a definitive answer. The bundler detects changes in files and will regenerate the bundle along with the link to that bundle, which will include an updated query string param. Since the query string is part of the URI, it's considered a totally different resource at this point, and the browser should fetch it again, because there is technically no cache available. The only logical reason this would not occur is if the HTML with the link to the bundle is not being updated. This can happen if you're using OutputCache or otherwise caching the HTML document. It can also happen if the client's browser is aggressively caching the HTML document. Unfortunately, there's not much you can do about that, as the client browser ultimately has control over what is or is not cached and for how long.
That said, given that this is a single page app, it's very possible that it's also including a cache manifest. This manifest will very often include the HTML file itself, and the browser will not refetch any file in the manifest unless the manifest itself is updated.

How to invalidate browser cache using just configuration in the webserver?

For a long time I've been updating ASP.NET pages on the server and never find the correct way to make changes visible on files like CSS and images.
I know if a append something in the URL the browser will think the file is another one:
<img src="/images/myLogo.png?v=1"/>
or perhaps changing its name:
<img src="/images/myLogo.v1.png"/>
Unfortunately it does not look the correct way. In a case were I'm using App_Themes the files in this folder are automatically injected in the page in a way I can't easily change the URL.
So my question is:
When I'm publishing de ASP.NET Application on the server what is the correct way to signal to IIS (and it notify browser after that) that a file was changed? It is not automatic? Should I change some configuration in IIS or perhaps make some "decoration" in the code?
I've already tried many questions here in SO like "ASP.NET - Invalidate browser cache", "How to refresh the browser cache of an image?", "Handle cached images? How to get the browser to show the new version?", and even "What is an elegant way to force browsers to reload cached CSS/JS files?" but none of them actually take another aproach else in a way you must handle it manually in the code instead of IIS or ASP.NET configuration.
The closer I could find is "Asking browsers to cache our images (ASP.NET/IIS)" where they set expiration but not based on the fact the files were update. Instead they used days or hour to cache those file so they would updated even when no changes were made.
I'm want to know if IIS or ASP.NET offers something related to this, automatically send to the browser that the files was changed. Is it possible/built in?
The options you have to update the browser side, cached item are:
Change the file name
Add url parameter
Place it on cache for a limited time (eg for couple of hours)
Compare the date-time of creation.
Signaling with eTag.
With the three two you avoiding one server call for each item, but the third option load it again after some time.
With the others you have to make one call to the server to see if needs to be load it again.
So you can not have all here, there is not correct way, and you need to chose what is the best for you, and what you can do. The faster from client perspective is the (1) and (2) options.
The direct answer to your question is to use eTag, or date-time compare of the file creation, but you loose that way, a call to the server, you only win the size of what is travel back.
Some more links:
http eTag
How do I support ETags in ASP.NET MVC?
Configuring ETags with Http module in asp.net
How to control web page caching, across all browsers?
Jquery getScript caching
and you can find even more.

Priming the asp.net output cache

Is there a way to programmatically prime the asp.net output cache? I've investigated the caching API and can't seem to find an obvious way to do this. Has anyone tried something like this? If so, what method did you use?
I gave some thought to this last year and ended up concluding that it was not that important for the case, but if it's important for you website, all you have to do is to simply call the webpages from somewhere like Application_Start (after all code has run) event but you shouldn't stop there!
The cache will eventually expire and to avoid that you should set up some way to cache the pages again before any clients requests that page.
Make the outputcache dependent on someother object in cache and set an expiration callback.
Thus, when that cache object expires, so does your pages and you should make http requests to the pages you want to recache and so on.
I'm answering to this question, but the amount of effort and question marks I still have in my mind lead me to advise not to go through with this...
UPDATE
The only kind of dependency you may set in outputcache is sql dependency. Use it if you want, but if you would need to depend your outputcache on some other business object, then this might get very difficult. I could tell you that you could set a database object and depend your database on it and expire it yourself using some kind of timer.
Man, the longer I write the more solutions and difficulties I find! I can't write a book for something that is not worthy your precious time. Believe me you that the usefulness for this will be nearly zero.
Priming the cache is as others have suggested as easy as requesting the pages you want cached. Of course if you do this programmaticly it will only request the HTML and not all the linked resources (CSS, JavaScript, Images...) which is a good thing to avoid wasted bandwidth.
For many websites the items that are cached which consume the biggest performance penalties are common to many or all pages. For example a navigation system on a large CMS or storefront may query the database and do a bunch of rendering work which can then be cached for all pages. Also a big part of the initial load in ASP.net is when the website if first accessed and loaded into memory. Both of these issues can be addressed by even calling a single page on your site, but there is nothing stopping you from making a list of URLs and calling each one periodically.
If your cache policy is set for a 20 minutes timeout, maybe request each page once every 17-18 minutes.
Here are some resources with source code to help you get started:
Good Simple Primer on requesting web URL in C#
Website Monitoring Windows Service
Asyncronous Website Monitor
As I mentioned before, you can easily extend these to "foreach" over an array or list of URLs to be requested.

ASP.NET: Legitimate architecture/HttpModule concern?

An architect at my work recently read Yahoo!'s Exceptional Performance Best Practices guide where it says to use a far-future Expires header for resources used by a page such as JavaScript, CSS, and images. The idea is you set a Expires header for these resources years into the future so they're always cached by the browser, and whenever we change the file and therefore need the browser to request the resource again instead of using its cache, change the filename by adding a version number.
Instead of incorporating this into our build process though, he has another idea. Instead of changing file names in source and on the server disk for each build (granted, that would be tedious), we're going to fake it. His plan is to set far-future expires on said resources, then implement two HttpModules.
One module will intercept all the Response streams of our ASPX and HTML pages before they go out, look for resource links and tack on a version parameter that is the file's last modified date. The other HttpModule will handle all requests for resources and simply ignore the version portion of the address. That way, the browser always requests a new resource file each time it has changed on disk, without ever actually having to change the name of the file on disk.
Make sense?
My concern relates to the module that rewrites the ASPX/HTML page Response stream. He's simply going to apply a bunch of Regex.Replace() on "src" attributes of <script> and <img> tags, and "href" attribute of <link> tags. This is going to happen for every single request on the server whose content type is "text/html." Potentially hundreds or thousands a minute.
I understand that HttpModules are hooked into the IIS pipeline, but this has got to add a prohibitive delay in the time it takes IIS to send out HTTP responses. No? What do you think?
A few things to be aware of:
If the idea is to add a query string to the static file names to indicate their version, unfortunately that will also prevent caching by the kernel-mode HTTP driver (http.sys)
Scanning each entire response based on a bunch of regular expressions will be slow, slow, slow. It's also likely to be unreliable, with hard-to-predict corner cases.
A few alternatives:
Use control adapters to explicitly replace certain URLs or paths with the current version. That allows you to focus specifically on images, CSS, etc.
Change folder names instead of file names when you version static files
Consider using ASP.NET skins to help centralize file names. That will help simplify maintenance.
In case it's helpful, I cover this subject in my book (Ultra-Fast ASP.NET), including code examples.
He's worried about stuff not being cached on the client - obviously this depends somewhat on how the user population has their browsers configured; if it's the default config then I doubt you'd need to worry about trying to second guess the client caching, it's too hard and the results aren't guaranteed, also it's not going to help new users.
As far as the HTTP Modules go - in principle I would say they are fine, but you'll want them to be blindingly fast and efficient if you take that track; it's probably worth trying out. I can't speak on the appropriateness of use RegEx to do what you want done inside, though.
If you're looking for high performance, I suggest you (or your architect) do some reading (and I don't mean that in a nasty way). I learnt something recently which I think will help -let me explain (and maybe you guys know this already).
Browsers only hold a limited number of simultaneous connections open to a specific hostname at any one time. e.g, IE6 will only do 6 connections to say www.foo.net.
If you call your images from say images.foo.net you get 6 new connections straight away.
The idea is to seperate out different content types into different hostnames (css.foo.net, scripts.foo.net, ajaxcalls.foo.net) that way you'll be making sure the browser is really working on your behalf.
http://code.google.com/p/talifun-web
StaticFileHandler - Serve Static Files in a cachable, resumable way.
CrusherModule - Serve compressed versioned JS and CSS in a cachable way.
You don't quite get kernel mode caching speed but serving from HttpRuntime.Cache has its advantages. Kernel Mode cache can't cache partial responses and you don't have fine grained control of the cache. The most important thing to implement is a consistent etag header and expires header. This will improve your site performance more than anything else.
Reducing the number of files served is probably one of the best ways to improve the speed of your website. The CrusherModule combines all the css on your site into one file and all the js into another file.
Memory is cheap, hard drives are slow, so use it!

Incremental or on-demand sitemap.xml

After reading Jeff's article about the importance of sitemaps, so I decided to generate one for my dynamic website.
I saw some articles about how to implement it with ASP.NET but every solution I saw showed how to generate it on the fly with an HTTP Handler.
But that solution means that every time someone asks for the file, my code has to iterate trought all my entries to re-generate one?
Wouldn't it be less resource consuming to generate it incrementally? For example on stackoverflow, every time a user adds a question, appending the new URL node?
You might want to cache the resulting XML and invalidate the cache whenever your site structure changes. This might lead to having a publish/subscribe mechanism for components of your web site, but in case of properly structured application this won't be a problem.
You mean cache the result? Yes there's no reason you couldn't do that. Depending on the amount of traffic your site is getting it might be unnecessary but if you're doing it simply to improve your technique there's a number of ways to approach it.

Resources