Incremental or on-demand sitemap.xml - asp.net

After reading Jeff's article about the importance of sitemaps, so I decided to generate one for my dynamic website.
I saw some articles about how to implement it with ASP.NET but every solution I saw showed how to generate it on the fly with an HTTP Handler.
But that solution means that every time someone asks for the file, my code has to iterate trought all my entries to re-generate one?
Wouldn't it be less resource consuming to generate it incrementally? For example on stackoverflow, every time a user adds a question, appending the new URL node?

You might want to cache the resulting XML and invalidate the cache whenever your site structure changes. This might lead to having a publish/subscribe mechanism for components of your web site, but in case of properly structured application this won't be a problem.

You mean cache the result? Yes there's no reason you couldn't do that. Depending on the amount of traffic your site is getting it might be unnecessary but if you're doing it simply to improve your technique there's a number of ways to approach it.

Related

First Byte Time scores F

I recently purchased a new theme and installed wordpress on my GoDaddy hosting account for my portfolio. I am still working on it, but as of right now I sometimes get page load speeds of 10-20seconds, and others 2 seconds (usually after the page has been cached). I have done all that I believe I can (without breaking the site) to optimize my performance speed (reducing image sizing, using a free CDN, using W3 Total Cache, etc).
It seems that my main issue is this 'TTFB' wait time I get whenever I go to a new page that hasn't been cached yet. How I can fix this? Is it the theme's fault? Do I NEED to switch hosting providers? I really don't want to go through the hassle of doing that and paying So much more just to have less than optimal results. I am new to this.
My testing site:
http://test.ninamariephotography.com/
See my Web Page Results here:
http://www.webpagetest.org/result/161111_9W_WF0/
Thank you in advance to anyone for your help:)
Time To First Byte should depend on geography. I don't think that's your problem. I reran your test and got a B.
I think the issue is your hosting is a tiny shared instance, and you're serving static files. Here are some ideas to speed things up.
Serve images using an image-serving service. Check out imgix which is $3/m. It could help in unexpected ways serving images off an external domain depending on HTTP protocol version and browser version, and how connections are shared.
Try lossy compression. You lose some image detail, but you also lose some file size. Check out compressor.io for an easy tool.
Concatenate and minify scripts. You have a number of little javascript files that load individually. Consider joining them together and minifying. I don't know the tool chain for Wordpress, perhaps there's a setting?
If none of that helps, you should experiment with different a hosting choice.

Is it a good idea to cache a sitemap.xml?

I have a sitemap.xml page which is dynamically generated by my Symfony application. I also use Varnish. I would like to know if it's a good idea to cache the sitemap.xml page or if it's useless?
Thanks.
How often will it change? If it isn't changing on a minute-by-minute basis, but is being read often (and hence being generated often), then it's probably worth caching it. You may want some code to generate it offline and then invalidate the Varnish cache if a particular timeliness for a newly update file is required.

Priming the asp.net output cache

Is there a way to programmatically prime the asp.net output cache? I've investigated the caching API and can't seem to find an obvious way to do this. Has anyone tried something like this? If so, what method did you use?
I gave some thought to this last year and ended up concluding that it was not that important for the case, but if it's important for you website, all you have to do is to simply call the webpages from somewhere like Application_Start (after all code has run) event but you shouldn't stop there!
The cache will eventually expire and to avoid that you should set up some way to cache the pages again before any clients requests that page.
Make the outputcache dependent on someother object in cache and set an expiration callback.
Thus, when that cache object expires, so does your pages and you should make http requests to the pages you want to recache and so on.
I'm answering to this question, but the amount of effort and question marks I still have in my mind lead me to advise not to go through with this...
UPDATE
The only kind of dependency you may set in outputcache is sql dependency. Use it if you want, but if you would need to depend your outputcache on some other business object, then this might get very difficult. I could tell you that you could set a database object and depend your database on it and expire it yourself using some kind of timer.
Man, the longer I write the more solutions and difficulties I find! I can't write a book for something that is not worthy your precious time. Believe me you that the usefulness for this will be nearly zero.
Priming the cache is as others have suggested as easy as requesting the pages you want cached. Of course if you do this programmaticly it will only request the HTML and not all the linked resources (CSS, JavaScript, Images...) which is a good thing to avoid wasted bandwidth.
For many websites the items that are cached which consume the biggest performance penalties are common to many or all pages. For example a navigation system on a large CMS or storefront may query the database and do a bunch of rendering work which can then be cached for all pages. Also a big part of the initial load in ASP.net is when the website if first accessed and loaded into memory. Both of these issues can be addressed by even calling a single page on your site, but there is nothing stopping you from making a list of URLs and calling each one periodically.
If your cache policy is set for a 20 minutes timeout, maybe request each page once every 17-18 minutes.
Here are some resources with source code to help you get started:
Good Simple Primer on requesting web URL in C#
Website Monitoring Windows Service
Asyncronous Website Monitor
As I mentioned before, you can easily extend these to "foreach" over an array or list of URLs to be requested.

ASP.NET: Legitimate architecture/HttpModule concern?

An architect at my work recently read Yahoo!'s Exceptional Performance Best Practices guide where it says to use a far-future Expires header for resources used by a page such as JavaScript, CSS, and images. The idea is you set a Expires header for these resources years into the future so they're always cached by the browser, and whenever we change the file and therefore need the browser to request the resource again instead of using its cache, change the filename by adding a version number.
Instead of incorporating this into our build process though, he has another idea. Instead of changing file names in source and on the server disk for each build (granted, that would be tedious), we're going to fake it. His plan is to set far-future expires on said resources, then implement two HttpModules.
One module will intercept all the Response streams of our ASPX and HTML pages before they go out, look for resource links and tack on a version parameter that is the file's last modified date. The other HttpModule will handle all requests for resources and simply ignore the version portion of the address. That way, the browser always requests a new resource file each time it has changed on disk, without ever actually having to change the name of the file on disk.
Make sense?
My concern relates to the module that rewrites the ASPX/HTML page Response stream. He's simply going to apply a bunch of Regex.Replace() on "src" attributes of <script> and <img> tags, and "href" attribute of <link> tags. This is going to happen for every single request on the server whose content type is "text/html." Potentially hundreds or thousands a minute.
I understand that HttpModules are hooked into the IIS pipeline, but this has got to add a prohibitive delay in the time it takes IIS to send out HTTP responses. No? What do you think?
A few things to be aware of:
If the idea is to add a query string to the static file names to indicate their version, unfortunately that will also prevent caching by the kernel-mode HTTP driver (http.sys)
Scanning each entire response based on a bunch of regular expressions will be slow, slow, slow. It's also likely to be unreliable, with hard-to-predict corner cases.
A few alternatives:
Use control adapters to explicitly replace certain URLs or paths with the current version. That allows you to focus specifically on images, CSS, etc.
Change folder names instead of file names when you version static files
Consider using ASP.NET skins to help centralize file names. That will help simplify maintenance.
In case it's helpful, I cover this subject in my book (Ultra-Fast ASP.NET), including code examples.
He's worried about stuff not being cached on the client - obviously this depends somewhat on how the user population has their browsers configured; if it's the default config then I doubt you'd need to worry about trying to second guess the client caching, it's too hard and the results aren't guaranteed, also it's not going to help new users.
As far as the HTTP Modules go - in principle I would say they are fine, but you'll want them to be blindingly fast and efficient if you take that track; it's probably worth trying out. I can't speak on the appropriateness of use RegEx to do what you want done inside, though.
If you're looking for high performance, I suggest you (or your architect) do some reading (and I don't mean that in a nasty way). I learnt something recently which I think will help -let me explain (and maybe you guys know this already).
Browsers only hold a limited number of simultaneous connections open to a specific hostname at any one time. e.g, IE6 will only do 6 connections to say www.foo.net.
If you call your images from say images.foo.net you get 6 new connections straight away.
The idea is to seperate out different content types into different hostnames (css.foo.net, scripts.foo.net, ajaxcalls.foo.net) that way you'll be making sure the browser is really working on your behalf.
http://code.google.com/p/talifun-web
StaticFileHandler - Serve Static Files in a cachable, resumable way.
CrusherModule - Serve compressed versioned JS and CSS in a cachable way.
You don't quite get kernel mode caching speed but serving from HttpRuntime.Cache has its advantages. Kernel Mode cache can't cache partial responses and you don't have fine grained control of the cache. The most important thing to implement is a consistent etag header and expires header. This will improve your site performance more than anything else.
Reducing the number of files served is probably one of the best ways to improve the speed of your website. The CrusherModule combines all the css on your site into one file and all the js into another file.
Memory is cheap, hard drives are slow, so use it!

IIS Log files for ASP.NET website

I am testing ASP.NET website and for that I have turned logging on at IIS6.0.
Following are the observations during testing:
Each link, png image, MS Chart and CSS file has been requested separately, one after another.
For request of say login page it is taking around 30-45 seconds to complete and in that page only 6 images are there and at log file it is observed that there are separate requests for each images one after another.
Can anybody help me to improve site performance and also I would like to know that is it possible that all requests would send to server parallel?
Yes it is possible to improve on the app speed by parallelizing the downloads !
I recommend going through google page-speed and yahoo's yslow, and read the practices that they propose. I felt it informative.
http://code.google.com/speed/page-speed/
http://developer.yahoo.com/yslow/help/index.html
Thanks
First of all, have you checked web-site Performance tab? Limits could've been set there. Also check that keep-alives are enabled (web site tab).
Then you should profile your server using System Monitor.
If everything mentioned is ok, you should check client side and what's between client and server.
What's happening is that the browser makes HTTP requests to the server for each object it finds on the page. You can eliminate those requests, or reduce how often they happen, by enabling client-side caching. For static files, you can configure that in IIS.
You can parallelize requests for images (not JS files) by assigning them to different domains; if they are all in a single domain, the browser will request only two at a time.
However, you question opens the door to a big subject. In an attempt to provide a detailed answer, I ended up writing a book on the subject, called Ultra-Fast ASP.NET. I cover the answer to the question from the OP in great detail in Chapter 2.

Resources