Cloudflare optimization techniques (free plan) - wordpress

OK, so I'm trying to benefit from the CF's free plan and squeeze as much as I can out of it. The main goal is to get the site served from the CF cache so it will load faster in the browser, if only for the first visit and search engines. It is a Wordpress so it can be a little slower than other sites.
So, to have CF cache properly I have set the following rules. You probably know that under the free plan 3 is the maximum:
https://example.com/wp-content/*
Browser Cache TTL: a year, Cache Level: Cache Everything, Edge Cache TTL: a month
https://example.com/wp-admin/*
Security Level: High, Cache Level: Bypass, Disable Apps, Disable Performance
https://example.com/*
Auto Minify: HTML, CSS & JS, Browser Cache TTL: 30 minutes, Cache Level: No Query String, Edge Cache TTL: 2 hours, Email Obfuscation: On, Automatic HTTPS Rewrites: On
Exactly in this order. These should allow CF to cache the files stored in the wp-content (uploads etc) for the maximum amount of time, then ignore and bypass the wp-admin and finally serve all the others (products in my case, blog articles, pages and so on) from its cache, although these should have a shorter time. I've also set the caching level in the Cloudflare dashboard to 'No query string'.
So far CF caches all the above and first time visitors or search engines should get a super fast page.
Next, I've added the following in the site's footer:
<script>jQuery(document).ready(function(){var e="?"+(new Date).getTime();jQuery("a").each(function(){jQuery(this).attr("href",jQuery(this).attr("href")+e)})})</script>
This script appends the current date to all links on the page. By doing this I want the visitor to get the latest version of the page (ie from my server), not the one stored by CF, because CF should not cache ULRs such as https://example.com/samplepage?234523445345 as it was instructed previously, in both the cache settings and the page rules.
Now, what I'm worried about is CF caching pages belonging to logged in members, such as account details. While the string javascript does work and the members would click a link such as /account?23456456 and therefore the page should not get cached, I have to wonder 'what if?'.
So, is there any better way to achieve what I am trying to (fast loading without caching members pages and sensitive details, such as shopping cart)? Or is this the maximum I can get out of the free plan?

In your case. Completely wordpress site? It is really very simple than other platforms to optimise. A new service called. Automatic Platform optimisation (APO). enable this in your cloudflare and install this in your wordpress plugin. Then connect the cloudflare to wordpress through APO.. And try to cache everything from your origin server. This will reduce the TTFB and RTT. This two will defenitely satisfy your site performance and speed.

Related

Optimizing Google Search Appliance on a remote server

I'm planning to deploy a Google Search Appliance to remotely index an intranet site (transcontinentally). So I will be using the company's network and potentially consuming too much bandwidth.
Regarding the configurations that I can use to mitigate the effect of the initial crawl (which is the only one that is perceived as dangerous for the network) we have:
Crawl and Index > Host Load Schedule
Web Server Host Load: basically number of concurrent connections to the crawled servers within 1 minute, so minimizing this setting should
Exceptions to Web Server Host Load: this is a schedule used for either increasing or decreasing the number of concurrent connections to the crawled server.
Crawl and Index > Crawl Schedule
Instead of a continous crawl I should choose a Scheduled crawl.
Am I on the right track and can other settings be configured in order not to generate excessive network traffic between the GSA and the Web servers?
The best way to minimize the crawling of a remote site is to not crawl it. Failing that, there are a couple of settings will help it it as noted out above:
1) Host Load Schedule
This sets the number of current threads set to the crawler for the host. Note that this can be a number below 1. (i.e. 2.5) (also noted by BigMikeW)
2) Freshness Tuning
Crawl infrequently actually means "Crawl never again". This works well in conjunction with a meta-url feed which will tell the GSA to recrawl the page or a recrawl request from the administrative console. Crawl frequently actually means: "Crawl Once Per Day". This setting doesn't really mean much now that the crawler has been retuned and the hardware is faster. The GSA will submit requests intra daily to the pages it finds.
3) Crawl schedule
I find that it's not better to turn off the crawler but rather keep it on continuous mode and set the threshold at zero. This allows the natural GSA algorithms to play out. Anything you wish to achieve by scheduling can be achieved by tuning it to zero for the periods you want the crawler quiet.
My recommendation for minimizing WAN traffic:
1) Review DNS and add an override if necessary to ensure you are routing to nearest content source
2) Set the content sources pattern to crawl infrequently
3) Create a meta url feed to push content updates.
The last one would take a bit of coding. There is an example sitemap feeder here:
https://code.google.com/p/gsafeedmanager/
With this configuration, the GSA will never recrawl the content and will rely on the feed to inform it of updates.
Alternate:
1) Ensure the content source responds to HEAD requests with LAST Modified Dates. Do not configure crawl infrequently. The GSA will detect deltas and slow the crawl down over time.
Yes, I would also look at the Freshness Tuning and Duplicate Hosts.
Host Load Schedule
Web Server Host Load
Exceptions to Web Server Host Load
Crawl Schedule
Crawl Mode
Freshness Tuning
Crawl Frequently
Crawl Infrequently
As Tan Hong Tat says, look at Freshness Tuning and Duplicate Hosts.
I would set it to crawl infrequently at least until the initial crawl has completed.
Also do some content analysis. Using the Crawl patterns you can direct the GSA to ignore certain content types (based on file extension) or areas of the intranet that don't contain content of value to the search experience.
When you're setting the host load remember that you can use decimal values between 0-1, e.g.: 0.1.
If they have a decent WAN optimizer in place you may find this is less of an issue than you think.

NGINX and memcached - full page caching and TTL

I'm using nginx, memcached and APC for all my sites. What I host is a wordpress site, a vBulletin forum and some other sites.
I've set up nginx and memcached so that nginx first checks the memcached server to see if it has an entry for the full page, if it doesnt pass the request along to PHP and cache the full page - then display it to the user, see link for configuration: http://pastebin.com/ZFSrA9e5
Currently the vBulletin forum is using the "$config['Datastore']['class'] = 'vB_Datastore_Memcached';" and the WP blog is using the Memcached Object Cache (http://wordpress.org/extend/plugins/memcached/)
I am only caching WP as the full page in memcached (as explained above) at the moment to see if I run into any issues - so far so good.
What I want to achieve is good loading times and low load. The issues I've ran into/questions I have ran into are these:
How do I know that for example a user logs in for the first time, memcached caches the request for the first user. Then the next user comes and memcached serves the cached page for the first user - does anything take this into account/prevent this?
How/when will memcached/nginx flush the full-site cache in order to update the cache?
Am I recommended to run both APC and memcached? As far as I'm aware; memcached caches small values and apc caches the compiled PHP code, correct?
Would be awesome if someone could enlighten me on these questions.
1) Your cache response solely depends of this:
set $memcached_key "wordpress:$request_uri";
So each cached-entry depends only of URI and user auth information does not make sense. Second request will be same as first one because it will have same memcache keys. If you want to store separate cache-keys for each logged user you'll need to set more distinct key, something like this:
set $memcached_key "wordpress:$request_uri$scheme$host$cookie_PHPSESSID";
2) This depends of WP-plugin. Nginx never flushes the cache, to make force-flush you'll need to restart memcache.
3) Yes, both of them do different things, APC caches compiled PHP code, so it dont have to compile each time with each request (it only recompiles with server restart or when php file is changed). Memcache stores some portions of page or the whole page (your scenario) in memory and when KEY provided by nginx found in memcache, PHP is not even involved - whole page serves directly from memcahced memory.
hope this helps)

is wordpress suitable for a site which has 317k pageviews p/w

I had meeting with a local newspaper company's owner. they are planning to have a newly designed website. their current website is static and doesnt have any kinds of database. But their weekly pageview figure is around 317k. This figure surely will increase in the future
The question is if i create a Wordpress system for them will the website run smoothly with new functionalities (news,galleries may be). it is not neccessary to use lots of plugins. can their current server support wordpress package without any upgrade.
Or shall i think to use php to design website.
Yes - so long as the machinery for it is adequate, and you configure it properly.
If the company uses CDN (like akamai), ask them if this thing can piggyback on their account, then make them do it anyway when they throw up a political barrier. Then, then stop sweating it, turn keepalive on and ignore anything below this line. Otherwise:
If this is on a VPS, make sure it has guaranteed memory and I/O resources - otherwise host it on a hardware machine. If you're paranoid, something with a 10k RPM drive and 2-3 gigs of ram will do (memory for apache and mysql to have breathing room and hard drive for unexpected swap file compensation.)
Make sure the 317k/w figure is accurate:
If it comes from GA/Omniture/another vendor tracking suite, increase the figures by about 33-50% to account for robots that they can't track.
If the number comes from house stats/httpd logs, assume it's 10-20% less (since robots don't typically hit you up for stylesheets and images.)
If it comes from combined reports by an analyst whose job it is to report on their own traffic performance, scratch your head and flip a coin.
Apache: News sites in America have lunchtime and workday winddown traffic bursts around or about 11 am, and 4 pm, so you may want to turn Keepalive off (having it on will improve things during slow traffic periods, but during burst times the machine will spin into an unrecoverable state.)
PHP: Make sure some kind of opcode caching is enabled on the hosting machine (either APC or eAccelerator). With opcode caching, memory footprint drops off significantly and machine doesn't have to borrow as much from the swap file - hard drive.
WP: Make sure you use WP3.4, as ticket http://core.trac.wordpress.org/ticket/10964 was closed in favor of this ticket's fix: http://core.trac.wordpress.org/ticket/18536. Both longstanding issues address query performances on large volume sites, but the overall improvements/fixes help everywhere else too.
Secondly, make sure to use something like the WP Super Cache caching plugin and configure it appropriately. If volume of content on this site is going to be permanently small, you shouldn't have to take any special precautions - otherwise you may want to alter the plugin/rules so to permanently archive older content into a static file. There is no reason why 2 year old content should be constantly respidered at full resource cost.
Robots.txt: prepare and properly register a dynamic sitemap with google/bing/etc. If you expect posts to be unnecessarily peppered with a bunch of tags and categories by people who don't understand what they actually do, you may want to Disallow /page/*, /category/* and /tag/*. Otherwise, when spider robots swarm the site, for every post you'll be slammed by an amount increased by number of tags/cats it has. And then some.
For several years The Baltimore Sun hosted their reader reward, sports and editorial database projects directly off a single collocated machine. Combined traffic volume was factors larger than what you mention, but adequately met.
Here's a video of httpd status w/keepalive on during a slow hour, at about 30 req./sec: http://www.youtube.com/watch?v=NAHz4GRY0WM#t=09
I would not exclude WordPress for this project based only off of the weekly pageview of < a million. I have hosted WordPress sites that receive much, much more traffic and were still very functional. Whether or not WordPress is the best solution for this type of project though based off of the other criteria you have is completely up to you.
Best of luck and happy coding!
WP is capable of handling huge traffic. See this list of people who are using WP VIP services:
Time,DowJones,NBC Sprts,CNN and many more.
Visit WordPress VIP site: http://vip.wordpress.com/clients/

How Find Concurrent request per second made to server using Google Analytics?

Our website is being running for the past 6 months.
I would like to know how to detect concurrent users we are serving or the requests per second we are getting" so that we can do some performance tweaking.
We use Apache, PHP(Typo3 CMS), Google Analytics and AWStats.
Thank you.
The new Google Analytics Interface has an option to view the users in real time.
This will only show you the views to your HTML ages though (or any call to GA, like file downloads if configured).
It will not show you people accessing assets such as images, CSS or javascript files.
To increase the performance of TYPO3 there are a couple of things to consider:
USER instead of USER_INT plugins
if a user is logged in, switch the caching of the extension with a condition (see code at the end, SO doesn't like code in bullet lists)
use a PHP cache such as APC, see this discussion: apc vs eaccelerator vs xcache
use a reverse proxy, such as varnish in combination with the TYPO3 extension moc_varnish
plugin.tx_myplugin = USER
[loginUser = *]
plugin.tx_myplugin = USER_INT
[global]

akamai refresh cache before deployment and do cutover at specified time

My objective is to achieve zero downtime during deployment. My site uses akamai as CDN. Lets say I do have primary and secondary cluster of IIS servers. During deployment, the updates are made to secondary cluster. Before switchover from primary to secondary, can I request akamai to cache the content and do a cutover at a specified time?
The problem you are going to have is to guarantee that your content is cached on ALL akamai servers. Is the issue that you want to force content to be refreshed as soon as you cutover?
There are a few options here.
1 - Use a version in the requests "?v=1". This version would ALWAYS be requested from origin and would be appended to every request. As soon as you update your site, update the version on origin, so that the next request will append "?v=2" thus "busting" the cache and forcing an origin hit for all requests
2 - Change your akamai config to "honor webserver TTLs". You can then set very low or almost 0 TTLs right before you cut over and then increase gradually after you cutover
3 - Configure akamai to use If-MOdified-Since. This will force akamai to "validate" if any requests have changed.
4 - Use ECCU which can purge a whole directory, but this can take up to 40 minutes, but should be manageable during a maint window.
I don't think this would be possible based on my experience with Akamai (but things change faster than I can keep up with) - you can flush the content manually (at a cost) so you could flush /* we used to do this for particular files during deployments (never /* because we had over 1.2M URLs) but I can't see how Akamai could cache a non-visible version of your site for instant cut-over without having some secondary domain and origin.
However I have also found that Akamai are pretty good to deal with and it would definitely be worth contacting them in relation to a solution.

Resources