Why do we need CDNs when HTTP proxies already cache content? - http

CDN seems to be a popular way of improving an app's performance.
But why are they required when you consider that HTTP proxies on the web can cache the content already ?

CDNs are a kind of web cache, just one operated under your auspices, rather than the web user's. You get full control of the freshness of your content, whereas you don't have any control of the proxy servers "out there".

The user's proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?
Read full article at https://developer.yahoo.com/performance/rules.html

Related

Do CDN prefetch user's static data or it's done on the request?

As per Akamai -
A content delivery network (CDN) is a group of geographically
distributed servers that speed up the delivery of web content by
bringing it closer to where users are. Data centers across the globe
use caching, a process that temporarily stores copies of files, so
that you can access internet content from a web-enabled device or
browser more quickly through a server near you. CDNs cache content
like web pages, images, and video in proxy servers near to your
physical location. This allows you to do things like watch a movie,
download software, check your bank balance, post on social media, or
make purchases, without having to wait for content to load.
So I am mainly interested in "post on social media"
Will the CDN prefetch a user's social media content (static?) or will it be done at the user's request because prefetching can be costly, complex and wasteful?
Prefetching generic static information, such as a website's landing page image, a viral video, product images (amazon must use CDN to deliver images for their products else it will take lot of time to load images and will be a very bad experience) makes sense.
Whether it is social media content or a static website, content is not fetched by Akamai except when a request comes to the platform for the content. At that point, depending on how the platform is configured for your particular site, the content is cached for later users for a period of time.
A few notes:
Akamai can be configured to automatically request the additional content linked on the page without waiting for the end user's browser to issue the request. Details: https://techdocs.akamai.com/property-mgr/docs/prefetching
You can use prefetching for both cacheable and non-cacheable objects. Details: https://techdocs.akamai.com/property-mgr/docs/prefetchable-objects
You can configure Akamai to prefresh that content as it nears the expiration of the caching time limit, checking to see if it has been modified. Details: https://techdocs.akamai.com/property-mgr/docs/cache-prefresh-refresh
In the case of user profiles, for example, I would not use caching. Synchronizing all the data to keep it up to date is complicated.
CDN systems usually let you set whether to cache the file and how often to update it.

Why do we still use HTTP instead of WebSockets for building Web Applications?

Recently I dived into the topic of WebSockets and built a small application that utilizes them.
Now I wonder why HTTP based API'S are still being used, or rather, why they are still being proposed.
As far as I can see there is nothing I can't do with WS that would be possible via HTTP, but the other way round I gain a lot of improvements.
What would be a real world example of an application that takes more benefits from a HTTP powered backend than from a WS one?
#Julian Reschke made good points. The web is document based, if you want your application to play in the WWW ... it have to comply with the game rules.
Still, you can create WS based SPA applications that comply with those.
Using the HTML5 history API, you can change the URL in shown by the browser without causing navigation. That allows you to have a different URL in your address bar depending on the state of your app, enabling then bookmarking and page history. The plugin "ui-router" for AngularJS plays very well here, changing the URL if you change the state programmatically, and viceversa.
You can make your SPA crawlable.
But still you want to use HTTP for some other things, like getting resources or views and cache them using HTTP cache mechanisms. For example, if you have a big application you want some big views to be downloaded on demand, rather than pack everything in a big main view.
It would be a pain to implement your own caching mechanism for HTML to get views and cache them in the local storage for example. Also, by using traditional HTTP requests, those views can be cached in CDNs and other proxy caches.
Websockets are great to maintain "connected" semantics, send data with little latency and get pushed data from the server at any time. But traditional HTTP request is still better for operations that can benefit from distribution mechanisms, like caching, CDN and load balancing.
About REST API vs WebSocket API (I think your question was actually about this), it is more a convenience than a preference. If your API has a big call rate per connection... a websocket probably makes more sense. If your API gets a low call rate, there is no point in using WebSocket. Remember that a Websocket connection, although it is lightweight, it means that something in the server is being held (ie: connection state), and it may be a waste of resources if the request rate do not justify it.
Bookmarking? Page history? Caching? Visibility to search engines?
HTTP and WebSockets are two Web tools originated to accomplish different tasks.
With HTTP you typically implement the request/response paradigm.
With WebSockets you typically implement an asynchronous real-time messaging paradigm.
There are several applications where you need both the paradigms.
You can also try to use WebSockets for request/response and use HTTP for asynchronous real-time messaging paradigm. While the former makes little sense, the latter is a widespread technique, necessary in all the cases where WebSockets are not working (due to network intermediaries, lack of client support, etc.). If you are interested in thus topic, check out this other answer of mine, which tries to clarify the terminology related to these techniques: Is Comet obsolete now with Server-Sent Events and WebSocket?

Is it possible to add logic to CDN

Is it possible to serve two different pages based on the user agent.
I want to serve pagename-iphone.html if the user agent matches iPhone and pagename-other.html for all other user agents. I want all pages on the site to follow this pattern. Is it possible to do this at the CDN level (cloud front, akamai etc).
thanks for your help!
I think what you are after is User Agent based caching, like vary: User-Agent.
In theory, a server provide Cache service can definitely do so, however, as far as I can tell CloudFront and most of other major CDN providers don' support so.
The basic reason is very straightforward that the currently there are too many User-Agent header, and it's almost unique on every single browser, not mention the different versions of the same browser. If you purely do things based on the whole User-Agent, you will lost the benefit of CDN cache most of the time.
Some of the more advanced servers allow you to add condition based on headers, for example, in Varnish, you can even add if,else logic for returning different values. But this is not available for majority of CDNs.
In the other hand, you should not rely on CDN to return different html pages. CDN is more commonly used to accelerate artifacts (js/css/imgs) instead of the whole page.
EDIT: Actually, I just recieved an email from AWS mentioned now CloudFront starts to support this:
Mobile Device Detection: You can now cache and deliver customized
content to your viewers on different devices (e.g. mobile vs. desktop)
based on the value of the User Agent header.
Please refers to: http://aws.amazon.com/about-aws/whats-new/2014/06/26/amazon-cloudfront-device-detection-geo-targeting-host-header-cors/ for more details.

Explanation of what CDN's do with "cached" resources?

I use shared hosting in conjunction with the CDN Cloudflare. And, my site is definitely faster than before using the CDN. However, I'd like to have a better understanding of how the CDN and my hosting service interact. For example, suppose I have an image on a webpage, as well as an external javascript file. I know that the CDN "caches" these resources. But, does that mean that instead of transferring the image/javascript from my shared hosting (which would "cost" me bytes transferred each month), that the CDN does it, essentially give me "free" transfer of these resources?
Yes, you are right if your CDN service provider doesn't charge any data transfer fees (such as free level CloudFlare).
Basically what happened is a user make a request to an asset in your website, if that item is already being cached in your CDN provider's edge node, the item will be served from them.
The only time your server will get a hit is when the asset is not available from the cache or expired.
But this free does come at a cost, there is no free meal and imagine how CloudFlare can survive if simply allow every one to have a free meal.
For most of the free service, there is no performance guarantee so that the cache hit ratio may not be high all the time, CDN provider may only allocate a limited memory to store the cache for free customers, hence there is a relatively higher chance that your assets are not available in the cache.
What's worse is if the content is not available in the cache, then your CDN provider will need to fetch the file to your origin server. Now a direct request from end user to your webpage becomes two indirect request, which obviously will actually increasing the loading time.

implications of having a site in full https

I am currently developing an MVC4 web application for eCommerce. The site will contain a login and users can visit the site, input their details and submit orders etc. This is a traditional eCommerce site.
To boost the security of the site, I am looking to set up the entire site in https. As the user will be supplying their log in credentials and storing personal information in cookies, I would like the site to be fully secured.
I have concerns though, these being if I set up the site in https, will it detriment performance? Will it impact negatively on search engine optimization? Are there any other implications of having an entire site in https?
I use output caching to cache the content of my views - with https will these still get cached?
I have been reviewing security guidelines and documentation, such as this from OWASP and they recommend this. Also, I see that sites such as twitter are fully https.
Generally speaking, no - whole-site encryption is not a problem for performance.
(Just make sure you disable SSL 2.0 on your server, as it's vulnerable to the BEAST attack; you should use TLS 1.0 or SSL3.0 which have been supported by pretty much every browser since 2000).
The performance issues were a problem years ago, but not anymore. Modern servers have the capacity to deal with the encryption of hundreds of requests and responses every second.
You haven't mentioned deploying a load-balancer or failover system, which implies your site won't be subject to thousands of pageviews every second. That's when you need to start using SSL offloaders - but you're okay for now.
Output caching is not affected by encryption - just make sure you're not serving one person's output to another (i.e. cache a shopping cart or banking details in Session or with the Session ID in the Cache key).

Resources