I'm implementing a high traffic client web application that uses a lot of REST API's for its data access layer from the cloud database. I said client because it implements REST and not provides it.
REST APIs are implemented server side as well as client side and I need to figure out a good solution for caching. The application is running on a web farm so it I'm leaning toward a distributed caching like memcached. This caching solution will need to be like a proxy layer between my application and REST APIs and support both client side as well as server side.
For example if I make a call to update a record I would update through REST and I'd like to keep updated record in the cache so next calls to that record won't need extra call to the outside REST services.
I want to minimize REST calls as much as possible and would need to keep the data accurate as much as I can, but it doesn't need to be 100% accurate.
What is the best solution for this caching proxy? Is it a standalone application that runs on one of the servers with local cache, or built into current solution using distributed caching? what are you ideas, suggestion or concerns
You hit the nail on the head. You need a caching layer that acts as a proxy to your data.
I suggest that you create a layer that abstracts the concept of the cloud a way a bit. Your client shouldn't care where the data comes from. I would create a repository layer that communicates with the cloud and all other data. Then you can put a service layer on top of that that your client would actually call into. Inside this service layer is where you would implement things like your caching layer.
I used to always suggest using MemCached or MemCached Win32 depending on your environment. MemCached win32 works really well if you are in a windows world! Look to the Enyim client for MemCached is the least problematic of all the other ports.
If you are open to it though and you are in a .net world then you might try Velocity. MS finally got the clue that there was a hole in their caching framework in that they needed to support the farm concept. Velocity last time I checked is not out of beta yet...but still worth a look.
I generally suggest using the repository and service layer concepts from day one...even though you don't need it. The flexibility it provides for your application is worth having as you never know which direction your application will need to be pulled in. Needing to scale is usually the best reason to need this flexibility. But usually when you need to scale you need to scale now and refactoring in a repository layer and services layer while not impossible is usually semi-complex to do down the road.


Do we need Application Tier for Single Page Application

We are designing a new SPA application. Initially, we planned to have three tier application-
Application Tier (which will serve web pages and will behave as a proxy for all another data requests).
Business Tier (This will host WebAPIs for all business functionality).
Database Tier (To store the data).
Since this application will be an HTML5 application, can we bypass Application Tier for data request and directly call Business Tier from the browser?
One downside we can see is that it will result into CORS request
and for some request Preflight will also come into the picture. That may slow it down a bit.
And if data is formed by combining data from external service hit, this logic will have to passed to browser.
Could you please suggest something on these lines?
Tiers depend on what your architecture goals are, if it's performance than less tiers/physical separation will result in better performance. However there are other techniques you can use to mitigate performance such as caching.
In a typical scenario like this. It's better to separate the layers into separate physical layers such as an UI layer, API layer and data later. Also API is also a sort of UI, if u think carefully. Instead of delivering pages u are delivering data(json).
Keep in mind your business logic should be testable without involving any other layers.
I think the correct answer will always depend on your situation and architectural goals such as performance scalability, reliability, ease of change. etc etc.
But you can take this as a general guidance.

NevaTech Sentinet

I was reading through Nevatech Sentinet the last week and I'm currently asking myself the following question: "When NevaTech Sentinet exists, with all these named features, why should anyone use BizTalk with ESB Toolkit and extend it with Sentinet?"
Does I see there something wrong, but Sentinet is able to handle everything and more what BizTalk with ESB Toolkit is also able to do?
Essentially you are talking about 2 very different products here.
Sentinet is a very good tool if you are thinking about centralized API management. It is very good in what it does in that niche.
BizTalk on the other hand is a ESB, using a publish/subscribe architecture. BizTalk also has various ways of connecting to non-trivial systems like SAP, DB2, Siebel, MSMQ, etc... It can also do EDI/AS2/X12, flat file parsing and so on.
You can set it up as a ESB and/or message broker/hub/etc...
In your specific case (I'm guessing web services related?) it might seem that BizTalk and Sentinet are similar, yet the two are very different and actuallly complement each other rather nicely.
As you see these are 2 completely diverse products and together they actually might be a perfect match in your case.
Some more clarification after your comment:
Using BizTalk does not necessarily mean you have to use ESB Toolkit. BizTalk can perfectly act as an ESB without the ESB Toolkit.
The benefits of using Sentinet is definitely API management. What you are focusing on when "doing" API management with Sentinet is to form a single layer of API's within Sentinet. Often you would point all your clients (both internal or external) to Sentinet, where you would host all of your services virtually. You gain a lot of control that way and can add security, versioning, load balancing, SLA reporting, etc... to your existing services without any hassle.
Another thing Sentinet is quite good at is low-latency services. This is something BizTalk is not particulary good at, since it will persist everything to it's database to prevent losing messages. (I once used it in a POC and setup a virtual service calling an existing, external service with additional enrichment and easily covered 200+ trx/seconds).
BizTalk on the other hand is middleware. That's a whole other playing field.
It's very good in connecting different systems to each other using different protocols, mapping messages to other formats (xml, flat file or EDI), adding business logic to your flows, integration patters, long running flows, loose coupling, etc... you wouldn't want to use it as a virtual service since it will persist everything to it's message box!
Hopefully now you see they are both quite a different tool set.
They do play along nicely next to each other though: hosting your BizTalk web services in a virtual web service on Sentinet has a lot of advantages, especially in a fast-paced environment.
One addition to #Peter's great answer:
I almost always install the ESB ToolKit, if nothing else for the centralized exception handling (EsbExceptionDb). You may or may not want to use the itinerary and other services in the toolkit, but the exception DB is very handy when trying to debug and potentially resubmit messages.

Deprecating ASP.NET Web Methods

I have some internal-facing ASP.NET web services that have had numerous API additions over the years. Some of the original web methods, while still available for consumption, have recommended replacements available. I would like to steer consuming clients toward using these new methods so I can retire and eventually remove their elders.
If this were a client API rather than a web service API, I'd just mark the offending methods with the obsolete attribute. But .NET attributes do not get serialized and are not visible to consuming developers when they add or refresh web references.
What techniques are recomended for obsoleting ASP.NET web methods? Is there anything built into the tooling (VS2005-2010)? I don't want to break any of the existing clients, so I can't simply remove the web methods outright or change their internal behavior to reprot their usage as erroneous.
Tim, the short answer to this is unfortunately that you have to contact those clients and communicate the change with them and agree on timelines etc. There might be something that you can do to smooth the process over for them, particularly if they are not IT savvy clients and had to get their applications built by external contractors.
You can butter this up any way you like for them really, from the system is going to be replaced, to we are doing it bigger, better and faster.
Additionally you can build in code to slow them down, NOT RECOMMENDED, but then when they inquire you can give them the, we don't support that system any longer, it has been replaced by system 'X'.
If the new methods you are talking about are still just web-methods, you can just point the old ones to the new ones, and let the clients use the old one.
Another option is to identify the clients stuck on the old methods, get their IP addresses and lock it down so only they can use it, this way you ensure new clients will not attempt to connect to the old methods.
Other than that, I cannot think of anything that will not be a pain or difficult for both yo and the client.

Will a REST interface slow down my search engine?

To get a search website built quickly I plan to split the work between two teams: One to build the search engine and one to build web UIs (mobile/desktop). My plan is to build the search engine as a set of REST services based on .NET 3.5. UIs may be built using some other technology.
Questions: is the REST interface likely to be a performance bottleneck? How best to avoid this?
REST is unlikley to be a bottleneck in this scenario. It wasn't clear from your post whether you were making REST calls directly from your HTML UI on the client, or whether you were making server-to-server REST calls on the back end. So I'll cover both cases below.
If your REST calls are being made between your client UI and your servers, then using REST or another HTTP remoting approach matters relatively little-- the time it takes to execute the search on the back end and then send the results back down to the client should dwarf the impact of the REST call itself. If you want to improve perf, focus on client-side networking tricks (e.g. HTTP Compression, proper caching headers, etc) and optimizing your search engine itself.
If your architecture is one tier of servers (hosting your web UI) calling another tier (your serach engine), then calling between those tiers over REST also shouldn't add too much to your overall latency. This is because (same as above) running the search and sending results back down to the client will usually take a few hundred milliseconds at least, and the overhead of the back-end REST call (if done properly) will usually be 50ms or less.
That said, it's easy to mess up the client end of server-to-server HTTP calls. For example, many HTTP Client libraries (including .NET's) will by default limit the number of concurrent client connections, which makes sense if you're building an actual client app but will kill your scalability if used from a "client" that's actually a server serving hundreds of users concurrently. Other potential problems include authentication issues, proxy problems, DNS, etc. So be careful to build and configure your REST client code carefully, and be sure to load-test with a few hundred concurrent users!
No. REST is not (and generally cannot be) a bottleneck. REST is HTTP without the fancy HTML page. It's cheaper and faster than a regular web page.
I think it should not effect your performance, but to have a proper use of REST service .Net has ASP.Net MVC which supports REST fully.
Do remember to read through this link

Asynchronously Decoupled Three-Tier Architecture

Maybe I just expected "three-tier architecture" to deliver a little more than just a clean separation of responsibilities in the source code (see here)...
My expectations to such a beast that can safely call its self "three-tier architecture" are a lot higher... so, here they are:
If you were to build something like a "three tier architecture" system but this time with these, additional requirements and constraints:
Up and running at all times from a Users point of viewExpect when the UI gets replacedWhen other parts of the system are down, the UI has to handle that
Never get into a undefined state or one from which the system cannot recover automatically
The system has to be "pausable"
The middle-tier has to contain all the business logic
Obviously using an underlying Database, itself in the data-tier (if you like)
The business logic can use a big array of core services (here in the data-tier, not directly accessible by the UI, only through business logic tier facade)
Can be unavailable at times
Can be available as many parallel running, identical processes
The UI's may not contain any state other than the session in case of web UI's and possibly transient view baking models
Presentation-tier, logic-tier and data/core-services-tier have to be scalable independently
The only thing you can take for granted is the network
Note: The mentioned "core services" are heavy-weight components that access various external systems within the enterprise. An example would be the connection to an Active Directory or to a "stock market ticker"...
1. How would you do it?
If you don't have an answer right now, maybe read on and let me know what you think about this:
Sync considered harmful. Ties your system together in a bad way (Think: "weakest link"). Thread blocked while waiting for timeout. Not easy to recover from.
Use asynchronous messaging for all inter-process communication (between all tiers). Allows to suspend the system anytime you like. When part of the system is down, no timeout happens.
Have central routing component where all requests get routed through and core services can register themselves.
Add heartbeat component that can e.g. inform the UI that a component is not currently available.
State is a necessary evil: Allow no state other than in the business logic tier. This way the beast becomes manageable. While the core services might well need to access data themselves, all that data should be fed in by the calling middle tier. This way the core services can be implemented in a fire and forget fashion.
2. What do you think about this "solution"?
I think that, in the real world, high-availability systems are implemented using fail-over: for example, it isn't that the UI can continue to work without the business layer, instead it's that if the business layer becomes unavailable then the UI fails over to using a backup instance of the business layer.
Apart from that, they might operate using store-and-forward: e.g. a mail system might store a piece of mail, and retransmit it periodically, if it can't deliver it immediately.
Yep its the way most large websites do it. Look at nosql databases, Google's bigtable architecture etc.
1. This is the general approach I'd take.
I'd use a mixture of memcached , a nosql-cloud (couch-db or mongo-db) and enterprise grade RDBMS systems (core data storage) for the data layer. I'd then write the service layer ontop of the data layer. nosql database API's are massively parallel (look at couchdb with its ngingx service layer parallizer). I'd then provide "oldschool each request is a web-page" generating web-servers and also direct access to the service layer for new style AJAX application; both these would depend on the service layer.
p.s. the RDBMS is an important component here, it holds the authoritative copy of the all the data in the memchached/nosql cloud. I would use an enterprise grade RDBMS to do data-centre to data-centre replication. I don't know how the big boys do their cloud based site replication, it would scare me if they did data-cloud to data-cloud replication :P
Some points:
yYu do not need heartbeat, with nosql
the approach taken is that if content
becomes unavailable, you regenerate it
onto another server using the
authoratitve copy of the data.
The burden of state-less web-design
is carried to the nosql and memcached
layer which is infinitely scalable.
So you do not need to worry about
this. Just have a good network
In terms of sync, when you are
talking to the RDBMS you can expect
acceptable synchronous response
times. Your cloud you should treat as
an asynchronous resource, you will
get help from the API's that
interface with your cloud so you
don't even have to think about this.
Advice I can give about networking
and redundancy is this: do not go for
fancy Ethernet bonding, as its not worth
it -- things always go wrong. Just
set up redundant switches, ethernet cards
and have multiple routes to all your
machines. You can use OpenBSD and
CARP for your routers, as they work
great - routers are your worst point of failure -- openbsd solves this problem.
2. You've described the general components of a web 2.0 farm, so no comment:D
