Physically Separating Secure and Non Secure Web Requests - http

We have been doing some research into physically isolating the secure and non-secure sections of our web application into two applications. All "http" requests would be served by one server (or cluster) and all "https" requests would be served by another server (or cluster).
The reason that we are looking into this is partially for the survivability of the application. Since the secure section of the application is revenue generating we could, for example, have a larger and/or more powerful cluster to serve the requests. Conversely, when we upgrade the hardware in the secure application, it could be re-purposed to serve the non-secure site - basically extending the life of the servers.
Has anyone worked with this approach? We had an RFP out to a (well known) vendor last year for an architectural assessment and this was one of the possible paths that was recommended. While I see the potential upside, I worry about things such as maintenance, deployment, version control, etc.

Depending how your app is architected, it seems to me that if you used virtualisation / load balancing you could have the same benefits of guaranteed resources and isolation for the paid area, while also being able to dynamically burst resources to deal with spikes in load in either area. Your current proposal allows you to guarantee and prioritise resources, but it may result in some of them being idle.
Plus it would be easier to manage load through configuration, as it would then be a pure deployment issue and an entirely separate concern. You'd also be more independent of your hardware upgrade path as you'd just be adding/assigning virtual machines to the new hardware.

Related

Spring Redis cache expiration in memory

Using Spring Redis cache and wonder if is possible to set some data cache duration in memory. Cache of cache. If i know that data in Redis will not change for 5 minutes i dont need that Spring Redis cache touch the Redis everytime when some #Cacheable method is called.
Is Redisson the answer?
AFAICT, Redisson is simply a client-side facade or enhanced Redis (Java) client used to interface with a Redis node (or cluster) in a more powerful and convenient way, not unlike Spring Data Redis. For example, and as you already know, using Redis as a caching provider in Spring's Cache Abstraction.
Redis does seem to support client-side caching (a local cache in addiion to the remote (server) cache?), when using a Redis client/server topology. This would be transparent to you application (e.g. #Cacheable) and configured in the Redis client driver, AFAIK.
However, given my lack of experience with Redis, or even Redisson for matter, I cannot speak to this feature in detail. Redis client-side caching may need to be supported by the Redis client drivers (e.g. Jedis, Lettuce, even Redisson, etc).
NOW THE LONG-WINDED ANSWER FOR THE INTERESTED READER:
What you are describing when you state a "cache of cache" hearsay, is really having a "locally available cache" in addition to the "remote, or server-side cache". This assumes, of course, you are running Redis in a client/server (not embedded), and possibly distributed/clustered (maybe HA), capacity in the first place.
Ideally, you would choose a caching provider that supported this sort of arrangement out-of-the-box, natively. And, despite popular belief (for example), much of what Redis "reinvented" (horizontal scale-out or cluster, HA, even persistence) already existed in other, more mature solutions, built from the ground up with these concerns in mind.
SIDENOTE: Granted, the referenced article above is dated, but also a bit naive.
A "cache of (a) cache" is technically referred to as the Near Caching pattern.
It is where the "local" (application/client-side) cache mirrors the "remote" (server-side and primary) cache to avoid [a] network hop(s), i.e. latency, by only accessing the remote cache when necessary (e.g. cache miss), preferably in a "single-hop", "fault-tolerant" fashion, when the server-side is distributed and clustered.
However, a fundamental difference between the local cache and server-side, remote cache is that the local cache only stores a subset of the data from the remote cache based on "interests".
NOTE: In Redis's documentation, they referred to this as "tracking". There are different ways, across different providers, to express "interests" or track what the client has accessed. Be mindful of the different approaches here since they consume different system resources.
You might have a distributed (Web / Microservice) application architecture where several client application instances serve different demographics or populations of end-users. Clearly, those client application instances might use shared, but different subsets of the primary dataset stored in the servers. This is where the local cache and "registering interest" only in the data that matters to, or is used by, the client application comes into play.
"Registering interest" is important since the server-side, remote cache can notify clients ("push", rather than a client "pulling") hosting a local cache when data on the server changes that a client is interested in since more than 1 client might have interest in and use the same data (e.g. "record", and the intersection of data).
So, how do we properly address this concern without unnecessarily introducing extra (layers of) complexity into our system/application architecture?
Well, for one, it starts by choosing the right caching provider for the problem at hand.
DISCLAIMER: my experience stems from Apache Geode, which is the OSS variate of VMware Tanzu GemFire and a I am responsible for all things Spring for Apache Geode at VMware.
While I am a bit biased here it is not uncommon for other caching providers (and complete IMDG solutions) to support the same arrangement. For example, 1 of my personal favorites is Hazelcast.
Hazelcast calls this particular caching arrangement, or topology, an "embedded" cache and even refers to this as "near cache" in the documentation.
The nice thing about a local, embedded "Near Cache" is that it avoids latency through unnecessary networks hops, however, interest registration is key to keep data consistent, as far as possible.
I have documented, talked about and even demonstrated different caching patterns when using Spring for Apache Geode in the Spring Boot for Apache Geode documentation here and Near Caching in particular, along with the Near Caching Sample in the Samples with the other caching patterns).
I am sure you can find similar resources with other caching providers, even Redis.
At any rate, this documentation should help you understand different concerns to be aware of (e.g. memory consumption) when choosing any topology and configuration.
Good luck!

caching and load-balancing approaches

Lets say I have one WCF service that is running on one server.
Now we introduced some caching on the wcf service for performance reason.
Now if I want to do some load balancing .... is there some existing solution that will allow my cache to also be synched when it is living on different servers ???
How do we deal with these sort of issues?
Maybe a solution is to create a seperate CachingService that will be hosted on another server ... but then again if I want to load balance that service ... I'd need to somehow sync those cache that resides on different machines ...
Or maybe it doesn't event makes sense to load-balance the caching servers but only your wcf services???
Yes, there are many solutions/product that (usually) sync cache between servers (memcached, ehcache and more). I personally found that in case of closely located servers referenced cache performs better. That is cache where (possibly multiple) cache servers share the load and store objects on first "requested from" basis and then share references to the objects between all cache servers.

ASP.Net load balancing

I am working on asp.net (newbie) and I am trying to understand what it means to do "load balancing" for the web site. The website will be used by multiple users and resources (database, web service,..).
If anyone could help me understanding the concept of the load balance for asp.net web site, I would really appreciate it.
Thanks.
One load-balancing-related issue you may want to be aware of at development time: where you store your session state. This MSDN article gives a good overview of your options.
If you implement your asp.net system using "out-of-process" or "sql-server-mode" session state management, that will give you some additional flexibliity later, if you decide to introduce a load-balancer to your deployed system:
Your load balancer needn't handle session affinity. As one poster mentioned above, all modern load-balancers handle it anyway, so this is a minor consideration in any case.
Web-gardens (a sort of IIS/server-implemented load-balancer) REQUIRES use of "out-of-process" or "sql-server-mode" session state management. So if your system is already configured that way, you'll be one step closer to being able to use web-gardens.
What is it?
Load balancing simply refers to distributing a workload between two or more computers. As a concept, it's not unique to asp.net. Although having separate machines for your database and web server could be called "load balancing" it more commonly refers to using multiple machines to serve a single role, such as having multiple web servers.
Should you worry about it? Probably not. Do you already have a performance problem? Are your database and web server on their own machines? If you do find that your server resources are strained, it would probably be easier to scale up (a more powerful single machine) than out (load balancing). These days, a dedicated box can handle a LOT of traffic if your code is decent.
Load Balancing, in the programming sense, does not apply to ASP.NET; it applies to a technique to try to distribute server load across two or more machines, rather than it all being used on one machine. Unless you will have many thousands (millions?) of users, you probably do not need to worry about it.
Check the Wikipedia article for more information.
Load balancing is not specific for any on technology stack be it asp.net, jsp etc. To load balance is to spread the incoming requests to a web site over more than one server. This is typically done with a software or hardware load balancer. The load balancer sits in front of two or more web servers and delegates the incoming traffic. Although this technique is not limited to web servers. Load Balancing
Enjoy!
I've never used it, but an option is IIS Application Request Routing.
IIS Application Request Routing (ARR)
2.0 enables Web server administrators, hosting providers, and Content
Delivery Networks (CDNs) to increase
Web application scalability and
reliability through rule-based
routing, client and host name
affinity, load balancing of HTTP
server requests, and distributed disk
caching
In a typical web server/database scenario, the db is almost always guaranteed to load up the machine first. This is because dealing with storing data requires more resources. Before you even start looking at load balancing your web server, you need to think about how to load balance the database.
Spreading one database across multiple servers is a lot harder than load balancing a web server. One of the techniques that can be used is sharding (or horizontal partitioning). This is where some records are stored on one server, and other records - on another server. For example records with ID 1-900000 are on server 1 and records 900001- are on server 2.
In comparison to DB load balancing, spreading the load across multiple ASP.NET servers is not overly complicated. Most of the session issues can be easily mitigated by using out of process session and/or never talking to Application.Cache directly. Data load balancing on the other hand is hard and requires a lot of planning and trial and error. In most cases, talking to a load balanced DB requires using an ORM which supports it (e.g. NHibernate) or your own Data Access Layer. The reason being is that you need to take out establishing a connection from the code that uses the database, so that the decision which DB to talk to is handled in one place.
the exact solution is to save session into the SQL Server with Stored Procedure. To read session call 'SessionCheck' stored Procedure.
I'd add that it really isn't something to worry about. By the time you need a load balancer, you can probably afford one of the neato newfangled ones with sticky sessions so you don't even have to deal with the session boogeyman.

Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.
I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.
Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution
Thanks
I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.
Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.
It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:
Database query load on your DB server.
Business logic load on your web/application server.
Here is a JSP-specific overview of caching options.
I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.
Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.
My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.
I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.
If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.

Addressing scalability ,performance in a .net web application

I'm working on a .net portal which would be having lots of concurrent users.
so scalability,performance need to be addressed in the design and architecture.
We plan to use load balancing in the application.
Keeping this in mind,what would be the best way of communicating between IIS web server(hosting aspx,aspx.cs files) and application server (hosting .net assemblies like business logic and data access layer)?
Should it be .net remoting or soap web service?or is there any other approach?
Thanks.
Is there another approach? Yes - don't distribute your objects.
The most scalable approach is to NOT to distribute your objects away from each other. Ask yourself, why do you want to deploy one flavor of code to an "app server" while another flavor of code goes to a "web server"? The communication that goes on between those two layers, if they are distributed, will be much much much much (etc etc) more expensive than a local call.
With today's 64-bit servers, with all of that memory, and the hot CPUs, and with ASP.NET's superior memory management, why not put your business logic and DAL on the same physical machine as the ASPX files? Why not?
If you need to scale, add more servers. Simple.
There are good reasons, of course, to distribute. The most common good reasons have to do with domains of ownership - along several axes: security management, or even budget and control. In other words, to take the latter case, if team is responsible for running the business logic and a separate team is responsible for building and running the web layer -then it may make sense to distribute those two things to allow independence of management. Most of the good reasons for distributing computer code, have their origins in the structures of the human organizations using or developing the code.
There is no good technical reason why a web page should not run on the same CPU, sharing the same CLR VM and memory heap, as the database access layer.
Regardless what you do with distribution, it would be unwise to architect your system with less-than-formal interfaces defining the connections between the layers. If you keep formal interfaces, then it should be no problem for you to measure the perf and efficiency of a distributed approach versus a co-located approach.
Do you really need an app server? Just how big are you talking exactly? For example Stackoverflow.com has ~50k uniques a day and doesn't have an app server so I assume you are talking much bigger than that? Most performance bottle necks come down to database issues so I would concentrate on that.
I suggest you take a look at the Patterns and Practices groups guidelines for performance, more specifically Chapter 6 - Improving ASP.NET Performance of the guideline. I agree with Cheeso that you should seriously consider NOT physically splitting your application layer and UI layer if you can. The P&P guideline has the following notes:
Avoid Unnecessary Process Hops
Although process hops are not as expensive as machine hops, you should avoid process hops where possible. Process hops cause added overhead because they require interprocess communication (IPC) and marshaling. For example, if your solution uses Enterprise Services, use library applications where possible, unless you need to put your Enterprise Services application on a remote middle tier.
Understand the Performance Implications of a Remote Middle Tier
If possible, avoid the overhead of interprocess and intercomputer communication. Unless your business requirements dictate the use of a remote middle tier, keep your presentation, business, and data access logic on the Web server. Deploy your business and data access assemblies to the Bin directory of your application. However, you might require a remote middle tier for any of the following reasons:
You want to share your business logic between your Internet-facing Web applications and other internal enterprise applications.
Your scale-out and fault tolerance requirements dictate the use of a middle tier cluster or of load-balanced servers.
Your corporate security policy mandates that you cannot put business logic on your Web servers.
If you absolutely have to split the application logic up anyways, you could use WCF as a transport mechanism. I'm not sure how it stacks up against remoting when it comes to performance. However, I seem to remember that this is the guideline Microsoft is pushing.
Clemens Vasters (Technical Lead for the Microsoft .NET Service Bus) talks about WCF vs. Remoting in this answer on MSDN forums.
Learn to write asynchronously.
Explore the CCR runtime for example.
Each thread that is blocked waiting for IO responses is one less available to your system.
Turn off 'idealised logging' leave the ability to switch it back on via admin console. But logging is often a hidden bottle neck.
CACHE CACHE CACHE!
If it was expensive to get the data the first time, don't pay for it the second!
Avoid ASP.net's Session State - This can seriously bloat and lead to a large slow down in page responsiveness.
Modify the http headers to specify short browser caching (5sec - 20sec) (Depends on the nature of the content)
Utilise GZIP while you are at it!
AND USE LOTS OF RAM
Here are my tips
1)Move all your static files - images , css, js to a load balancer like nginx. This will greatly reduce the load on IIS server and it will have enough free resources to serve the main request.
2)Think about caching and avoiding database access altogether.
3)Try to implement REST principles are far as possible.
4)Keep session state to a bare minimum - if possible avoid it altogether.
There are some good performance and scalability points in these articles from Omar Al Zabir.
10 ASP.NET Performance and Scalability Secrets
and
99.99% available ASP.NET and SQL Server SaaS Production Architecture
(also check out his book Building a Web 2.0 Portal with ASP.NET 3.5)

Resources