Using a remote, external web service instead of a database - asp.net

I am building an ASP.NET web application that will be deployed to a 4-node web farm.
My web application's farm is located in California.
Instead of a database for back-end data, I plan to use a set of web services served from a data center in New York.
I have a page /show-web-service-result.aspx that works like this:
1) User requests page /show-web-service-result.aspx?s=foo
2) Page's codebehind queries a web service that is hosted by the third party in New York.
3) When web service returns, the returned data is formatted and displayed to user in page response.
Does this architecture have potential scalability problems? Suppose I am getting hundreds of unique hits per second, e.g.
/show-web-service-result.aspx?s=foo1
/show-web-service-result.aspx?s=foo2
/show-web-service-result.aspx?s=foo3
etc...
Is it typical for web servers in a farm to be using web services for data instead of database? Any personal experience?
What change should I make to the architecture to improve scalability?

You have most definitely a scalability problem: the third-party web service. Unless you have a service-level agreement with that service (agreeing on the number of requests that you can submit per second), chances are real that you overload that service with your anticipated load. That you have four nodes yourself doesn't help you then.
So you should a) come up with an agreement with the third party, and b) test what the actual load is that they can take.
In addition, you need to make sure that your framework can use parallel connections for accessing the remote service. Suppose you have a round-trip time of 20ms from California to New York (which would be fairly good), you can not make more than 50 requests over a single TCP connection. Likewise, starting new TCP connections for every request will also kill performance, so you want pooling on these parallel connections.

I don't see a problem with this approach, we use it quite a bit where I work. However, here are some things to consider:
Is your page rendering going to be blocked while waiting for the web service to respond?
What if the response never comes, i.e. the service is down?
For the first problem I would look into using AJAX to update the page after you get a response back from the web service. You'll also want to consider how to handle the no response or timeout condition.
Finally, you should really think about how you could cache the web service data locally. For example if you are calling a stock quoting service then unless you have a real-time feed, there is no reason to call the web service with every request you get. Store the data locally for a period of time and return that until it becomes stale.

You may have scalability problems but most of these can be carefully engineered around.
I recommend you use ASP.NET's asynchronous tasks so that the web service is queued up, the thread is released while the request waits for the web service to respond, and then another thread picks up when the web service is done to finish off the request.
MSDN Magazine - Wicked Code - Asynchronous Pages in ASP.NET 2.0
Local caching is an absolute must. The fewer times you have to go from California to New York, the better. You might want to look into Microsoft's Velocity (although that's still in CTP) or NCache, or another distributed cache, so that each of your 4 web servers don't all have to make and cache the same data from the web service - once one server gets it, it should be available to all.
Microsoft Project Code Named "Velocity"
NCache
Other things that can go wrong that you should engineer around:
The web service is down (obviously) and data falls out of cache, and you can't get it back. Try to make it so that the data is not actually dropped from cache until you're sure you have an update available. Then the only risk is if the service is down and your application pool is reset, so don't reset it as a first-line troubleshooting maneuver!
There are two different timeouts on web requests, a connect and an overall timeout. Make sure both are set extremely low and you handle both of them timing out. If the service's DNS goes down, this can look like quite a different failure.
Watch perfmon for ASP.NET Queued Requests. This number will rise rapidly if the service goes down and you're not covering it properly.
Research and adjust ASP.NET performance registry settings so you have a highly optimized ASP.NET thread pool. I don't remember the specifics, but I seem to remember that there's a limit on IO Completion Ports and something else of that nature that are absurdly low for the powerful hardware I'm assuming you have on hand.

the trendy answer is REST. Any GET request can be HTTP Response cached (with lots of options on how that is configured) and it will be cached by the internet itself (your ISP, essentially).

Your project has an architecture that reflects they direction that Microsoft and many others in the SOA world want to take us. That said, many people try to avoid this type of real-time risk introduced by the web service.
Your system will have a huge dependency on the web service working in an efficient manner. If it doesn't work, or is slow, people will just see that your page isn't working properly.
At the very least, I would get a web stress tool and performance test your web service to at least the traffic levels you expect to get at peaks, and likely beyond this. When does it break (if ever?), when does it start to slow down? These are good metrics to know.
Other options to look at: perhaps you can get daily batches of data from the web service to a local database and hit the database for your web site. Then, if for some reason the web service is down or slow, you could use the most recently obtained data (if this is feasible for your data).
Overall, it should be doable, but you want to understand and measure the risks, and explore any potential options to minimize those risks.

It's fine. There are some scalability issues. Primarily, with the number of calls you are allowed to make to the external web service per second. Some web services (Yahoo shopping for example) limit how often you can call their service and will lock out your account if you call too often. If you have a large farm and lots of traffic, you might have to throttle your requests.
Also, it's typical in these situations to use an interstitial page that forks off a worker thread to go and do the web service call and redirects to the results page when the call returns. (Think a travel site when you do search, you get an interstitial page while they call out to an external source for the flight data and then you get redirected to a results page when the call completes). This may be unnecessary if your web service call returns quickly.

I recommend you be certain to use WCF, and not the legacy ASMX web services technology as the client. Use "Add Service Reference" instead of "Add Web Reference".

One other issue you need to consider, depending on the type of application and/or data you're pulling down: security.
Specifically, I'm referring to authentication and authorization, both of your end users, and the web application itself. Where are these things handled? All in the web app? by the WS? Or maybe the front-end app is authenticating the users, and flowing the user's identity to the back end WS, allowing that to verify that the user is allowed? How do you verify this? Since many other responders here mention a local data cache on the front end app (an EXCELLENT idea, BTW), this gets even MORE complicated: do you cache data that is allowed to userA, but not for userB? if so, how do you verify that userB cannot access data from the cache? What if the authorization is checked by the WS, how do you cache the permissions then?
On the other hand, how are you verifying that only your web app is allowed to access the WS (and an attacker doesn't directly access your WS data over the Internet, for instance)? For that matter, how do you ensure that your web app contacts the CORRECT WS server, and not a bogus one? And of course I assume that all the connection to the WS is only over TLS/SSL... (but of course also programmatically verify the cert applies to the accessed server...)
In short, its complicated, and many elements to consider here.... but it is NOT insurmountable.
(as far as input validation goes, that's actually NOT an issue, since this should be done by BOTH the front end app AND the back end WS...)
Another aspect here, as mentioned by #Martin, is the need for an SLA on whatever provider/hosting service you have for the NY WS, not just for performance, but also to cover availability. I.e. what happens if the server is inaccessible how quickly they commit to getting it back up, what happens if its down for extended periods of time, etc. That's the only way to legitimately transfer the risk of your availability being controlled by an externality.

Related

Background task polling external resource at certain intervals

Requirement: I need to create a background worker/task that will get data from an external source ( message queue) at certain intervals ( i.e. 10s) and update a database. Need to run non stop 24hrs. An ASP.NET application is placing the data to the message queue.
Possible solutions:
Windows service with timer
Pros: Takes load away from web server
Cons: Separate deployment overhead, Not load balanced
Use one of the methods described here : background task
Pros: No separation deployment required, Can be load balanced - if one server goes down another can pick it up
Cons: Overhead on web server (however, in my case with max 100 concurrent users and seeing the web server resources are under-utilized, I do not think it will be an issue)
Question: What would be a recommended solution and why?
I am looking for a .net based solution.
You shouldn't go with the second option unless there's a really good reason for it. Decoupling your background jobs from your web application brings a number of advantages:
Scalability - It's up to you where to deploy the service. It can share the same server with the web application or you can easily move it to a different server if you see the load going up.
Robustness - If there's a critical bug in either the web application or the service this won't bring the other component down.
Maintanance - Yes, there's a slight overhead as you will have to adjust your deployment process but it's as simple as copying all binaries from the output folder and you will have to do it once only. On the other hand, you won't have to redeploy the application thus brining it down for some time if you just need to fix a small bug in the service.
etc.
Though I recommend you to go with the first option I don't like the idea with timer. There's a much simpler and robust solution. I would implement a WCF service with MSMQ binding as it provides you with a lot of nice features out of the box:
You won't have to implement polling logic. On start up the service will connect to the queue and will sit waiting for new messages.
You can easily use transaction to process queue messages. For example, if there's something wrong with the database and you can't write to it the message which is being processed at the moment won't get lost. This will get back to the queue to be processed later.
You can deploy as many services listening to the same queue as you wish to ensure scalability and availability. WCF will make sure that the same queue message is not processed by more than one service that is if a message is being processed by service A, service B will skip it and get the next available message.
Many other features you can learn about here.
I suggest reading this article for a WCF + MSMQ service sample and see how simple it is to implement one and use the features I mentioned above. As soon as you are done with the WCF service you can easily host it in a windows service.
Hope it helps!

Types of services for long running asp.net processes

Our website has a long running calculation process which keeps the client waiting for a few minutes until it's finished. We've decided we need a design change, and to farm out the processing to a windows or a WCF service, while the client is presented with another page, while we're doing all the calculations.
What's the best way of implementing the service though?
We've looked at background worker processes, but it looks like these are problematic because if IIS can periodically shut down threads
It seems the best thing to use is either a Windows service or a WCF service. Does anyone have a view on which is better for this purpose?
If we host the service on another machine, would it have to be a WCF service?
It looks like it's difficult to have the service (whatever type it is) to communicate back to the website - maybe instead the service can update its results to a database, and the website polls that for the required results later on.
It's an open ended question I know, but does anyone have any ideas?
thanks
I don't think that the true gain in terms of performance will come from the design change.
If I were to chose between windows service and WCF I would go with the Windows service because I would be able to fix an affinity and prioritize as I want. However I will have to implement the logic for serving multiple clients in the same time (which in a WCF service approach will be handled by IIS).
So in terms of performance if you use .NET framework for both the WCF service and Windows service the performance difference will not be major. Windows service would be more "controllable", WCF would be more straight-forward and with no big performance penalties.
For these types of tasks I would focus on highly optimizing the single thread calculation. If you have a complex calculation, can it be written in native code (C or C++)? You could make a .DLL file that is highly optimized and is used by either the Windows service or the WCF service. Using this approach will allow you to select best compiler option and make best use of your machine resources. Also nothing stops you from creating multiple threads in the .DLL function.
The link between the website and the service can be ensured in both cases: through sockets for Windows service (extra code for creating the protocol) or directly through SOAP for the WCF. If you push the results in a database the difficulty would be letting the website (and knowing to wich particular user session) know that the data is there.
So that's what I would do.
Hope it helps.
Cheers!
One way to do this is:
The Client submits the calculation request using a Call to a WCF Service (can be hosted in IIS)
The calculation request is stored in a database With a unique ID
The ID is returned to the Client
A Windows Service (or serveral on several different machines) poll the database for New requests
The Windows service performs the calculation and stores the result to a result table With the ID
The Client polls the result table (using a WCF service) With the ID
When the calculation is finished the result is returned to the client

asp.net infinite loop - can this be done?

This question is about limits imposed to me by ASP.NET (like script timeout etc').
I have a service running under ASP.NET and I want to create a counterpart service for monitoring.
The main service's data is located at a database.
I was thinking about having the monitor service query the database in intervals of 1 second, within a loop, issued by an http request done by the remote client.
Now the actual serving of this monitoring will be done by a client http request, which will make the script loop (written in C#) and when new data is detected it'll aggregate that data into that one looping request output buffer, send it, and exit the loop, thus finishing the request.
The client will have to issue a new request in order to keep getting updates.
This is actually exactly like TCP (precisely like Windows IOCP); You request the service for data and wait for it. When it arrives you fire another request.
My actual question is: Have you done it before? How did it go? Am I limited by some (configurable) limits imposed by the IIS/ASP.NET framework? What are my limits in such situation, or, what are better options without complicating things too much?
Note that I do not expect many such monitoring requests at a time, maybe a few dozens.
This means however that 10 such concurrent monitoring requests will keep 10 threads busy, and the question is; Can it hurt IIS/performance? How will IIS handle 10 busy threads? Will it issue more? What are the limits? This is just one example of a limit I can think of.
I think you main concern in this situation would be timeouts, which are pretty much configurable. But I think that it is a wrong solution - you'd be better of with some background service, running constantly/periodically, and writing the monitoring data to some data store and then your monitoring page would just return it upon request.
if you want your page to display something only if the monitorign data is available- implement it with ajax - on page load query monitoring service, then if some monitoring events are available- render them, if not- sleep and query again.
IMO this would be a much better solution than a reallu long running requests.
I think it won't be a very good idea to monitor a service using ASP.NET due to the following reasons...
What happens when your application pool crashes?
What if you decide to do IISReset? Which application will come up first... the main app, or the monitoring app?
What if the monitoring application hangs due to load?
What if the load is already high on the Main Service. Wouldn't monitoring it every 1 sec, increase the load on the Primary Service, as well as IIS?
You get the idea...

How to invoke code within a web app that isn't externally open?

Say, for example, you are caching data within your ASP.NET web app that isn't often updated. You have another process running outside of the app which ocassionally updates this data, when you do this you would like the cached data to be cleared immediately so that the next request picks up the new data straight away.
The caching service is running in the context of your web app and not externally - what is a good method of calling into the web app to get it to update the cache?
You could of course, just hack a page or web service together called ClearTheCache that does it. This can then be called by your other process. Of course you don't want this process to be externally useable or visible on your web app, so perhaps you could then check that incoming requests to this page are calling localhost, if not throw a 404. Is this acceptable? Could this be spoofed at all (for instance if you used HttpApplication.Request.Url.Host)?
I can think of many different ways to go about this, mainly revolving around creating a page or web service and limiting requests to it somehow, but I'm not sure any are particularly elegant. Neither do I like the idea of the web app routinely polling out to another service to check if it needs to execute something, I'd really like a PUSH solution.
Note: The caching scenario is just an example, I could use out-of-process caching here if needed. The question is really concentrating on invoking code, for any given reason, within a web app externally but in a controlled context.
Don't worry about the limiting to localhost, you may want to push from a different server in future. Instead share a key (asymmetrical or symmetrical doesn't really matter) between the two, have the PUSH service encrypt a block of data (control data for example) and have the receiver decrypt. If the block decrypts correctly and the data is readable you can safely assume that only the service that was supposed to call you has and you can perform the required actions! Not the neatest solution, but allows you to scale beyond a single server.
EDIT
Having said that an asymmetrical key would be better, have the PUSH service hold the private part and the website the public part.
EDIT 2
Have the PUSH service put the date/time it generated the cipher text into the data block, then the client can be sure that a replay attack hasn't taken place by ensuring the date/time is within an acceptable time period (say a minute).
Consider an external caching mechanism like EL's caching block, which would be available to both the web and the service, or a file to cache data to.
HTH.

How to create an ASP.NET web farm?

I am looking for information on how to create an ASP.NET web farm - that is, how to make an ASP.NET application (initially designed to work on a single web server) work on 2, 3, 10, etc. servers?
We created a web application which works fine when, say, there are 500 users at the same time. But now we need to make it work for 10 000 users (working with the web app at the same time).
So we need to set up 20 web servers and make something so that 10 000 users could work with the web app by typing "www.MyWebApp.ru" in their web browsers, though their requests would be handled by 20 web-servers, without their knowing that.
1) Is there special standard software to create an ASP.NET web farm?
2) Or should we create a web farm ourselves, by transferring requests between different web servers manually (using ASP.NET / C#)?
I found very little information on ASP.NET web farms and scalability on the web: in most cases, articles on scalability tell how to optimize and ASP.NET app and make it run faster. But I found no example of a "Hello world"-like ASP.NET web app running on 2 web servers.
Would be great if someone could post a link to an article or, better, tell about one's own experience in ASP.NET "web farming" and addressing scalability issues.
Thank you,
Mikhail.
1) Is there special standard software
to create an ASP.NET web farm?
No.
2) Or should we create a web farm
ourselves, by transferring requests
between different web servers manually
(using ASP.NET / C#)?
No.
To build a web farm, you will need some form of load balancing. For up to 8 servers or so, you can use Network Load Balancing (NLB), which is built in to Windows. For more than 8 servers, you should use a hardware load balancer.
However, load balancing is really just the tip of the iceberg. There are many other issues that you should address, including things like:
State management (cookies, ViewState, session state, etc)
Caching and cache invalidation
Database loading (managing round-trips, partitioning, disk subsystem, etc)
Application pool management (WSRM, pool resets, partitioning)
Deployment
Monitoring
In case it might be helpful, I cover many of these issues in my book: Ultra-Fast ASP.NET: Build Ultra-Fast and Ultra-Scalable web sites using ASP.NET and SQL Server.
I'd say you should configure an NLB cluster (Network Load Balancing), which basically splits all requests between cluster nodes (And as an added benefit detects if things are down and stops sending them requests). There's features built into windows for this, but they don't compare to a hardware device for performance or scalability. If you're using Windows 2008 it really is simple to set one up. If you do this make sure you have a shared machine key or you'll start getting exceptions for viewstate being invalid (When 1 server submits the form and it posts to the other and they're using different keys to encode the data).
You can also use DNS round-robin but at 20 servers presumably in 1 datacenter I wouldn't see a point to going to such crazy lengths. If you've got multiple data centers though this is definitely worth considering (As NLB won't really work well between data centers).
You'll also want to be sure if a user swaps servers they don't loose their session. The simplest way would be to use a Session State database (Configurable in the web.config, or you can do it server-wide in IIS's configs). If you don't use sessions though just turn them off in the Pages directive of the web.config and call it a day. You could also use a session state server, but I don't have any experience with this.
It may also be worth considering spending some time optimizing the code or adding caching directives to static content - it can be very cost-effective even if you only trim the need for a few of those servers.
Hope that helps.
If you keep your server stateless, it is easy with a good router that implements some round-Robbin protocol (that send each call to the single published server ip to a different web server).
if it is not stateless (like - if a login is required, or ssl) than you need to keep each session to the same server.
Here is some info about MS Application Request Routing - you will get everything there:
IIS Load balancing
I would not recommend #2. You will do much better off with a load balancer.
Pay attention to session state management. Unless you configure the load balancer to keep each user on the same web server, you will have to use the session state server or database.
Also, check your code's usage of Application and Cache variables. These will be different on every web server. If those values are static, you may not have a problem. But if they can change, you can end up with different values on each web server.
There used to be a problem with ViewState in 1.x, as explained here. I'm not sure if this problem still exists.
Then, there are some changes that you need to make to the Machine Key in web.config, as explained here.

Resources