To get a search website built quickly I plan to split the work between two teams: One to build the search engine and one to build web UIs (mobile/desktop). My plan is to build the search engine as a set of REST services based on .NET 3.5. UIs may be built using some other technology.
Questions: is the REST interface likely to be a performance bottleneck? How best to avoid this?
REST is unlikley to be a bottleneck in this scenario. It wasn't clear from your post whether you were making REST calls directly from your HTML UI on the client, or whether you were making server-to-server REST calls on the back end. So I'll cover both cases below.
If your REST calls are being made between your client UI and your servers, then using REST or another HTTP remoting approach matters relatively little-- the time it takes to execute the search on the back end and then send the results back down to the client should dwarf the impact of the REST call itself. If you want to improve perf, focus on client-side networking tricks (e.g. HTTP Compression, proper caching headers, etc) and optimizing your search engine itself.
If your architecture is one tier of servers (hosting your web UI) calling another tier (your serach engine), then calling between those tiers over REST also shouldn't add too much to your overall latency. This is because (same as above) running the search and sending results back down to the client will usually take a few hundred milliseconds at least, and the overhead of the back-end REST call (if done properly) will usually be 50ms or less.
That said, it's easy to mess up the client end of server-to-server HTTP calls. For example, many HTTP Client libraries (including .NET's) will by default limit the number of concurrent client connections, which makes sense if you're building an actual client app but will kill your scalability if used from a "client" that's actually a server serving hundreds of users concurrently. Other potential problems include authentication issues, proxy problems, DNS, etc. So be careful to build and configure your REST client code carefully, and be sure to load-test with a few hundred concurrent users!
No. REST is not (and generally cannot be) a bottleneck. REST is HTTP without the fancy HTML page. It's cheaper and faster than a regular web page.
I think it should not effect your performance, but to have a proper use of REST service .Net has ASP.Net MVC which supports REST fully.
Do remember to read through this link http://www.ytechie.com/2008/10/aspnet-mvc-what-about-seo.html
Related
I have multiple AJAX requests going out of my browser.
My UI is comprised of multiple views and the AJAX requests are trying to populate those views simultaneously. In some cases I require more than 10 simultaneous requests to be sent from client and processed concurrently at the server.
But due to browser limitations on max concurrent requests to a single domain and because of HTTP's "A server MUST send its responses to requests in the same order that the requests were received" constraint, I am not deriving as much concurrency in request processing as I would want.
From my application's standpoint, I dont need responses to come in the order in which I sent the request. I am ok if view8 gets populated before view1, for example.
Async processing using Servlet 3.0 constructs seems to address only one-side of the problem (the Server-side) and hence cannot be fully exploited for maximizing application concurrency.
My question is:
Am I missing out on some proper constructs ? ('proper' in contrast to workarounds like "host your images from a different sub domain") that can yield me more concurrency ?
This seems like something many web UIs would need ! If not, then I am designing my UI the wrong way. In either case, I would appreciate your inputs.
Edit1: To my advantage, I dont have to support a huge number of concurrent clients. The maximum number of concurrent clients accessing the app would be < 100. Given that fact, basically am trying to enhance the experience of these clients when I have the processing power available aplenty on my server-side.
Edit2: Our application/API is not for 'public' consumption. For ex: It is like my company's webmail app. It is hosted on the internet but it is not meant for everyone's consumption. Only meant for consumption by the relevant few.
The reason why am giving that info, is to differentiate my app from SO/Twitter, which seem to differentiate their (REST) API users from their normal website users. In our case, we think we should not differentiate that way and want to provide single-set of REST endpoints for both.
The reason behind the limitation in the spec (RFC2616) seems to be : "These guidelines are intended to improve HTTP response and avoid congestion.". However, intranet web apps have more luxuries and should not have to be so constrained !?
The server is exposing REST API and hence the UI makes specific GETs
for various resource catogories (ex: blogs, videos, news, articles).
Since each resource catogory has its exclusive view it all fits in
nicely. It feels wrong to collate requests to get blogs and videos
together in one request. Isnt it ?
Well, IMHO being pragmatic is more important. Sure, it makes sense for a service to expose RESTful API but it's not always necessary to expose the entire API to the browser. Your API can be separate from your server side web app. You can always make those multiple API requests on the server side, collate the results and send them back to the client. For e.g. look at the SO home page. The StackOverflow API does expose a RESTful API but when loading the home page the browser doesn't send across multiple requests just to populate the tags, thread listing etc.
Thanks Sanjay for the suggestion. But we wanted to have a single-API
for both REST clients and Browser clients. Interestingly, the root URI
"stackoverflow.com" is not mentioned in SO's REST API, but the browser
client uses it. I suppose if they had exposed the root URI, their
response would be difficult to process (as it would be a mixture of
data). Their REST API is granular (as is in my application), but their
javascript code uses some other doors(APIs) to decrease no. of
round-trips to the server! Somehow that doesnt feel right (Am a novice
in this field though). Feel free to correct me
SO doesn't use any "other doors". It's just that they simply don't send across 10 concurrent requests for populating something on the page. They make XHR request when you vote, mark thread as favorite, comment etc. For loading the page itself, there are no multiple requests. If you want to directly hit your RESTful API from the browser, you'll have to honor the limitations. Either that or go the desktop way which allows you virtually unlimited connections to your server but I guess you don't want to go that route...
I have a web application running on asp.net 4.0 and oracle 11G.
I am using ado.net to connect database server.
My application is using all ready around 15-20 generic-http-handlers.
I am calling those generic-http-handler from jquery.
I want to use more of these but I am not sure about the effect of this on my appliation.
Kindly suggest is it good idea to go for more generic-http-handers?
Edit 1
I was going through the web to find out how many concurrent http request I can have in the same tab of a browser form the same domain.
I came across a niche question on this topic
How many concurrent AJAX (XmlHttpRequest) requests are allowed in popular browsers?
It suggest that though you ave async=true in your ajax call from jquery but it will have to wait till other http request has finished.
It has also suggested that you can create sub-domain to overcome this issue.
Now can some one can suggest me weather I should go for more or not?
I'm not sure if you are asking about whether it is too many to have 20 handlers defined, or have 20 handlers invoked from jQuery, so I will address both.
In terms of defining many handlers, a generic HTTP handler (ASHX) is similar to the ASP.NET page handler (ASPX), but more lightweight in that it does not have the full lifecycle of a page, and is not intended for returning UI. Many large-scale applications have hundreds of ASPX pages defined, which is consistent with the design intention of ASP.NET Web Forms, where every UI page is a distinct ASPX. So, to have hundreds of ASHX, would be even less heavy than hundreds of ASPX, and no problem at all.
In terms of invoking 20 handlers, here we get into the conversation about "chunky" versus "chatty" interfaces. When interfacing over a WAN (i.e., between browser and server), a "chunky" interface (one which makes a smaller number of heavier calls) is better: when you try to scale your application, a "chatty" interface (one which makes a higher number of lighter calls) will hold open many more connections on the server, will often cause more load on the database in terms of a higher number of transactions and a higher number of open simultaneous connections, and therefore will generally not scale as well on the server side.
On the browser side, the news is even worse. Per HTTP specification, browsers limit you to two simultaneous requests, so if you mean to fire off your 20 requests all at once, it will not happen, which means you may get some performance problem from having so many jquery get/post calls queueing up at one time.
The tradeoff, of course, is that often the programming is often cleaner with a "chatty" interface. So here you must make the judgement about your future scaling needs, versus the importance of cleaner code.
I would say if you're building an application that, for its expected life and evolution, can comfortably run all of its traffic on a single web and single database server; AND, your browser code is set up in such a way that the 2 simultaneous requests will not cause you any performance issue, then it is reasonable to go for the "chatty" interface if it gives you much cleaner code.
But if you expect there to be a need for scaling beyond a single server; OR, there are common use cases where many of these jquery get/posts will be invoked simultaneously and hamper performance, then by all means I would refactor to a more "chunky" interface, which would mean not calling more than 20 handlers from a single page via jQuery.
If you've read this and still can't decide which is right, then I would recommend refactoring the interface to make it more "chunky".
Hope this helps, and best of luck to you!
Some background
I am planning to writing a REST service which helps facilitate collaboration between multiple client systems. Similar to how git or hg handle things I want the client to perform all merging locally and for the server to reject new changes unless they have been merged with existing changes.
How I want to handle it
I don't want clients to have to upload all of their change sets before being told they need to merge first. I would like to do this by performing a POST with the Expect 100 Continue header. The server can then verify that it can accept the change sets based on the header information (not hard for me in this case) and either reject the request or send the 100 Continue status through to the client who will then upload the changes.
My problem
As far as I have been able to figure out so far ASP.NET doesn't support this scenario, by the time you see the request in your controller actions the POST body has normally already been completely uploaded. I've had a brief look at WCF REST but I haven't been able to see a way to do it there either, their conditional PUT example has the full request body before rejecting the request.
I'm happy to use any alternative framework that runs on .net or can easily be made to run on Windows Azure.
I can't recommend WcfRestContrib enough. It's free, and it has a lot of abilities.
But I think you need to use OpenRasta instead of WCF in order to do what you're wanting. There's a lot of stuff out there on it, like wiki, blog post 1, blog post 2. It might be a lot to take in, but it's a .NET framework thats truly focused on being RESTful, and not RPC like WCF. And it has the ability work with headers, like you asked about. It even has PipelineContributors, which have access to the whole context of a call and can halt execution, handle redirections, or even render something different than what was expected.
EDIT:
As far as I can tell, this isn't possible in OpenRasta after all, because "100 continue is usually handled by the hosting environment, not by OR, so there’s no support for it as such, because we don’t get a chance to respond in the asp.net pipeline"
I am building an ASP.NET web application that will be deployed to a 4-node web farm.
My web application's farm is located in California.
Instead of a database for back-end data, I plan to use a set of web services served from a data center in New York.
I have a page /show-web-service-result.aspx that works like this:
1) User requests page /show-web-service-result.aspx?s=foo
2) Page's codebehind queries a web service that is hosted by the third party in New York.
3) When web service returns, the returned data is formatted and displayed to user in page response.
Does this architecture have potential scalability problems? Suppose I am getting hundreds of unique hits per second, e.g.
/show-web-service-result.aspx?s=foo1
/show-web-service-result.aspx?s=foo2
/show-web-service-result.aspx?s=foo3
etc...
Is it typical for web servers in a farm to be using web services for data instead of database? Any personal experience?
What change should I make to the architecture to improve scalability?
You have most definitely a scalability problem: the third-party web service. Unless you have a service-level agreement with that service (agreeing on the number of requests that you can submit per second), chances are real that you overload that service with your anticipated load. That you have four nodes yourself doesn't help you then.
So you should a) come up with an agreement with the third party, and b) test what the actual load is that they can take.
In addition, you need to make sure that your framework can use parallel connections for accessing the remote service. Suppose you have a round-trip time of 20ms from California to New York (which would be fairly good), you can not make more than 50 requests over a single TCP connection. Likewise, starting new TCP connections for every request will also kill performance, so you want pooling on these parallel connections.
I don't see a problem with this approach, we use it quite a bit where I work. However, here are some things to consider:
Is your page rendering going to be blocked while waiting for the web service to respond?
What if the response never comes, i.e. the service is down?
For the first problem I would look into using AJAX to update the page after you get a response back from the web service. You'll also want to consider how to handle the no response or timeout condition.
Finally, you should really think about how you could cache the web service data locally. For example if you are calling a stock quoting service then unless you have a real-time feed, there is no reason to call the web service with every request you get. Store the data locally for a period of time and return that until it becomes stale.
You may have scalability problems but most of these can be carefully engineered around.
I recommend you use ASP.NET's asynchronous tasks so that the web service is queued up, the thread is released while the request waits for the web service to respond, and then another thread picks up when the web service is done to finish off the request.
MSDN Magazine - Wicked Code - Asynchronous Pages in ASP.NET 2.0
Local caching is an absolute must. The fewer times you have to go from California to New York, the better. You might want to look into Microsoft's Velocity (although that's still in CTP) or NCache, or another distributed cache, so that each of your 4 web servers don't all have to make and cache the same data from the web service - once one server gets it, it should be available to all.
Microsoft Project Code Named "Velocity"
NCache
Other things that can go wrong that you should engineer around:
The web service is down (obviously) and data falls out of cache, and you can't get it back. Try to make it so that the data is not actually dropped from cache until you're sure you have an update available. Then the only risk is if the service is down and your application pool is reset, so don't reset it as a first-line troubleshooting maneuver!
There are two different timeouts on web requests, a connect and an overall timeout. Make sure both are set extremely low and you handle both of them timing out. If the service's DNS goes down, this can look like quite a different failure.
Watch perfmon for ASP.NET Queued Requests. This number will rise rapidly if the service goes down and you're not covering it properly.
Research and adjust ASP.NET performance registry settings so you have a highly optimized ASP.NET thread pool. I don't remember the specifics, but I seem to remember that there's a limit on IO Completion Ports and something else of that nature that are absurdly low for the powerful hardware I'm assuming you have on hand.
the trendy answer is REST. Any GET request can be HTTP Response cached (with lots of options on how that is configured) and it will be cached by the internet itself (your ISP, essentially).
Your project has an architecture that reflects they direction that Microsoft and many others in the SOA world want to take us. That said, many people try to avoid this type of real-time risk introduced by the web service.
Your system will have a huge dependency on the web service working in an efficient manner. If it doesn't work, or is slow, people will just see that your page isn't working properly.
At the very least, I would get a web stress tool and performance test your web service to at least the traffic levels you expect to get at peaks, and likely beyond this. When does it break (if ever?), when does it start to slow down? These are good metrics to know.
Other options to look at: perhaps you can get daily batches of data from the web service to a local database and hit the database for your web site. Then, if for some reason the web service is down or slow, you could use the most recently obtained data (if this is feasible for your data).
Overall, it should be doable, but you want to understand and measure the risks, and explore any potential options to minimize those risks.
It's fine. There are some scalability issues. Primarily, with the number of calls you are allowed to make to the external web service per second. Some web services (Yahoo shopping for example) limit how often you can call their service and will lock out your account if you call too often. If you have a large farm and lots of traffic, you might have to throttle your requests.
Also, it's typical in these situations to use an interstitial page that forks off a worker thread to go and do the web service call and redirects to the results page when the call returns. (Think a travel site when you do search, you get an interstitial page while they call out to an external source for the flight data and then you get redirected to a results page when the call completes). This may be unnecessary if your web service call returns quickly.
I recommend you be certain to use WCF, and not the legacy ASMX web services technology as the client. Use "Add Service Reference" instead of "Add Web Reference".
One other issue you need to consider, depending on the type of application and/or data you're pulling down: security.
Specifically, I'm referring to authentication and authorization, both of your end users, and the web application itself. Where are these things handled? All in the web app? by the WS? Or maybe the front-end app is authenticating the users, and flowing the user's identity to the back end WS, allowing that to verify that the user is allowed? How do you verify this? Since many other responders here mention a local data cache on the front end app (an EXCELLENT idea, BTW), this gets even MORE complicated: do you cache data that is allowed to userA, but not for userB? if so, how do you verify that userB cannot access data from the cache? What if the authorization is checked by the WS, how do you cache the permissions then?
On the other hand, how are you verifying that only your web app is allowed to access the WS (and an attacker doesn't directly access your WS data over the Internet, for instance)? For that matter, how do you ensure that your web app contacts the CORRECT WS server, and not a bogus one? And of course I assume that all the connection to the WS is only over TLS/SSL... (but of course also programmatically verify the cert applies to the accessed server...)
In short, its complicated, and many elements to consider here.... but it is NOT insurmountable.
(as far as input validation goes, that's actually NOT an issue, since this should be done by BOTH the front end app AND the back end WS...)
Another aspect here, as mentioned by #Martin, is the need for an SLA on whatever provider/hosting service you have for the NY WS, not just for performance, but also to cover availability. I.e. what happens if the server is inaccessible how quickly they commit to getting it back up, what happens if its down for extended periods of time, etc. That's the only way to legitimately transfer the risk of your availability being controlled by an externality.
I'm implementing a high traffic client web application that uses a lot of REST API's for its data access layer from the cloud database. I said client because it implements REST and not provides it.
REST APIs are implemented server side as well as client side and I need to figure out a good solution for caching. The application is running on a web farm so it I'm leaning toward a distributed caching like memcached. This caching solution will need to be like a proxy layer between my application and REST APIs and support both client side as well as server side.
For example if I make a call to update a record I would update through REST and I'd like to keep updated record in the cache so next calls to that record won't need extra call to the outside REST services.
I want to minimize REST calls as much as possible and would need to keep the data accurate as much as I can, but it doesn't need to be 100% accurate.
What is the best solution for this caching proxy? Is it a standalone application that runs on one of the servers with local cache, or built into current solution using distributed caching? what are you ideas, suggestion or concerns
Thank you,
You hit the nail on the head. You need a caching layer that acts as a proxy to your data.
I suggest that you create a layer that abstracts the concept of the cloud a way a bit. Your client shouldn't care where the data comes from. I would create a repository layer that communicates with the cloud and all other data. Then you can put a service layer on top of that that your client would actually call into. Inside this service layer is where you would implement things like your caching layer.
I used to always suggest using MemCached or MemCached Win32 depending on your environment. MemCached win32 works really well if you are in a windows world! Look to the Enyim client for MemCached win32...it is the least problematic of all the other ports.
If you are open to it though and you are in a .net world then you might try Velocity. MS finally got the clue that there was a hole in their caching framework in that they needed to support the farm concept. Velocity last time I checked is not out of beta yet...but still worth a look.
I generally suggest using the repository and service layer concepts from day one...even though you don't need it. The flexibility it provides for your application is worth having as you never know which direction your application will need to be pulled in. Needing to scale is usually the best reason to need this flexibility. But usually when you need to scale you need to scale now and refactoring in a repository layer and services layer while not impossible is usually semi-complex to do down the road.