Output cache versus application cache? - asp.net

I have an application that uses the application cache to store the responses generated but custom HTTP handlers. The same response is always returned to requests for the same URL, and the entire response is inserted whole into the cache.
If an application is caching per-URL, is there any advantage to using the application cache? Or should I just be using the output cache?
Note that because I'm using a custom HTTP handler, all of this is being done in C#, not in page directives.

Assuming you do not use authorization and no dynamic content, the lower level you go the better the results. The lowest level is kernel mode caching. http://learn.iis.net/page.aspx/154/walkthrough-iis-70-output-caching/
Think of it in terms of an office.
Technically the request chain is: The boss, a secretary, an answering machine, and the phone line provider.
Imagine an office without a secretary. The boss have to answer every call. This is a scenario without a cache at all.
Application cache is a secretary. It handles the calls so the boss (application) don't have to answer just to tell the same thing over and over.
Secretary is someone who sits between the boss and external world. She can handle most simple scenarios. The boss get's bothered when there's no secretary at work (low memory).
But the secretary is a human, so she goes home at some time in the evening (ASPNET application recycles at some time, and application cache get's exposed, so in terms of ASPNET the secretary shares same app donain with the boss).
Here an answering machine comes to play. Not only it can screen a secretary from answering stupid questions over and over again, it screens the boss when no secretary is available. It's just a machine, and the client listens to a nice prerecorded voice or a music (cached item) when neither secretary, no boss can answer them.
IIS caching kernel mode is the answering machine for your asnet "office". An answering machine is much cheaper then a secrtetary. It's just a microcontroller with a tape, it does not even consume coffee, it just plays back a tape or something of that kind.
Well it runs on the same box, but it performs much better, because it just does the simple task of giving out content on maximum speed with it's own low level system resources management.
That said, kernel mode is the perferred way to cache, if you have semi-dynamic content in terms of performance.

First I'll state the usual caveat that it depends on the specific case. Factors such as available web server memory, load, page size, data size, etc.
That said, if there are not a huge number of urls and they don't have to be very fresh then the output cache would have the edge I believe. Especially if you are going to do it publicly, that is encourage caching on isp and browser level. Thus saving load on your server and shortening trip for the return user, or user using same isp or proxy.

I would think it would come down to whether or not you need to adjust the settings of the cache programatically at runtime from within your code or not. If you don't then setting the output cache declaratively would be fine.

Related

how to prevent vulnerability scanning

I have a web site that reports about each non-expected server side error on my email.
Quite often (once each 1-2 weeks) somebody launches automated tools that bombard the web site with a ton of different URLs:
sometimes they (hackers?) think my site has inside phpmyadmin hosted and they try to access vulnerable (i believe) php-pages...
sometimes they are trying to access pages that are really absent but belongs to popular CMSs
last time they tried to inject wrong ViewState...
It is clearly not search engine spiders as 100% of requests that generated errors are requests to invalid pages.
Right now they didn't do too much harm, the only one is that I need to delete a ton of server error emails (200-300)... But at some point they could probably find something.
I'm really tired of that and looking for the solution that will block such 'spiders'.
Is there anything ready to use? Any tool, dlls, etc... Or I should implement something myself?
In the 2nd case: could you please recommend the approach to implement? Should I limit amount of requests from IP per second (let's say not more than 5 requests per second and not more then 20 per minute)?
P.S. Right now my web site is written using ASP.NET 4.0.
Such bots are not likely to find any vulnerabilities in your system, if you just keep the server and software updated. They are generally just looking for low hanging fruit, i.e. systems that are not updated to fix known vulnerabilities.
You could make a bot trap to minimise such traffic. As soon as someone tries to access one of those non-existant pages that you know of, you could stop all requests from that IP address with the same browser string, for a while.
There are a couple of things what you can consider...
You can use one of the available Web Application Firewalls. It usually has set of rules and analytic engine that determine suspicious activities and react accordingly. For example in you case it can automatically block attempts to scan you site as it recognize it as a attack pattern.
More simple (but not 100% solution) approach is check referer url (referer url description in wiki) and if request was originating not from one of you page you rejected it (you probably should create httpmodule for that purpose).
And of cause you want to be sure that you site address all known security issues from OWASP TOP 10 list (OWASP TOP 10). You can find very comprehensive description how to do it for asp.net here (owasp top 10 for .net book in pdf), i also recommend to read the blog of the author of the aforementioned book: http://www.troyhunt.com/
Theres nothing you can do (reliabily) to prevent vulernability scanning, the only thing to do really is to make sure you are on top of any vulnerabilities and prevent vulernability exploitation.
If youre site is only used by a select few and in constant locations you could maybe use an IP restriction

When to use load balancing?

I am just getting in to the more intricate parts of web development. This may not be in the best place. However, when is it best to get load balancing for a web project? I understand that it depends on good design/bad design as to how many users you can get to visit a site without it REALLY effecting the performance. However, I am planning to code a new project that could potentially have a lot of users and I wondered if I should be thinking off the bat about load balancing. Opinions welcome; thanks in advance!
I should not also that the project most likely will be asp.net (webforms or mvc not yet decided) with backend of mongodb or pgsql(again still deciding).
Load balancing can also be a form of high availability. What if your web server goes down? It can take a long time to replace it.
Generally, when you need to think about throughput you are already rich because you have an enormous amount of users.
Stackoverflow is serving 10m unique users a month with a few servers (6 or so). Think about how many requests per day you had if you were constantly generating 10 HTTP responses per second for 8 hot hours: 10*3600*8=288000 page impressions per day. You won't have that many users soon.
And if you do, you optimize your app to 20 requests per second and CPU core which means you get 80 requests per second on a commodity server. That is a lot.
Adding a load balancer later is usually easy. LBs can tag each user with a cookie so they get pinned to one particular target. You app will not notice the difference. Usually.
Is this for an e-commerce site? If so, then the real question to ask is "for every hour that the site is down, how much money are you losing?" If that number is substantial, then I would make load balancing a priority.
One of the more-important architecture decisions that I have seen affect this, is the use of session variables. You need to be able to provide a seamless experience if your user ends-up on different servers during their visit. Session variables won't transfer from server to server, so I would avoid using them.
I support a solution like this at work. We run four (used to be eight) .NET e-commerce websites on three Windows 2k8 servers (backed by two primary/secondary SQL Server 2008 databases), taking somewhere around 1300 (combined) orders per day. Each site is load-balanced, and kept "in the farm" by a keep-alive. The nice thing about this, is that we can take one server down for maintenance without the users really noticing anything. When we bring it back, we re-enable our replication service and our changes get pushed out to the other two servers fairly quickly.
So yes, I would recommend giving a solution like that some thought.
The parameters here that may affect the one the other and slow down the performance are.
Bandwidth
Processing
Synchronize
Have to do with how many user you have, together with the media you won to serve.
So if you have to serve a lot of video/files to deliver, you need many servers to deliver it. Let say that you do not have, what is the next think that need to check, the users and the processing.
From my experience what is slow down the processing is the locking of the session. So one big step to speed up the processing is to make a total custom session handling and your page will no lock the one the other and you can handle with out issue too many users.
Now for next step let say that you have a database that keep all the data, to gain from a load balance and many computers the trick is to make local cache of what you going to show.
So the idea is to actually avoid too much locking that make the users wait the one the other, and the second idea is to have a local cache on each different computer that is made dynamic from the main database data.
ref:
Web app blocked while processing another web app on sharing same session
Replacing ASP.Net's session entirely
call aspx page to return an image randomly slow
Always online
One more parameter is that you can make a solution that can handle the case of one server for all, and all for one :) style, where you can actually use more servers for backup reason. So if one server go off for any reason (eg for update and restart), the the rest can still work and serve.
As you said, it depends if/when load balancing should be introduced. It depends on performance and how many users you want to serve. LB also improves reliability of your app - it will not stop when one system goes crashing down. If you can see your project growing to be really big and serve lots of users I would sugest to design your application to be able to be upgraded to LB, so do not do anything non-standard. Try to steer away of home-made solutions and always follow good practice. If later on you really need LB it should not be required to change your app.
UPDATE
You may need to think ahead but not at a cost of complicating your application too much. Do not go paranoid and prepare everything to work lightning fast 'just in case'. For example, do not worry about sessions - session management can be easily moved to SQL Server at any time and this is the way to go with LB. Caching will also help if you hit some bottlenecks in the future but you do not need to implement it straight away - good design (stable interfaces), separation and decoupling will allow for the cache to be added later on. So again - stick to good practices, do not close doors but also do not open all of them straight away.
You may find this article interesting.

Monitoring ASP.NET and SQL Server for Security

What is the best (or any good) way to monitor an ASP.NET application to ensure that it is secure and to quickly detect intrusion? How do we know for sure that, as of right now, our application is entirely uncompromised?
We are about to launch an ASP.NET 4 web application, with the data stored on SQL Server. The web server runs in IIS on a Windows Server 2008 instance, and the database server runs on SQL Server 2008 on a separate Win 2008 instance.
We have reviewed Microsoft's security recommendations, and I think our application is very secure. We have implemented "defense in depth" and considered a range of attack vectors.
So we "feel" confident, but have no real visibility yet into the security of our system. How can we know immediately if someone has penetrated? How can we know if a package of some kind has been deposited on one of our servers? How can we know if a data leak is in progress?
What are some concepts, tools, best practices, etc.?
Thanks in advance,
Brian
Additional Thoughts 4/22/11
Chris, thanks for the very helpful personal observations and tips below.
What is a good, comprehensive approach to monitoring current application activity for security? Beyond constant vigilance in applying best practices, patches, etc., I want to know exactly what is going on inside my system right now. I want to be able to observe and analyze its activity in a way that clearly shows me which traffic is suspect and which is not. Finally, I want this information to be totally accurate and easy to digest.
How do we efficiently get close to that? Wouldn't a good solution include monitoring logins, database activity, ASP.NET activity, etc. in addition to packets on the wire? What are some examples of how to assume a strong security posture?
Brian
The term you are looking for is Intrusion Detection System (IDS). There is a related term called Intrusion Prevention System (IPS).
IDS's monitor traffic coming into your servers at the IP level and will send alerts based on sophisticated analysis of the traffic.
IPS's are the next generation of IDS which actually attempt to block certain activities.
There are many commercial and open source systems available including Snort, SourceFire, Endace, and others.
In short, you should look at adding one of these systems to your mix for real time monitoring and potentially blocking of hazardous activities.
I wanted to add a bit more information here as the comments area is just a bit small.
The main thing you need to understand are the types of attacks you will see. These are going to range from relatively unsophisticated automated scripts on up to highly sophisticated targeted attacks. They will also hit everything they can see from the web site itself to IIS, .Net, Mail server, SQL (if accessible), right down to your firewall and other exposed machines/services. A wholistic approach is the only way to really monitor what's going on.
Generally speaking, a new site/company is going to be hit with the automated scripts within a few minutes (I'd say 30 at most) of going live. Which is the number one reason new installations of MS Windows keep the network severely locked down during installation. Heck, I've seen machines nailed within 30 seconds of being turned on for the first time.
The approach hackers/worms take is to constantly scan wide ranges of IP addresses, this is followed up with machine fingerprinting for those that respond. Based on the profile they will send certain types of attacks your way. In some cases the profiling step is skipped and they attack certain ports regardless of response. Port 1443 (SQL) is a common one.
Although the most common form of attack, the automated ones are by far the easiest to deal with. Shutting down unused ports, turning off ICMP (ping response), and having a decent firewall in place will keep most of the scanners away.
For the scripted attacks, make sure you aren't exposing commonly installed packages like PhpMyAdmin, IIS's web admin tools, or even Remote Desktop outside of your firewall. Also, get rid of any accounts named "admin", "administrator", "guest", "sa", "dbo", etc Finally make sure your passwords AREN'T allowed to be someones name and are definitely NOT the default one that shipped with a product.
Along these lines make sure your database server is NOT directly accessible outside the firewall. If for some reason you have to have direct access then at the very least change the port # it responds to and enforce encryption.
Once all of this is properly done and secured the only services that are exposed should be the web ones (port 80 / 443). The items that can still be exploited are bugs in IIS, .Net, or your web application.
For IIS and .net you MUST install the windows updates from MS pretty much as soon as they are released. MS has been extremely good about pushing quality updates for windows, IIS, and .Net. Further a large majority of the updates are for vulnerabilities already being exploited in the wild. Our servers have been set to auto install updates as soon as they are available and we have never been burned on this (going back to at least when server 2003 was released).
Also you need to stay on top of the updates to your firewall. It wasn't that long ago that one of Cisco's firewalls had a bug where it could be overwhelmed. Unfortunately it let all traffic pass through when this happened. Although fixed pretty quickly, people were still being hammered over a year later because admins failed to keep up with the IOS patches. Same issue with windows updates. A lot of people have been hacked simply because they failed to apply updates that would have prevented it.
The more targeted attacks are a little harder to deal with. A fair number of hackers are going after custom web applications. Things like posting to contact us and login forms. The posts might include JavaScript that, once viewed by an administrator, could cause credentials to be transferred out or might lead to installing key loggers or Trojans on the recipients computers.
The problem here is that you could be compromised without even knowing it. Defenses include making sure HTML and JavaScript can't be submitted through your site; having rock solid (and constantly updated) spam and virus checks at the mail server, etc. Basically, you need to look at every possible way an external entity could send something to you and do something about it. A lot of Fortune 500 companies keep getting hit with things like this... Google included.
Hope the above helps someone. If so and it leads to a more secure environment then I'll be a happy guy. Unfortunately most companies don't monitor traffic so they have no idea just how much time is spent by their machines fending off this garbage.
I can say some thinks - but I will glad to hear more ideas.
How can we know immediately if someone has penetrated?
This is not so easy and in my opinion, ** an idea is to make some traps** inside your backoffice , together with monitor for double logins from different ips.
a trap can be anything you can think of, for example a non real page that say "create new administrator", or "change administrator password", on backoffice, and there anyone can gets in and try to make a new administrator is for sure a penetrator - of course this trap must be known only on you, or else there is no meaning for that.
For more security, any change to administrators must need a second password, and if some one try to make a real change on administrators account, or try to add any new administrator, and fails on this second password must be consider as a penetrator.
way to monitor an ASP.NET application
I think that any tool that monitor the pages for some text change, can help on that. For example this Network Monitor can monitor for specific text on you page and alert you, or take some actions if this text not found, that means some one change the page.
So you can add some special hiden text, and if you not found, then you can know for sure that some one change the core of your page, and probably is change files.
How can we know if a package of some kind has been deposited on one of our servers
This can be any aspx page loaded on your server and act like a file browser. For this not happens I suggest to add web.config files to the directories that used for uploading data, and on this web.config do not allow anything to run.
<configuration>
<system.web>
<authorization>
<deny users="*" />
</authorization>
</system.web>
</configuration>
I have not tried it yet, but Lenny Zeltser directed me to OSSEC, which is a host-based intrusion detection system that continuously monitors an entire server to detect any suspicious activity. This looks like exactly what I want!
I will add more information once I have a chance to fully test it.
OSSEC can be found at http://www.ossec.net/

Using a remote, external web service instead of a database

I am building an ASP.NET web application that will be deployed to a 4-node web farm.
My web application's farm is located in California.
Instead of a database for back-end data, I plan to use a set of web services served from a data center in New York.
I have a page /show-web-service-result.aspx that works like this:
1) User requests page /show-web-service-result.aspx?s=foo
2) Page's codebehind queries a web service that is hosted by the third party in New York.
3) When web service returns, the returned data is formatted and displayed to user in page response.
Does this architecture have potential scalability problems? Suppose I am getting hundreds of unique hits per second, e.g.
/show-web-service-result.aspx?s=foo1
/show-web-service-result.aspx?s=foo2
/show-web-service-result.aspx?s=foo3
etc...
Is it typical for web servers in a farm to be using web services for data instead of database? Any personal experience?
What change should I make to the architecture to improve scalability?
You have most definitely a scalability problem: the third-party web service. Unless you have a service-level agreement with that service (agreeing on the number of requests that you can submit per second), chances are real that you overload that service with your anticipated load. That you have four nodes yourself doesn't help you then.
So you should a) come up with an agreement with the third party, and b) test what the actual load is that they can take.
In addition, you need to make sure that your framework can use parallel connections for accessing the remote service. Suppose you have a round-trip time of 20ms from California to New York (which would be fairly good), you can not make more than 50 requests over a single TCP connection. Likewise, starting new TCP connections for every request will also kill performance, so you want pooling on these parallel connections.
I don't see a problem with this approach, we use it quite a bit where I work. However, here are some things to consider:
Is your page rendering going to be blocked while waiting for the web service to respond?
What if the response never comes, i.e. the service is down?
For the first problem I would look into using AJAX to update the page after you get a response back from the web service. You'll also want to consider how to handle the no response or timeout condition.
Finally, you should really think about how you could cache the web service data locally. For example if you are calling a stock quoting service then unless you have a real-time feed, there is no reason to call the web service with every request you get. Store the data locally for a period of time and return that until it becomes stale.
You may have scalability problems but most of these can be carefully engineered around.
I recommend you use ASP.NET's asynchronous tasks so that the web service is queued up, the thread is released while the request waits for the web service to respond, and then another thread picks up when the web service is done to finish off the request.
MSDN Magazine - Wicked Code - Asynchronous Pages in ASP.NET 2.0
Local caching is an absolute must. The fewer times you have to go from California to New York, the better. You might want to look into Microsoft's Velocity (although that's still in CTP) or NCache, or another distributed cache, so that each of your 4 web servers don't all have to make and cache the same data from the web service - once one server gets it, it should be available to all.
Microsoft Project Code Named "Velocity"
NCache
Other things that can go wrong that you should engineer around:
The web service is down (obviously) and data falls out of cache, and you can't get it back. Try to make it so that the data is not actually dropped from cache until you're sure you have an update available. Then the only risk is if the service is down and your application pool is reset, so don't reset it as a first-line troubleshooting maneuver!
There are two different timeouts on web requests, a connect and an overall timeout. Make sure both are set extremely low and you handle both of them timing out. If the service's DNS goes down, this can look like quite a different failure.
Watch perfmon for ASP.NET Queued Requests. This number will rise rapidly if the service goes down and you're not covering it properly.
Research and adjust ASP.NET performance registry settings so you have a highly optimized ASP.NET thread pool. I don't remember the specifics, but I seem to remember that there's a limit on IO Completion Ports and something else of that nature that are absurdly low for the powerful hardware I'm assuming you have on hand.
the trendy answer is REST. Any GET request can be HTTP Response cached (with lots of options on how that is configured) and it will be cached by the internet itself (your ISP, essentially).
Your project has an architecture that reflects they direction that Microsoft and many others in the SOA world want to take us. That said, many people try to avoid this type of real-time risk introduced by the web service.
Your system will have a huge dependency on the web service working in an efficient manner. If it doesn't work, or is slow, people will just see that your page isn't working properly.
At the very least, I would get a web stress tool and performance test your web service to at least the traffic levels you expect to get at peaks, and likely beyond this. When does it break (if ever?), when does it start to slow down? These are good metrics to know.
Other options to look at: perhaps you can get daily batches of data from the web service to a local database and hit the database for your web site. Then, if for some reason the web service is down or slow, you could use the most recently obtained data (if this is feasible for your data).
Overall, it should be doable, but you want to understand and measure the risks, and explore any potential options to minimize those risks.
It's fine. There are some scalability issues. Primarily, with the number of calls you are allowed to make to the external web service per second. Some web services (Yahoo shopping for example) limit how often you can call their service and will lock out your account if you call too often. If you have a large farm and lots of traffic, you might have to throttle your requests.
Also, it's typical in these situations to use an interstitial page that forks off a worker thread to go and do the web service call and redirects to the results page when the call returns. (Think a travel site when you do search, you get an interstitial page while they call out to an external source for the flight data and then you get redirected to a results page when the call completes). This may be unnecessary if your web service call returns quickly.
I recommend you be certain to use WCF, and not the legacy ASMX web services technology as the client. Use "Add Service Reference" instead of "Add Web Reference".
One other issue you need to consider, depending on the type of application and/or data you're pulling down: security.
Specifically, I'm referring to authentication and authorization, both of your end users, and the web application itself. Where are these things handled? All in the web app? by the WS? Or maybe the front-end app is authenticating the users, and flowing the user's identity to the back end WS, allowing that to verify that the user is allowed? How do you verify this? Since many other responders here mention a local data cache on the front end app (an EXCELLENT idea, BTW), this gets even MORE complicated: do you cache data that is allowed to userA, but not for userB? if so, how do you verify that userB cannot access data from the cache? What if the authorization is checked by the WS, how do you cache the permissions then?
On the other hand, how are you verifying that only your web app is allowed to access the WS (and an attacker doesn't directly access your WS data over the Internet, for instance)? For that matter, how do you ensure that your web app contacts the CORRECT WS server, and not a bogus one? And of course I assume that all the connection to the WS is only over TLS/SSL... (but of course also programmatically verify the cert applies to the accessed server...)
In short, its complicated, and many elements to consider here.... but it is NOT insurmountable.
(as far as input validation goes, that's actually NOT an issue, since this should be done by BOTH the front end app AND the back end WS...)
Another aspect here, as mentioned by #Martin, is the need for an SLA on whatever provider/hosting service you have for the NY WS, not just for performance, but also to cover availability. I.e. what happens if the server is inaccessible how quickly they commit to getting it back up, what happens if its down for extended periods of time, etc. That's the only way to legitimately transfer the risk of your availability being controlled by an externality.

Allowing Session in a Web Farm? Is StateServer Good Enough?

First of all to give you a bit of background on the current environment. We have a number of ASP.NET applications, all of which use session for certain aspects. We are "Load Balanced" over multiple servers due to traffic levels, however, our load balancing is set to use "Sticky Sessions" as currently all web applications are set to use "InProc" for session state.
We are looking at being able to remove the "Sticky Sessions" configuration on our load balancer, as due to our traffic loads servers can and do get overloaded. We want to go with a more balanced approach, but must be able to use session.
I know that SqlServer for session state will work, but for reasons beyond our control, we cannot use SqlServer to store our state. In researching it seems that StateServer is our best bet. We have an additional server, with loads of memory sitting around. This server could be our StateServer for the entire Web Cluster. We just want to know the following things.
1.) Besides any potential serialization issues with the switch from InProc to StateServer, are there any major known issues with losing session objects or generating errors with the above listed environment?
2.) Aside from the single point of failure, and slighly slower performance are there any other gotchas that we need to be aware of with using StateServer.
3.) Are there any metrics that show the performance differences between the three types of state storage?
Here is a decent FAQ on asp.net state: http://www.eggheadcafe.com/articles/20021016.asp
From that Article, here is some information on StateServer:
In a web farm, make sure you have the same MachineKey in all your web servers. See KB 313091 on how to do it.
Also, make sure your objects are serializable. See KB 312112 for details.
For session state to be maintained across different web servers in the web farm, the Application Path of the website (For example \LM\W3SVC\2) in the IIS Metabase should be identical in all the web servers in the web farm. See KB 325056 for details
I have only used sql and in-proc. But these 3 that apply when using sql server apply as well:
Avoid storing too much information in the session, as it affects both in serialization and data transmitted over the network.
Make sure you don't have anything that depends on the Session_onEnd. This is just not available for out of process sessions.
Turn off session on pages that doesn't uses it. This don't make a difference for in-process session, but for out of process it will save you a lot.
Make sure your server etag ids are synchronized across the web farm otherwise caching at client browsers will be upset.
Have you reviewed your code in detail to make sure everything can be serialized out of process and across a LAN efficiently?
Are you solving the main performance problem within your system? I ask because the database is the typical source of contention.
My main motivation for moving away from sticky sessions was operational flexibility i.e. cycle down a problematic server or to deploy a software upgrade. So having implemented a central session state service make sure you take full advantage from an operational stand point.
In my experience we've found out that native state server or even using SQL Server for sessions is a very scary scenario as both have issues (mainly performance). By the way, we are also using sticky sessions.
I think you can explore other products for this to achive the absolute best. A free option would be Velocity but it is still not released.
And another comprehensive but proven product will be (Very expensive actually) NCache. THis will even help in your serilizations with less cost, If you use their API's it will be even better results.
Take a look and see which looks best for you.
About SQL Server, you server will die very soon if you have enough number of hits coming in (I belive you have some hits already which yielded you to do Web Farm or you do it just for the sake of redundancy)
Bottom line: We are evaluating Velocity because NCAchce is really expensive. However advantages are huge.
We are using StateServer for a very small web farm with only two nodes for a few hundred users.
I'm not responsible for its operation but I remember only two issues in two years where the service had to be restarted because it crashed.
I would like to another one more point to the accepted answer:
Make sure the version of framework dlls is the same.
In my case the System.Web dll versions were different as a few windows updates were skipped on one of the servers of the farm.

Resources