Guidelines for providing large downloads in IIS + ASP.NET (MVC) - asp.net

We want to allow users to download large files from our ASP.NET MVC2 system.
We are providing the files through the Controller.File method, which streams from FileStream to Response.OutputStream.
The reason we use Controller.File instead of providing a direct link is that we need to verify security rules on the logged in (Forms authentication) user.
What would be the largest areas of concern when doing this?
Security: we'll probably need to increase executionTimeout. Does this expose security issues?
Memory: I assume that, since Controller.File is streaming the contents directly from disk, there are little memory implications.
CPU: I read on various blogs that providing large downloads is heavy on the cpu, but these were unconfirmed statements, so I did not find any recommendations from MS.
Network: how many concurrent downloads are possible? Can we throttle, so that other traffic is not hindered by this?
Other?
What would be your recommendations?
What would be other options than going through the ASP.NET pipeline, but still provides us with the data we need to validate the logged in user. ISAPI is said to reduce CPU and Memory, maybe some other advantages here?
Are there any (official) guidelines or best practices available concerning this?

I would look to do it asynchronously. Make sure buffering is switched off, that way your data is sent to the client rather than asp waiting for you to finish. if you're already streaming, then that's a good thing. I'm assuming you mean you read x bytes from the filestream and write those bytes to the output stream, repeat until EOF.
I'm not aware of any guidelines I can point you at. The above come from my own experience.

Related

Adding Encryption to Solr/lucene indexes

I am currently using Solr to perform search services over some sensitive records.
As Solr/lucene provides fast searching by storing inverted indexes of the sensitive information in plain text on a disk there is a requirement to encrypt these index files so that unauthorized people can't have access to them by bypassing the system's security.
I found there are similar patches open on Apache JIRA AES encrypted directory and Codec for index-level encryption.
AES encrypted directory looks promising but this patch has been implemented for lucene 3.1 as I am using the newer version, I am not sure if this patch can be used with lucene version 5 or higher.
I was wondering if there is a way to implement a security measure that encrypts the indexes or if it is possible to write some custom plugin which can encrypt/decrypt the indexes on I/O level(i.e FsDirectory)?
The discussion in the comment section of LUCENE-6966 you have shared is really interesting. I would reason with this quote of Robert Muir that there is nothing baked into Solr and probably will never be.
More importantly, with file-level encryption, data would reside in an unencrypted form in memory which is not acceptable to our security team and, therefore, a non-starter for us.
This speaks volumes. You should fire your security team! You are wasting your time worrying about this: if you are using lucene, your data will be in memory, in plaintext, in ways you cannot control, and there is nothing you can do about that!
Trying to guarantee anything better than "at rest" is serious business, sounds like your team is over their head.
So you should consider to encrypt the storage Solr is using on OS level. This should be transparent for Solr. But if someone comes into your system, he should not be able to copy the Solr data.
This is also the conclusion the article Encrypting Solr/Lucene indexes from Erick Erickson of Lucidwors draws in the end
The short form is that this is one of those ideas that doesn't stand up to scrutiny. If you're concerned about security at this level, it's probably best to consider other options, from securing your communications channels to using an encrypting file system to physically divorcing your system from public networks. Of course, you should never, ever, let your working Solr installation be accessible directly from the outside world, just consider the following: http://server:port/solr/update?stream.body=<delete><query>*:*</query></delete>!

Building ASP.Net web page to test user's connection (bandwidth)

We need to add a feature to our website that allows the user to test his connection speed (upload/download). After some research I found that the way this being done is downloading/uploading a file and divide the file size by the time required for that task, like here http://www.codeproject.com/KB/IP/speedtest.aspx
However, this is not possible in my case, this should be a web application so I can't download/upload files to/from user's machine for obvious security reasons. From what I have read this download/upload tasks should be on the server, but how?
This website has exactly what I have in mind but in php http://www.brandonchecketts.com/open-source-speedtest. Unfortunately, I don't have any background about php and this task is really urgent.
I really don't need to use stuff like speedtest
thanks
The formula for getting the downloadspeed in bit/s:
DataSize in byte * 8 / (time in seconds to download - latency to the server).
Download:
Host a blob of data (html document, image or whatever) of which you know the downloadsize on your server, make a AJAX GET request to fetch it and measure the amount of time between the start of the download and when it's finished.
This can be done with only Javascript code, but I would probably recommend you to include a timestamp when the payload was generated in order to calculate the latency.
Upload:
Make a AJAX POST request towards the server which you fill with junk data of a specific size, calculate the time it took to perform this POST.
List of things to keep in mind:
You will not be able to measure higher bandwith than your servers
Concurrent users will drain the bandwith, build a "limit" if this will be a problem
It is hard to build a service to measure bandwidth with precision, the one you linked to is presenting 5Mbit/s to me when my actual speed is around 30Mbit/s according to Bredbandskollen.
Other links on the subject
What's a good way to profile the connection speed of Web users?
Simple bandwidth / latency test to estimate a users experience
Note: I have not built any service like this, but it should do the trick.

Adding more hardware v/s refactoring code under a time crunch

Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.

Multiple requests to server question

I have a DB with user accounts information.
I've scheduled a CRON job which updates the DB with every new user data it fetches from their accounts.
I was thinking that this may cause a problem since all requests are coming from the same IP address and the server may block requests from that IP address.
Is this the case?
If so, how do I avoid being banned? should I be using a proxy?
Thanks
You get banned for suspicious (or malicious) activity.
If you are running a normal business application inside a normal company intranet you are unlikely to get banned.
Since you have access to user accounts information, you already have a lot of access to the system. The best thing to do is to ask your systems administrator, since he/she defines what constitutes suspicious/malicious activity. The systems administrator might also want to help you ensure that your database is at least as secure as the original information.
should I be using a proxy?
A proxy might disguise what you are doing - but you are still doing it. So this isn't the most ethical way of solving the problem.
Is the cron job that fetches data from this "database" on the same server? Are you fetching data for a user from a remote server using screen scraping or something?
If this is the case, you may want to set up a few different cron jobs and do it in batches. That way you reduce the amount of load on the remote server and lower the chance of wherever you are getting this data from, blocking your access.
Edit
Okay, so if you have not got permission to do scraping, obviously you are going to want to do it responsibly (no matter the site). Try gather as much data as you can from as little requests as possible, and spread them out over the course of the whole day, or even during times that a likely to be low load. I wouldn't try and use a proxy, that wouldn't really help the remote server, but it would be a pain in the ass to you.
I'm no iPhone programmer, and this might not be possible, but you could try have the individual iPhones grab the data so all the source traffic isn't from the same IP. Just an idea, otherwise just try to be a bit discrete.
Here are some tips from Jeff regarding the scraping of Stack Overflow, but I'd imagine that the rules are similar for any site.
Use GZIP requests. This is important! For example, one scraper used 120 megabytes of bandwidth in only 3,310 hits which is substantial. With basic gzip support (baked into HTTP since the 90s, and universally supported) it would have been 20 megabytes or less.
Identify yourself. Add something useful to the user-agent (ideally, a link to an URL, or something informational) so we can see your bot as something other than "generic unknown anonymous scraper."
Use the right formats. Don't scrape HTML when there is a JSON or RSS feed you could use instead. Heck, why scrape at all when you can download our cc-wiki data dump??
Be considerate. Pulling data more than every 15 minutes is questionable. If you need something more timely than that ... why not ask permission first, and make your case as to why this is a benefit to the SO community and should be allowed? Our email is linked at the bottom of every single page on every SO family site. We don't bite... hard.
Yes, you want an API. We get it. Don't rage against the machine by doing naughty things until we build it. It's in the queue.

How to determine the size in bytes of the ASP.NET Cache?

I'm in active development of an ASP.NET web application that is using server side caching and I'm trying to understand how I can monitor the size this cache during some scale testing. The cache stores XML documents of various sizes, some of which are multi-megabyte.
On the System.Web.Caching.Cache object of System.Web.Caching namespace I see various properties, including Count, which gets "the number of items stored in the cache" and EffectivePrivateBytesLimit, which gets "the number of bytes available for the cache." Nothing tells me the size in bytes of the cache.
In the Understanding Caching Technologies section of the "Caching Architecture Guide for .NET Framework Applications" guide, there is a "Managing the Cache Object" section with a table (Table 2.2: Application performance counters for monitoring a cache) listing a set of application performance counters, but I don't see any that give me the current size of the cache.
What is a good way to find the size this cache? Do I have to set a byte limit on the cache and look at one of the turnover rates? Am I thinking of this problem in the wrong way? Is the answer to How to determine total size of ASP.Net cache really the best way to go?
I was about to give a less detailed account of the answer you refer to in your question until I read that. I would refer you to this, seems spot on to me. No better way than seeing the physical size on the server, anything else might not be accurate.
You might want to set up some monitoring, for which a Powershell script might be handy to record and send on to yourself in a report. This way you could run various tests overnight say and summarise it.
On a side note, they sound like very large documents to be putting in a memory cache. Have you considered a disk based cache for these larger items and leaving the memory for smaller items which is more ideal for it. If your disks are reasonably fast this should be fairly performant.

Resources