StreamReader ReadToEnd() after HttpWebRequest EndGetResponse() - most scalable?

StreamReader ReadToEnd() after HttpWebRequest EndGetResponse() - most scalable? - asp.net

I am calling a RESTful web service in the back-end of some ASP.NET pages.
I am using ASP.NET asynchronous pages, so under the hood I am using the methods:
HttpWebRequest BeginGetResponse()
and
HttpWebRequest EndGetResponse()
The response string in my case is always a JSON string. I use the following code to read the entire string:
using (StreamReader sr = new StreamReader(myHttpWebResponse.GetResponseStream()))
{
myObject.JSONData = sr.ReadToEnd();
}
Is this method OK in terms of scalability? I have seen other code samples that instead retrieve the response data in blocks using Read(). My primary goal is scalability, so this back-end call can be made across many concurrent page hits.
Thanks,
Frank

It depends on what you mean by "scalable". If you're talking about being able to handle bigger and bigger files, I'd say it's not terribly scalable. Since you're using a single ReadToEnd, a huge stream would require the entire stream be read into memory and then acted upon. As the application streams grow in number, complexity and size you're going to find that this will begin to hamper the server's performance to handle requests. You may also find that your application pool will begin to recycle itself DURING your request (if you end up taking that much virtual memory).
If the stream is always going to be smallish and you're only concerned with the number of streams created, I don't see why this wouldn't scale as long as your streams were dependent on open files, database connections, etc.

Related

ASP.NET Web API split response in chuncks

Is it possible to split a response from WEB API into chunks as follow.
I have a win forms app that can handle 100 KB of data at a time.
When this client makes a request to my ASP.NET WEB API for some data let's assume that the WEB API response is 2 MB ... can I somehow cache this response, split it in 100KB chunks and return the first chunk to my app. That response will contain a link/token to the next chunk and so forth? Does this sound crazy? Is this feasible?
One more question: when we are talking about request/response content length (size), what does this means: that the content itself cannot be bigger then 100 KB or the content with headers and so on ... I want to know if headers are included or not in the content length?
Example: if my response is 99KB and the headers are 10 KB (109 KB) will this pass if the limit is 100KB?

Pagination is a pretty common solution to large data sets in webservices. Facebook, for example, will paginate API results when they exceed a certain number of rows. Your idea for locally caching the full result is a good optimization, though you can probably get away with not caching it as a first implementation if you are unsure of whether or not you will keep this as your final solution.
Without caching, you can just pass the page number and total number of pages back to your client, and it can then re-make the call with a specific page number in mind for the next set. This makes for an easy loop on the client, and your data access layer is marginally more complicated since it will only re-serialize certain row numbers depending on the page parameter, but that should still be pretty simple.
Web API has access to the same HttpRuntime.Cache object as the rest of the ASP.NET project types do, so it should be easy to write a wrapper around your data access call and stick the result of a larger query into that cache. Use a token as the key to that value in the cache and pass the key, likely an instance of the GUID class, back to the client with the current page number. On subsequent calls, skip accessing your normal persistence method (DB, file, etc) and instead access the GUID key in the HttpRuntime.Cache and find the appropriate row. One wrinkle with this is if you have multiple webservers hosting your service since the HttpRuntime.Cache will exist on only the machine that took the first call, so unless your load balancer has IP affinity or you have a distributed caching layer, this will be more difficult to implement.

In an ASP.NET application, should I buffer per-request log entries and commit infrequently?

I need to put a customized logging system of sorts in place for an ASP.NET application. Among other things, it has to log some data per request. I've thought of two approaches:
Approach #1: Commit each entry per request. For example: A log entry is created and committed to the database on every request (using a transient DbContext). I'm concerned that this commit puts an overhead on the serving of the request that would not scale well.
Approach #2: Buffer entries, commit periodically. For example: A log entry is created and added to a concurrent buffer on every request (using a shared lock). When a limit in that buffer is exceeded, an exclusive lock is acquired, the buffered entries are committed to the database in one go (using another, also transient DbContext, created and destroyed only for each commit) and the buffer is emptied. I'm aware that this would make the "committing" request slow, but it's acceptable. I'm also aware that closing/restarting the application could result in loss of uncommitted log entries because the AppDomain will change in that case, but this is also acceptable.
I've implemented both approaches within my requirements, I've tested them and I've strained them as much as I could in a local environment. I haven't deployed yet and thus I cannot test them in real conditions. Both seem to work equally well, but I can't draw any conclusions like this.
Which of these two approaches is the best? I'm concerned about performance during peaks of a couple thousand users. Are there any pitfalls I'm not aware of?

To solve your concern with option 1 about slowing down each request, why not use the TPL to offload the logging to a different thread? Something like this:
public class Logger
{
public static void Log(string message)
{
Task.Factory.StartNew(() => { SaveMessageToDB(message); });
}
private static void SaveMessageToDB(string message)
{
// etc.
}
}
The HTTP request thread wouldn't have to wait while the entry is written. You could also adapt option 2 to do the same sort of thing to write the accumulated set of messages in a different thread.
I implemented a solution that is similar to option 2, but in addition to a number limit, there was also a time limit. If no logs entries had been entered in a certain number of seconds, the queue would be dumped to the db.

Use log4net, and set its buffer size appropriately. Then you can go home and have a beer the rest of the day... I believe it's Apache licensed, which means you're free to modify/recompile it for your own needs (fitting whatever definition of "integrated in the application, not third party" you have in mind).
Seriously though - it seems way premature to optimize out a single DB insert per request at the cost of a lot of complexity. If you're doing 10+ log calls per request, it would probably make sense to buffer per-request - but that's vastly simpler and less error prone than writing high-performance multithreaded code.
Of course, as always, the real proof is in profiling - so fire up some tests, and get some numbers. At minimum, do a batch of straight inserts vs your buffered logger and determine what the difference is likely to be per-request so you can make a reasonable decision.
Intuitively, I don't think it'd be worth the complexity - but I have been wrong on performance before.

send large data using webservice

I need to send large data using webservice. the size of data would be between 300 MB to 700 MB. The webservice generates data from SQL database and send to the client. it is in form of DataSet with around 20 to 25 tables.
I tried solution from artical, "How to: Enable a Web Service to Send and Receive Large Amounts of Data" and sample fo Microsoft WSE 3.0, but mostly it is giving me "System.OutOfMemoryException".
I think the problem is WebService buffers data in memory on server and it crosses limit.
i thought two alternate,
(1) send DataTable one by one, but some time one DataTable can have around 100MB to 150MB data
(2)Write file on server and transfer using HttpWebRequest(FTP possible, but FTP server is not accessible currently)
can any one suggest workaround for this problem using webservice?
Thanks,

A dataset will load all the data in memory. It is not suited to transfer that huge amounts of data. DataSets carry a lot of extra information when they are serialized.
If you know the structure of the tables you will need to transfer, create a set of Serializable objects and sending an array of those would help reduce your data payload significantly.
If you must use a DataSet take a look into enabling BinaryRemoting.
BinaryFormatter bf = new BinaryFormatter();
myDataSet.RemotingFormat = SerializationFormat.Binary;
bf.Serialize(s, ResultDataSet);
After reducing your data payload by such means, it would be best to write the files to a publicly accessible location on your http server. Hosting it over http allows clients to download the file far more easily than ftp. You can control access to these http folders by means of proper permissions given to the users.

Does asp.net lifecycle continue if I close the browser in the middle of processing?

I have an ASP.NET web page that connects to a number of databases and uses a number of files. I am not clear what happens if the end user closes the web page before it was finished loading i.e. does the ASP.NET life cycle end or will the server still try to generate the page and return it to the client? I have reasonable knowledge of the life cycle but I cannot find any documentation on this.
I am trying to locate a potential memory leak. I am trying to establish whether all of the code will run i.e. whether the connection will be disposed etc.

The code would still run. There is a property IsClientConnected on the HttpRequest object that can indicate whether the client is still connected if you are doing operations like streaming output in a loop.

Once the request to the page is generated, it will go through to the unload on the life cycle. It has no idea the client isn't there until it sends the information on the unload.
A unique aspect of this is the Dynamic Compilation portion. You can read up on it here: http://msdn.microsoft.com/en-us/library/ms366723
For more information the the ASP.NET Life Cycle, look here:
http://msdn.microsoft.com/en-us/library/ms178472.aspx#general_page_lifecycle_stages
So basically, a page is requested, ASP.NET uses the Dynamic Compilation to basically create the page, and then it attempts to send the page to the client. All code will be run in that you have specified in the code, no matter if the client is there or not to receive it.
This is a very simplified answer, but that is the basics. Your code is compiled, the request generates the response, then the response is sent. It isn't sent in pieces unless you explicitly tell it to.
Edit: Thanks to Chris Lively for the recommendation on changing the wording.

You mention tracking down a potential memory leak and the word "connection". I'm going to guess you mean a database connection.
You should ALWAYS wrap all of your connections and commands in using clauses. This will guarantee the connection/command is properly disposed of regardless of if an error occurs, client disconnects, etc.
There are plenty of examples here, but it boils down to something like:
using (SqlConnection conn = new SqlConnection(connStr)) {
using (SqlCommand cmd = new SqlCommand(conn)) {
// do something here.
}
}
If, for some reason, your code doesn't allow you to do it this way then I'd suggest the next thing you do is restructure it as you've done it wrong. A common problem is that some people will create a connection object at the top of the page execution then re-use that for the life of the page. This is guaranteed to lead to problems, including but not limited to: errors with the connection pool, loss of memory, random query issues, complete hosing of the app...
Don't worry about performance with establishing (and discarding) connections at the point you need them in code. Windows uses a connection pool that is lightning fast and will maintain connections for as long as needed even if your app signals that it's done.
Also note: you should use this pattern EVERY TIME you are using an un-managed class. Those always implement IDisposable.

Static variable across multiple requests

In order to improve speed of chat application, I am remembering last message id in static variable (actually, Dictionary).
Howeever, it seems that every thread has own copy, because users do not get updated on production (single server environment).
private static Dictionary<long, MemoryChatRoom> _chatRooms = new Dictionary<long, MemoryChatRoom>();
No treadstaticattribute used...
What is fast way to share few ints across all application processes?
update
I know that web must be stateless. However, for every rule there is an exception. Currently all data stroed in ms sql, and in this particular case some piece of shared memory wil increase performance dramatically and allow to avoid sql requests for nothing.
I did not used static for years, so I even missed moment when it started to be multiple instances in same application.
So, question is what is simplest way to share memory objects between processes? For now, my workaround is remoting, but there is a lot of extra code and I am not 100% sure in stability of this approach.

I'm assuming you're new to web programming. One of the key differences in a web application to a regular console or Windows forms application is that it is stateless. This means that every page request is basically initialised from scratch. You're using the database to maintain state, but as you're discovering this is fairly slow. Fortunately you have other options.
If you want to remember something frequently accessed on a per-user basis (say, their username) then you could use session. I recommend reading up on session state here. Be careful, however, not to abuse the session object -- since each user has his or her own copy of session, it can easily use a lot of RAM and cause you more performance problems than your database ever was.
If you want to cache information that's relevant across all users of your apps, ASP.NET provides a framework for data caching. The simplest way to use this is like a dictionary, eg:
Cache["item"] = "Some cached data";
I recommend reading in detail about the various options for caching in ASP.NET here.
Overall, though, I recommend you do NOT bother with caching until you are more comfortable with web programming. As with any type of globally shared data, it can cause unpredictable issues which are difficult to diagnosed if misused.

So far, there is no easy way to comminucate between processes. (And maybe this is good based on isolation, scaling). For example, this is mentioned explicitely here: ASP.Net static objects
When you really need web application/service to remember some state in memory, and NOT IN DATABASE you have following options:
You can Max Processes count = 1. Require to move this piece of code to seperate web application. In case you make it separate subdomain you will have Cross Site Scripting issues when accesing this from JS.
Remoting/WCF - You can host critical data in remoting applcation, and access it from web application.
Store data in every process and syncronize changes via memcached. Memcached doesn't have actual data, because it took long tim eto transfer it. Only last changed date per each collection.
With #3 I am able to achieve more than 100 pages per second from single server.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex