I need to build a proxy (maybe a bad description) that receives an XML file from a 3rd party, saves it, sends it on to another 3rd party, gets the response back and passes that back to the original 3rd party. Let's call that entire process a "unit".
Should I use a webservice? A Generic Handler? Something else?
I might have to do 20 "units" per second, but I know that each "unit" may span 30 seconds to a minute each, so really, I mean that I need to be able to have 1200 of these "units" running at the same time, in all varying stages of the process that I described above.
As far as the file saving goes, I eventually want to put this into a database, but I would imagine that writing the file is quicker than actually saving the data into a database, so I'll just have another process that isn't nearly as time critical as this grab the files and insert them into the DB at its own convenience.
The "app" will only consist of 1 page and it will be running under SSL. This will likely be the only thing on this server at any given time to ensure that this little process is not a bottleneck.
What in .Net would be a good (fast and scalable) way to go about this? I don't have any effective limit on what I would need as far as hardware goes -- so I can get a screaming machine if it would guarantee no bottlenecks.
Since webservices are based on XML you need to consider the fact that you could end up with "XML inside XML". But part from that I'd say using webservices is a good way to go. Mostly because it is compatible, easy to use and easy to understand (for future maintainers).
There are however alternatives that use less CPU/memory/bandwith. WCF provides several models to solve this both in regard to running under IIS or stand-alone process and transfer type.
Personally I'm a fan of plain old binary transfer through TCP. REST could be one way to go as it is compatible (frontend proxy/caching for instance) and essentially gives you a binary transfer with little overhead.
I also like to leave the dirty work to IIS, so I avoid stand-alone WCF apps. I assume IIS is faster and more stable than what I can do easily.
Maybe my question on high concurrent load can be of help.
I would write a WCF service, use REST to simplify it's URLs, and set the WCF service to run as a singleton so that your memory doesn't get out of control.
Good article on WCF: http://www.c-sharpcorner.com/UploadFile/sridhar_subra/116/
Related
I am working on a project and implementing the search functionality.
I have a text box and there will be an auto suggestion implemented.
I have two ways to go.
Make a single call to the DB and filter the list of the auto-suggest or.
Make multiple calls in the DB and update the auto-suggest list using ajax
what is the best solution performance-wise and why?
It depends on how "heavy" both approaches are from the database perspective and how fast should auto-suggest response be. Well-behaved application built on connection pool pattern should not take too many resources for 2nd approach, however, this way network traffic and latency come into play. On the other hand, 1st approach might take more resources.
So I would recommend testing it out in real conditions using a load testing tool like Apache JMeter, producing the same load against 2 implementations, and measuring which one works faster and consumes fewer resources. See The Real Secret to Building a Database Test Plan With JMeter to get familiarized with the databases load testing concept.
I am working on an application at the moment that is using as a caching strategy the reading and writing of data to text files in a read/write directory within the application.
My gut reaction is that this is sooooo wrong.
I am of the opinion that these values should be stored in the ASP.NET Cache or another dedicated in-memory cache such as Redis or something similar.
Can you provide any data to back up my belief that writing to and reading from text files as a form of cache on the webserver is the wrong thing to do? Or provide any data to prove me wrong and show that this is the correct thing to do?
What other options would you provide to implement this caching?
EDIT:
In one example, a complex search is performed based on a keyword. The result from this search is a list of Guids. This is then turned into a concatenated, comma-delimited string, usually less than 100,000 characters. This is then written to a file using that keyword as its name so that other requests using this keyword will not need to perform the complex search. There is an expiry - I think three days or something, but I don't think it needs to (or should) be that long
I would normally use the ASP.NET Server Cache to store this data.
I can think of four reasons:
Web servers are likely to have many concurrent requests. While you can write logic that manages file locking (mutexes, volatile objects), implementing that is a pain and requires abstraction (an interface) if you plan to be able to refactor it in the future--which you will want to do, because eventually the demand on the filesystem resource will be heavier than what can be addressed in a multithreaded context.
Speaking of which, unless you implement paging, you will be reading and writing the entire file every time you access it. That's slow. Even paging is slow compared to an in-memory operation. Compare what you think you can get out of the disks you're using with the Redis benchmarks from New Relic. Feel free to perform your own calculation based on the estimated size of the file and the number of threads waiting to write to it. You will never match an in-memory cache.
Moreover, as previously mentioned, asynchronous filesystem operations have to be managed while waiting for synchronous I/O operations to complete. Meanwhile, you will not have data consistent with the operations the web application executes unless you make the application wait. The only way I know of to fix that problem is to write to and read from a managed system that's fast enough to keep up with the requests coming in, so that the state of your cache will almost always reflect the latest changes.
Finally, since you are talking about a text file, and not a database, you will either be determining your own object notation for key-value pairs, or using some prefabricated format such as JSON or XML. Either way, it only takes one failed operation or one improperly formatted addition to render the entire text file unreadable. Then you either have the option of restoring from backup (assuming you implement version control...) and losing a ton of data, or throwing away the data and starting over. If the data isn't important to you anyway, then there's no reason to use the disk. If the point of keeping things on disk is to keep them around for posterity, you should be using a database. If having a relational database is less important than speed, you can use a NoSQL context such as MongoDB.
In short, by using the filesystem and text, you have to reinvent the wheel more times than anyone who isn't a complete masochist would enjoy.
I have a couple of ActionMethods that returns content from the database that is not changing very often (eg.: a polygon list of available ZIP-Areas, returned as json; changes twice per year).
I know, there is the [OutputCache(...)] Attribute, but this has some disadvantages (a long time client-side caching is not good; if the server/iis/process gets restartet the server-side cache also stopps)
What i want is, that MVC stores the result in the file system, calculates the hash, and if the hash hasn't changed - it returns a HTTP Status Code 304 --> like it is done with images by default.
Does anybody know a solution for that?
I think it's a bad idea to try to cache data on the file system because:
It is not going to be much faster to read your data from file system than getting it from database, even if you have it already in the json format.
You are going to add a lot of logic to calculate and compare the hash. Also to read data from a file. It means new bugs, more complexity.
If I were you I would keep it as simple as possible. Store you data in the Application container. Yes, you will have to reload it every time the application starts but it should not be a problem at all as application is not supposed to be restarted often. Also consider using some distributed cache like App Fabric if you have a web farm in order not to come up with different data in the Application containers on different servers.
And one more important note. Caching means really fast access and you can't achieve it with file system or database storage this is a memory storage you should consider.
In order to improve speed of chat application, I am remembering last message id in static variable (actually, Dictionary).
Howeever, it seems that every thread has own copy, because users do not get updated on production (single server environment).
private static Dictionary<long, MemoryChatRoom> _chatRooms = new Dictionary<long, MemoryChatRoom>();
No treadstaticattribute used...
What is fast way to share few ints across all application processes?
update
I know that web must be stateless. However, for every rule there is an exception. Currently all data stroed in ms sql, and in this particular case some piece of shared memory wil increase performance dramatically and allow to avoid sql requests for nothing.
I did not used static for years, so I even missed moment when it started to be multiple instances in same application.
So, question is what is simplest way to share memory objects between processes? For now, my workaround is remoting, but there is a lot of extra code and I am not 100% sure in stability of this approach.
I'm assuming you're new to web programming. One of the key differences in a web application to a regular console or Windows forms application is that it is stateless. This means that every page request is basically initialised from scratch. You're using the database to maintain state, but as you're discovering this is fairly slow. Fortunately you have other options.
If you want to remember something frequently accessed on a per-user basis (say, their username) then you could use session. I recommend reading up on session state here. Be careful, however, not to abuse the session object -- since each user has his or her own copy of session, it can easily use a lot of RAM and cause you more performance problems than your database ever was.
If you want to cache information that's relevant across all users of your apps, ASP.NET provides a framework for data caching. The simplest way to use this is like a dictionary, eg:
Cache["item"] = "Some cached data";
I recommend reading in detail about the various options for caching in ASP.NET here.
Overall, though, I recommend you do NOT bother with caching until you are more comfortable with web programming. As with any type of globally shared data, it can cause unpredictable issues which are difficult to diagnosed if misused.
So far, there is no easy way to comminucate between processes. (And maybe this is good based on isolation, scaling). For example, this is mentioned explicitely here: ASP.Net static objects
When you really need web application/service to remember some state in memory, and NOT IN DATABASE you have following options:
You can Max Processes count = 1. Require to move this piece of code to seperate web application. In case you make it separate subdomain you will have Cross Site Scripting issues when accesing this from JS.
Remoting/WCF - You can host critical data in remoting applcation, and access it from web application.
Store data in every process and syncronize changes via memcached. Memcached doesn't have actual data, because it took long tim eto transfer it. Only last changed date per each collection.
With #3 I am able to achieve more than 100 pages per second from single server.
Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.
I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.
If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.
If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.
No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.
I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.
Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.