Caching of data in a text file — Better options

Caching of data in a text file — Better options - asp.net

I am working on an application at the moment that is using as a caching strategy the reading and writing of data to text files in a read/write directory within the application.
My gut reaction is that this is sooooo wrong.
I am of the opinion that these values should be stored in the ASP.NET Cache or another dedicated in-memory cache such as Redis or something similar.
Can you provide any data to back up my belief that writing to and reading from text files as a form of cache on the webserver is the wrong thing to do? Or provide any data to prove me wrong and show that this is the correct thing to do?
What other options would you provide to implement this caching?
EDIT:
In one example, a complex search is performed based on a keyword. The result from this search is a list of Guids. This is then turned into a concatenated, comma-delimited string, usually less than 100,000 characters. This is then written to a file using that keyword as its name so that other requests using this keyword will not need to perform the complex search. There is an expiry - I think three days or something, but I don't think it needs to (or should) be that long
I would normally use the ASP.NET Server Cache to store this data.

I can think of four reasons:
Web servers are likely to have many concurrent requests. While you can write logic that manages file locking (mutexes, volatile objects), implementing that is a pain and requires abstraction (an interface) if you plan to be able to refactor it in the future--which you will want to do, because eventually the demand on the filesystem resource will be heavier than what can be addressed in a multithreaded context.
Speaking of which, unless you implement paging, you will be reading and writing the entire file every time you access it. That's slow. Even paging is slow compared to an in-memory operation. Compare what you think you can get out of the disks you're using with the Redis benchmarks from New Relic. Feel free to perform your own calculation based on the estimated size of the file and the number of threads waiting to write to it. You will never match an in-memory cache.
Moreover, as previously mentioned, asynchronous filesystem operations have to be managed while waiting for synchronous I/O operations to complete. Meanwhile, you will not have data consistent with the operations the web application executes unless you make the application wait. The only way I know of to fix that problem is to write to and read from a managed system that's fast enough to keep up with the requests coming in, so that the state of your cache will almost always reflect the latest changes.
Finally, since you are talking about a text file, and not a database, you will either be determining your own object notation for key-value pairs, or using some prefabricated format such as JSON or XML. Either way, it only takes one failed operation or one improperly formatted addition to render the entire text file unreadable. Then you either have the option of restoring from backup (assuming you implement version control...) and losing a ton of data, or throwing away the data and starting over. If the data isn't important to you anyway, then there's no reason to use the disk. If the point of keeping things on disk is to keep them around for posterity, you should be using a database. If having a relational database is less important than speed, you can use a NoSQL context such as MongoDB.
In short, by using the filesystem and text, you have to reinvent the wheel more times than anyone who isn't a complete masochist would enjoy.

Related

Handle ActionResults as cachable, "static content" in ASP.NET MVC (4)

I have a couple of ActionMethods that returns content from the database that is not changing very often (eg.: a polygon list of available ZIP-Areas, returned as json; changes twice per year).
I know, there is the [OutputCache(...)] Attribute, but this has some disadvantages (a long time client-side caching is not good; if the server/iis/process gets restartet the server-side cache also stopps)
What i want is, that MVC stores the result in the file system, calculates the hash, and if the hash hasn't changed - it returns a HTTP Status Code 304 --> like it is done with images by default.
Does anybody know a solution for that?

I think it's a bad idea to try to cache data on the file system because:
It is not going to be much faster to read your data from file system than getting it from database, even if you have it already in the json format.
You are going to add a lot of logic to calculate and compare the hash. Also to read data from a file. It means new bugs, more complexity.
If I were you I would keep it as simple as possible. Store you data in the Application container. Yes, you will have to reload it every time the application starts but it should not be a problem at all as application is not supposed to be restarted often. Also consider using some distributed cache like App Fabric if you have a web farm in order not to come up with different data in the Application containers on different servers.
And one more important note. Caching means really fast access and you can't achieve it with file system or database storage this is a memory storage you should consider.

.Net Scenario Based Opinion

I am facing a situation where I am stuck in a very heavy traffic load and keeping the performance high at the same time. Here is my scenario, please read it and advise me with your valuable opinion.
I am going to have a three way communication between my server, client and visitor. When visitor visits my client's website, he will be detected and sent to a intermediate Rule Engine to perform some tasks and output a filtered list of different visitors on my server. On the other side, I have a client who will access those lists. Now what my initial idea was to have a Web Service at my server who will act as a Rule Engine and output resultant lists on an ASPX page. But this seems to be inefficient because there will be huge traffic coming in and the clients will continuously requesting data from those lists so it will be a performance overhead. Kindly suggest me what approach should I do to achieve this scenario so that no deadlock will happen and things work smoothly. I also considered the option for writing and fetching from XML file but its also not very good approach in my case.
NOTE: Please remember that no DB will involve initially, all work will remain outside DB.

Wow, storing data efficiently without a database will be tricky. What you can possibly consider is the following:
Store the visitor data in an object list of some sort and keep it in the application cache on the server.
Periodically flush this list (say after 100 items in the list) to a file on the server - possibly storing it in XML for ease of access (you can associate a schema with it as well to make sure you always get the same structure you need). You can perform this file-writing asynchronously as to avoid keeping the thread locked while writing the file.
The Web Service sounds like a good idea - make it feed off the XML file. Possibly consider breaking the XML file up into several files as well. You can even cache the contents of this file separately so the service feeds of the cached data for added performance benefits...

Adding more hardware v/s refactoring code under a time crunch

Background:
Enterprise application - very will written for its time in 2004.
Stack:
.NET, Heavy use of Remoting, ASMX style web services, SQL Server
Problem:
The application allows user to go through various wizards for lack of a better term, all of their actions are stored in what we call "wiz state", which is essentially XML that is persisted to a SQL server database very frequently because we allow users to pause/resume their application. Often in these wizards, the XML that comprises the wizard state grows very large, I'm talking 5-8 MB of data, and we noticed that when we had a sudden influx of simultaneous users, we started receiving occasional timeouts against the database, because a lot of what the wizard state is comprised of, is keeping track of collections of "things". Sometimes these custom collections grow very large.
Question:
We were in a meeting today and we're expecting a flurry of activity in October that will test the system like never before, and possibly result in huge wizard states that go back and forth from the web server to the database. The crux of the situation is that there is only one database and one web server.
For arguments sake, because of the complexity of the application, lets say adding any kind of clustering/mirroring to increase database throughput is out of the question. I spoke up in the meeting and said the quickest way to address this in the shortest time period would be to add more servers to the front end web application so the load could be distributed amongst web servers. The development lead said I was completely wrong and it would have no effect because we only have one database, so adding more web power would do nothing. He is having one of the other developers reduce the xml bloat that we persist frequently to the database. Probably in the long run, reducing the size of the xml that we pass back and forth is the right idea, but will adding additional web servers truly have no effect, I just think in terms of simultaneous users, it should help.
Any responses thoughts are appreciated, proof that more web servers would help would be pure win.
Thanks.
EDIT: We use binary serialization to store the XML in the database in an image field.

I haven't heard anything about locating the "bottlenecks". Isn't that the first thing to do? Here's the method I use.
Otherwise you're just investing in guesses. That won't work.
I've been in meetings like that, where everybody gets excited throwing ideas around, and "management" wants to make "decisions", but it's the blind leading the blind. Knuckle down and find out what's going on. You can't do that in meetings.
Some time ago I looked at a performance problem with some similarity to yours. The biggest "bottleneck" was in writing and parsing XML, with attendant memory allocation, setup, and destruction. Then there were others as well. You might find the same thing, or something different.
P.S. I keep quoting "bottleneck" because all the performance problems I've found have been nothing at all like the necks of bottles. Rather they are like way over-bushy call trees that need radical pruning, such as making and reading mountains of XML for no good reason.

If the rate at which the data is written by SQL is the bottleneck, feeding data to SQL more quickly should have no effect.
I am not sure exactly what the data structure is, but perhaps compressing the XML data on the web server(s) before writing may have a positive effect.

If the bottleneck is the database, then more web services will not help you a lot.
The problem may be that the problem is not only the size of the data, but the number of concurrent request to the same table. The number of writes will be the big problem. If your XML write is in a transaction with other queries you may try to break out the XML write from that transaction to reduce locking time of the XML table.
As stated by vdeych you may try compression to reduce the data size. (That would increase the load on the web servers.)
You may also try caching the data. Only read from the SQL server if the data is not already in the cache. Make sure you don't update the SQL server if your data has not changed.

No one seems to have suggest this, what about replacing your XML serialization of your wizard with JsonSerialization.
Not only should this give you a minor boost in performance in the serialization itself since both the DataContractSerializer (faster) and Newtonsoft Json.NET (fastest) out perform the XML serializers in .NET. This should easily reduce the size of your object graph by upwards of 50% or more (depending on number of properties vs large strings in the XML).
This should dramatically lower the IO that is inflicted upon Sql server. This should also limit the amount of scope required to alter your application significantly (assuming it's well designed and works through common calls for serialization/deserialization).
If you choose to go this route also invest time comparing BSON vs JSON as I think it would be likely that the binary encoded one will offer even more space savings (and further IO reduction) due to the size of your object graphs.

I'm not a .NET expert but maybe using a binary serialization would increase throughput. Making sure that the XML isn't stored as text (fairly obvious but thought I'd mention it). Also relational databases are best for storing relational data, so perhaps substituting an ORM layer in place of the serialization (sounds feasible) could speed things up.

Mike is spot on, without understanding the resource constaint leading to the performance issues, no amount of discussion will resolve the problem. I'll add that socket timeouts that affect running statements are a symptom, and are never imposed by SQL Server, they're an artifact of your driver configuration or a firewall or similar device between app and db imposing them (unless you're talking about timeouts for new connections, then you have a host in serious distress under load).
Given your symptom is database timeouts, you need to start there. If they're indicative of long running statements that result in a socket timeout, use SQL Server profiler to capture the workload while simultaneously monitoring system resources. Given it's a mature application and the type of workload you mention, it's unlikely to be statement tuning related, it probably boils down to resource limitations CPU, memory or disk IO capacity
This Technet guide is a very good place to start:
http://technet.microsoft.com/en-us/library/cc966540.aspx
If it's resource contention, then it's a simple discussion about how the resource contention can be tuned, configured for or addressed by adding more of whatever is needed.
Edit: I should add that given a database performance issue, more applications servers is likely to worsen the problem as you increase the amount of concurrency, that might otherwise be kept in check by connection pool, request processing or other limits.

Using static data in ASP.NET vs. database calls?

We are developing an ASP.NET HR Application that will make thousands of calls per user session to relatively static database tables (e.g. tax rates). The user cannot change this information, and changes made at the corporate office will happen ~once per day at most (and do not need to be immediately refreshed in the application).
About 2/3 of all database calls are to these static tables, so I am considering just moving them into a set of static objects that are loaded during application initialization and then refreshed every 24 hours (if the app has not restarted during that time). Total in-memory size would be about 5MB.
Am I making a mistake? What are the pitfalls to this approach?

From the info you present, it looks like you definitely should cache this data -- rarely changing and so often accessed. "Static" objects may be inappropriate, though: why not just access the DB whenever the cached data is, say, more than N hours old?
You can vary N at will, even if you don't need special freshness -- even hitting the DB 4 times or so per day will be much better than "thousands [of times] per user session"!
Best may be to keep with the DB info a timestamp or datetime remembering when it was last updated. This way, the check for "is my cache still fresh" is typically very light weight, just get that "latest update" info and check it with the latest update on which you rebuilt the local cache. Kind of like an HTTP "if modified since" caching strategy, except you'd be implementing most of it DB-client-side;-).

If you decide to cache the data (vs. make a database call each time), use the ASP.NET Cache instead of statics. The ASP.NET Cache provides functionality for expiry, handles multiple concurrent requests, it can even invalidate the cache automatically using the query notification features of SQL 2005+.
If you use statics, you'll probably end up implementing those things anyway.
There are no drawbacks to using the ASP.NET Cache for this. In fact, it's designed for caching data too (see the SqlCacheDependency class http://msdn.microsoft.com/en-us/library/system.web.caching.sqlcachedependency.aspx).

With caching, a dbms is plenty efficient with static data anyway, especially only 5M of it.
True, but the point here is to avoid the database roundtrip at all.
ASP.NET Cache is the right tool for this job.

You didnt state how you will be able to find the matching data for a user. If it is as simple as finding a foreign key in the cached set then you dont have to worry.
If you implement some kind of filtering/sorting/paging or worst searching then you might at some point miss the quereing capabilities of SQL.
ORM often have their own quereing and linq makes things easy to, but it is still not SQL.
(try to group by 2 columns)
Sometimes it is a good way to have the db return the keys of a resultset only and use the Cache to fill the complete set.

Think: Premature Optimization. You'll still need to deal with the data as tables eventually anyway, and you'd be leaving an "unusual design pattern".
With event default caching, a dbms is plenty efficient with static data anyway, especially only 5M of it. And the dbms partitioning you're describing is often described as an antipattern. One example: multiple identical databases for multiple clients. There are other questions here on SO about this pattern. I understand there are security issues, but doing it this way creates other security issues. I've recently seen this same concept in a medical billing database (even more highly sensitive) that ultimately had to be refactored into a single database.
If you do this, then I suggest you at least wait until you know it's solving a real problem, and then test to measure how much difference it makes. There are lots of opportunities here for Unintended Consequences.

xml parsing / querying performance question for asp.net

I have to port a smaller windows forms application (product configurator) to an asp.net app which will be used on a large company's website, demand should be moderate because it's for a specialized product line.
I don't have access to a database and using XML is a requirement from their web developers.
There are roughly 30 different products with roughly 300 different possible configurations stored in the xml files, and linked questions / answers that lead to a product recommendation. Also some production options. The app is available in 6 languages.
How would you solve the 'data access' layer, if you could call it this way? I thought of reading / deserializing the xml files into their objects and store them in asp.net's cache if they're not there already and then read from the cache on subsequent requests. But that would mean all objects live in the memory all day and night.
Is that even necessary, or smart, performance wise? As I said before, the app is not that big, the xml files not that large. Could I just create some Repository class that reads the xml files whenever an object is requested (ie. 'Product Details', or 'Next question') and returns it that way, and drive memory consumption down?

The whole approach seems to be sticking to a single server. First consider if this is appropriate as you mentioned a "large company's website", that sets a red flag for me. If you need the site to scale, you will end up having more than a single server, which prevents considering a simple local file.
If you are constrained to using that, analyze what data is more appropriate to keep in cache (does not change often, its long lived, the same info is requested different times). Try to keep the cached stuff separated from the non cached, which will reduce the amount of amount of info in the more dynamic files. If you expect big amounts of information, consider splitting the files with something appropriate to your domain.

I use Cache whenever I can. I cache objects upon their first request. If memory is of any concern, I set expiration policy. And whether it is or not, when short on memory, the framework will unload the cache anyway.
Since it is per application and not per user, it makes sense to have it, especially if the relative footprint is small.
If you have to expand to multiple servers later, you can access the same file over the network or modify DA layer to retrieve data by any other means (services, DB, etc). The caching code will stay the same and performance will be virtually unaffected.
If you set dependency, objects will always stay current.
I'm for it.

Using the cache, and setting an appropriate expiration policy as advised by others is a sound approach. I'd suggest you look at using LINQ to XML as the basis for your data access code as it is so much easier to use than traditional methods of querying XML. You can find a decent introduction here.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex