My system was mentioned in this question:
https://stackoverflow.com/questions/5633634/best-index-strategies-for-read-only-table
because the data is readonly, and in specific time, a part of data (50-200k rows, about 200 byte/each) is intensively queried, so I think allowing client to connect to database and query each row/query is way too expensive. It would be a better choice if I cache part of data (which is being intensively queried) into RAM, which is much faster than SQL Server.
The problem is, the system I'm currently working on is a webservice, so I'm not sure that it allows large static data to be cached. Is my idea a good choice? How my data can "survive" when IIS recycle?
Thank you very much.
Load the data into RAM on Application_Start event. Then you don't need to worry about IIS restart
Here is an MS guide about Caching Data at Application Startup
Related
My question is regarding aggregated data for fast access across several servers on Amazon EC2. In an ASP.NET application, I would probably store that data in Application["somevar"] variable so it can be accessed quickly (in memory) by all users.
The problem starts when I want that aggregated data to be gathered and its value equal on all servers. If I chose to deploy two servers, the user might be transmitting data to different servers every time (the servers are under a load balancer or ElasticBean), and if for example I count the number of times the user asked for the page, each server's Application var will have different value
For example:
Server 1:
Application["counter1"] = 120
Server 2:
Application["counter1"] = 130
What I want is a variable that be the same on all servers. The reason I want the data in Application-like variable is that I want that data in memory for fast access, then I might write that data to the database.
What I want to know is how can I achieve this. I though about using Amazon ElasticCache so even if I have 10 server under the load balancer, I can access the ElasticCache variable via API and it doesn't matter from which server I access the memcache variable, it will get/set the same variable, and therefore I can achieve my goal in keeping a cross-server global variable.
I wanted to know if it's a good practice and wherever there is a better way to implement such feature.
I am developing my application in ASP.NET C# and with MySQL. Also take into consideration that some of the aggregated data should be written to the database, and I do that to prevent a lot of writes at the same time, and write data after it reaches 20 writes for example and then the data will be written to the database.
Just to clear up a few things. First lets make sure that we understand how to use ElasticCache. The API for ElasticCache doesn't give us any CRUD operations on the cache cluster, the API from Amazon is strictly for managing the servers and configuration. You will need to use a memcached library for .NET to connect to the cluster. Using a cache like memcached is a good solution for you're first problem. It will easily and quickly store simple application variables in a distributed environment. Using a cache is generally a good practice even with smaller applications.
I'm not sure how many users you have or how many you expect to have but one thing I've learned in my years programming is that over optimization is usually a bad idea. Over optimization is when you start to optimize you're code before it's really necessary. Take you're proposed optimization for example. We know that making 1 write on the database is quicker than making 20 writes, generally speaking of course. However, unless your database is the bottleneck in your application to implement such a feature you introduce a significant amount of complexity for no immediate benefit. If a memcached cluster server crashes, which it will, then the data waiting to be written to the database is lost. If you really do have a lot of users then you have to start thinking about concurrency and locks on the memcached items.
Without knowing more about your application i can't make any real recommendations except to say that make sure your optimization are required before you spend time increasing the complexity of your application for nothing.
I have been reading that lots of people use Redis or another key-value store/NoSQL solution as a distributed cache for their website.
Maybe I'm not understanding completely, but it seems a solution like this only works for shared data. For example, if I have a website that requires a user to log-in and the queries they generate return data specific to only that user (in my case, banking/asset information) that can't be cached for all users, this type of solution doesn't work.
Unfortunately, the database is shared across all our applications and when it get bogged down, the website gets bogged down as well. Since each user has gigabytes of information, I obviously can't cache all of that and each web page queries completely different information.
Is there some caching strategy that I can employ for this type of scenario?
A distributed cache like Velocity doesn't require that the data it stores be limited to "shared" data. But you do have to read the data from your DB and store it in the cache, which takes time.
A few alternatives:
Partition your data, so it's spread out among several DB servers
Add as much RAM as you can to each DB server, to allow SQL Server to cache what it can
There are many variations to the partitioning theme....
Is your web app load balanced? There are caching options at the web tier as well -- the ASP.NET object cache is a good place to start.
It's possible that your web clients are requesting the same data more than once (for a given user). So caching could give a benefit in that case.
But before you go implementing a huge caching solution, you really need to look at the queries that are particularly slow or executed a huge number of times and see if you can optimize them in any way.
Then look at upgrading your DB machine.
I read a nice article about the performance issues that MySpace had when they had a huge growth.
You can find the article here.
One quote from the article that stands out:
The addition of the cache servers is "something we should have done
from the beginning, but we were growing too fast and didn't have time
to sit down and do it," Benedetto adds
If the problem is in your database server think about partitioning your data and making use of a database farm to spread the load. Also think about SSD's! They can really speed up your database access code.
Depending how dynamic your data is you could consider using Fragment Caching. This will cache the HTML of the page rather than the data so if the volume of data is prohibtive to cache then this might work for you
I have inherited an ASP.NET 3.5 application that relies heavily on sessions and storing DataTables within them (I know - bad, bad, bad). The application pool on the remote shared hosting service indicated that memory is at full capacity and as a result customers are losing their shopping carts because of dropped sessions.
Ultimately the goal is to rewrite this code, but for the time being I would like to stabilize the site the best I can. The host has recommended I use SQL Server Session State instead of in-proc. I have no experience with this, so I'm hoping it's as simple as running the .sql against the database to configure SQL Server and updating the web.config.
Any ideas? Thanks.
The docs say only that the session data has to be serializable. AFAIK DataTables are not serializable, unless you do it yourself, which is probably not going to work.
In order to improve speed of chat application, I am remembering last message id in static variable (actually, Dictionary).
Howeever, it seems that every thread has own copy, because users do not get updated on production (single server environment).
private static Dictionary<long, MemoryChatRoom> _chatRooms = new Dictionary<long, MemoryChatRoom>();
No treadstaticattribute used...
What is fast way to share few ints across all application processes?
update
I know that web must be stateless. However, for every rule there is an exception. Currently all data stroed in ms sql, and in this particular case some piece of shared memory wil increase performance dramatically and allow to avoid sql requests for nothing.
I did not used static for years, so I even missed moment when it started to be multiple instances in same application.
So, question is what is simplest way to share memory objects between processes? For now, my workaround is remoting, but there is a lot of extra code and I am not 100% sure in stability of this approach.
I'm assuming you're new to web programming. One of the key differences in a web application to a regular console or Windows forms application is that it is stateless. This means that every page request is basically initialised from scratch. You're using the database to maintain state, but as you're discovering this is fairly slow. Fortunately you have other options.
If you want to remember something frequently accessed on a per-user basis (say, their username) then you could use session. I recommend reading up on session state here. Be careful, however, not to abuse the session object -- since each user has his or her own copy of session, it can easily use a lot of RAM and cause you more performance problems than your database ever was.
If you want to cache information that's relevant across all users of your apps, ASP.NET provides a framework for data caching. The simplest way to use this is like a dictionary, eg:
Cache["item"] = "Some cached data";
I recommend reading in detail about the various options for caching in ASP.NET here.
Overall, though, I recommend you do NOT bother with caching until you are more comfortable with web programming. As with any type of globally shared data, it can cause unpredictable issues which are difficult to diagnosed if misused.
So far, there is no easy way to comminucate between processes. (And maybe this is good based on isolation, scaling). For example, this is mentioned explicitely here: ASP.Net static objects
When you really need web application/service to remember some state in memory, and NOT IN DATABASE you have following options:
You can Max Processes count = 1. Require to move this piece of code to seperate web application. In case you make it separate subdomain you will have Cross Site Scripting issues when accesing this from JS.
Remoting/WCF - You can host critical data in remoting applcation, and access it from web application.
Store data in every process and syncronize changes via memcached. Memcached doesn't have actual data, because it took long tim eto transfer it. Only last changed date per each collection.
With #3 I am able to achieve more than 100 pages per second from single server.
I keep hearing that it's bad practise to store large object collections / anything in the session. Often during conversation it's quickly followed by: 'Just turn sessions off'
So what is the general problem with sessions? I use them a fair bit and since they 'real' session is stored behind a strongly typed container I don't really see the issue.
There is nothing wrong with session - you just need to be mindful of its limitations. To say "just turn off session" is throwing the baby out with the bathwater.
There is a huge difference between storing BIG objects and small objects in a session
The session will stay alive on a server untill it expiers, and that means those big objects pollute your available memory. If you do that with a server under load, or a server that runs many application pools, then this can cause trouble.
You dont need cookies to have a session, since ASP cal also encode that information in the urls. Also you can configure the session store to run out of process, or even to store the information inside a SQL Server (reducing the memory load on the server, and enabeling sessions across a farm)
So basically: Objects are ok - Big objects not
Here's my take -- sessions are not bad but sometimes they are overused. It can also be harder to understand a web application's flow when it relies on a lot of sessions so of course you should be careful not to get carried away.
However, you should feel free to use them anytime you need to store temporary data to be made accessible across multiple pages. In no other situation should they be used. But that situation is one for which sessions were specifically designed.
Now, if you're worried about memory consumption on the server, that's not necessarily a reason to avoid sessions. But it may be more of a reason to avoid the InProc session provider. In fact I'm not a fan of InProc sessions as they tend to expire prematurely after a certain number of recompiles in your application.
What I actually prefer and nearly always use are SQL Server sessions. They'll be slightly slower, but the benefits are numerous. They'll persist even if the server is rebooted and that makes them a very reliable choice. And of course since they're stored in the SQL file system instead of in memory, they won't make such a big hit on memory.
This article on MSDN talks about the various session providers and also explains how to configure SQL to handle your sessions. If you don't have SQL, just know that even the free SQL Server Express 2008 can be configured as your session provider.
I had thought that it largely depends on the traffic to your web site. If you are running something like amazon.com, trying to store the user's shopping cart in a session would take huge amounts of IIS allocated memory, bringing down your web server. For smaller web sites, session variables are fine to use in moderation.
Storing large objects in Session is bad, yes, but "large" is relative.
Basically, storing an object in session will keep it in memory until the session expires, so if you have a site with a high user count all storing mega-objects in their session, you'll kill your server pretty quickly.
With that being said, an argument could be made for the idea that if you have objects that are 5k+ in memory and have enough users to actually cap out a server then you can probably afford more hardware anyway.
There are also topics like server clustering and session integrity between boxes in the cluster. Some frameworks handle this, I don't know if .NET does or not.
There are two things to be careful of:
Memory consumption: if you store large data objects in session and you have many user you may well run out of memory or at the very least triggering many early recycling of your application
This is only a problem if you have multiple web servers (web farm): the session has to be stored externally (not in process) in a SQL server or a windows service so that it is accessible from different machines. This can be quite slow at times.
Session requires the user to have cookies turned on
If you're working in a web farm, you'll run into trouble.
I guess these reasons don't have anything to do with storing large objects in session, just in using sessions at all.
2 major issues come to mind...
1) Persistence of sessions across servers when you start scaling your website
2) Memory usage explosion from storing UI objects in session state
The more serious issue is the tendency to store objects in session. When you store something as innocuous as a Label from a page on your page, you get LOTS of unwanted object attributes as well. You probably just wanted the text of that label stored in your session, but along with it, you get references to the page itself...and all of a sudden, you have a massive usage of memory to store the page, its view state, and lots of unwanted attributes in memory on your server.
Check out this link about storing UI elements in session
You may want to check out this question as well.
This is an old thread although.
But I have an experience for a session problem. I would like to share it.
There is a simple flow.
One .aspx validate a client, and read a bill-html from a file (for this client), then save this html(about 2MB) in a session variable.
This .aspx will auto redirect to next .aspx, the next .aspx retrieves this html from session. Then show it to the client.
It works fine in most cases. But some clients encountered a problem: The bill he saw is not his bill, but others.
We used sniffers tools to intercept the network package.
And we saw a strange situation:
Our IIS has definitely sent the SessionID(eg: 1111111) to the client, But when the client redirects to next page and tries to access session. The SessionID(eg: 11112222) that this client brings is different.
We think that the browser of that client does not accept the SessionID.
And finally, we abandon the use of Session, and solved this problem.