SQLite shared cache - sqlite

I have a huge (>10GB) sqlite database that is shared among many (up to CPU core count) processes (same executable). This is a specialized application so RAM is not an issue and I want to cache as much of the database in memory. I have found about PRAGMA cache_size; and I am successfully using it but this blows the RAM usage out of proportion as each of many processes has its own private cache.
Now, I found SQLite Shared-Cache Mode but I can't see if this applies to different processes or just threads in one process. I have run some tests which confirm the latter but I am not sure if I am doing something wrong or whether something else needs to be done to make this work.

That page explains that "the same cache can be shared across an entire process".
In theory, you could try to configure your OS so that the entire database is held in the file cache.
If the amount of data in individual queries is small, it might be worthwhile to use a client/server database so that the caching needs to be done only in the server process.

Related

Meteor server-side memory usage for thousands of concurrent users

Based on this answer, it looks like the meteor server keeps an in-memory copy of the cache for each connected client. My understanding is that it gets used in order to avoid sending multiple copies of data when dealing with overlapping subscriptions on a client.
The relevant part of the linked answer (emphasis is mine):
The merge box: The job of the merge box is to combine the results (added, changed and removed calls) of all of a client's active publish functions into a single data stream. There is one merge box for each connected client. It holds a complete copy of the client's minimongo cache.
Assuming that answer is still accurate in the current version of meteor, couldn't that create a huge waste of memory on the server as the number of users increases?
As an off-the-cuff calculation, if an app had about a 100kB cache per client, then 10,000 concurrent users would use up 1GB of memory on the server, and 100,000 users a whopping 10GB! This would be true even if each client was looking at almost identical data. It seems plausible for an app use much more data than that per client, which would further exacerbate the problem.
Does this problem exist in the current version of Meteor? If so, what techniques can be used to limit the amount of memory the server needs to use to manage all the client subscriptions?
Take a look at this post by Arunoda at his meteorhacks.com blog:
http://meteorhacks.com/making-meteor-500-faster-with-smart-collections.html
which talks about his Smart Collections page:
http://meteorhacks.com/introducing-smart-collections.html
He created an alternative Collection stack which has succeeded in it's goals for speed, efficiency (memory & cpu) and scalability (you can see a graphed comparison in the post). Admittedly in his tests RAM usage was negligent with both Collection types, although the way he's implemented things there should be a very obvious difference with the type of use case you mentioned.
Also, you can see in this post on meteor-core:
https://groups.google.com/d/msg/meteor-core/jG1KLObX1bM/39aP4kxqWZUJ
that the Meteor developers are aware of his work and are cooperating in implementing some of the improvements into Meteor itself (but until then his smart package works great).
Important note! Smart collections relies on access to the Mongo Oplog. This is easy if you're running on your own machine or hosted infrastructure. If you're using a cloud based database, this option might not be available, or if it is, will cost a lot more than the smaller packages.

Is Caching in C# the right approach for me?

I've tried to read up on Caching in ASP.NET and still have a few questions.
When using a Sql Cache Dependency ... I know that you can specify which tables will be monitored but if a change happens to any one of those tables does it reset the entire cache? I understand that I don't want to cache tables that will have frequent changes but we could end up with a good handful of cached tables and even if each table only gets a few updates a day, that could turn into 50ish resets of the cache daily (8 hour window).
I would be creating and maintaining this cache via a GAC DLL. A large number of different applications would be accessing that GAC at any one time. Does each application maintain its own copy of the cache or is it just stored in one global location (or possibly per app pool)?
Is there a physical location on the server where I can see how much space the Cache is currently consuming? This would be extremely pertinent if each application maintains its own Cache as that could end up taking large amounts of disk space.
Is there some way to physically force the cache to rebuild itself? I could see my boss assuming that the cache was at fault for a particular issue and I'd need to be able to rule that out at the rootest level. No "changing a record and saying that SHOULD rebuild the cache" but rather "doing [Action X] and KNOWING that whatever was in the cache is now gone"
Thanks in advance for your answers and time.
SqlCacheDependency only monitors tables in the old-style SQL 2000 approach, which relies on triggers and polling. The SQL 2005+ method monitors changes at the row level, and uses Service Broker. At the level of the Cache object, changes will invalidate just the Cache entries associated with the given SqlCacheDependency (not the entire cache).
Each application has a separate copy of the Cache. If you have many apps sharing the same data, you might consider creating a separate "caching server," and have your apps get their data from there, using WCF -- basically add another tier to your app.
You can look at a couple of cache-related performance counters, but if your concern is disk space, then there's nothing to worry about, since the ASP.NET cache is stored entirely in RAM. In addition, if RAM gets too full, one feature of the cache is that it will let go of old/infrequently referenced objects to make room for new objects.
The easiest way to force the cache to be dropped is to simply recycle your application or AppPool (which happens once a day or so by default anyway). If you want something more targeted, you would need to write some code to forcibly remove certain items from the cache, either using Cache.Remove() or using linked dependencies.
from top of my head:
Only that table's content will be invalidated.
Each web application has it's own cache.
Cache is stored in memory. and see this question How to determine total size of ASP.Net cache? regarding cache size
http://bit.ly/vsqNDl this may help

Optimizing Put Performance in Berkeley DB

I just started playing with Berkeley DB a few days ago so I'm trying to see if there's something I've been missing when it comes to storing data as fast as possible.
Here's some info about the data:
- it comes in 512 byte chunks
- chunks come in order
- chunks will be deleted in FIFO order
- if i lose some data off the end because of power failure that's ok as long as the whole db isn't broken
After reading the a bunch of the documentation it seemed like a Queue db was exactly what I wanted.
However, after trying some test code my fastest results were about 1MByte per second just looping through a DB->put with DB_APPEND set. I also tried using transactions and bulk puts but both of these slowed things down considerably so I didn't pursue them for much time. I was inserting into a fresh db created on a NANDFlash chip on my Freescale i.MX35 dev board.
Since we're looking to get at least 2MBytes per second write speeds, I was wondering if there's something I missed that can improve my speeds since I know that my hardware can write faster than this.
Try putting this into your DB_CONFIG:
set_flags DB_TXN_WRITE_NOSYNC
set_flags DB_TXN_NOSYNC
From my experience, these increase write performance a lot.
DB_TXN_NOSYNC
If set, Berkeley DB will not write or synchronously flush the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the application or system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how many log updates can fit into the log buffer, how often the operating system flushes dirty buffers to disk, and how often the log is checkpointed
Calling DB_ENV->set_flags with the DB_TXN_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
DB_TXN_WRITE_NOSYNC
If set, Berkeley DB will write, but will not synchronously flush, the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how often the system flushes dirty buffers to disk and how often the log is checkpointed.
Calling DB_ENV->set_flags with the DB_TXN_WRITE_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_WRITE_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_WRITE_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
See http://www.mathematik.uni-ulm.de/help/BerkeleyDB/api_c/env_set_flags.html for more details.
I suggest you must use transactions / TDS datastore if as you mention you cannot recreate a database (i.e. it isnt just a local cache) if it gets corrupted. If you dont care about loosing a few items in event of a crash/power outage then DB_TXN_WRITE_NOSYNC will improve TDS performance, you database will still be integral and recoverable.
If you store using BTREE and a numeric index (if you have no natural key) and watch out for endian issues so you get good key locality and high page utilization then you should be able to get way more than 2000 inserts a second, especially to SSD, especially if you Use DbMultileKeyDataBuilder to do bulk inserts.

What happens if your asp.net app is using too much memory?

Lets say that you are using a shared hosting plan and your application stores lots of objects
in the application state.
If they start taking too much memory does this mean that the server will just remove them?
If not what will happen then? What happens when the server has no memory left? Can you still store objects into the application or session state?
I am asking this because i am planning on developing a big site that will rely on the application state, and it will be crucial that the objects stored there don't get destroyed.
What i am afraid of is that at a certain point i might have too many objects in the application state and they might get removed to free up memory.
There are three different thresholds:
The total size of your app exceeds the maximum process size on your machine (really only applicable with an x86 OS). In that case, you'll start getting out of memory errors at first, generally followed very quickly by a process crash.
Your process, along with everything else running on the machine, no longer fits in physical memory. In that case, the machine will start to page, generally resulting in extremely poor performance.
Your process exceeds the memory limit imposed by IIS on itself, via IIS Manager. In that case, the process will be killed and restarted, as with a regular AppPool recycle.
With the Application object, entries are not automatically removed if you approach any of the above thresholds. With the Cache object, they can be removed, depending on the priority you assign.
As others have said, over-using the Application object isn't generally a good idea, because it's not scalable. If you were ever to add a second load-balanced server, keeping the info in sync from one server to another becomes very challenging, among other things.
What happens when any application takes up too much memory on a computer?
It causes the server to run everything really slowly. Even the other sites that share the computer.
It's not a good idea to store that much in application state. Use your config file and/or the database.
It sounds like you have a memory leak, the process keeps leaking memory until it crushes with an out-of-memory condition and is then automatically restarted by the server.
1.5GB is about the maximum amount of memory a 32 bit process can allocate before running out of address space.
Somethings to look for:
Do you do your own caching? when are
items removed from the cache?
Is there somewhere data is added to a
collection every once in a while but
never removed?
Do you call Dispose on every object
that implements IDisposable?
Do you access any non-managed code at
all (COM objects or using DllImport)
or allocate non-managed memory (using
the Marshal class for example)?
anything that is allocated there is
never freed by the garbage collector,
you have to free it yourself.
Do you use 3rd party libraries or any
code from 3rd parties? it can have
any of the problems in the list too.
If you use the Cache object instead of the Application object, you can minimize problems of running out of memory. If the memory utilization of the ASP.Net worker process approaches the point at which the process will be bounced automatically (the recycle limit), the memory in Cache will be scavenged. Items that haven't been used for a while are removed first, potentially preventing the process from recycling. If the data is stored in Application, ASP.Net can do nothing to prevent the process from recycling, and all app state will be lost.
However, you do need to have a way of repopulating the Cache object. You could do that by persisting the cached data in a database, as others have proposed.
Here's a short article with a good code example for handling Cache.
And here's a video of how to use Cache.
Anything stored in application state should be refreshable, and needs to be saved in current status in files or database. If nothing else happens, IIS restarts worker processes at least once a day, so nothing in application state will be there forever.
If you do run out of memory, you'll probably get an out of memory exception. You can also monitor memory usage, but in a shared host environment, that may not be enough information to avoid problems. And you may get the worker process recycled as an "involuntary" fix.
When you say that it's crucial that objects stored in application state don't get destroyed, it sounds like you're setting yourself up for trouble.
I think you should use session instead of the application sate and stored that session into sql server database. So once your application user end its session that will release your memory.
If you want more specific answer then please provide the more information about your application.

Available RAM on shared hosting provider

I'm building business app that will hold somewhere between 50,000 to 150,000 companies. Each company (db row) is represented with 4-5 properties/columns (title, location,...). ORM is LINQ2SQL.
I have to do some calculation, and for that I have lot of queries for specific company. Now, i go to db every time when i need something, and it produces 50-200 queries, depending on calculation complexy. I tried to put all companies to cache, and for 10,000 rows (companies) in db, it takes around 5,5MB of cache. In this scenario, I have only one query.
This application will be on shared hosting server, so my resources are limited. I'm interested, what will happen if I try to load, let say 100,000 companies (rows, objects)? Or put that in cache?
Is there any RAM limit that average hosting company give to ASP.NET application? Does it depend on dedicated Applcation Pool (I can put app to dedicated pool)?
Options are:
- load whole table to c# objects. Id did some memory profiling, 10,000 objects needs 5MB RAM
- query db to get referenced objects when needed.
Task is: for given company A, build tree of connected companies.
Table and columns:
Company : IdCompany, Title, Address, Contact
CompanyConnection: IdParentCompany, IdChildCompany
Your shared host will likely be IIS 7 on Windows Server running as a virtual machine. This machine will behave as any ordinary machine would - it is not 'aware' of being shared or virtualised.
You should expect Windows to begin paging to disk when it is out of physical RAM and then out of memory errors only get thrown only when the page file has filled the disk. Of course, you don't ever want to page any part of the warm cache to disk.
Windows itself can begin nagging you about being out of memory, but this is not the same 'urgency' and applications will continue to be able to request RAM and it will continue being given (albeit serviced from the page file).
If you application could crash and leave corrupt state or a partial transaction, then you should code defensively and check memory is available before embarking upon an action.
Create the expected number of objects in a loop with pretend data and watch the memory consumption on the box - the Working Set of the worker process is the one to watch. You can do this in Task Manager.
Watch for Page Faults. These are events when a memory operation had to be directed to disk.
Also, very large sets of objects can cause long garbage collection cycles >1second. This can be a big issue in time-sensitive applications like trading and market data.
Hope that helps.
Update: I do a similar caching thang for a mega data-mining application.
Each ORM type has a GetObject method which uses a giant cache or goes to disk and then updates the cache: Person.GetPerson( check people cache, go to db, add to people cache )
Now my queries return just the unique keys of the results. Then each key is fetched using the above method. This is slow initially until the cache builds up but...
The point being that each query result points to the same instance in memory! This means the RAM footprint is much smaller due to sharing.
The query results are then cached, too. Of course.
Where objects are not immutable, each object-write updates its own instance in the giant cache but also causes all query caches that concern that type of object to void themselves!
Of course, in this application, writes are rare as its mainly reference data.

Resources