I'm currently working on a fairy large project (active members is about hundreds K) and was strongly lean to Plone solutions.
I've asked some questions related to it like here and here.
Got some replies from very experienced Plonistas (and active stackoverflowers as well). I really appreciate it. People keeps saying Plone does not scale well to that large, and most of the reasons is because of ZODB.
Then I think of an in-memory backend for ZODB. RAM is really cheap now ! you can get 128GB for just ~$3k, ten times over a normal $300 128GB SSD, and achieve ~30GBs IO bandwidth compare to that ~300MBs of the SSD.
in-memory backend + Blob for binary + 10s disk journalling for backup + all undos except last 10s would be an instance kill ! They should smoke the RDBMs and offer full ACID + Transaction + Object Mapping compare to such couch*/redis etc.
Is it technical possible ? Is there any implementation ? Does it worth implement (in your opinion) ?
There is a memcache option for RelStorage which helps when you need to use a slow database, but really you should probably just leave that sort of caching to your operating system and make sure your database server has plenty of RAM. (If your RAM is large enough then your filesystem cache should already store most of the data.)
An SSD will significantly reduce the worst case read latencies for random access to data not already in the filesystem cache. It seems silly not to use them now, especially as the Intel 330 SSD is so cheap and has a capacitor equivalent to a battery backed raid controller (making writes superfast too.)
An all in RAM solution can never be considered ACID, as it won't be Durable.
As mentioned in my comment on your other post, it is not the ZODB that is the problem here but Plone's synchronous use of a single contended portal_catalog.
Instead of keeping the entire ZODB in memory, you could mount the portal_catalog in a separate mount point and keep it in memory. I've already seen such kind of configuration and it works smoothly for about 8k users using standard hardware (2 server + 1 zeo server). It may be sufficient for your needs, maybe using more performant hw.
Related
Google Cloud disks are network disks that behave like local disks. SQLite expects a local disk so that locking and transactions work correctly.
A. Is it safe to use Google Cloud disks for SQLite?
B. Do they support the right locking mechanisms? How is this done over the network?
C. How does disk IOP's and Throughput relate to SQLite performance? If I have a 1GB SQLite file with queries that take 40ms to complete locally, how many IOP's would this use? Which disk performance should I choose between (standard, balanced, SSD)?
Thanks.
Related
https://cloud.google.com/compute/docs/disks#pdspecs
Persistent disks are durable network storage devices that your instances can access like physical disks
https://www.sqlite.org/draft/useovernet.html
the SQLite library is not tested in across-a-network scenarios, nor is that reasonably possible. Hence, use of a remote database is done at the user's risk.
Yeah, the article you referenced, essentially stipulates that since the reads and writes are "simplified", at the OS level, they can be unpredictable resulting in "loss in translation" issues when going local-network-remote.
They also point out, it may very well work totally fine in testing and perhaps in production for a time, but there are known side effects which are hard to detect and mitigate against -- so its a slight gamble.
Again the implementation they are describing is not Google Cloud Disk, but rather simply stated as a remote networked arrangement.
My point is more that Google Cloud Disk may be more "virtual" rather than purely networked attached storage... to my mind that would be where to look, and evaluate it from there.
Checkout this thread for some additional insight into the issues, https://serverfault.com/questions/823532/sqlite-on-google-cloud-persistent-disk
Additionally, I was looking around and I found this thread, where one poster suggest using SQLite as a read-only asset, then deploying updates in a far more controlled process.
https://news.ycombinator.com/item?id=26441125
the persistend disk acts like a normal disk in your vm. and is only accessable to one vm at a time.
so it's safe to use, you won't lose any data.
For the performance part. you just have to test it. for your specific workload. if you have plenty of spare ram, and your database is read heavy, and seldom writes. the whole database will be cached by the os (linux) disk cache. so it will be crazy fast. even on hdd storage.
but if you are low on spare ram. than the database won't be in the os cache. and writes are always synced to disk. and that causes lots of I/O operations.
in that case use the highest performing disk you can / are willing to afford.
I am not able to find maria DB recommended RAM,disk,number of Core capacity. We are setting up initial level and very minimum data volume. So just i need maria DB recommended capacity.
Appreciate your help!!!
Seeing that over the last few years Micro-Service architecture is rapidly increasing, and each Micro-Service usually needs its own database, I think this type of question is actually becoming more appropriate.
I was looking for this answer seeing that we were exploring the possibility to create small databases on many servers, and was wondering for interest sake what the minimum requirements for a Maria/MySQL DB would be...
Anyway I got this helpful answer from here that I thought I could also share here if someone else was looking into it...
When starting up, it (the database) allocates all the RAM it needs. By default, it
will use around 400MB of RAM, which isn’t noticible with a database
server with 64GB of RAM, but it is quite significant for a small
virtual machine. If you add in the default InnoDB buffer pool setting
of 128MB, you’re well over your 512MB RAM allotment and that doesn’t
include anything from the operating system.
1 CPU core is more than enough for most MySQL/MariaDB installations.
512MB of RAM is tight, but probably adequate if only MariaDB is running. But you would need to aggressively shrink various settings in my.cnf. Even 1GB is tiny.
1GB of disk is more than enough for the code and minimal data (I think).
Please experiment and report back.
There are minor differences in requirements between Operating system, and between versions of MariaDB.
Turn off most of the Performance_schema. If all the flags are turned on, lots of RAM is consumed.
20 years ago I had MySQL running on my personal 256MB (RAM) Windows box. I suspect today's MariaDB might be too big to work on such tiny machine. Today, the OS is the biggest occupant of any basic machine's disk. If you have only a few MB of data, then disk is not an issue.
Look at it this way -- What is the smallest smartphone you can get? A few GB of RAM and a few GB of "storage". If you cut either of those numbers in half, the phone probably cannot work, even before you add apps.
MariaDB or MySQL both actually use very less memory. About 50 MB to 150 MB is the range I found in some of my servers. These servers are running a few databases, having a handful of tables each and limited user load. MySQL documentation claims in needs 2 GB. That is very confusing to me. I understand why MariaDB does not specify any minimum requirements. If they say 50 MB there are going to be a lot of folks who will want to disagree. If they say 1 GB then they are unnecessarily inflating the minimum requirements. Come to think of it, more memory means better cache and performance. However, a well designed database can do disk reads every time without any performance issues. My apache installs (on the same server) consistently use up more memory (about double) than the database.
I am studying various ASP.Net deployment approaches. In there, I got a basic question. Is there any thumb rule about enviornment definition? What could be called a 'good' setup if I have to support 1000 concurrent users(requests).
I understand that there are many factors like how application is designed etc. But assuming that everything else is great, what configuration should I look for like Which processor, how much RAM etc?
Also how many concurrent users below configuration should be able to support ?
CPU: Dual 3.40 GHz Intel Xeon (Hyper-Threaded)
Memory : 3GB
OS: Windows Server 2003 SP2
Thanks for thelp
Having been on both sides of the equation (web developer and hardware engineer), my current opinion is that the answer involves both of those sides as well.
Your hardware needs to be not only sufficient for general usage, but it also has to cope with reasonable unexpected peaks and failures - which means that it needs to be redundant, and in excess of your capacity planning.
Your software needs to be designed so its easily redundant - theres no point in speccing a tiered hardware architecture (now or for future planning) if the software is going to require significant amount of changes to handle it.
Your software also needs to be designed so sudden unexpected peaks in resource usage don't happen as a regular occurrence for no external reason (eg marketing campaign).
I know that you say you understand the non-hardware factors, but the real answer to your question is that there is no real way to answer it without knowing the other factors - each situation and circumstance is unique, and requires a unique solution.
However, in an effort to add generalised recommendations, try these:
CPU - choose something with a lot of cache, and individual cache per core as well. This will do wonders to speed up the system. I typically go for dual core, dual processor at a minimum (for a total of 4 cores on two seperate physical cpus). Processor speed ratings don't really matter as much as you think these days.
Memory - fast memory, minimum of 8GB of it. Use the smallest dimms possible for the server.
Harddisk - SAS 15K RPM at a minimum, RAID 6 for the data partition on one controller, RAID 1 or 6 for the system partition on another controller. Choose a good quality controller backed by a good support or warranty package - your controller is no good if it dies in 3 years time and you can't get a replacement.
But above all, don't just install the OS and app and let it be, profile the set up as much as possible, don't be afraid of making changes to optimise to the individual setup (within reason). Move your ASP.Net temporary files to a fast disk (or a ram disk - if they are going to be rebuilt anyway, no matter worrying over losing them). Move the database to a second server, with a crossover 1GBit link between the two. Turn off disk maintenance in the OS, turn off services you do not need.
Good luck!
Can anyone recommend a way in which I can throttle an application based on the current disk usage or even CPU usage.
The application I am writing scans files on the hard disk and will be pretty hard disk intensive in itself.
Can anyone recommend a way in which I can either throttle down my application(or even pause it for that matter) when the disk usage is high(i.e. user himself is running very HDD or CPU intensive app)? Basically my application shouldn't hamper user's productivity. I know this is a pretty big research topic in itself. But I at least need some cues on how would I approach this.
Help in any form is highly appreciated. :)
Thanks.
Samrat.
Vista has added I/O Prioritization to Windows so if you're using that platform you can just let the O/S take care of it.
For other operating systems maybe finding the I/O latency, and if it is over some predefined threshold then sleep your disk scanner for a bit would work?
Take a look at this ("How can I programmatically limit my program’s CPU usage to below 70%?") and this ("Win32 Thread scheduling#The Larry Osterman answer")
How much traffic can one web server handle? What's the best way to see if we're beyond that?
I have an ASP.Net application that has a couple hundred users. Aspects of it are fairly processor intensive, but thus far we have done fine with only one server to run both SqlServer and the site. It's running Windows Server 2003, 3.4 GHz with 3.5 GB of RAM.
But lately I've started to notice slows at various times, and I was wondering what's the best way to determine if the server is overloaded by the usage of the application or if I need to do something to fix the application (I don't really want to spend a lot of time hunting down little optimizations if I'm just expecting too much from the box).
What you need is some info on Capacity Planning..
Capacity planning is the process of planning for growth and forecasting peak usage periods in order to meet system and application capacity requirements. It involves extensive performance testing to establish the application's resource utilization and transaction throughput under load. First, you measure the number of visitors the site currently receives and how much demand each user places on the server, and then you calculate the computing resources (CPU, RAM, disk space, and network bandwidth) that are necessary to support current and future usage levels.
If you have access to some profiling tools (such as those in the Team Suite edition of Visual Studio) you can try to set up a testing server and running some synthetic requests against it and see if there's any specific part of the code taking unreasonably long to run.
You should probably check some graphs of CPU and memory usage over time before doing this, to see if it can even be that. (A number alike to the UNIX "load average" could be a useful metric, I don't know if Windows has anything like it. Basically the average number of threads that want CPU time for every time-slice.)
Also check the obvious, that you aren't running out of bandwidth.
Measure, measure, measure. Rico Mariani always says this, and he's right.
Measure req/sec, RAM, CPU, Sessions, etc.
You may come up with a caching strategy (Output caching, data caching, caching dependencies, and so on.)
See also how your SQL Server is doing... indexes are a good place to start but not the only thing to look at..
On that hardware, a .NET application should be able to serve about 200-400 requests per second. If you have only a few hundred users, I doubt you are seeing even 2 requests per second, so I think you have a lot of capacity on that box, even with SQL server running.
Without know all of the details, I would say no, you will not see any performance improvement by adding servers.
By the way, if you're not using the Output Cache, I would start there.