I want to control the RAM usage of my bokeh dashboards (in Python) while serving. Is there an option to do this from the command line?
Python is not Java, so there are no options built in to Python itself to control memory usage, nor does Bokeh attempt to hand-roll any adhoc solution in this area. The only option I can suggest offhand is to run the Bokeh server in some sort of lightweight VM container that affords some max memory option, or possibly there is a way to use Lifecycle Hooks to actively monitor memory usage yourself, and actively destroy sessions if a limit is exceeded.
Related
Given a java application running in flexible environment, how is it possible to get a heap dump to see the heavy objects?
I would ideally import this into a tool like eclipse mat and analyze the heap dump.
Another great option would be for stackdriver profiler showing it but i just see there cpu profiling but not memory profiling
You can connect to a Flex instance with ssh in a similar fashion that you connect to GCE VM instances. Read more about it here. Once connected you can get the heap dump like you usually do.
I can't seem to find an application to monitor SQLite DB performance. Currently I have a test server that uses SQLite. I'm primarily concerned with obtaining a benchmark of storage requirements and performance for scaling this server to production.
I know for MySQL there is the standard Nagios for monitoring (changing to mySQL is not an option at this point). Is there anything analogous for SQLite?
SQLite has functions like sqlite3_status() and sqlite3_db_status(), but those do not really give you the information you want, and might not even be available in all languages.
Anyway, SQLite is an embedded library, so you'd have to monitor your actual application. Tools like Nagios allow to monitor a server's CPU load and disk usage, but you can also use any other tool of your OS.
The screenshot below shows the change in cache related state of my zope instance over time (3 months so far).
We've increased the cache size several times over the period from 3000 all the way up to 6000000. With the exception of one recent blip, we have hit a ceiling of 30 Million (not sure what parameter that is) (see the 'by year' graph). This happened at a cache size of about 1000000, after which, changes to the cache size seemed to have no effect on the cached objects or memory usage of zope.
The zope/plone process moved from using about 500 MB of memory to using 3GB (we have 8GB on this server).
What I expected was that sliding the cache size upwards would allow zope to take advantage of more of the available server memory, but it is stuck at 3GB (out of a potential 8GB on the server).
Is there another setting that might be "capping" things at 3GB?
At a guess, your OS is limiting per-process memory size.
In a bash shell, check ulimit -v to see if a virtual memory limit is set. See man bash for the full list of options for that command.
See Limit memory usage for a single Linux process for more information on how to use ulimit.
I don't know what's going on with the memory of your server but you are doing this the wrong way: you simply can't have 6 million objects on memory, that's impossible: on a typical installation in Plone 4.x you should need somewhere in between 50 GB and 150 GB of memory for that.
to solve this we need more information: which Plone version are you using? how many objects do you have in your database? how many threads? what's the architecture of your server, 32 or 64-bit?
second, be sure to install the latest version of the munin.zope plugin to collect reliable information about your server (hat tip to Leo Rochael).
third, read this thread on the core developers list to understand how to calculate a more realistic number for your cache size (hat tip to Hanno Schlichting).
fourth, move the number up slowly and take time to monitor the results; check for total number of objects in memory and avoid memory swaps at any cost. You can stop increasing cache size if you see the number of objects is below your goal value. remember: you're never gonna have all the objects in memory, that's quite difficult because people tend to visit mostly only a subset of your content.
fifth, if you are in Plone 4.x test DateTime 3.0.3 (on a staging server before put it in production) this could decrease further your memory consumption by up to 40% (somebody told me it now works also in Plone 3.x, but I haven't check it my self).
sixth, share your result on the Plone setup list!
good night and good luck!
A 32-bit platform -- don't know if this is limited to Intel -- is limited to 3GB per process. That is because it can only address up to 4GB per process, and the bottom 1GB is used by the kernel. Of course PAE allows you to access up to 64GB, but there are certain per process limitations which you are running into here. You really cannot run a high-traffic plone site on a 32-bit platform anymore. Quite often the simplest solution is to upgrade your OS to the 64-bit version, because unless you have seriously ancient hardware, it should already be capable of running x86-64.
I would like to schedule and distribute on several machines - Windows or Ubuntu - (one task is only on one machine) the execution of R scripts (using RServe for instance).
I don't want to reinvent the wheel and would like to use a system that already exists to distribute these tasks in an optimal manner and ideally have a GUI to control the proper execution of the scripts.
1/ Is there a R package or a library that can be used for that?
2/ One library that seems to be quite widely used is mapReduce with Apache Hadoop.
I have no experience with this framework. What installation/plugin/setup would you advise for my purpose?
Edit: Here are more details about my setup:
I have indeed an office full of machines (small servers or workstations) that are sometimes also used for other purpose. I want to use the computing power of all these machines and distribute my R scripts on them.
I also need a scheduler eg. a tool to schedule the scripts at a fix time or regularly.
I am using both Windows and Ubuntu but a good solution on one of the system would be sufficient for now.
Finally, I don't need the server to get back the result of scripts. Scripts do stuff like accessing a database, saving files, etc, but do not return anything. I just would like to get back the errors/warnings if there are some.
If what you are wanting to do is distribute jobs for parallel execution on machines you have physical access to, I HIGHLY recommend the doRedis backend for foreach. You can read the vignette PDF to get more details. The gist is as follows:
Why write a doRedis package? After all, the foreach package already
has available many parallel back end packages, including doMC, doSNOW
and doMPI. The doRedis package allows for dynamic pools of workers.
New workers may be added at any time, even in the middle of running
computations. This feature is relevant, for example, to modern cloud
computing environments. Users can make an economic decision to \turn
on" more computing resources at any time in order to accelerate
running computations. Similarly, modernThe doRedis Package cluster
resource allocation systems can dynamically schedule R workers as
cluster resources become available
Hadoop works best if the machines running Hadoop are dedicated to the cluster, and not borrowed. There's also considerable overhead to setting up Hadoop which can be worth the effort if you need the map/reduce algo and distributed storage provided by Hadoop.
So what, exactly is your configuration? Do you have an office full of machines you're wanting to distribute R jobs on? Do you have a dedicated cluster? Is this going to be EC2 or other "cloud" based?
The devil is in the details, so you can get better answers if the details are explicit.
If you want the workers to do jobs and have the results of the jobs reconfigured back in one master node, you'll be much better off using a dedicated R solution and not a system like TakTuk or dsh which are more general parallelization tools.
Look into TakTuk and dsh as starting points. You could perhaps roll your own mechanism with pssh or clusterssh, though these may be more effort.
I'm interested in using a statistical programming language within a web site I'm building to do high performance stats processing that will then be displayed to the web.
I'm wondering if an R compiler can be embedded within a web server and threaded to work well with the LAMP stack so that it can work smoothly with the front-end and back-end of the web site and improve the performance of the site.
If R is not the right choice for such an application, then perhaps there is another tool that is?
The general rule is that webserver should do NO calculations -- whatever you do, it will always end in a bad user experience. The way is that the server should respond to calculation request by scheduling the job for some worker process, give the user some nice working status and then push the results obtained from worker when they are ready (most likely with AJAX polling or some more recent COMET idea).
Of course this requires some RPC protocol to R and some queuing agent -- this can be done either with background processes (easy yet slow), R HTTP servers (more difficult yet faster), or real RPC like Rserve or triggr (hard, yet fast to ultra-fast).
You are confusing two issues.
Yes, R can be used via a webplatform. In fact, the R FAQ has an entire section on this. In the fifteen+ years that both R and 'the Web' have ridden to prominence, many such frameworks have been proposed. And since R 2.13.0 R even has its own embedded web server (to drive documentation display).
Yes, R scripts can run faster via the bytecode compiler, but that does not give you orders of magnitude.