I've a package with global variables related to a file open
(*os.File), and its logger associated.
By another side, I'll build several commands that are going to use
that package and I want not open the file to set it as logger every
time I run a command.
So, the first program to run will set the global variables, and here
is my question:
Do the next programs to use the package can access to those global
variables without problem? It could be created a command with a flag
to initialize those values before of be used by another programs, and
another flag to finish it (unset the global variables in the package).
If that is not possible, which is the best option to avoid such IO-bound? To use a server in Unix sockets?
Assuming by 'program' you actually mean 'process', the answer is no.
If you want to share a (customized perhaps) logging functionality between processes then I would consider a daemon-like (Go doesn't yet AFAIK support writing true daemons) process/server and any kind of IPC you see handy.
Related
So I'm using Hydra 1.1 and hydra-ax-sweeper==1.1.5 to manage my configuration, and run some hyper-parameter optimization on minerl environment. For this purpose, I load a lot of data in to memory (peak around 50Gb while loading with multiprocessing, drops to 30Gb after fully loaded) with multiprocessing (by pytorch).
On a normal run this is not a problem (My machine have 90+Gb RAM), one training finish without any issue.
However, when I run the same code with -m option (and hydra/sweeper: ax in config), the code stops after about 2-3 sweeper runs, getting stuck at the data loading phase, because all memories of the system (+swap memory) is occupied.
First I thought this was some issue with minerl environment code, which starts java-code in sub-process. So I tried to run my code without the environment (only the 30Gb data), and I still have the same issue. So I suspect I have some memory-leak inbetween the Hydra sweeper.
So my question is, How does Hydra sweeper(or ax-sweeper) work in-between sweeps? I always had the impression that it runs the main(cfg: DictConfig) decorated with #hydra.main(...), takes a scalar return(score) and run the Bayesian optimizer with this score, with main() called similar to a function (everything inside being properly deallocated/garbage collected between each sweep-run).
Is this not the case? Should I then load the data somewhere outside the main() and keep it between sweeps?
Thank you very much in advance!
The hydra-ax-sweeper may run trials in parallel, depending on the result of calling the get_max_parallelism function defined in ax.service.ax_client.
I suspect that your machine is running out of memory because of this parallelism.
Hydra's Ax plugin does not currently have a config group for configuring this max_parallelism setting, so it is automatically set by ax.
Loading the data outside of main (as you suggested) may be a good workaround for this issue.
Hydra sweepers in general does not have a facility to control concurrency. This is the responsibility of the launcher you are using.
The built-in basic launcher runs the jobs serially so it should not trigger memory issues.
If you are using other launchers, you may need to control their parallelism via Launcher specific parameters.
For a Shiny app that I am making, I have to define some variables in the global environment as they need to be available to many functions here and there. Some of these variables don't exist to start with and are created as the user interacts with the app. The app is to check for existence of the variables and if they don't exist, it has to do something. However, after one session of use, the variables come into existence and stay in the global environment. When the user starts the app again, the app sees the variables in the global environment and so it behaves the way it is not supposed to behave. Is there a way I can remove the variables I create just before the user chooses to terminate the app? Any help is highly appreciated.
The valid way to solution that would be using onStop function as in:
onStop(function() cat("Session stopped\n"))
The linked documentation suggests using that within server function.
Create a function to cleanup when exiting using on.exit. on.exit records the expression given as its argument as needing to be executed when the current function exits (either naturally or as the result of an error). This is useful for resetting graphical parameters or performing other cleanup actions.
on.exit(rm(list= list(myListOfThings)))
When running gnu-make rules with -jN make creates a jobserver for managing job-count across submakes. Additionally you can "pass the jobserver environment" to a make recipe by prefixing it with + - eg:
target :
+./some/complex/call/to/another/make target
Now I instead of a sub-make I have a (python) script which runs some complex packaging actions (too complex for make). One of the actions that it can run into can actually spawn off a make command.
package.stamp : $(DEPS)
+./packaging.py $(ARGS)
touch $#
Now when that make command is invoked inside packaging.py
make[1]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
This makes some sense because whatever environment is setup by make, may not be being honoured or passed through by python.
Is it possible to pass through the jobserver references through the python program to the sub-make - if so, how?
There are two aspects to the jobserver that must be preserved: the first is an actual environment variable, which make uses to send options to sub-makes. That value is being preserved properly, or else make would not know that it should even look for the jobserver and you would not see that warning message.
The second aspect are two open file descriptors which are passed to the children of make. Your script MUST preserve these two descriptors and leave them open when it invokes the sub-make.
You don't show us what Python code is being used to invoke the sub-make. By default, the subprocess module will not close file descriptors, but you can provide the close_fds=True option to have it do so... you should not use this option if you want parallel make invocations to work properly with the jobserver.
If you're not using subprocess, then you'll have to show us what you are doing.
You should probably mark this with a python tag as it's mainly a Python question.
To summarise and clarify the answer - for the jobserver to work in your sub-processes you need to preserve:
Environment variables
The jobserver fds
One of the environment variables passed looks (for me) as follows:
MAKEFLAGS= --jobserver-fds=3,4 -j -- NAME=VALUE
jobserver-fds communicates which fds make has opened to communicate with the jobserver. For the the submake to be able to use the jobserver you should thus preserve, or arrange to be available, those specific fds (or else re-write the environment variable appropriately to point them to whichever fd they end up on).
NAME=VALUE is arguments passed by me to the original make.
I created a client-server application and now I would like to deploy it.
While development process I started the server on a terminal and when I wanted to stop it I just had to type "Ctrl-C".
Now want to be able to start it in background and stop it when I want by just typing:
/etc/init.d/my_service {stop|stop}
I know how to do an initscript, but the problem is how to actually stop the process ?
I first thought to retrieve the PID with something like:
ps aux | grep "my_service"
Then I found a better idea, still with the PID: Storing it on a file in order to retrieve it when trying to stop the service.
Definitely too dirty and unsafe, I eventually thought about using sockets to enable the "stop" process to tell the actual process to shut down.
I would like to know how this is usually done ? Or rather what is the best way to do it ?
I checked some of the files in the init.d and some of them use PID files but with a particular command "start-stop-daemon". I am a bit suspicious about this method which seems unsafe to me.
If you have a utility like start-stop-daemon available, use it.
start-stop-daemon is flexible and can use 4 different methods to find the process ID of the running service. It uses this information (1) to avoid starting a second copy of the same service when starting, and (2) to determine which process ID to kill when stopping the service.
--pidfile: Check whether a process has created the file pid-file.
--exec: Check for processes that are instances of this executable
--name: Check for processes with the name process-name
--user: Check for processes owned by the user specified by username or uid.
The best one to use in general is probably --pidfile. The others are mainly intended to be used in case the service does not create a PID file. --exec has the disadvantage that you cannot distinguish between two different services implemented by the same program (i.e. two copies of the same service). This disadvantage would typically apply to --name also, and, additionally, --name has a chance of matching an unrelated process that happens to share the same name. --user might be useful if your service runs under a dedicated user ID which is used by nothing else. So use --pidfile if you can.
For extra safety, the options can be combined. For example, you can use --pidfile and --exec together. This way, you can identify the process using the PID file, but don't trust it if the PID found in the PID file belongs to a process that is using the wrong executable (it's a stale/invalid PID file).
I have used the option names provided by start-stop-daemon to discuss the different possibilities, but you need not use start-stop-daemon: the discussion applies just as well if you use another utility or do the matching manually.
I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.