I'm running Serverspec integration smoke tests with Test Kitchen on a system built with Vagrant+Chef Solo. When i run kitchen test then the tests are started right after successful converge, and some of my tests fail because it takes time for the system to fully start up for the first time.
So i'm wondering what would be a good way to insert delay between converge and verify, otherwise preserving the default behaviour of kitchen test? Currently i have the following ideas:
write a shell script that does kitchen converge+check if converge was unsuccessful, then abort+sleep xx+kitchen verify+if successful then kitchen destroy. But this would not allow to run multiple suites on parallel (i'm testing multiple versions of the system).
create a recipe that just executes sleep xx and append it to the end of chef run list. This seems to work, but looks a bit too "hacky" for me.
Does anyone know a better way?
taavi
For now i went on with idea 2. Also created a feature request: https://github.com/test-kitchen/test-kitchen/issues/598
Related
So I'm using Hydra 1.1 and hydra-ax-sweeper==1.1.5 to manage my configuration, and run some hyper-parameter optimization on minerl environment. For this purpose, I load a lot of data in to memory (peak around 50Gb while loading with multiprocessing, drops to 30Gb after fully loaded) with multiprocessing (by pytorch).
On a normal run this is not a problem (My machine have 90+Gb RAM), one training finish without any issue.
However, when I run the same code with -m option (and hydra/sweeper: ax in config), the code stops after about 2-3 sweeper runs, getting stuck at the data loading phase, because all memories of the system (+swap memory) is occupied.
First I thought this was some issue with minerl environment code, which starts java-code in sub-process. So I tried to run my code without the environment (only the 30Gb data), and I still have the same issue. So I suspect I have some memory-leak inbetween the Hydra sweeper.
So my question is, How does Hydra sweeper(or ax-sweeper) work in-between sweeps? I always had the impression that it runs the main(cfg: DictConfig) decorated with #hydra.main(...), takes a scalar return(score) and run the Bayesian optimizer with this score, with main() called similar to a function (everything inside being properly deallocated/garbage collected between each sweep-run).
Is this not the case? Should I then load the data somewhere outside the main() and keep it between sweeps?
Thank you very much in advance!
The hydra-ax-sweeper may run trials in parallel, depending on the result of calling the get_max_parallelism function defined in ax.service.ax_client.
I suspect that your machine is running out of memory because of this parallelism.
Hydra's Ax plugin does not currently have a config group for configuring this max_parallelism setting, so it is automatically set by ax.
Loading the data outside of main (as you suggested) may be a good workaround for this issue.
Hydra sweepers in general does not have a facility to control concurrency. This is the responsibility of the launcher you are using.
The built-in basic launcher runs the jobs serially so it should not trigger memory issues.
If you are using other launchers, you may need to control their parallelism via Launcher specific parameters.
I am using karate 9.0.0 and running feature files in parallel and generating
cucumber report using karate parallel run code. Problem is that in the report in feature overview its showing the total time execution as
feature 1 execution time + feature 2 execution time +feature 3 execution time = total execution time
but actual execution time is less if i am running features in parallel in more than 1 thread. How can i show and calculate the test suite run time.
It is reported on the console. I don't understand why you need to worry about it as long as your tests are running reasonably fast.
Anyway, if you really want to capture this, just use the methods of the Results class that you get from the Runner.parallel(). For example you have a getElapsedTime() method.
I got robot framework tests in below schema:
Suite Setup
Test Case 1
Test Case 2
Test Case 3
...
Suite Teardown
In tear down step I have got a loop that goes through all tests cases and do some additional checks for all test cases (i can do this when test cases are execute because it need to wait some time for some operations in external system). If any of this checks will fail, the teardown step is fail and it also fail every test case. I can set tear down keyword to don't fail tear down step but than I will have all pass in test suite.
Is there any option/feature (or walkaround) that will give me possibility to set status and error message of selected test case in tear down step (something like tc[23].status=fail, tc[23].message='something'.
This is not possible, at least not out-of-the-box. In any event I also think this is not a desirable test approach. Every test should be self contained and all the logic to assess PASS or FAIL should be in that test. Revisiting the result is in my view an anti-pattern.
It is understandable that when there is a large pause of inactivity that you would like to progress with your tests. However, I think that parallelising your tests is a better and more stable approach. For Robot Framework there is Pabot to help you with that, but creating your own test runner is possible.
We are using Robot framework and RIDE tool for test case execution. we have 100+ testcases and test execution takes more than 6 hours to complete.
RF result and log html is great for viewing results. But these 2 files are viewable only after completion of test case execution.
Is there any plugin / tool or mechanism to view the testcase result status during execution. in RIDE tool -"Run" tab - only shows pass:<> fail:<> and not very user useful.
Need real time testcase status report instead of waiting for completion
You can use the listener interface. With it, you can have robot framework call a python function each time a keyword, testcase or suite starts and finishes. For the case where they finish, the data that is passed in will include the pass or fail status.
Using the listener interface (as Bryan Oakley suggested) is surely the most flexible way to intercept test progession status. If you are looking for tools, Jenkins (with Robot Framework plugin) gives you the opportunity to follow a test run in real time at test case granularity. Just start a job and switch to (Jenkins) console to see the output dropping in.
I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.