I've an ANT target like the one below:
<target name="validate_master_resources" description="Ensure the target is valid for local test run">
<exec executable="bash" outputproperty="swap">
<arg value="-c"/>
<arg value="free -m | grep '^Swap:' | awk '{print $4}'"/>
</exec>
<if>
<islessthan arg1="${swap}" arg2="5000"/>
<then>
<echo>the value of the swap memory is: ${swap}</echo>
<sleep minutes="10"/>
<runtarget target="validate_master_resources"/>
</then>
</if>
</target>
As you can see it's a kind of recursive target, validate_master_resources call itself if some condtions are encountered.
I decided to use runtarget (instead of ant call), to avoid the creation of a new ANT processes.
The job when tested ad hoc for few mins, works as expected, but in the first 2 "Real world" runs, it get stuck after 20 minutes...
I'm running the job in Jenkins and as said the job just stuck after 20 minutes
As you can see below, it has been called many times then stuck for hours and i need to kill it...
validate_master_resources:
[echo] The number of current Java processes is: 33 summed to the 45 is 78
validate_master_resources:
[echo] The number of current Java processes is: 33 summed to the 45 is 78
validate_master_resources:
[echo] The number of current Java processes is: 33 summed to the 45 is 78
validate_master_resources:
[echo] The number of current Java processes is: 33 summed to the 45 is 78
validate_master_resources:
[echo] The number of current Java processes is: 33 summed to the 45 is 78
validate_master_resources:
[echo] The number of current Java processes is: 33 summed to the 45 is 78
Now, my question is, can this problem (the build stuck) be related to the thing that this ant task is recursive?
Are there any task process/tags that helps me manage recursiions like that?
Am I using the task in an illegal mode, and the wrong behaviour depends by this particular loop that i created?
Thanks
there were no issue here, stupid question, the only problem was that i was not un-setting the variables in the loop and i was using always the same values, this caused issues in the rest of the script that were causing the process to hung...
Resolved! Always be careful using variables in ant, the scope sometime can be a pain to manage!
Related
I am trying to run my julia code on multiple nodes of a cluster, which uses Moab and Torque for the scheduler and resource manager.
In an interactive session where I requested 3 nodes, I load julia and openmpi modules and run:
mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation julia --project=. "./estimation/test.jl"
The mpirun does successfully recognize my 3 nodes since it displays:
====================== ALLOCATED NODES ======================
comp-bc-0383: slots=24 max_slots=0 slots_inuse=0 state=UP
comp-bc-0378: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
comp-bc-0372: slots=24 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
However, after that it returns an error message
--------------------------------------------------------------------------
mpirun was unable to find the specified executable file, and therefore
did not launch the job. This error was first reported for process
rank 48; it may have occurred for other processes as well.
NOTE: A common cause for this error is misspelling a mpirun command
line parameter option (remember that mpirun interprets the first
unrecognized command line token as the executable).
Node: comp-bc-0372
Executable: /opt/aci/sw/julia/1.5.3_gcc-4.8.5-ips/bin/julia
--------------------------------------------------------------------------
What could be the possible cause of this? Is it because it has trouble accessing julia from other nodes? (I think this is the case because the code runs as long as -np X where x <= 24, which is the number of slots for one node; as soon as x >= 25, it fails to run)
Here a good manual how to work with modules and mpirun. UsingMPIstacksWithModules
To sum it up with what is written in the manual:
It should be highlighted that modules are nothing else than a structured way to manage your environment variables; so, whatever hurdles there are about modules, apply equally well about environment variables.
What you need is to export the environment variables in your mpirun command with -x PATH -x LD_LIBRARY_PATH. To see if this worked you can then run
mpirun -np 72 --hostfile $PBS_NODEFILE -display-allocation -x PATH -x LD_LIBRARY_PATH which julia
Also, you should consider giving the whole path of the file you want to run, so /path/to/estimation/test.jl instead of ./estimation/test.jl since your working directory is not the same in every node. (In general it is always safer to use whole paths).
By using whole paths, you should also be able to use /path/to/julia (that is the output of which julia) instead of only julia, this way you should not need to export the environment variables.
I have an LVM hard drive. It holds all my media for use by Kodi. It occasionally (about once a week) cannot access the media. Attempting to remount the device with sudo mount -a resulted in Input/Output error.
The solution from various sources was that it contains badblocks so I have run fsck -cc /dev/icybox/media to do a non-destructive read-write badblocks check.
It took 5 days but finally it finished, good news, no read or write errors but a couple of hundred corrupted blocks.
Here is some of the output:
# fsck -cc -V /dev/icybox/media
fsck from util-linux 2.34
[/usr/sbin/fsck.ext4 (1) -- /mnt/icybox] fsck.ext4 -cc /dev/mapper/icybox-media
e2fsck 1.45.5 (07-Jan-2020)
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done
/dev/mapper/icybox-media: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 55640069: 849596509
Multiply-claimed block(s) in inode 55640261: 448514694
Multiply-claimed block(s) in inode 55641058: 465144485
Multiply-claimed block(s) in inode 55641147: 470406248
...and lots more Multiply-claimed block(s)
Then this:
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 190 inodes containing multiply-claimed blocks.)
File /TV Shows/Arrested Development/Arrested Development - Season 1/Arrested Development - 119 - Best Man for the Gob.mkv (inode #55640069, mod time Sat May 5 11:19:03 2018)
has 1 multiply-claimed block(s), shared with 1 file(s):
<The bad blocks inode> (inode #1, mod time Thu May 20 22:36:40 2021)
Clone multiply-claimed blocks<y>? yes
There are a bunch more files saying they have 1 multiply-claimed blocks shared with 1 file on inode #1. Should I say yes to the clone question?
All the files shown are shared with bad block inode #1, according to https://unix.stackexchange.com/questions/198673/why-does-have-the-inode-2 inode#1 stores the badblocks.
So I have a bunch of question:
How can this file be shared with badblocks?
Is the badblocks list incorrect/corrupted?
Is there a way to clear the badblocks list and do another scan to start over to fill it correctly?
I am not too bothered about losing the data of individual media files so long as I can take a list to re-download them.
P.S. Not sure if it is relevant, I had run the same fsck command before this and it was interrupted by a power outage so I don't know if that would cause a corrupted badblock inode #1.
I ran it another time which got to about 70% and then something went wrong and every block was becoming a read error (I think it became Input/Output error again) so I am worried all those blocks were listed as badblocks, I cancelled the process when I noticed it at about 70% so it didn't finish.
Thanks for any help and answers
As outlined in http://wiki.bitplan.com/index.php/Apache_Jena#Script_to_start_Fuseki_server
I have been avoiding the complexity of Fuseki configuration files and started the server from a script for my usecases in which I only need one dataset/endpoint. For multiple datasets/endpoints i simply used multiple servers.
Descriptions like:
https://jena.apache.org/documentation/fuseki2/fuseki-config-endpoint.html
and questions like:
fuseki Multiple services found exception
have been intimidating me since there seem to be so many options and no straight forward way to simply say: please use these dataset from the following directories as the command line version can do for one dataset.
Just look at:
https://users.jena.apache.narkive.com/MNZHLT25/multiple-datasets-on-fuseki
where the user expectation:
java -jar fuseki-0.1.0-server.jar --update --loc=data /dataset
--loc=data2 /dataset2
can be seen that is unfortunately not fullfilled. Instead:
http://jena.apache.org/documentation/serving_data/index.html#fuseki-configuration-file
was the answer at the time which is now an outdated link.
So obviously there are people out there getting fuseki to work with multiple datasets. But how do they do it ?
I know how to load a TDB store from a triple file via command line. I know that i could use the WebGUI to setup datasets and load data but that won't work for my multi million (and partly multi-billion) triple files.
What is a (hopefully simple) example for loading multiple triple files and making the result available with the same fuseki server as different datasets and have the SPARQL endpoints running (partly read-only?)
https://jena.apache.org/documentation/fuseki2/fuseki-layout.html gives a hint on the layout of files.
Using the script to start fuseki i inspected the run directory which in my case was to be found at:
apache-jena-fuseki-3.16.0/run
There are two subdirectories which are initially empty and stay so if you run things from the commandline:
configuration
database
By adding a dataset via the webgui http://localhost:3030
a directory with the name of the dataset in this case:
databases/cr
and a configuration file
configuration/cr.ttl
is created.
For smaller datasets data can now be added via the webgui. For bigger datasets a copy or symlink of the original loaded tdb data is necessary in the databases directory.
example symlinks:
zeus:databases wf$ls -l
total 48
drwxr-xr-x 4 wf admin 136 Sep 14 07:43 cr
lrwxr-xr-x 1 wf admin 27 Sep 15 11:53 dblp -> /Volumes/Torterra/dblp/data
lrwxr-xr-x 1 wf admin 26 Sep 14 08:10 gnd -> /Volumes/Torterra/gnd/data
lrwxr-xr-x 1 wf admin 42 Sep 14 07:55 wikidata -> /Volumes/Torterra/wikidata2020-08-15/data/
By restarting the server without a --loc
nohup java -jar fuseki-server.jar&
the configurations are automatically picked up.
The good news is that you do not have to bother with the details of the config files this way as long as you do not have any special needs.
I have 4 test projects and wanna run all of them on TeamCity in parallel.
So can I do that? If I can then how?
It is ok parallel execution by fixtures on all test projects, but I hope that I can run these vstest.console commands in parallel?
vstest.console command does run here not in parallel
My answer might not be applicable for your case, depending on resources you have.
My idea requires you to have 4 agents running, so you can use them in parallel.
To do so, you want is to create 4 builds config (one for each of yours parallel run) (named : testRun0, testRun1, testRun2, testRun3).
Then you can add another build config (could be named: "testReport") that has the 4 others as a "snapshot dependency".
in that case, every time a trigger occurs in the 5th build config, it will trigger the 4 other first.
there is a good example in the jetbrain doc: https://blog.jetbrains.com/teamcity/2019/10/build-chains-teamcitys-blend-of-pipelines-part-2-running-builds-in-parallel/
look at "Composite build configuration" part.
All what we need is:
1 console runner step
vs build tools (vstest.console.exe)
To run 4 dll's in parallel all what we need is call in cmd vstest.console.exe with 4 dll files separated by space.
Like this: https://learn.microsoft.com/en-us/visualstudio/test/vstest-console-options?view=vs-2019#code-try-1
We can log them using parameters /logger:logger://teamcity /logger:console;verbosity=normal
So final command looks like:
<path_to_vstest.console> vstest.console.exe MSTest.dll UnitTest1.dll UnitTest2.dll UnitTest3.dll /logger:logger://teamcity /logger:console;verbosity=normal /Parallel
For some reason phpagi script randomly stop's in the middle. This only happens once every 20-50 calls. I was able to notice several such 'fails' realtime in asterisk CLI. No error's were displayed. Script sent several verbose messages until just stopped.
Script is used for billing, it has several while's and sql queries. max execution time set to 30seconds.
I didn't find any errors in the /var/log/messages or /var/log/asterisk/messages
Asterisk 1.6.2.24
PHP 5.1.6 (cli) (built: Jun 27 2012 12:21:13)
[context-x]
exten => 1,1,Dial(SIP/XXXXXX)
exten => h,1,AGI(script.php)
Any ideas why it could just stop for random calls?
Thanks.
UPDATE:
I've just noticed that when the problem occurs AGI returns 4:
-- <SIP/xxxxxx-00000185>AGI Script script.php completed, returning 4
What's wrong?
From what I'm hearing from your symptoms, it doesn't sound like it's specifically an Asterisk issue, but, could be an issue with your script. Here's a number of tips regarding how I would go about debugging an application without any knowledge of what's happening behind the scenes of the AGI script itself...
Firstly, AGI Script script.php completed, returning 4 means that your script is exiting without a clean exit status code. returning 0 is what you'd like to see. You can see the last exit status code at your bash prompt by running a script and then checking the status code with the $? variable. Like so:
[user#host ~]$ ./script.php
[user#host ~]$ echo $?
0
That's what Asterisk is telling you what happened to your script. Anything non-zero is "something's wrong here". Generally, you can customize these to your liking, so specifically 4, I'm not sure.
One thing you'll want to do is turn on agi debugging like so:
host*CLI> agi set debug on
Then run your script, and see if you can find that your php script is spitting out any errors.
Another recommendation that I would have is to make sure your php is logging to your syslog so that you can find errors in /var/log/messages. You can do this by setting in your /etc/php.ini this line:
error_log = syslog
Lastly, to try to replicate the error, I would suggest using a development box, and originate a bunch of calls for yourself, and build a script to create a bunch of "call files".
Here's a call file to get you started:
Channel: LOCAL/100#mycontext
MaxRetries: 2
RetryTime: 60
WaitTime: 30
Application: Wait
Data: 30
When you've created a file, move it to /var/spool/asterisk/outgoing (Moving is important, you want the pointer move, because asterisk may pick up the file too soon if you're writing to it in that directory first).
You can also originate a call on an extension from the CLI:
host*CLI> channel originate LOCAL/100#mycontext application Wait 5
You may also want to use other options in the call file, such as CallerID: John Doe <8005551212> in order to feed interesting data to your AGI application while you create tests to replicate the issue.