I have the following makefile.
I would like step0 to run then I would like all of the b*.R scripts to run at the same time in step1. When step1 is complete I would like final to run.
When I run make or make -j 8 it seems like all of the b*.R files still run sequentially. Is this makefile set up correctly to run all of the b*.R files at the same time? If not what do I need to change.
final : step1
Rscript c.R
step1 : step0
Rscript b1.R
Rscript b2.R
Rscript b3.R
Rscript b4.R
Rscript b5.R
Rscript b6.R
step0 :
Rscript a.R
When I run make or make -j 8 it seems like all of the b*.R files still run sequentially.
-jN allows parallel execution of different recipes, not the individual commands constituting a recipe.
So the makefile should be restructured like this:
.PHONY: final b1 b2 b3 b4 b5 b6 step0
final: b1 b2 b3 b4 b5 b6
b1 b2 b3 b4 b5 b6: step0
b1: ;Rscript b1.R
b2: ;Rscript b2.R
b3: ;Rscript b3.R
b4: ;Rscript b4.R
b5: ;Rscript b5.R
b6: ;Rscript b6.R
step0: ;Rscript a.R
If you want make to handle the parallelism for you, you need to restructure the makefile to have different targets. For example:
step1: b1 b2 b3 b4 b5 b6
b1: step0
Rscript b1.R
b2: step0
Rscript b2.R
...
step0 :
Rscript a.R
Or, you could let the shell do the parallelism and write:
step1: step0
Rscript b1.R & Rscript b2.R & \
Rscript b3.R & ... & wait
I would recommend the former.
Related
I am running into a problem while designing a solution using autosys. Looking for some inputs on this scenario:
I have three job Boxes viz. BoxA,BoxB and BoxC.
BoxA has two jobs inside it A1 and A2 respectively and I have configured the last job in this box with two exit codes as success codes 0 and 10 respectively.
Now depending upon the exit code of the job in this box, I want to trigger either BoxB( if exit code is 0 ) or BoxC( if exit code is 10 ).
Additional information for BoxB and BoxC:
BoxB has 5 jobs in it names B1,B2...B5 and this box will kick off when exit code of A2 will be 0.
BoxC has 7 jobs in it and this box will kick off if either A2 exits with code 10 OR B5 goes into success.
ISSUE description:
If A2 exits with code 10 then in that case the solution is working as expected amd BoxC gets kicked off.
However, if A2 exits with code 0 both BoxB and BoxC get kock off.
This is the starting condition of BoxC:
(e(A2)=10) or s(B5)
Please advise.
Harsh,
As stated, jobs are A1 A2 B1 B2 B3 B4 B5 C1 C2 C3 C4 C5 C6 C7.
All jobs are under one box alone.
CASE I: Assuming the box start at 01:00 hours and job A2’s exit code is 0
insert_job: BOX_NAME
job_type: box
owner: ##
max_run_alarm: 0
alarm_if_fail: n
date_conditions: y
start_times: "01:00"
run_calendar: ##
send_notification: n
box_success: s(C7)
A1 Success
A2 Success with E=0
Job B1 defined as
insert_job: JOB_B1
condition: e(A2)=0
B1 Success
B2 .. B5 Success
Job C1 defined as
insert_job: JOB_C1
condition: e(A2)=10 | s(B5)
C1 .. C7 Success
box completed !!
CASE II: job A2’s exit code is 10
A1 Success
A2 Success with E=10
condition: e(A2)=0 #jobs remains activated
B1 .. B5 Activated
condition: e(A2)=10 | s(B5) # OR condition is fulfilled C1 starts
C1 .. C7 Success
B1 .. B5 jobs remain activated but box completes upon success of C7.
box completed !!
Hope this helps.
Let me know if i am clear enough.
My answer is considering only the happy cases and not into account of the extreme cases.
I am trying to use unix to transform a tab delimited file from a short/wide format to long format, in a similar way as the reshape function in R. I hope to create three rows for each row in the starting file. Column 4 currently contains 3 values separated by commas. I hope to keep columns 1, 2, and 3 the same for each starting row, but have column 4 be one of the values from the initial column 4. This example probably makes it more clear than I can describe verbally:
current file:
A1 A2 A3 A4,A5,A6
B1 B2 B3 B4,B5,B6
C1 C2 C3 C4,C5,C6
goal:
A1 A2 A3 A4
A1 A2 A3 A5
A1 A2 A3 A6
B1 B2 B3 B4
B1 B2 B3 B5
B1 B2 B3 B6
C1 C2 C3 C4
C1 C2 C3 C5
C1 C2 C3 C6
As someone just becoming familiar with this language, my initial thought was to use sed to find the commas replace with a hard return
sed 's/,/&\n/' data.frame
I am really not sure how to include the values for columns 1-3. I had low hopes of this working, but the only thing I could think of was to try inserting the column values with {print $1, $2, $3}.
sed 's/,/&\n{print $1, $2, $3}/' data.frame
Not to my surprise, the output looked like this:
A1 A2 A3 A4
{print $1, $2, $3} A5
{print $1, $2, $3} A6
B1 B2 B3 B4
{print $1, $2, $3} B5
{print $1, $2, $3} B6
C1 C2 C3 C4
{print $1, $2, $3} C5
{print $1, $2, $3} C6
It seems like an approach might be to store the values of columns 1-3 and then insert them. I am not really sure how to store the values, I think that it may involve using an adaptation of the following script, but I am having a hard time understanding all of the components.
NR==FNR{a[$1, $2, $3]=1}
Thanks in advance for your thoughts on this.
You can a write simple read loop for this and use brace expansion for parsing the comma delimited field:
#!/bin/bash
while read -r f1 f2 f3 c1; do
# split the comma delimited field 'c1' into its constituents
for c in ${c1//,/ }; do
printf "$f1 $f2 $f3 $c\n"
done
done < input.txt
Output:
A1 A2 A3 A4
A1 A2 A3 A5
A1 A2 A3 A6
B1 B2 B3 B4
B1 B2 B3 B5
B1 B2 B3 B6
C1 C2 C3 C4
C1 C2 C3 C5
C1 C2 C3 C6
As solution without calling an external program :
#!/bin/bash
data_file="d"
while IFS=" " read -r f1 f2 f3 r
do
IFS="," read f4 f5 f6 <<<"$r"
printf "$f1 $f2 $f3 $f4\n$f1 $f2 $f3 $f5\n$f1 $f2 $f3 $f6\n"
done <"$data_file"
In the great Miller there is the nest verb to do it
With
mlr --nidx --ifs "\t" nest --explode --values --across-records -f 4 --nested-fs "," input.tsv
you will have
A1 A2 A3 A4
A1 A2 A3 A5
A1 A2 A3 A6
B1 B2 B3 B4
B1 B2 B3 B5
B1 B2 B3 B6
C1 C2 C3 C4
C1 C2 C3 C5
C1 C2 C3 C6
If you don't need the output to be in any particular order within a group of the fourth column, the following awk one-liner might do:
awk '{split($4,a,","); for(i in a) print $1,$2,$3,a[i]}' input.txt
This works by splitting your 4th column into an array, then for each element of the array, printing the "new" four columns.
If order is important -- that is, A4 must come before A5, etc, then you can use a classic for loop:
awk '{split($4,a,","); for(i=1;i<=length(a);i++) print $1,$2,$3,a[i]}' input.txt
But that's awk. And you're asking about bash.
The following might work:
#!/usr/bin/env bash
mapfile -t arr < input.txt
for s in "${arr[#]}"; do
t=($s)
mapfile -t -d, u <<<"${t[3]}"
for v in "${u[#]}"; do
printf '%s %s %s %s\n' "${t[#]:0:3}" "${v%$'\n'}"
done
done
This copies your entire input file into the elements of an array, and then steps through that array, mapping each 4th-column into a second array. It then steps through that second array, printing the first three columns from the first array, along with the current field from the second array.
It's obviously similar in structure to the awk alternative, but much more cumbersome to read and code.
Note the ${v%$'\n'} on the printf line. This strips off the last field's trailing newline, which doesn't get stripped by mapfile because we're using an alternate delimiter.
Note also that there's no reason you have to copy all your input into an array, I just did it that way to demonstrate a little more of mapfile. You could of course use the old standard,
while read s; do
...
done < input.txt
if you prefer.
How can I merge two lines if the have met specific criteria in Unix terminal?
I have data like:
A1
B1
A2
B2
A3
A4
A5
B5
And I want to merge to like that:
A1, B1
A2, B2
A3,
A4,
A5, B5
Real data looks like this:
"224222"
<Frequency freq="0.136" allele="T" sampleSize="5008"/>
"224223"
<Frequency freq="0.3864" allele="T" sampleSize="5008"/>
"224224"
"224225"
<Frequency freq="0.3894" allele="G" sampleSize="5008"/>
"1801179"
"1861759"
I actually tried to add dummy deliminator texts to before the "A" data to separate them. But I couldn't achive it.
Using sed
sed 's/$/, /;N;/\n<Freq/{s/\n//};P;D' <file>
Explanation:
s/$/, / - Append a comma to the current line
N - Get the next line
/\n<Freq/{s/\n//} - If the second line contains <Freq, delete the newline
P - Print first portion of pattern space
D - Delete first portion of pattern space
It can be done using awk getline:
awk '{ if(condition){ if((getline var)>0) print $0","$var; else print $0; } else print $0;}' <file>
I have a file where I want to print every entry for a column i>N followed by the contents of the next column. Each line has the same number of columns. An example input:
a b c d
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
say in this case I want to skip the first column so the desired output would be
b
b1
b2
b3
c
c1
c2
c3
d
d1
d2
d3
I got close to what I wanted using
awk '{for(i=2; i<=NF; print $i; i++)}'
but this prints each entry in a line consecutively instead off all entries from each column consecutively.
Thanks in advance
If every line has same number of fields then you can do:
awk '
{
for(i=2;i<=NF;i++)
rec[i]=(rec[i]?rec[i]RS$i:$i)
}
END {
for(i=2;i<=NF;i++) print rec[i]
}' file
If the number of fields are uneven, then you need to remember which line has the maximum number of fields.
awk '
{
for(i=2;i<=NF;i++) {
rec[i]=(rec[i]?rec[i]RS$i:$i)
}
num=(num>NF?num:NF)
}
END {
for(i=2;i<=num;i++) print rec[i]
}' file
Output:
b
b1
b2
b3
c
c1
c2
c3
d
d1
d2
d3
Using cut would be easier here:
# figure out how many fields
read -a fields < <(sed 1q file)
nf=${#fields[#]}
# start dumping the columns.
n=3
for ((i = n; i <= nf; i++)); do
cut -d " " -f $i file
done
I have a C shell script that does something like this:
#!/bin/csh
gcc example.c -o ex
gcc combine.c -o combine
ex file1 r1 <-- 1
ex file2 r2 <-- 2
ex file3 r3 <-- 3
#... many more like the above
combine r1 r2 r3 final
\rm r1 r2 r3
Is there some way I can make lines 1, 2 and 3 run in parallel instead of one after the another?
Convert this into a Makefile with proper dependencies. Then you can use make -j to have Make run everything possible in parallel.
Note that all the indents in a Makefile must be TABs. TAB shows Make where the commands to run are.
Also note that this Makefile is now using GNU Make extensions (the wildcard and subst functions).
It might look like this:
export PATH := .:${PATH}
FILES=$(wildcard file*)
RFILES=$(subst file,r,${FILES})
final: combine ${RFILES}
combine ${RFILES} final
rm ${RFILES}
ex: example.c
combine: combine.c
r%: file% ex
ex $< $#
In bash I would do;
ex file1 r1 &
ex file2 r2 &
ex file3 r3 &
wait
... continue with script...
and spawn them out to run in parallel. You can check out this SO thread for another example.
#!/bin/bash
gcc example.c -o ex
gcc combine.c -o combine
# Call 'ex' 3 times in "parallel"
for i in {1..3}; do
ex file${i} r${i} &
done
#Wait for all background processes to finish
wait
# Combine & remove
combine r1 r2 r3 final
rm r1 r2 r3
I slightly altered the code to use brace expansion {1..3} rather than hard code the numbers since I just realized you said there are many more files than just 3. Brace expansion makes scaling to larger numbers trivial by replacing the '3' inside the braces to whatever number you need.
you can use
cmd &
and wait after
#!/bin/csh
echo start
sleep 1 &
sleep 1 &
sleep 1 &
wait
echo ok
test:
$ time ./csh.sh
start
[1] 11535
[2] 11536
[3] 11537
[3] Done sleep 1
[2] - Done sleep 1
[1] + Done sleep 1
ok
real 0m1.008s
user 0m0.004s
sys 0m0.008s
GNU Parallel would make it pretty like:
seq 1 3 | parallel ex file{} r{}
Depending on how 'ex' and 'combine' work you can even do:
seq 1 3 | parallel ex file{} | combine
Learn more about GNU Parallel by watching http://www.youtube.com/watch?v=LlXDtd_pRaY
You could use nohup ex :
nohup ex file1 r1 &
nohup ex file2 r2 &
nohup ex file3 r3 &
xargs can do it:
seq 1 3 | xargs -n 1 -P 0 -I % ex file% r%
-n 1 is for "one line per input", -P is for "run each line in parallel"