I am using gnuplot combined with AWK to plot 2D bar plot from the following input data:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
lig_608#O3 HIE_163#HE2 HIE_163#NE2 498 0.5304 2.8317 153.0580
lig_608#O GLU_166#H GLU_166#N 476 0.5069 2.8858 161.7174
lig_608#O1 HIE_41#HE2 HIE_41#NE2 450 0.4792 2.8484 158.5193
THR_26#O lig_608#H9 lig_608#N1 399 0.4249 2.8312 149.9578
lig_608#O2 THR_26#H THR_26#N 312 0.3323 2.9029 164.8033
lig_608#O1 ASN_142#HD21 ASN_142#ND2 14 0.0149 2.8445 158.4224
lig_608#O1 GLN_189#HE22 GLN_189#NE2 2 0.0021 2.8562 149.7421
lig_608#O1 GLN_189#HE21 GLN_189#NE2 1 0.0011 2.7285 158.4377
lig_608#O3 GLY_143#H GLY_143#N 1 0.0011 2.7421 147.8213
My script takes the data from the third and 5th columns considering only the lines where the value from the 5th column > 0.05, producing bar graph
cat <<EOS | gnuplot > graph.png
set term pngcairo size 800,600
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS
!EDITED:
within my bash workflow the script looks like
for file in "${output}"/${target}*.log ; do
file_name3=$(basename "$file")
file_name2="${file_name3/.log/}"
file_name="${file_name2/${target}_/}"
echo "vizualisation with Gnuplot!"
cat <<EOS | gnuplot > ${output}/${file_name2}.png
set title "$file_name" font "Century,22" textcolor "#b8860b"
set tics font "Helvetica,12"
#set term pngcairo size 1280,960
set term pngcairo size 800,600
set yrange [0:1]
set xtics noenhanced
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.05' $file" using 0:5:xtic(3) with boxes
EOS
done
This is the image produced from following filtered data:
HIE_163#NE2 0.5304
GLU_166#N 0.5069
HIE_41#NE2 0.4792
lig_608#N1 0.4249
THR_26#N 0.3323
I need to modify my awk searching expression integrated in the
gnuplot that makes selection of the two columns from the whole data.
Instead of taking the index from the third column (Donor) from each
line I need to take it either from the first (#Acceptor) or form the
third (#Donor) column. The index should be taken from one of these
columns depending on the lig_* pattern. E.g. if the data in the
(#Acceptor) column starts from lig* I need to take the value from the
third column (#Donor) of the same line and visa verse (lig* pattern
presents either in the 1st column or in the 3rd but not in the both..)
Taking my example, the filtered data with the updated searching should become:
HIE_163#NE2 0.5304 # the first index from the third column
GLU_166#N 0.5069 # the first index from the third column
HIE_41#NE2 0.4792 # the first index from the third column
THR_26#O 0.4249 # !!!! the first index from the first column !!
THR_26#N 0.3323 # the first index from the third column
No need for awk, you can do it all in gnuplot (hence platform-independent).
This would be my first attempt. You will filter by plotting the unwanted data at x-value of NaN, however, this will give some warnings: warning: add_tic_user: list sort error which you can ignore.
But this can probably be avoided by some changes.
Edit: the original script would have failed when the first line had a value <0.05 in column 5. Here are two versions which don't have this problem. There will also be no warnings. Maybe these attempts can be further simplified.
For creating an output file simply add this to your script: (check help output)
set term pngcairo size 800,600
set output "myOutputFile.png"
...<your script>...
set output
Data: SO73961783.dat
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
lig_608#O3 HIE_163#HE2 HIE_163#NE2 498 0.5304 2.8317 153.0580
lig_608#O GLU_166#H GLU_166#N 476 0.5069 2.8858 161.7174
lig_608#O1 HIE_41#HE2 HIE_41#NE2 450 0.4792 2.8484 158.5193
THR_26#O lig_608#H9 lig_608#N1 399 0.4249 2.8312 149.9578
lig_608#O2 THR_26#H THR_26#N 312 0.3323 2.9029 164.8033
lig_608#O1 ASN_142#HD21 ASN_142#ND2 14 0.0149 2.8445 158.4224
lig_608#O1 GLN_189#HE22 GLN_189#NE2 2 0.0021 2.8562 149.7421
lig_608#O1 GLN_189#HE21 GLN_189#NE2 1 0.0011 2.7285 158.4377
lig_608#O3 GLY_143#H GLY_143#N 1 0.0011 2.7421 147.8213
Script 1:
Filter your data and write it into a new table. If condition >0.05 is not met, write an empty line. Probably the easiest to understand and gives the shortest final plot command.
### conditional xtic labels
reset session
set termoption noenhanced
FILE = "SO73961783.dat"
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
set grid y
set xrange[-1:5]
set table $Filtered
myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? strcol(col2) : strcol(col1)
plot FILE u ((y0=column(5))>0.05 ? sprintf("%g %s",y0,myTic(1,3)) : '') w table
unset table
plot $Filtered u 0:1:xtic(2) w boxes
### end of script
Script 2:
Without extra table, but a more complex plot command. Increase the x-position x0 if a value>0.05 is found (except for the first time) and keep the previous position and and label (i.e. overwrite it) if a value<=0.05 is found.
### conditional xtic labels
reset session
set termoption noenhanced
FILE = "SO73961783.dat"
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
set key off
set style fill solid 0.5
set boxwidth 0.9
set grid y
set xrange[-1:5]
myTic(col1,col2) = strcol(col1)[1:3] eq 'lig' ? strcol(col2) : strcol(col1)
plot x0=c=(t0='',0) FILE u ((y0=column(5))>0.05 ? (c==0 ? (c=1,t0=myTic(1,3)) : (x0=x0+1,t0=myTic(1,3))) : (y0=NaN),x0):(y0):xtic(t0) w boxes
### end of script
Result:
As you potentially want to do more complicated processing with awk, I would
suggest an alternative way of mixing awk and gnuplot.
Gnuplot supports including inline data in its script files, so you could have awk generate the inline data while supplying the plot-configuration with bash, all done in a sub-shell. For example:
(
printf '$data << EOD\n'
awk 'NR>1 && $5>0.05 { print $1 ~ /^lig/ ? $3 : $1, $5 }' infile
cat << EOS
EOD
set term pngcairo size 1280,960 font ",20"
set output "output.png"
set xtics noenhanced
set ytics 0.02
set grid y
set key off
set style fill solid 0.5
set boxwidth 0.9
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
plot "\$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
EOS
)
Would produce this gnuplot script:
$data << EOD
HIE_163#NE2 0.5304
GLU_166#N 0.5069
HIE_41#NE2 0.4792
THR_26#O 0.4249
THR_26#N 0.3323
EOD
set term pngcairo size 1280,960 font ",20"
set output "output.png"
set xtics noenhanced
set ytics 0.02
set grid y
set key off
set style fill solid 0.5
set boxwidth 0.9
set xlabel "Fraction, %"
set ylabel "H-bond donor, residue"
plot "$data" using 0:2:xtic(1) with boxes, "" using 0:2:2 with labels offset 0,1
Pipe it to Gnuplot, i.e. (...) | gnuplot and get this in output.png:
I have to extend the output and the solution of my project (make an exams scheduling):
-Extend the structure to five days (I have always worked on one day):
I thought about moltiply the number of days for slotstimes (5*10) and then I tune the output! Is there a better way?
Now the whole code:
include "globals.mzn";include "alldifferent.mzn";
%------------------------------Scalar_data----------------------
int: Students; % number of students
int: Exams; % number of exams
int: Rooms; % number of rooms
int: Slotstime; % number of slots
int: Days; % a period i.e. five days
int: Exam_max_duration; % the maximum length of any exam (in slots)
%------------------------------Vectors--------------------------
array[1..Rooms] of int : Rooms_capacity;
array[1..Exams] of int : Exams_duration; % the duration of written test
array[1..Slotstime, 1..Rooms] of 0..1: Unavailability;
array[1..Students,1..Exams] of 0..1: Enrollments;
Enrollments keeps track of the registrations for every student;
from this I obtain the number of students which will be at the exam,
in order to choose the right room according to the capacity
%---------------------------Decision_variables------------------
array[1..Slotstime,1..Rooms] of var 0..Exams: Timetable_exams;
array[1..Exams] of var 1..Rooms: ExamsRoom;
array[1..Exams] of var 1..Slotstime: ExamsStart;
%---------------------------Constraints--------------------------
% Calculate the number of subscribers and assign classroom
% according to time and capacity
constraint forall (e in 1..Exams,r in 1..Rooms,s in 1..Slotstime)
(if Rooms_capacity[r] <= sum([bool2int(Enrollments[st,e]>0)| st in 1..Students])
then Timetable_exams[s,r] != e
else true
endif
);
% Unavailability OK
constraint forall(c in 1..Slotstime, p in 1..Rooms)
(if Unavailability[c,p] == 1
then Timetable_exams[c,p] = 0
else true
endif
);
% Assignment exams according with rooms and slotstimes (Thanks Hakan)
constraint forall(e in 1..Exams) % for each exam
(exists(r in 1..Rooms) % find a room
( ExamsRoom[e] = r /\ % assign the room to the exam
forall(t in 0..Exams_duration[e]-1)
% assign the exam to the slotstimes and room in the timetable
(Timetable_exams[t+ExamsStart[e],r] = e)
)
)
/\ % ensure that we have the correct number of exam slots
sum(Exams_duration) = sum([bool2int(Timetable_exams[t,r]>0) | t in 1..Slotstime,
r in 1..Rooms]);
%---------------------------Solver--------------------------
solve satisfy;
% solve::int_search([Timetable_exams[s, a] | s in 1..Slotstime, a in
% 1..Rooms],first_fail,indomain_min,complete) satisfy;
And now the output, extremely heavy and full of strings.
%---------------------------Output--------------------------
output ["\n" ++ "MiniZinc paper: Exams schedule " ++ "\n" ]
++["\nDay I \n"]++
[
if r=1 then "\n" else " " endif ++
show(Timetable_exams[t,r])
| t in 1..Slotstime div Days, r in 1..Rooms
]
++["\n\nDay II \n"]++
[
if r=1 then "\n" else " " endif ++
show(Timetable_exams[t,r])
| t in 11..((Slotstime div Days)*2), r in 1..Rooms
]
++["\n\nDay III \n"]++
[
if r=1 then "\n" else " " endif ++
show(Timetable_exams[t,r])
| t in 21..((Slotstime div Days)*3), r in 1..Rooms
]
++["\n\nDay IV \n"]++
[
if r=1 then "\n" else " " endif ++
show(Timetable_exams[t,r])
| t in 31..((Slotstime div Days)*4), r in 1..Rooms
]
++["\n\nDay V \n"]++
[
if r=1 then "\n" else " " endif ++
show(Timetable_exams[t,r])
| t in 41..Slotstime, r in 1..Rooms
]
++[ "\n"]++
[
"\nExams_Room: ", show(ExamsRoom), "\n",
"Exams_Start: ", show(ExamsStart), "\n",
]
++["Participants: "]++
[
if e=Exams then " " else " " endif ++
show (sum([bool2int(Enrollments[st,e]>0)| st in 1..Students]))
|e in 1..Exams
];
I finish with data:
%Data
Slotstime=10*Days;
Students=50;
Days=5;
% Exams
Exams = 5;
Exam_max_duration=4;
Exams_duration = [4,1,2,3,2];
% Rooms
Rooms = 4;
Rooms_capacity = [20,30,40,50];
Unavailability = [|0,0,0,0 % Rooms rows % Slotstime columns
|0,0,0,0
|0,0,0,0
|0,0,0,0
|1,1,1,1
|1,1,1,1
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
% End first day
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
|1,1,1,1
|1,1,1,1
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
% End secon day
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
|1,1,1,1
|1,1,1,1
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
% End third day
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
|1,1,1,1
|1,1,1,1
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
% End fourth day
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
|1,1,1,1
|1,1,1,1
|0,0,0,0
|0,0,0,0
|0,0,0,0
|0,0,0,0
%End fifth day
|];
Enrollments= [|1,0,1,0,1 % Exams rows %Students columns
|1,0,1,0,1
|0,1,0,0,0
|1,0,0,1,0
|0,1,0,0,0
|0,0,1,1,0
|1,0,0,1,0
|0,0,0,0,1
|1,0,0,0,1
|0,0,0,0,1
|0,1,0,0,0
|0,0,0,0,0
|0,1,0,0,1
|0,0,1,0,1
|1,0,1,0,1
|1,0,1,0,1
|0,1,0,0,0
|1,0,0,1,0
|0,1,0,0,0
|0,0,1,1,0
|1,0,0,1,0
|0,0,0,0,1
|1,0,0,0,1
|0,0,0,0,1
|0,1,0,0,0
|0,0,0,0,0
|0,1,0,0,1
|0,0,1,0,1
|1,0,1,0,1
|1,0,1,0,1
|0,1,0,0,0
|1,0,0,1,0
|0,1,0,0,0
|0,0,1,1,0
|1,0,0,1,0
|0,0,0,0,1
|1,0,0,0,1
|0,0,0,0,1
|0,1,0,0,0
|0,0,0,0,0
|0,1,0,0,1
|0,0,1,0,1
|1,0,1,0,1
|1,0,1,0,1
|0,1,0,0,0
|1,0,0,1,0
|0,1,0,0,0
|0,0,1,1,0
|1,0,0,1,0
|0,0,0,0,1
|];
Thanks in advance
For the output section, the following code should work. I only changed the Day schedule, the rest is unchanged.
output ["\n" ++ "MiniZinc paper: Exams schedule " ++ "\n" ]
++
[
if t mod 10 = 1 /\ r = 1 then
"\n\nDay " ++ show(d) ++ " \n"
else "" endif ++
if r=1 then "\n" else " " endif ++
show(Timetable_exams[t,r])
| d in 1..Days, t in 1+(d-1)*10..(Slotstime div Days)*d, r in 1..Rooms,
]
++[ "\n"]++
[
"\nExams_Room: ", show(ExamsRoom), "\n",
"Exams_Start: ", show(ExamsStart), "\n",
]
++["Participants: "]++
[
if e=Exams then " " else " " endif ++
show (sum([bool2int(Enrollments[st,e]>0)| st in 1..Students]))
|e in 1..Exams
];
If it's a requirement that the days should be numbered with "I","II", etc then you can define a string array with the day names, e.g.
array[1..Days] of string: DaysStr = ["I","II","III","IV","V"];
and then use it in the output loop:
% ....
if t mod 10 = 1 /\ r = 1 then
"\n\nDay " ++ DaysStr[d] ++ " \n" % <---
else "" endif ++
% ....
Later update:
One other thing to make the model a little more general (and smaller) is to replace the huge Unavailability matrix (and the constraint using it) with this:
set of int: UnavailabilitySlots = {5,6};
% ....
constraint
forall(c in 1..Slotstime, p in 1..Rooms) (
if c mod 10 in UnavailabilitySlots then
Timetable_exams[c,p] = 0
else
true
endif
);
Yet another comment:
The original model has a flaw in that it allow exams that will be over two days, e.g. the 2 last hours of day I and the first 2 hours of day II. I think the following extra (and not so pretty) constraint will fix that. Again, the magic "10" is used.
constraint
% do not pass over a day limit
forall(e in 1..Exams) (
not(exists(t in 1..Exams_duration[e]-1) (
(ExamsStart[e]+t-1) mod 10 > (ExamsStart[e]+t) mod 10
))
)
;