gnuplot plot difference of 2 columns - plot

I have two files A and B. Both files contain 2 columns, x and y.
Now, I want to plot a graph for x vs (yA - yB). Does gnuplot provide a command for the same ?
One more thing, lets say xA and xB are not same. How should I plot a graph where x-axis contains all elements which are in both, xA and xB and y-axis is the difference is the corresponding y-components ?

First, preprocess the files with join in bash:
join <(sort -k1,1 file1) <(sort -k1,1 file2) > file3
Sorting the files is essential, otherwise join would not work.
Then you can use the result to draw the graph:
plot '< sort -n file3' using 1:($2-$3) with lines
Again, numeric sorting is needed here, because join uses alphanumeric sorting which makes the lines cross each other.

I think this might be a good job for paste.
plot "<paste A B" u 1:($2-$4) w points #whatever line style you want...
#xA #yA-yB
For the file where xA != xB, I'm a little unclear whether you want to plot only the set of points with are common to both (the intersection of the two sets) or whether you want to plot all the points (the union of the sets). The union is easy:
plot "<paste A B" u 1:($2-$4) w points ls 1,\
"<paste A B" u 3:($2-$4) w points ls 1
The intersection is hard using only unix commandline tools (especially if you want to preserve the order of your input)
using Python though, it's not too bad...
#joinfiles.py
import sys
f1=sys.argv[1]
f2=sys.argv[2]
xA,yA=zip(*[map(float,line.split()) for line in f1.readlines()])
xB,yB=zip(*[map(float,line.split()) for line in f2.readlines()])
f1.close()
f2.close()
for i,x in enumerate(xA):
if(x in xB):
sys.stdout.write('%f %f %f\n'%(x,yA[i],yB[i]))
and then from gnuplot:
plot "<python joinfiles.py A B" u 1:($2-$3) #...

Related

How to plot data from file from specific lines start at line with some special string

I am trying to execute command similar to
plot "data.asc" every ::Q::Q+1500 using 2 with lines
But i have problem with that "Q" number. Its not a well known value but number of line with some specific string. Lets say i have line with string "SET_10:" and then i have my data to plot after this specific line. Is there some way how to identify the number of that line with specific string?
An easy way is to pass the data through GNU sed to print just the wanted lines:
plot "< sed -n <data.asc '/^SET_10:/,+1500{/^SET_10:/d;p}'" using 1:2 with lines
The -n stops any output, the a,b says between which lines to do the {...} commands, and those commands say to delete the trigger line, and p print the others.
To make sure you have a compatible GNU sed try the command on its own, for a short number of lines, eg 5:
sed -n <data.asc '/^SET_10:/,+5{/^SET_10:/d;p}'
If this does not output the first 5 lines of your data, an alternative is to use awk, as it is too difficult in sed to count lines without this GNU-specific syntax. Test the (standard POSIX, not GNU-specific) awk equivalent:
awk <data.asc 'end!=0 && NR<=end{print} /^start/{end=NR+5}'
and if that is ok, use it in gnuplot as
plot "< awk <data.asc 'end!=0 && NR<=end{print} /^start/{end=NR+1500}'" using 1:2 with lines
Here's a version entirely within gnuplot, with no external commands needed. I tested this on gnuplot 5.0 patchlevel 3 using the following bash commands to create a simple dataset of 20 lines of which only 5 lines are to be printed from the line with "start" in column 1. You don't need to do this.
for i in $(seq 1 20)
do let j=i%2
echo "$i $j"
done >data.asc
sed -i data.asc -e '5a\
start'
The actual gnuplot uses a variable endlno initially set to NaN (not-a-number) and a function f which takes 3 parameters: a boolean start saying if column 1 has the matching string, lno the current linenumber, and the current column 1 value val. If the linenumber is less-than-or-equal-to the ending line number (and therefore it is not still NaN), f returns val, else if the start condition is true the wanted ending line number is set in variable endlno and NaN is returned. If we have not yet seen the start, NaN is returned.
gnuplot -persist <<\!
endlno=NaN
f(start,lno,val) = ((lno<=endlno)?val:(start? (endlno=lno+5,NaN) : NaN))
plot "data.asc" using (f(stringcolumn(1)eq "start", $0, $1)):2 with lines
!
Since gnuplot does not plot points with NaN values, we ignore lines upto the start, and again after the wanted number of lines.
In your case you need to change 5 to 1500 and "start" to "SET_10:".

Plotting multiple sets of information from file with Gnuplot

I have a file that looks like this:
0 0.000000
1 0.357625
2 0.424783
3 0.413295
4 0.417723
5 0.343336
6 0.354370
7 0.349152
8 0.619159
9 0.871003
0.415044
The last line is the mean of the N entries listed right above it. What I want to do is to plot a chart that has each point listed and a line with the mean value. I know it involves replot in some way but I can't read the last value separately.
You can make two passes using the stats command to get the necessary data
stats datafile u 1 nooutput
stats datafile u ($0==(STATS_records-1)?$1:1/0) nooutput
The first pass of stats will summarize the data file. What we are actually interested in is the number of records in the file, which will be saved in the variable STATS_records.
The second pass will compute a column to analyze. If the line number (the value of $0) is equal to one less than the number of records (lines are numbered from 0, so this is the last line), than we get this value, otherwise we get an invalid value. This causes the stats command to only look at this last line. Now the value of the last line is stored in STATS_max (or STATS_min and several other variables).
Now we can create the plot using
plot datafile u 1:2, STATS_max
where we explicitly state columns 1 and 2 to make the first plot specification ignore that last line (actually, if we just do plot datafile it should default to this column selection and automatically ignore that last line, but this makes certain). This produces
An alternative way is to use external programs to filter the data. For example, if we have the linux command tail available, we could do1
ave = system("tail -1 datafile")
plot datafile u 1:2, ave+0
Here, ave will contain the last row of the file as a string. In the plot command we add 0 to it to force it to change to a number (otherwise gnuplot will think it is a filename).
Other external programs can be used to read that last line as well. For example, the following call to python3 (using Windows style shell quotes) does the same:
ave = system('python -c "print(open(datafile,\"r\").readlines()[-1])"')
or the following using AWK (again with Windows style shell quotes) has the same result:
ave = system('awk "END{print}"')
or even using Perl (again with Windows shell quotes):
ave = system('perl -lne "END{print $last} $last=$_" datafile')
1 This use of tail uses a now obsolete (according to the GNU manuals) command line option. Using tail -n 1 datafile is the recommended way. However, this shorter way is less to type, and if forward compatibility is not needed (ie you are using this script once), there is no reason not to use it.
Gnuplot ignores those lines with missing data (for example, the last line of your datafile has no column 2). Then, you can simply do the following:
stats datafile using 2 nooutput
plot datafile using 1:2, STATS_mean
The result:
There is no need for using external tools or using stats (unless the value hasn't been calculated already, but in your example it has).
During plotting of the data points you can assign the value of the first column, e.g. to the variable mean.
Since the last row doesn't contain a second column, no datapoint will be plotted, but this last value will be hold in the variable mean.
If you replace reset session with reset and read the data from a file instead of a datablock, this will work with gnuplot 4.6.0 or even earlier versions.
Minimal solution:
plot FILE u (mean=$1):2, mean
Script: (nicer plot and including data for copy & paste & run)
### plot values as points and last value from column 1 as line
reset session
$Data <<EOD
0 0.000000
1 0.357625
2 0.424783
3 0.413295
4 0.417723
5 0.343336
6 0.354370
7 0.349152
8 0.619159
9 0.871003
0.415044
EOD
set key top center
plot $Data u (mean=$1):2 w p pt 7 lc rgb "blue" ti "Data", \
mean w l lw 2 lc rgb "red"
### end of script
Result:

Plotting the difference of two columns in the same file

I have a file with 12 columns. I'd like to plot the data with x axis being my 1st column and y axis being the difference between the 2nd and the 8th columns.
I tried plot "test.dat" using 1:(8-2) but naturally, it is interpreted as 1:6. How can I do this?
You are missing $, just add them and they will allow you to reference the column contents
plot "test.dat" using 1:($8-$2) w linespoints
$1 is a shortcut for column(1) and if the column numbers are stored in variables i and j you must use the column statement to select the respective columns:
i = 8
j = 2
plot "test.dat" using 1:(column(i)-column(j)) w lp
plot "< awk '{print $8 - $2}' test"

Exclude data in gnuplot with a condition

I have a data file with 3 column and I want to plot with 2 of them. But I want to use the third with a condition to exclude or not the line from the plot (For example, if $3 < 10 the data line isn't valid). I know there is set datafile missing but this case is somewhat peculiar and I don't know how to do that. Any help is appreciated...
You can use conditional logic in the using expression in the plot command:
plot 'data.dat' u 1:($3 < 10 ? 1/0 : $2)
This command plots 1/0 (it skips that data point) if the value in the third column is < 10, and otherwise plots the value in the second column.

Combine multiple data files in a single plot

I have several data files produced from a Fortran code. All the data files are following the same style regarding their names, that is: data###.out, where ### starts from 001 and ends to 500. I know that in order to read and plot several data files in gnuplot I must use
plot for [i=1:500] sprintf('data00%i.out', i) u 1:2 w d lc rgb 'black'
However, this only works until data009.out. For 010 to 099 it should skip one zero and from 100 to 999 two zeros. How can I obtain this?
To pad with zeroes an integer printed with three digits, the correct format is %03i. Try:
plot for [i=1:500] sprintf('data%03i.out', i) u 1:2 w d lc rgb 'black'

Resources