take the last data column with AWK to create a linepoints gnuplot - graph

I am working on gnuplot linepoints to create a comulative and normal distribution graph. I have created a file to provide the information to both graphs.
I got a problem when I was trying to plot the last data.
Here is the my script to create the second graph.
plot.plt
set term pos eps
set style data linespoints
set style line 1 lc 8 lt -1
set size 1,1
set yr [0:20]
set key below
set grid
set output 'output.eps'
plot "<awk '{i=i+$3; print $1,i}' data.dat" smooth cumulative t 'twitter' ls 1
data.dat
5.0 1 0.10
9.0 5 0.20
13.0 7 0.30
14.0 1 0.20
15.0 9 0.20
I want to create x axis with the first column and y axis with the last column. so the y axis range must between 0 to 1. which part should I change? thanks

Using smooth cumulative is enough, no need for awk. You are doing the same operation twice, once with gnuplot and once with awk. Simply do
plot 'data.dat' using 1:3 smooth cumulative

Related

xtics label with conditions using gnuplot

I am interested in labeling the xtics with strings from a column in a file.
The file is written in the following manner:
Index Name Status Value
1 Ver1 with 0.3
2 Ver1 without 0.25
3 Ver2 with 0.35
4 Ver3 with 0.27
The datas shall be plotted with a conditioned plot
plot file u (strcol(3) eq "with"?$1:1/0):($4) w p pt 7 notitle
The xtics shall be labeled with the data contained in column(2). If all values are used this can be done by xticlabel(2). But I only want to use the filtered data to get a plot like:
|
| x x x
|
----------------------------
ver1 ver2 ver3
The questions is: How can I label the xtics using only the filtered values?
Thanks in advance!
You could impose the same condition in xticlabel as well, i.e., to use column 2 if required or pass an undefined value instead:
plot 'file.dat' u \
(strcol(3) eq "with"?$1:1/0):4:xticlabel((strcol(3) eq "with")?strcol(2):1/0) \
w p pt 7 notitle

Two Boxplots for one X position using gnuplot

I have 2 sets of data A and B, each with a y value for x=100, 200, 300. I want to create one graph which shows the difference between these two data sets. As such this means that for each x, there will be two boxplots(one for data A and one for data B).
for example, this is how the columns are organized in my data.
DataSet A
# x=100 200 300
1 2 3
1.1 2.1 3.1
1.2 2.2 3.2
1 2 3
1.01 2.01 3.01
DataSet B
# x=100 200 300
6 7 9
6.1 7.1 9.1
6.2 7.2 9.2
6 7 9
6.01 7.01 9.01
I was able to get two graphs out of this data using:
set style fill solid 0.25 border -1
set style boxplot outliers pointtype 7
set style data boxplot
set xtics ('100' 1, '200' 2, '300' 3)
plot for [i=1:3] "A.txt" using (i):i notitle
plot for [i=1:3] "B.txt" using (i):i notitle
However, I am facing issues when combining it into one.
Please help.
If you want to have them stacked above each other (in case they don't overlap), then you can just combine the two plot into one with
plot for [i=1:3] "A.txt" using (i):i notitle,\
for [i=1:3] "B.txt" using (i):i notitle
If they can overlap, you may want to put them side-by-side with
set boxwidth 0.3
plot for [i=1:3] "A.txt" using (i-0.15):i notitle,\
for [i=1:3] "B.txt" using (i+0.15):i notitle
Just to give two example of how you could combine those plots.

Plot data points with connecting lines but which leave gaps

I like following linespoints plotting style:
http://www.gnuplotting.org/join-data-points-with-non-continuous-lines/
However, I have encountered an issue when I plot several lines with this style:
As you can see the second series of points blank-out also the first series (lines and points), what I don't want to happen.
Feature of gnuplot which makes this possible is pointinterval and pointintervalbox.
Documentation of gnuplot:
A negative value of pointinterval, e.g. -N, means that point symbols
are drawn only for every Nth point, and that a box (actually circle)
behind each point symbol is blanked out by filling with the background
color. The command set pointintervalbox controls the radius of this
blanked-out region. It is a multiplier for the default radius, which
is equal to the point size.
http://www.bersch.net/gnuplot-doc/set-show.html#set-pointintervalbox
Since the doc says, fill with background color I was hoping using a transparent background the issue could be resolved, but it seems to be that the color white is used.
Gnuplot version
gnuplot> show version long
G N U P L O T
Version 5.0 patchlevel 0 last modified 2015-01-01
Copyright (C) 1986-1993, 1998, 2004, 2007-2015
Thomas Williams, Colin Kelley and many others
gnuplot home: http://www.gnuplot.info
faq, bugs, etc: type "help FAQ"
immediate help: type "help" (plot window: hit 'h')
Compile options:
-READLINE +LIBREADLINE +HISTORY
-BACKWARDS_COMPATIBILITY +BINARY_DATA
+GD_PNG +GD_JPEG +GD_TTF +GD_GIF +ANIMATION
-USE_CWDRC +HIDDEN3D_QUADTREE
+DATASTRINGS +HISTOGRAMS +OBJECTS +STRINGVARS +MACROS +THIN_SPLINES +IMAGE +USER_LINETYPES +STATS +EXTERNAL_FUNCTIONS
Minimal Working Example (MWE):
gnuplot-space-line-mark-style.gp
reset
set terminal pngcairo transparent size 350,262 enhanced font 'Verdana,10'
show version
set output 'non-continuous_lines.png'
set border linewidth 1.5
set style line 1 lc rgb '#0060ad' lt 1 lw 2 pt 7 pi -1 ps 1.5
set style line 2 lc rgb '#0020ad' lt 1 lw 2 pt 7 pi -1 ps 1.5
set pointintervalbox 3
unset key
set ytics 1
set tics scale 0.75
set xrange [0:5]
set yrange [0:4]
plot 'plotting_data1.dat' with linespoints ls 1,\
'plotting_data2.dat' with linespoints ls 2
plotting_data1.dat
# X Y
1 2
2 3
3 2
4 1
plotting_data2.dat
# X Y
1.2 2.4
2 3.5
3 2.5
4 1.2
UPDATE
A working pgfplots solution is given on tex.stackoverflow.com
You can do a lot with gnuplot. It's just a matter of how complicated you allow it to get.
You can realize the gap by a two step plotting. First: only with points and second: with vectors which are lines between the points shortened by performing a bit of geometry calculations.
The parameter L1 determines the gap and needs to be adjusted to the data and graph scale. Tested with gnuplot 5.0 and 5.2.
Revised version:
Here is the version which creates gaps independent of the terminal size and the graph scale. It just requires bit more scaling. However, since it requires the size of terminal and graph which are stored in GPVAL_...-variables which you only get after plotting, therefere the procedure unfortunately requires replotting.
I'm not sure whether this works for all terminals. I just tested on a wxt terminal.
Empirical findings (for wxt-terminal on Win7):
pointsize 100 (ps) corresponds to 600 pixels (px), hence: Rpxps=6 (ratio pixel to pointsize )
term size 400,400 (px) corresponds to 8000,8000 terminal units (tu), hence: Rtupx=20 (ratio terminal units to pixels)
Edit: the factor Rtupx apparently is different for different terminals: wxt: 20, qt: 10, pngcairo: 1, you could use the variable GPVAL_TERM for checking the terminal.
Rtupx = 1. # for pngcairo terminal 1 tu/px
if (GPVAL_TERM eq "wxt") { Rtupx = 20. } # 20 tu/px, 20 terminal units per pixel
if (GPVAL_TERM eq "qt") { Rtupx = 10. } # 10 tu/px, 10 terminal units per pixel
The ratios of axis units (au) to terminal units (tu) are different for x and y and are:
Rxautu = (GPVAL_X_MAX-GPVAL_X_MIN)/(GPVAL_TERM_XMAX-GPVAL_TERM_XMIN)
Ryautu = (GPVAL_Y_MAX-GPVAL_Y_MIN)/(GPVAL_TERM_YMAX-GPVAL_TERM_YMIN)
The variable GapSize is given in pointsize units. Actually, the real gap size depends on the pointsize (and also linewidth of the line). For simplicity, here gap size means the distance from the center of the point to where the line starts. So, GapSize=1.5 when having pointsize 1.5 will result in a gap of 0.75 on each side. L3(n) from the earlier version is now replaced by L3px(n) in pixel dimensions and L1 from the earlier version is not needed anymore.
Code:
### "linespoints" with gaps between lines and points
reset session
$Data1 <<EOD
# X Y
0 3
1 2
1.5 1
3 2
4 1
EOD
$Data2 <<EOD
0 0
1 1
2 1
2 2
3 1
3.98 0.98
EOD
GapSize = 1.5
Rtupx = 20. # 20 tu/px, 20 terminal units per pixel
Rpxps = 6. # 6 px/ps, 6 pixels per pointsize
# Ratio: axis units per terminal units
Rxautu(n) = (GPVAL_X_MAX-GPVAL_X_MIN)/(GPVAL_TERM_XMAX-GPVAL_TERM_XMIN)
Ryautu(n) = (GPVAL_Y_MAX-GPVAL_Y_MIN)/(GPVAL_TERM_YMAX-GPVAL_TERM_YMIN)
dXpx(n) = (x3-x0)/Rxautu(n)/Rtupx
dYpx(n) = (y3-y0)/Ryautu(n)/Rtupx
L3px(n) = sqrt(dXpx(n)**2 + dYpx(n)**2)
x1px(n) = dXpx(n)*GapSize*Rpxps/L3px(n)
y1px(n) = dYpx(n)*GapSize*Rpxps/L3px(n)
x2px(n) = dXpx(n)*(L3px(n)-GapSize*Rpxps)/L3px(n)
y2px(n) = dYpx(n)*(L3px(n)-GapSize*Rpxps)/L3px(n)
x1(n) = x1px(n)*Rtupx*Rxautu(n) + x0
y1(n) = y1px(n)*Rtupx*Ryautu(n) + y0
x2(n) = x2px(n)*Rtupx*Rxautu(n) + x0
y2(n) = y2px(n)*Rtupx*Ryautu(n) + y0
set style line 1 pt 7 ps 1.5 lc rgb "black"
set style line 2 lw 2 lc rgb "black
set style line 3 pt 7 ps 1.5 lc rgb "red"
set style line 4 lw 2 lc rgb "red"
plot \
$Data1 u (x3=NaN, y3=NaN,$1):2 w p ls 1 notitle, \
$Data1 u (y0=y3,y3=$2,x0=x3,x3=$1,x1(0)):(y1(0)): \
(x2(0)-x1(0)):(y2(0)-y1(0)) w vectors ls 2 nohead notitle, \
$Data2 u (x3=NaN, y3=NaN,$1):2 w p ls 3 notitle, \
$Data2 u (y0=y3,y3=$2,x0=x3,x3=$1,x1(0)):(y1(0)): \
(x2(0)-x1(0)):(y2(0)-y1(0)) w vectors ls 4 nohead notitle
replot
### end of code
Result: (two different terminal sizes)
Explanations:
Question: Why is there the argument (n) for L3(n), x1(n), y1(n), x2(n), y2(n)?
n is always 0 when L3(n),... are computed and is not used on the right hand side.
Answer:
To make them non constant-expressions. Alternatively, one could
add x0,x3,y0,y3 as variables, e.g. L3(x0, y0, x3, y3); however, the
compactness would be lost.
Question: What does the using part in plot $Data1 using (x3=NaN,y3=NaN,$1):2 mean?
Answer:
(,) is called a serial evaluation which is documented under the
section Expressions > Operator > Binary in the gnuplot documentation
(only v4.4 or newer).
Serial evaluation occurs only in parentheses and is guaranteed to
proceed in left to right order. The value of the rightmost subexpression
is returned.
This is done here for the initialialization of (x3,y3) for the
subsequent plot of the line segments as vectors. It is irrelevant for
the plotting of points.
Question: How does this draw N-1 segments/vectors for N points?
Answer:
Setting x3=NaN, y3=NaN when plotting points ensures that for the
first data point the initial data point (x0,y0) is set to (NaN,NaN)
which has the consequence that the evaluation of x1(0) and y1(0) also returns NaN.
Gnuplot in general skips points with NaN, i.e. for the first
data point no vector is drawn. The code draws the line between the
first and second point when the iteration reaches the second point.
Question: How does the second plot '' u ... iterates over all points?
Answer:
gnuplot> h special-filenames explains this:
There are a few filenames that have a special meaning: '', '-', '+' and '++'.
The empty filename '' tells gnuplot to re-use the previous input file in the
same plot command. So to plot two columns from the same input file:
plot 'filename' using 1:2, '' using 1:3
Question: Do we need the parentheses around (y1(0))?
Answer: gnuplot> h using explains this:
Each may be a simple column number that selects the value from one
field of the input file, a string that matches a column label in the first
line of a data set, an expression enclosed in parentheses, or a special
function not enclosed in parentheses such as xticlabels(2).

Gnuplot: plotting only specific values of the dataset using lines or linespoints

Let's suppose my dataset is like this:
0 0.3
1 0.12
2 0.4
3 0.6
4 0.9
...
10 0.23
11 0.6
...
20 0.34
21 0.4
...
and I'd like to plot values of both columns only if $1 % 10 == 0, i.e., (0,0.3), (10,0.23), (20,0.34) and so on... Now, I've written the following conditional script:
plot "data.csv" using 1:(int($1)%10==0?$2:0/0) title 'r=1' with linespoints linewidth 3 linecolor rgb 'blue'
The problem is that lines are not shown, but only points.
This is because, for all rows where the condition is not satisfied, the corresponding value is undefined. Anyway, what I need is quite different; I want those specific values to be just ignored, not to set to undefined. Is there a way to do that just using gnuplot (not awk and so on)?
In case you have all intermediate steps, i.e., full data for number (which is suspect based on your axis labelled iterations), you best use the every option
plot "data.csv" every 10 with linespoints
Otherwise, I would use awk inside your script for simplicity
plot "<awk '$1%10==0' data8" with linespoints
The probably with your original script, is that the points are shown, but with value infinity. It is a feature that these lines are not shown.

Gnuplot---clustered rowstacked bars

How can I make clustered rowstacked bars in gnuplot? It know how to get a clustered bars, but
not a cluster of rowstacked bars. Thanks!
Edit: in a cluster, stacked bars should use different colors/patterns as well.
I'm not completely sure how to go about doing this, but, one idea is to make it so that the boxes are touching each other
`set boxwidth 1`
That doesn't quite get you a "clustered" look yet -- To get a clustered look, I think you'd need to insert a row (maybe column) of zeros...(I haven't sorted through that one in my head yet) into your datafile where you want a cluster break.
Of course, you wouldn't need to set the boxwidth either I suppose...clustered just depends on the breaking every once in a while...
If I understand the original post right, it should be easy to accomplish with gnuplot if you can preprocess your data to offset x coordinates of specific data series.
To illustrate the approach I will use the following data in 3 data series:
# impulse.dat
0.9 1
1.9 4
2.9 3
3.9 5
1.0 1
2.0 2
3.0 4
4.0 2
1.1 3
2.1 3
3.1 5
4.1 4
Here each series has x-coordinates shifted by .1. To plot it I choose impulses of width 10.
plot [0:5] [0:6] 'impulse.dat' ind 0 w imp lw 10, \
'impulse.dat' ind 1 w imp lw 10, \
'impulse.dat' ind 2 w imp lw 10
Edit: to combine this with Matt's suggestion to use boxes would definitely be better:
set boxwidth 0.1
set fill solid
plot [0:5] [0:6] 'impulse.dat' ind 0 w boxes,\
'impulse.dat' ind 1 w boxes, \
'impulse.dat' ind 2 w boxes
Following is the picture with impulses.

Resources