How to turn the following tabular dataset into a simple 2D density plot to show a loc-number distribution?
I am new to gnuplot. Attempted a tutorial. A simple x,y plot with multiple columns of data, the plot is fine of course. Then tried this answer.. However I encountered the following issue, though x values are defined. I am guessing fundamentally my data set is lacking?(!).. what am I not doing right here? How to achieve a simple 2D contour from below data?
Updating based on recommended suggestions while OP aim remains intact.
Following is the input sample data used. File is single-space delimited. x = x, y=y, z1 = locid (1 to n) or z2=loctype (scuba, shower, swimming, restrooms, sushi, cafe, restaurant, etc)
input data :
ametype amename X(1000) Y1000) km-to-carpark
Scuba SCUB1 10.72 49.01
Scuba SCUB2 13.88 47.32
Scuba SCUB3 14.58 46.46
Scuba SCUB4 14.52 48.23
Scuba SCUB5 13.05 47.23
Scuba SCUB6 12.21 47.95
Scuba SCUB7 12.66 46.19
Cafe CAFE1 13.97 47.45
Cafe CAFE4 31.63 30.3
Playground PARK2 31.57 30.2
Playground PARK1 27.51 31.87
Cafe CAFE5 67.71 109.09
Scuba SCUB8 68.58 109.54
Scuba SCUB9 67.14 109.99
Cafe CAFE2 13.83 46.24
SUSHI SUSH1 79.59 41.22
SUSHI SUSHI2 73.81 54.14
SUSHI SUSHI3 72.87 55.47
SUSHI SUSHI4 75.05 56.51
RESTROOM RESTR1 74.1 56.05
RESTROOM RESTR2 74.96 57.9
RESTROOM RESTR3 75.06 55.59
RESTAURANT RESTAU1 76.57 56.33
RESTAURANT RESTAU1 76.95 55.1
RESTAURANT RESTAU2 77.75 54.69
RESTAURANT RESTAU2 76.15 54.34
code tried for a different dataset where x,y weren't coordinates;
set view map
set contour
set isosample 250, 250
set cntrparam level incremental 1, 0.1
set palette rgbformulae 33,13,10
splot 'data.dat' with lines nosurface
#splot for [col=1:10] ‘data.dat’ u ($1):(column(col) > 2 ? 1/0 : column(col)):3
errors:
1) All points x value undefined
2) Tabular output of this 3D plot style not implemented
updated:
a) increased data points
c) a possible chicken scratch to give simple impression.
Expecting a distribution density map like this.
This is an interesting plotting challenge.
The input data format is also straightforward, but needs some processing until the desired contour lines can be plotted with gnuplot.
Comments:
The data is all in one file. Data entries for the types can be random, no order necessary.
the example below will create some random test data with "Cafe, Scuba, Sushi" and 50 entries of each. Skip this part if you want to use your own file.
the further lines of the script, have no idea about the content of the test data file (i.e. how many types, type names, coordinates, etc.), all will be determined automatically.
create a unique list of types. The list will be in the order of first occurrence.
define a grid (here dx=0.2, dy=0.2, i.e. reasonable values within the data range) and count for each grid point the occurrences for each type within a certain radius (here: 0.5). Calculate the density by dividing the count by the unit area (area of the circle).
for each type create the contour lines via plotting to a file indexed by a two digit number. So far, I don't know how one would easily write this into indexed datablocks to avoid files on disk.
finally, plot the contour line files and the original data points by using a filter to get the right color.
One thing which I haven't figured out yet is set cntrparam level 2: I would like to have exactly 2 contour lines per type, but it seems gnuplot still uses the option set cntrparam level auto 2 and adjusts the number of levels itself.
As you can imagine this graph will probably look pretty confusing with 10 or more types.
For sure, there is room for improvement and no guarantee that there are no bugs in this script. Look at it as a starting point for further optimization. Suggestions for improvements are welcome!
Script:
### plot density contours from simple x,y location file
reset session
FILE = "SO73244095.dat"
# create some random test data
myTypes = "Cafe Scuba Sushi"
set print FILE
do for [p=1:words(myTypes)] {
a = word(myTypes,p)
x0 = rand(0)*5
y0 = rand(0)*5
do for [i=1:20] {
print sprintf("%s %s%d %.3g %.3g",a,a,i,invnorm(rand(0))+x0,invnorm(rand(0))+y0)
}
}
set print
# create a unique list of types
# and extract min, max data
addToList(list,col) = list.(_s='"'.strcol(col).'"', strstrt(list,_s)>0 ? '' : _s)
myTypes = ''
myType(i) = word(myTypes,i)
stats FILE u (myTypes=addToList(myTypes,1),$3):4 name "DATA" nooutput
Nt = words(myTypes)
print sprintf("%d types found: %s",Nt,myTypes)
# get densities for each type
dx = 0.2 # adjust the grid as you like...
dy = 0.2 # ... time for graph creation will increase with finer grid
Radius = 0.5 # adjust radius to a reasonable value
Nx = ceil((DATA_max_x-DATA_min_x)/dx)
Ny = ceil((DATA_max_y-DATA_min_y)/dy)
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
print "Please wait..."
set print $Densities
do for [nt=1:Nt] {
do for [ny=0:Ny] {
do for [nx=0:Nx] {
c = 0
x = DATA_min_x+nx*dx
y = DATA_min_y+ny*dy
stats FILE u (Dist(x,y,$3,$4)<=Radius && (strcol(1) eq word(myTypes,nt)) ? c=c+1 : 0) nooutput
d = c / (pi * Radius**2) # density per unit area
print sprintf("%g %g %g",x,y,d)
}
print "" # empty line
}
print ""; print "" # two empty lines
}
set print
# get contour lines via splot into files
myContFile(n) = sprintf("%s.cont%02d",FILE,n)
unset surface
set contour
set cntrparam cubicspline levels 2 # cubicspline for "nice" round curves
do for [nt=1:Nt] {
set table myContFile(nt)
splot $Densities u 1:2:3 index nt-1
unset table
}
# set size ratio -1 # uncomment if equal x,y scale is important
set grid x,y
set key out noautotitle
set xrange[:] noextend
set yrange[:] noextend
set colorsequence classic
myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN
plot for [i=1:Nt] myContFile(i) u 1:2 w l lc i, \
for [i=1:Nt] FILE u 3:(myFilter(4,1,myType(i))) w p pt 7 lc i ti myType(i)
### end of script
Result: (a few random examples)
I have plotted my data on linear scale in xmgrace by using these numbers:
0.001 0
0.00589391 0.10
0.155206 0.20
0.294695 0.30
0.43222 0.40
0.436149 0.50
0.489194 0.60
0.611002 0.70
0.860511 0.80
0.939096 0.90
0.964637 1
1 1
I have use xmgrace in Ubuntu to plot my date and calculate area under the curve (AUC; Data ->Transformation -> Integration-> SumOnly).
After converting linear curve to the logarithmic one, I am having a problem with calculating area under logarithmic curve.
Has anybody else encountered similar issue?
When you set the axis scale to "logarithmic" you are not actually changing your data, just the way you display it. Therefore, since data transformations such as integration act on the actual data you have, the result is bound to be the same.
In other words, you are integrating f(x) regardless of the scale of the axes. If you want to integrate log(f(x)) you have to first convert f(x) to log(f(x)) by using the Data -> Transformation -> Expression, writing something like y = ln(y) and pressing "apply". Be careful though: the first point (which has y = 0) will get an "inf". You'll need to get rid of it manually (double click on a set, select the first row and use edit -> delete) or don't use exactly 0 in your dataset. If you want to convert also the x axis then open the same "Expression" window and write x = ln(x). Integrate the new dataset and you should get the right number (I got -7.9 I think).
I have a data file, the data for y axis are in the third column. I would like to have the scale given by the first column on the x1 and by the second column on the x2. The standard way would be to:
plot data u 1:2 axes x1y1, data u 1:3 x2y1
But that creates two plots which is something I want to avoid. Of course one could make the above work with colours or with some other dirty tricks. It makes the whole plot code very cumbersome. Another nice way is to use multiplot as suggested here. But this is not really my goal, as I want to have the the real x2 axis.
Another way that came to my mind was to set x2range but that means going to the source file and figuring out the min and max or using some statistics in gnuplot (which feels like a waste of time for such a simple thing).
Is there any more simple and elegant way than the above ones? (I am especially concerned about the solution to be short to write, the plot can consist of several (>5) datasets and doing and I want to avoid plotting each dataset twice.
This can be done in this way, by telling gnuplot to re-scan file with 2nd column as x2 values but only invalid y-values for this second plot:
set xtics nomirror
set xrange [:] noextend
set x2tics
set x2range [:] noextend
plot '/tmp/f.gdat' u 1:3 w l, '' u 2:(1/0) ax x2y1
As an example, you can plot this data with Celsius on x and Fahrenheit on x2:
0 32 0
30 86 1
60 140 2
90 194 3
Note that this will only be sensible if column 2 is affinely linked with column 1. If you know the affine relation, using set link is much better.
I am thinking which way to do the addition of errorbars better by thinking the format of data.
The standard way of adding errors bars is discussed here, for instance.
My original data is in ranges
Model Decreasing Constant Increasing
2025 73-78 80-85 87-92
2035 63-68 80-85 97-107
2050 42-57 75-90 104.5-119.5
where the values are ranges.
I cannot plot directly in Gnuplot so I have to split it to averages and to error values in two files:
Averages:
Model Decreasing Constant Increasing
2025 75.5 82.5 89.5
2035 65.5 82.5 102
2050 49.5 82.5 112
and error configuration in ybar
Model Decreasing Constant Increasing
2025 2.5 2.5 2.5
2035 2.5 2.5 5
2050 7.5 7.5 7.5
I normally plot data like this as a one file
plot for [i=2:4] 'data.dat' using 1:i w linespoints
but now I should go through two files at the same time while doing the plot.
The normal syntax of plotting errorbars is
plot 'data' using 1:2:0:($1+$3):4:5 with yerrorlines
and manual here.
How can you plot from two files with errorbars in Gnuplot?
Feel free to propose if you know better way to do the addition of these errorbars in gnuplot.
Output to Cristoph's answer
where error bars missing in the first and third points.
Gnuplot 5 supports that you specify several characters as data file separators.
So, if you are sure you'll never get negative values (which I hopen given the format of your data), then you can use your original data file and set both white space and hyphen as datafile separator:
set datafile separator " -"
plot for [i=2:6:2] "data" using 1:(0.5*(column(i)+column(i+1))):(0.5*(column(i+1)-column(i))) with yerrorlines
First of all, I wonder about your columns used for plotting with yerrorlines. If your first data for 2025 is 75.5+/-2.5, you usually plot it with
plot "datafile" using <xcolum>:<ycolum>:<yerrorcolumn>
Your six columns are for the case of xy errorbars and specify the point itself and the lower and upper absolute values in x and y. But may be you are just doing it as you need it...
Now back to your question:
Gnuplot can not handle data from two files simultaneously, i.e. it can not take xy-values from one file and y-errors from another.
If you're running linux, the command line tool join can help.
Your averages stored in file A and the errors in file B, join A B will concatenate lines with the same value in the first colum like this:
2025 75.5 82.5 89.5 2.5 2.5 2.5
So,
plot "<join A B" using 1:2:5 with yerrorlines
should do the job. ("<join A B" will call the join command in the background and read its output like a data file)
I am working on gnuplot linepoints to create a comulative and normal distribution graph. I have created a file to provide the information to both graphs.
I got a problem when I was trying to plot the last data.
Here is the my script to create the second graph.
plot.plt
set term pos eps
set style data linespoints
set style line 1 lc 8 lt -1
set size 1,1
set yr [0:20]
set key below
set grid
set output 'output.eps'
plot "<awk '{i=i+$3; print $1,i}' data.dat" smooth cumulative t 'twitter' ls 1
data.dat
5.0 1 0.10
9.0 5 0.20
13.0 7 0.30
14.0 1 0.20
15.0 9 0.20
I want to create x axis with the first column and y axis with the last column. so the y axis range must between 0 to 1. which part should I change? thanks
Using smooth cumulative is enough, no need for awk. You are doing the same operation twice, once with gnuplot and once with awk. Simply do
plot 'data.dat' using 1:3 smooth cumulative