How to turn the following tabular dataset into a simple 2D density plot to show a loc-number distribution?
I am new to gnuplot. Attempted a tutorial. A simple x,y plot with multiple columns of data, the plot is fine of course. Then tried this answer.. However I encountered the following issue, though x values are defined. I am guessing fundamentally my data set is lacking?(!).. what am I not doing right here? How to achieve a simple 2D contour from below data?
Updating based on recommended suggestions while OP aim remains intact.
Following is the input sample data used. File is single-space delimited. x = x, y=y, z1 = locid (1 to n) or z2=loctype (scuba, shower, swimming, restrooms, sushi, cafe, restaurant, etc)
input data :
ametype amename X(1000) Y1000) km-to-carpark
Scuba SCUB1 10.72 49.01
Scuba SCUB2 13.88 47.32
Scuba SCUB3 14.58 46.46
Scuba SCUB4 14.52 48.23
Scuba SCUB5 13.05 47.23
Scuba SCUB6 12.21 47.95
Scuba SCUB7 12.66 46.19
Cafe CAFE1 13.97 47.45
Cafe CAFE4 31.63 30.3
Playground PARK2 31.57 30.2
Playground PARK1 27.51 31.87
Cafe CAFE5 67.71 109.09
Scuba SCUB8 68.58 109.54
Scuba SCUB9 67.14 109.99
Cafe CAFE2 13.83 46.24
SUSHI SUSH1 79.59 41.22
SUSHI SUSHI2 73.81 54.14
SUSHI SUSHI3 72.87 55.47
SUSHI SUSHI4 75.05 56.51
RESTROOM RESTR1 74.1 56.05
RESTROOM RESTR2 74.96 57.9
RESTROOM RESTR3 75.06 55.59
RESTAURANT RESTAU1 76.57 56.33
RESTAURANT RESTAU1 76.95 55.1
RESTAURANT RESTAU2 77.75 54.69
RESTAURANT RESTAU2 76.15 54.34
code tried for a different dataset where x,y weren't coordinates;
set view map
set contour
set isosample 250, 250
set cntrparam level incremental 1, 0.1
set palette rgbformulae 33,13,10
splot 'data.dat' with lines nosurface
#splot for [col=1:10] ‘data.dat’ u ($1):(column(col) > 2 ? 1/0 : column(col)):3
errors:
1) All points x value undefined
2) Tabular output of this 3D plot style not implemented
updated:
a) increased data points
c) a possible chicken scratch to give simple impression.
Expecting a distribution density map like this.
This is an interesting plotting challenge.
The input data format is also straightforward, but needs some processing until the desired contour lines can be plotted with gnuplot.
Comments:
The data is all in one file. Data entries for the types can be random, no order necessary.
the example below will create some random test data with "Cafe, Scuba, Sushi" and 50 entries of each. Skip this part if you want to use your own file.
the further lines of the script, have no idea about the content of the test data file (i.e. how many types, type names, coordinates, etc.), all will be determined automatically.
create a unique list of types. The list will be in the order of first occurrence.
define a grid (here dx=0.2, dy=0.2, i.e. reasonable values within the data range) and count for each grid point the occurrences for each type within a certain radius (here: 0.5). Calculate the density by dividing the count by the unit area (area of the circle).
for each type create the contour lines via plotting to a file indexed by a two digit number. So far, I don't know how one would easily write this into indexed datablocks to avoid files on disk.
finally, plot the contour line files and the original data points by using a filter to get the right color.
One thing which I haven't figured out yet is set cntrparam level 2: I would like to have exactly 2 contour lines per type, but it seems gnuplot still uses the option set cntrparam level auto 2 and adjusts the number of levels itself.
As you can imagine this graph will probably look pretty confusing with 10 or more types.
For sure, there is room for improvement and no guarantee that there are no bugs in this script. Look at it as a starting point for further optimization. Suggestions for improvements are welcome!
Script:
### plot density contours from simple x,y location file
reset session
FILE = "SO73244095.dat"
# create some random test data
myTypes = "Cafe Scuba Sushi"
set print FILE
do for [p=1:words(myTypes)] {
a = word(myTypes,p)
x0 = rand(0)*5
y0 = rand(0)*5
do for [i=1:20] {
print sprintf("%s %s%d %.3g %.3g",a,a,i,invnorm(rand(0))+x0,invnorm(rand(0))+y0)
}
}
set print
# create a unique list of types
# and extract min, max data
addToList(list,col) = list.(_s='"'.strcol(col).'"', strstrt(list,_s)>0 ? '' : _s)
myTypes = ''
myType(i) = word(myTypes,i)
stats FILE u (myTypes=addToList(myTypes,1),$3):4 name "DATA" nooutput
Nt = words(myTypes)
print sprintf("%d types found: %s",Nt,myTypes)
# get densities for each type
dx = 0.2 # adjust the grid as you like...
dy = 0.2 # ... time for graph creation will increase with finer grid
Radius = 0.5 # adjust radius to a reasonable value
Nx = ceil((DATA_max_x-DATA_min_x)/dx)
Ny = ceil((DATA_max_y-DATA_min_y)/dy)
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
print "Please wait..."
set print $Densities
do for [nt=1:Nt] {
do for [ny=0:Ny] {
do for [nx=0:Nx] {
c = 0
x = DATA_min_x+nx*dx
y = DATA_min_y+ny*dy
stats FILE u (Dist(x,y,$3,$4)<=Radius && (strcol(1) eq word(myTypes,nt)) ? c=c+1 : 0) nooutput
d = c / (pi * Radius**2) # density per unit area
print sprintf("%g %g %g",x,y,d)
}
print "" # empty line
}
print ""; print "" # two empty lines
}
set print
# get contour lines via splot into files
myContFile(n) = sprintf("%s.cont%02d",FILE,n)
unset surface
set contour
set cntrparam cubicspline levels 2 # cubicspline for "nice" round curves
do for [nt=1:Nt] {
set table myContFile(nt)
splot $Densities u 1:2:3 index nt-1
unset table
}
# set size ratio -1 # uncomment if equal x,y scale is important
set grid x,y
set key out noautotitle
set xrange[:] noextend
set yrange[:] noextend
set colorsequence classic
myFilter(colD,colF,valF) = strcol(colF) eq valF ? column(colD) : NaN
plot for [i=1:Nt] myContFile(i) u 1:2 w l lc i, \
for [i=1:Nt] FILE u 3:(myFilter(4,1,myType(i))) w p pt 7 lc i ti myType(i)
### end of script
Result: (a few random examples)
Related
I have an unsorted data set of two columns with most of the points aligning diagonally along y=x, however some points misalign.
I would like to show that most of the points actually do align along the function, however just pointplotting would just overlap the over-represented points to one. The viewer would then get the impression that the data points are actually scattered randomly because there is no weight to the occurrence count.
Is there a way to implement a weight to the points that occur more than once - maybe through point size? Couldnt find anything on this topic.
Thanks a lot!
You don't show data so I assumed something from your description. As #Christoph already mentioned you could use jitter or transparency to indicate that there are many more datapoints more or less at the same location. However, transparency is limited to 256 values (actually, 255 because fully transparent you won't see). So, in extreme case, if you have more than 255 points on top of each other you won't see a difference to 255 points on top of each other.
Basically, you're asking for displaying the density of points. This reminds me to this question: How to plot (x,y,z) points showing their density
In the example below a "pseudo" 2D-histogram is created. I'm not aware of 2D-histograms in gnuplot, so you have to do it as 1D-histogram mapping it onto 2D. You divide the plot into fields and count the occurrence of point in each field. This number you use either for setting the point variable color via palette or for variable pointsize.
The code example will generate 5 ways to plot the data:
solid points
empty points
transparent points
colored points
sized points (your question)
I leave it up to you to judge which way is suitable. Certainly it will depend pretty much on the data and your special case.
Code:
### different ways to show density of datapoints
reset session
# create some random test data
set print $Data
do for [i=1:1000] {
x=invnorm(rand(0))
y=x+invnorm(rand(0))*0.05
print sprintf("%g %g",x,y)
}
do for [i=1:1000] {
x=rand(0)*8-4
y=rand(0)*8-4
print sprintf("%g %g",x,y)
}
set print
Xmin=-4.0; Xmax=4.0
Ymin=-4.0; Ymax=4.0
BinXSize = 0.1
BinYSize = 0.1
BinXCount = int((Xmax-Xmin)/BinXSize)+1
BinYCount = int((Ymax-Ymin)/BinYSize)+1
BinXNo(x) = floor((x-Xmin)/BinXSize)
BinYNo(y) = floor((y-Ymin)/BinYSize)
myBinNo(x,y) = (_tmp =BinYNo(y)*BinXCount + BinXNo(x), \
_tmp < 0 || _tmp > BinXCount*BinYCount-1 ? NaN : int(_tmp+1))
# get data into 1D histogram
set table $Bins
plot [*:*][*:*] $Data u (myBinNo($1,$2)):(1) smooth freq
unset table
# initialize array all values to 0
array BinArr[BinXCount*BinYCount]
do for [i=1:BinXCount*BinYCount] { BinArr[i] = 0 }
# get histogram values into array
set table $Dummy
plot myMax=NaN $Bins u ($2<myMax?0:myMax=$2, BinArr[int($1)] = int($2)) w table
unset table
myBinValue(x,y) = (_tmp2 = myBinNo(x,y), _tmp2>0 && _tmp2<=BinXCount*BinYCount ? BinArr[_tmp2] : NaN)
# point size settings
myPtSizeMin = 0.0
myPtSizeMax = 2.0
myPtSize(x,y) = myBinValue(x,y)*(myPtSizeMax-myPtSizeMin)/myMax*myPtSizeMax + myPtSizeMin
set size ratio -1
set xrange [Xmin:Xmax]
set yrange [Ymin:Ymax]
set key top center out opaque box
set multiplot layout 2,3
plot $Data u 1:2 w p pt 7 lc "red" ti "solid points"
plot $Data u 1:2 w p pt 6 lc "red" ti "empty points"
plot $Data u 1:2 w p pt 7 lc "0xeeff0000" ti "transparent points"
set multiplot next
plot $Data u 1:2:(myBinValue($1,$2)) w p pt 7 ps 0.5 palette z ti "colored points"
plot $Data u 1:2:(myPtSize($1,$2)) w p pt 7 ps var lc "web-blue" ti "sized points"
unset multiplot
### end of code
Result:
I want to create a streamline like arrow lines in Gnuplot,I already have the data points that I needed, so I think my problem is not the same as this post says and different from this post because I have already obtain the data needed for stramlines.
What I have done is like this:
So the red lines are vectors show flow field and green line is streamlines to guide the readers the direction of the flux. And all the large blue arrows are my aim to be plotted in GNUPLOT. I have kown how to plot middle arrows as this post has shown but what code I need to do if I want to plot more arrows along the lines?
To be more detailed, How can I plot like this:
I supply my data file here :
velocity.txt is for vector flow field data as "index,X,Y,vx,vy,particle-numbers"
line.txt is for streamline data as "X,Y"
and My gnu file is bleow:
set terminal postscript eps size 108,16 enhanced font "Arial-Bold,100"
set output 'vector.eps'
unset key
set tics
set colorbox
set border 0
set xtics 2
#set xlabel 'x'
#set ylabel 'y'
set xrange [0:108]
set yrange [0:16]
#set cbrange [0:40]
set nolabel
set style line 4 lt 2 lc rgb "green" lw 2
plot 'velcoity.txt' u 2:3:(250*$4):(250*$5) with vectors lc 1,'line.txt' u 1:2 ls 4
Thank you!
To plot arrows along a line you can again use the vectors plotting style like you do already for the stream field.
But to get a proper plot you must consider several points:
Usually gnuplot limits the size of the arrow heads to a fraction of the arrow length. So, if you want to plot a continuous line with arrows heads, the arrows themselves should have a very short length. To avoid downscaling of the arrow heads, use the size ... fixed option, which is available only since version 5.0
You have only the trajectory, x and y values, of the line. To extract the arrow direction, the simplest approach would be to use the difference between two neighbouring points (or at a distance of two or three points).
You can extract these differences in the using statement. As pseudo code, one could do the following:
if rownumber modulo 10 == 0:
save x and y values
else if rownumber modulo 10 == 1:
draw arrow from previous point to current point, only with a head
else
ignore the point.
Putting this pseudo-code in the using statement gives the following:
ev = 10
avg = 1
sc = 0.1
plot 'line.txt' u (prev_x = (int($0)%ev == 0 ? $1 : prev_x), prev_y = (int($0)%ev == 0 ? $2 : prev_y), int($0)%ev == avg ? $1 : 1/0):2:(sc*(prev_x-$1)):(sc*(prev_y-$2)) w vectors backhead size 2,20,90 fixed ls 4
To make things more flexible, I introduced some variables: ev tells you the difference count between two arrows heads, avg the distance between two points used to calculate the arrow direction, and sc the length of the arrow shaft.
As further improvement you can use the length of the stream field arrows to colour the stream field vectors. This gives the following script
reset
unset key
set tics
set colorbox
set border 0
set xtics 2
set autoscale xfix
set autoscale yfix
set autoscale cbfix
set style line 4 lt 2 lc rgb "green" lw 2
ev=30
avg=3
sc=0.1
field_scale=500
plot 'velcoity.txt' u 2:3:(field_scale*$4):(field_scale*$5):(sqrt($4**2+$5**2)) with vectors size 1,15,45 noborder lc palette,\
'line.txt' u 1:2 ls 4 w l,\
'' u (prev_x = (int($0)%ev == 0 ? $1 : prev_x), prev_y = (int($0)%ev == 0 ? $2 : prev_y), int($0)%ev == avg ? $1 : 1/0):2:(sc*(prev_x-$1)):(sc*(prev_y-$2)) w vectors backhead size 2,20,90 fixed ls 4
With the result (qt terminal):
I have a data file with x, y, and z datums--basically x,y locations with
z representing attenuation at that location.
The answer to a question like Line plot in GnuPlot where line color is a third column in my data file? using
palette defined (with palette z) is very
close, except that each line segment is set to a single color along its length.
Is there a
way to have the Z value interpolated (linear is fine) along each segment, so
the attenuation values are a smooth gradient rather than jumping values
at each segment boundary?
You can use set dgrid3d to interpolate a given data set. Consider the data file test.dat with the content
1 2 1
2 3 2
1 1 2
Plot this with
set dgrid3d 30,30 splines
set ticslevel 0
set hidden3d
splot 'test.dat' matrix w l lc palette lw 3
to get
If this works also in your case depends on several other factors, like number of data points or if you don't want to create a new grid, but retain the original grid, and only smooth the colors. In the latter case you must write an external script to prepare your data in an appropriate way.
I have a data file with blocks of x/y values. Each block contains 16 lines with x/y pairs and each block represents those positions in a different time. http://pastebin.com/0teRrfRU
I want to plot the trajectory of a specific particle. To do that, I've written plot 'pos.dat' u 2:3 every ::n:0:n:i, where n is the n-th particle and i is the time up to which I want the trajectory plotted (I can then loop over the i to generate an animation).
This runs fine, but when I add w lines nothing gets plotted, and I don't understand why. Is there a way to plot this with lines? The only alternative I see is writing a script to parse the data file and generate a new one with only the values I want (effectively acting as every), but I don't want to do that if I can do it in Gnuplot.
After a closer look to your data, your case has some speciality.
Like in Plotting same line number of several blocks data with gnuplot you can plot the file into a table via with table which will remove the empty lines and hence lines will be connected.
However, some of your particles disappear on one side and re-appear on the opposite side. If you plot this with lines you will get a line through the whole graph which is certainly undesired. You can workaround this if you introduce a function Break() which returns NaN if the difference of two successive x- or y-values are larger than 90% (to be on the safe side) of the x- or y-range , respectively. The effect of NaN is that the line will interrupted.
Code: (works with gnuplot>=5.0.0 version at the time of OP's question)
### plotting trajectories
reset session
set term gif animate delay 3 size 400,400
set output "SO30744875.gif"
set size square
FILE = 'SO30744875.dat'
set key noautotitle
stats FILE u (N=column(-1),M=column(1),$2):3 nooutput
xrange = STATS_max_x-STATS_min_x
yrange = STATS_max_y-STATS_min_y
set table $Data
plot FILE u 1:2:3 w table
unset table
Break(col1,col2) = (x0=x1,x1=column(col1), y0=y1,y1=column(col2), \
abs(x1-x0)<0.9*xrange && abs(y1-y0)<0.9*yrange ? column(col2) : NaN)
do for [i=0:N] {
plot for [j=1:16] x1=y1=NaN $Data u 2:(Break(2,3))every M::j-1::(i+1)*M w l, \
FILE u 2:3 every :::i::i w p pt 7, \
FILE u 2:3:1 every :::i::i w labels offset 0.7,0.7
}
set output
### end of code
Result:
I have a data in file which I would like to plot using gnuplot. In the file, there are 3 data sets separated by two blank lines so that gnuplot can differentiate between the data sets by 'index'. I can plot three data sets separately via 'index' option of 'plot' command.
However, I am not sure how can I plot the data which is sum of 2nd column of all three data sets?
Note: all three data sets have same x data, i.e. 1st column
To do this the simplest thing would be to change your file format. Gnuplot manipulates columns pretty well. Since you are sharing the x data, you can change the file format to have four columns (assuming you are just plotting (x,y) data):
<x data> <y1 data> <y2 data> <y3 data>
and use a command like
plot 'data.dat' using 1:2 title 'data 1', \
'' u 1:3 t 'data 2', \
'' u 1:4 t 'data 3', \
'' u 1:($2+$3+$4) t 'sum of datas'
The dollar signs inside the parens in the using column specification allow you to add/subtract/perform other functions on columnar data.
This way your data file will also be smaller since you won't repeat the x data.
#Youjun Hu, never say that there is "no way" to do something with gnuplot. Most of the cases there is a way with gnuplot only, sometimes maybe not obvious or sometimes a bit cumbersome.
Data: SO16861334.dat
1 11
2 12
3 13
4 14
1 21
2 22
3 23
4 24
1 31
2 32
3 33
4 34
Code 1: (works with gnuplot 4.6.0, needs some adaptions for >=4.6.5)
In gnuplot 4.6.0 (version at the time of OP's question) there were no datablocks and no plot ... with table. The example below only works for 3 subdatasets, but could be adapted for other numbers. However, arbitrary large number of subdatasets will be difficult with this approach.
### calculate sum from 3 different (sub)datasets, gnuplot 4.6.0
reset
FILE = "SO16861334.dat"
stats FILE u 0 nooutput
N = int(STATS_records/STATS_blocks) # get number of lines per subblock
set table FILE."2"
plot FILE u 1:2
set table FILE."3"
x1=x2=y1=y2=NaN
myValueX(col) = (x0=x1,x1=x2,x2=column(col), r=int($0-2)/N, r<1 ? x0 : r<2 ? x1 : x2)
myValueY(col) = (y0=y1,y1=y2,y2=column(col), r<1 ? y0 : r<2 ? y1 : y2)
plot FILE."2" u (myValueX(1)):(myValueY(2))
unset table
set key top left
set offset graph 0.1, graph 0.1, graph 0.2, graph 0.1
plot for [i=0:2] FILE u 1:2 index i w lp pt 7 lc i+1 ti sprintf("index %d",i), \
FILE."3" u 1:2 every ::2 smooth freq w lp pt 7 lc rgb "magenta" ti "sum"
### end of code
Code 2: (works with gnuplot>=5.0.0)
This code works with arbitrary number of subdatasets.
### calculate sum from 3 different (sub)datasets, gnuplot>=5.0.0
reset
FILE = "SO16861334.dat"
set table $Data2
plot FILE u 1:2 w table
unset table
set key top left
set offset graph 0.1, graph 0.1, graph 0.2, graph 0.1
set colorsequence classic
plot for [i=0:2] FILE u 1:2 index i w lp pt 7 lc i+1 ti sprintf("index %d",i), \
$Data2 u 1:2 smooth freq w lp pt 7 lc rgb "magenta" ti "sum"
### end of code
Result: (same result for Code1 with gnuplot 4.6.0 and Code2 for gnuplot 5.0.0)