Is it possible to suppress plotting zero values from datafile column? - plot

I wrote same data collecting procedure, and over time I added more columns to the data output.
To build a consistent format, the procedure outputs 0 where no measurements were available.
I wonder when plotting the data file whether it is possible not to plot zero values (like if no data were present).
Some of the new columns are plotted by themselves (using 2:7) and others are used in an expression (using 2:($7+$8)).

Here is another option: set datafile missing "0". Note, that a value of 0.0 will be plotted.
This will also plot the lines connected in case you use with lines or with linespoints
Code:
### do not plot values "0"
reset session
$Data <<EOD
1 1.1
2 0
3 5.1
4 2.1
5 0
6 0.0
7 5.1
EOD
set datafile missing "0"
plot $Data u 1:2 w lp pt 7, \
'' u 1:($1+$2) w lp pt 7
### end of code
Result:
Also check help set datafile, help set datafile missing or help missing.

gnuplot will not plot values if they are not-a-number, i.e. NaN. You can either use this string in the data instead of 0, or write a function to convert 0 to NaN and use that, eg:
chk(x) = (x==0?NaN:x)
plot "file" using 2:(chk($7)+chk($8)) with lines
Adding a value to NaN results in NaN.

Related

Gnuplot - a way to convert and plot text information?

I am trying to use gnuplot to display the information contained in a file as in the example below:
1 2 3 … 10 11
1 1.0000000e-06 1.0000000e-06 … 0
2 2.5000000e-06 1.5000000e-06 … 0 #dt_grow
3 4.7500000e-06 2.2500000e-06 … 0 #dt_grow
4 8.1250000e-06 3.3750000e-06 … 0 #dt_cfl
5 1.2450703e-05 4.3257029e-06 … 1 #dt_mach, max_iteration_turbulence
6 1.6811013e-05 0.3603104e-06 … 0 #dt_grow
My goal is to be able to represent, somehow, the information listed in column 11 which, as you can see, contains non-numeric characters.
It might be pointless but, before moving ahead, it might be helpful to stress that:
row1 has no value at column 11
each column 11 value start with # and is not quoted
column 11 contains many other different possible entries (e.g. "#dt_piso","#dt_piso, 2*max_piso reached", "#dt_mach, temperature extrapolation error")
when values of column 11 present an additional information (e.g ", max_iteration_turbulence") values of column 10 are non-zero
the number of rows is typically of the order 10^6
My idea was to use associate a numeric value to each element of column11 using functions (e.g. if #dt_grow then 1, if #dt_cfl then 2 ecc) so that I can somehow represent this information.
What I have tried so far produce nothing but errors (that I am for brevity listing below each used plot command):
p "file" u 1:11 w l
--> x range is invalid
p "file" u 1:(''.$11 eq "#dt_cfl" ? 1 : 0) w l
--> warning: Skipping data file with no valid points. x range is invalid
p "file" u 1:(column(11) eq "#dt_cfl" ? 1 : 0) w l
--> internal error : STRING operator applied to non-STRING type
p "file" u 1:(strcol(11) eq "#dt_cfl" ? 1 : 0) w l
--> internal error : STRING operator applied to non-STRING type
splot "time.out" u 1:(11 eq "#dt_cfl" ? 1 : 0) w l
--> Need 1 or 3 columns for cartesian data
#Usage of functions does not resolve the issue:
e.g. f(x)= ''.x eq "#dt_cfl" ? 1 : 0
As you can probably tell by the diversity of my trials I am somehow confused on how it is recommendable to proceed in such cases. I have never had to plot string data and I am not quite sure of what is causing the issue. I've been looking for some inputs on the documentation but nothing really helped me on this. I would very much appreciate any inputs on how to handle string data and associate them to numeric values.
To wrap it up: I want to display the evolution of the information on column 11.
Ideally, I would like to be able to use the eventual additional information (as explained in point 4 above) based on the value of column 10.
Based on my request I believe a python script could better fit my necessities, but I am wondering if gnuplot offers such possibilities and I am eager to learn more.
Thanks in advance :)!
P.S.: I am adding a sketch of the results I am trying to obtain hping that this can help clarify my goals.
I am anyway open to new solution as this is just my plan of how I was thinking about overcoming the problem of plotting text data.
With respect to the few rows of data that provided above and assuming to do the following assosiations:
#dt_grow is 1
#dt_cfl is 2
#dt_mach is 3
so on for other possible values (this could be hardcoded as I would have no more that 10 possible values in column11)
Plot_ sketch
Maybe something like this?
You can use the 11th column (here: 5th column) as x2ticlabels (check help xticlabels). Before, link the x2 axis to the x1 axis (check help link).
You could rotate the x2tic labels if they are getting to many and overlap: set x2tics rotate by 90.
In principle, you could get rid of the leading # of each label, but I guess it will get a bit tricky because of your missing value in row 1.
Look at the example below as a starting point.
Script:
### adding text info from columns to some labels
reset session
$Data <<EOD
1 2 3 4 5
1 1.0000000e-06 1.0000000e-06 0
2 2.5000000e-06 1.5000000e-06 0 #dt_grow
3 4.7500000e-06 2.2500000e-06 0 #dt_grow
4 8.1250000e-06 3.3750000e-06 0 #dt_cfl
5 1.2450703e-05 4.3257029e-06 1 #dt_mach, max_iteration_turbulence
6 1.6811013e-05 0.3603104e-06 0 #dt_grow
EOD
set termoption noenhanced
set key top left
set link x2 via x inverse x
set x2tics
plot $Data u 1:2:x2tic(5) skip 1 axes x2y1 w lp pt 7 lc "red" title "column 2", \
'' u 1:3 skip 1 w lp pt 7 lc "web-green" title "column 3"
### end of script
Result:
Addition:
I guess I understand what you want to do but the background is still a bit unclear.
What you are asking for is a conversion or mapping of strings to numbers.
I assume you have a fixed and known set of keywords.
Apparently, for your desired plot the other columns besides 1 and 11 do not play a role.
Your missing value in column 11 in row 1 (excl. header) will create problems, hence add the option skip 2.
In the minimized example below, your column 11 is actually column 2.
The example below will create some random test data for better illustration.
create a string list of your keywords
you can address them via word(), check help word
you can (mis)use sum for a lookup to get the index, check help sum
furthermore, check help strcol, help xticlabels, help skip, help ternary.
Script:
### map strings to numbers
reset session
myKeys = '#dt_grow #dt_cfl #dt_piso #dt_foo #dt_bar #dt_xyz #dt_abc'
myKey(i) = word(myKeys,i)
# create some random test data
set table $Data
set samples 50
plot '+' u ("1 2") every ::0::0 w table
plot '+' u ("1") every ::0::0 w table
plot '+' u ($0+1):(word(myKeys,int(rand(0)*words(myKeys)+1))) w table
unset table
getIdx(s) = (n=0, sum[i=1:words(myKeys)] (s eq myKey(i) ? n=i : 0), n)
set ytics 1
set grid x,y
plot $Data u 1:(y0=getIdx(strcol(2))):ytic(myKey(y0)) skip 2 w lp pt 7 lc "red" notitle
### end of script
Result:
I will not attempt a full answer right now, but here are a few pieces that may be useful by themselves or in conjunction with the answer from #theozh.
Column 11 not always present: The presence or absence of column 11 on any given line can be tested using the "pseudo-column" #$, which evaluates to the total number of columns found on that line. See "help pseudo". This feature was introduced in gnuplot version 5.4.2 (June 2021). For example to plot the values of column 10 but only if column 11 is also present:
plot FOO using 0:((#$ > 10) ? column(10) : NaN)
-Separate lines on the graph for each column 11 category: This could be done more cleanly using arrays in the development version of gnuplot, but sticking with features present in version 5.4 I suggest placing all the categories you want to track in one big string and then looping over the string.
Category = "#dt_grow #dt_cfl #dt_mach"
xcoord(x) = ... some function of the value in column 1? ...
ycoord(y) = ... some function of the value in column 10? ...
set datafile missing NaN #ignore any lines that evaluate to NaN
plot for [cat in Category] (xcoord($1)) : (strcol(#$) eq cat ? ycoord($10) : NaN) with steps

GNUPLOT with point-size variables stored in a different file

I have a data file with the following format :
y1 y2 y3 y4 ...
1.3 1.1 0.5 0.5 ...
0.2 0.4 0.6 0.1 ...
I know how to use Gnuplot to plot the data in this file. Suppose I have 50 columns, then I use:
plot for [col=0:150] filename using 0:col with lines ...
Now, I want to make a scatter instead of a line plot with points having variable size. I have a different file storing the pointsize variables. I know I need to also use a for loop and:
w p ps variable
However, since the point-size variables are stored in a different file, I do not know how to write the using specification. Normally one uses
using 0:1:2
where the point size variables are stored in the second column etc. But what if these variables are stored in a different file ?
I think I can solve this problem by combining both the data and the pointsize variables file into a single file, but I wonder if one can do this using gnuplot.
Thanks
If there is a one-to-one matchup of lines in the two files, then yes. Assuming file.dat is formatted like the one you show above, and ps.dat contains one header record and then in column 1 the point size for all points in that same line of the data file:
# read point sizes into a data block in gnuplot
set datafile columnheaders
set table $pointsize
plot "ps.dat" using 1 with table
unset table
# Now plot the data, using the value of $pointsize[j+1] for row j of points
# There are two tricky bits here
# 1) the line numbers are counted starting with 0
# but array and datablock entries are counted starting from 1.
# 2) $pointsize is an array of strings. We need to convert this to a
# real number in order to use it as a point size
plot for [i=1:*] "file.dat" using 0:i:(real($pointsize[$0+1])) with points ps variable
file.dat
y1 y2 y3 y4
1 2 4 3
2 3 5 4
3 4 6 5
4 5 8 6
ps.dat
ps
1
5
2
3

Printing custom label every n elements using Gnuplot

I want to create scatter plot of a file that looks like:
counter N x y
1 200 50 50
2 200 46 46
3 200 56 56
4 200 36 36
5 200 56 56
There are 240 lines in this file. The N is incremented by 200 every 30 lines.
So, when I plot the numbers I want to create a scatter plot of x, y values vs. counter. Here is my code:
plot "file" using 1:3 title "hb" with points pt 2 ps 1 lc rgb "red", \
"file" using 1:4 title "ls" with points pt 3 ps 1 lc rgb "blue"
As a result my x-axis has the range [1,240].
The question is that I want the label of my x-axis to contain the values from the second column, and I want them to be printed after every 30 points.
So, I want my x-axis label to be customized as: [200,400,600,800,1000,1200,1400,1600] where they each have 30 points in between.
I actually searched for this question before, found the solution and solved it. So, I know there is an answer somewhere. But apparently I lost my code. I have been searching for the old post for an hour now but could not find it.
Can anyone help me with using customized labels here?
I'm not sure how to generate xtics from the data in gnuplot, so I'd use bash to generate them for me:
#! /bin/bash
xtics='('$(cut -d' ' -f1,2 file | sort -nuk2 | sed 's/\(.*\) \(.*\)/\2 \1/;s/^/"/;s/ /" /;s/$/,\\/')$'\n)'
gnuplot <<EOF
set term png
set output '1.png'
set xtics $xtics
plot "file" using 1:3 title "hb" with points pt 2 ps 1 lc rgb "red", \
"" using 1:4 title "ls" with points pt 3 ps 1 lc rgb "blue"
EOF
On a randomly generated input, it gives this output:
You can evaluate any expression in xticlabel to give a string or an invalid value. In order to set labels only at certain values of column 1, you can use
plot "file" using 1:3:xtic(int($1)%30 == 0 ? strcol(2) : 1/0) title "hb" pt 2 lc rgb "red", \
"" using 1:4 title "ls" pt 3 lc rgb "blue"
Thr expression xtic(int($1)%30 == 0 ? strcol(2) : 1/0) places the string value of column 2 when the value in column 1 is a multiple of 30. All other values are skipped, because 1/0 is an invalid value.

Simple line plot using R ggplot2

I have data as follows in .csv format as I am new to ggplot2 graphs I am not able to do this
T L
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
I tried to plot a line graph using following graph
data<-read.csv("sample.csv",head=TRUE,sep=",")
ggplot(data,aes(T,L))+geom_line()]
but I got following image it is not I want
I want following image as follows
Can anybody help me?
You want to use a variable for the x-axis that has lots of duplicated values and expect the software to guess that the order you want those points plotted is given by the order they appear in the data set. This also means the values of the variable for the x-axis no longer correspond to the actual coordinates in the coordinate system you're plotting in, i.e., you want to map a value of "L=1" to different locations on the x-axis depending on where it appears in your data.
This type of fairly non-sensical thing does not work in ggplot2 out of the box. You have to define a separate variable that has a proper mapping to values on the x-axis ("id" in the code below) and then overwrite the labels with the values for "L".
The coe below shows you how to do this, but it seems like a different graphical display would probbaly be better suited for this kind of data.
data <- as.data.frame(matrix(scan(text="
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
"), ncol=2, byrow=TRUE))
names(data) <- c("T", "L")
data$id <- 1:nrow(data)
ggplot(data,aes(x=id, y=T))+geom_line() + xlab("L") +
scale_x_continuous(breaks=data$id, labels=data$L)
You have an error in your code, try this:
ggplot(data,aes(x=L, y=T))+geom_line()
Default arguments for aes are:
aes(x, y, ...)

In Gnuplot, how can I plot the sum of two columns when I'm plotting by header name

I have the following data file:
denst densu densd denss
3 1 1 1
4 1 1.5 1.5
5 1 2.5 1.5
I can plot, say, densu(denst) as:
plot 'file.txt' u 'denst':'densu'
Which is very convenient syntax. But if I want to plot, say, the sum of densu and densd, with respect to denst the only way I can do it is:
set key autotitle columnhead to tell gnuplot the first line is headers and not data
plot 'file.txt' u 1:($2+$3) to plot
The question is how can I do operations with column values like that, but using the name notation? The actual file is a csv with ~40 columns, and it's very tedious to manually count which column is which number so I can use the $n syntax to do math with column data.
I would want to do something like plot 'file.txt' u 1:($'densu'+$'densd'), using header name syntax analogously to how I can do it with column number syntax. Is there any way to do this?
I've discovered a way to do it. These two commands are equivalent:
plot 'file.txt' u 1:($2+$3)
plot 'file.txt' u 1:(column(2)+column(3))
You can't do
plot 'file.txt' u 'denst':($'densu'+$'densd')
but you can do
plot 'file.txt' u 'denst':(column('densu')+column('densd'))
to the same effect.

Resources