Gnuplot - a way to convert and plot text information? - plot

I am trying to use gnuplot to display the information contained in a file as in the example below:
1 2 3 … 10 11
1 1.0000000e-06 1.0000000e-06 … 0
2 2.5000000e-06 1.5000000e-06 … 0 #dt_grow
3 4.7500000e-06 2.2500000e-06 … 0 #dt_grow
4 8.1250000e-06 3.3750000e-06 … 0 #dt_cfl
5 1.2450703e-05 4.3257029e-06 … 1 #dt_mach, max_iteration_turbulence
6 1.6811013e-05 0.3603104e-06 … 0 #dt_grow
My goal is to be able to represent, somehow, the information listed in column 11 which, as you can see, contains non-numeric characters.
It might be pointless but, before moving ahead, it might be helpful to stress that:
row1 has no value at column 11
each column 11 value start with # and is not quoted
column 11 contains many other different possible entries (e.g. "#dt_piso","#dt_piso, 2*max_piso reached", "#dt_mach, temperature extrapolation error")
when values of column 11 present an additional information (e.g ", max_iteration_turbulence") values of column 10 are non-zero
the number of rows is typically of the order 10^6
My idea was to use associate a numeric value to each element of column11 using functions (e.g. if #dt_grow then 1, if #dt_cfl then 2 ecc) so that I can somehow represent this information.
What I have tried so far produce nothing but errors (that I am for brevity listing below each used plot command):
p "file" u 1:11 w l
--> x range is invalid
p "file" u 1:(''.$11 eq "#dt_cfl" ? 1 : 0) w l
--> warning: Skipping data file with no valid points. x range is invalid
p "file" u 1:(column(11) eq "#dt_cfl" ? 1 : 0) w l
--> internal error : STRING operator applied to non-STRING type
p "file" u 1:(strcol(11) eq "#dt_cfl" ? 1 : 0) w l
--> internal error : STRING operator applied to non-STRING type
splot "time.out" u 1:(11 eq "#dt_cfl" ? 1 : 0) w l
--> Need 1 or 3 columns for cartesian data
#Usage of functions does not resolve the issue:
e.g. f(x)= ''.x eq "#dt_cfl" ? 1 : 0
As you can probably tell by the diversity of my trials I am somehow confused on how it is recommendable to proceed in such cases. I have never had to plot string data and I am not quite sure of what is causing the issue. I've been looking for some inputs on the documentation but nothing really helped me on this. I would very much appreciate any inputs on how to handle string data and associate them to numeric values.
To wrap it up: I want to display the evolution of the information on column 11.
Ideally, I would like to be able to use the eventual additional information (as explained in point 4 above) based on the value of column 10.
Based on my request I believe a python script could better fit my necessities, but I am wondering if gnuplot offers such possibilities and I am eager to learn more.
Thanks in advance :)!
P.S.: I am adding a sketch of the results I am trying to obtain hping that this can help clarify my goals.
I am anyway open to new solution as this is just my plan of how I was thinking about overcoming the problem of plotting text data.
With respect to the few rows of data that provided above and assuming to do the following assosiations:
#dt_grow is 1
#dt_cfl is 2
#dt_mach is 3
so on for other possible values (this could be hardcoded as I would have no more that 10 possible values in column11)
Plot_ sketch

Maybe something like this?
You can use the 11th column (here: 5th column) as x2ticlabels (check help xticlabels). Before, link the x2 axis to the x1 axis (check help link).
You could rotate the x2tic labels if they are getting to many and overlap: set x2tics rotate by 90.
In principle, you could get rid of the leading # of each label, but I guess it will get a bit tricky because of your missing value in row 1.
Look at the example below as a starting point.
Script:
### adding text info from columns to some labels
reset session
$Data <<EOD
1 2 3 4 5
1 1.0000000e-06 1.0000000e-06 0
2 2.5000000e-06 1.5000000e-06 0 #dt_grow
3 4.7500000e-06 2.2500000e-06 0 #dt_grow
4 8.1250000e-06 3.3750000e-06 0 #dt_cfl
5 1.2450703e-05 4.3257029e-06 1 #dt_mach, max_iteration_turbulence
6 1.6811013e-05 0.3603104e-06 0 #dt_grow
EOD
set termoption noenhanced
set key top left
set link x2 via x inverse x
set x2tics
plot $Data u 1:2:x2tic(5) skip 1 axes x2y1 w lp pt 7 lc "red" title "column 2", \
'' u 1:3 skip 1 w lp pt 7 lc "web-green" title "column 3"
### end of script
Result:
Addition:
I guess I understand what you want to do but the background is still a bit unclear.
What you are asking for is a conversion or mapping of strings to numbers.
I assume you have a fixed and known set of keywords.
Apparently, for your desired plot the other columns besides 1 and 11 do not play a role.
Your missing value in column 11 in row 1 (excl. header) will create problems, hence add the option skip 2.
In the minimized example below, your column 11 is actually column 2.
The example below will create some random test data for better illustration.
create a string list of your keywords
you can address them via word(), check help word
you can (mis)use sum for a lookup to get the index, check help sum
furthermore, check help strcol, help xticlabels, help skip, help ternary.
Script:
### map strings to numbers
reset session
myKeys = '#dt_grow #dt_cfl #dt_piso #dt_foo #dt_bar #dt_xyz #dt_abc'
myKey(i) = word(myKeys,i)
# create some random test data
set table $Data
set samples 50
plot '+' u ("1 2") every ::0::0 w table
plot '+' u ("1") every ::0::0 w table
plot '+' u ($0+1):(word(myKeys,int(rand(0)*words(myKeys)+1))) w table
unset table
getIdx(s) = (n=0, sum[i=1:words(myKeys)] (s eq myKey(i) ? n=i : 0), n)
set ytics 1
set grid x,y
plot $Data u 1:(y0=getIdx(strcol(2))):ytic(myKey(y0)) skip 2 w lp pt 7 lc "red" notitle
### end of script
Result:

I will not attempt a full answer right now, but here are a few pieces that may be useful by themselves or in conjunction with the answer from #theozh.
Column 11 not always present: The presence or absence of column 11 on any given line can be tested using the "pseudo-column" #$, which evaluates to the total number of columns found on that line. See "help pseudo". This feature was introduced in gnuplot version 5.4.2 (June 2021). For example to plot the values of column 10 but only if column 11 is also present:
plot FOO using 0:((#$ > 10) ? column(10) : NaN)
-Separate lines on the graph for each column 11 category: This could be done more cleanly using arrays in the development version of gnuplot, but sticking with features present in version 5.4 I suggest placing all the categories you want to track in one big string and then looping over the string.
Category = "#dt_grow #dt_cfl #dt_mach"
xcoord(x) = ... some function of the value in column 1? ...
ycoord(y) = ... some function of the value in column 10? ...
set datafile missing NaN #ignore any lines that evaluate to NaN
plot for [cat in Category] (xcoord($1)) : (strcol(#$) eq cat ? ycoord($10) : NaN) with steps

Related

GNUPLOT with point-size variables stored in a different file

I have a data file with the following format :
y1 y2 y3 y4 ...
1.3 1.1 0.5 0.5 ...
0.2 0.4 0.6 0.1 ...
I know how to use Gnuplot to plot the data in this file. Suppose I have 50 columns, then I use:
plot for [col=0:150] filename using 0:col with lines ...
Now, I want to make a scatter instead of a line plot with points having variable size. I have a different file storing the pointsize variables. I know I need to also use a for loop and:
w p ps variable
However, since the point-size variables are stored in a different file, I do not know how to write the using specification. Normally one uses
using 0:1:2
where the point size variables are stored in the second column etc. But what if these variables are stored in a different file ?
I think I can solve this problem by combining both the data and the pointsize variables file into a single file, but I wonder if one can do this using gnuplot.
Thanks
If there is a one-to-one matchup of lines in the two files, then yes. Assuming file.dat is formatted like the one you show above, and ps.dat contains one header record and then in column 1 the point size for all points in that same line of the data file:
# read point sizes into a data block in gnuplot
set datafile columnheaders
set table $pointsize
plot "ps.dat" using 1 with table
unset table
# Now plot the data, using the value of $pointsize[j+1] for row j of points
# There are two tricky bits here
# 1) the line numbers are counted starting with 0
# but array and datablock entries are counted starting from 1.
# 2) $pointsize is an array of strings. We need to convert this to a
# real number in order to use it as a point size
plot for [i=1:*] "file.dat" using 0:i:(real($pointsize[$0+1])) with points ps variable
file.dat
y1 y2 y3 y4
1 2 4 3
2 3 5 4
3 4 6 5
4 5 8 6
ps.dat
ps
1
5
2
3

Is it possible to suppress plotting zero values from datafile column?

I wrote same data collecting procedure, and over time I added more columns to the data output.
To build a consistent format, the procedure outputs 0 where no measurements were available.
I wonder when plotting the data file whether it is possible not to plot zero values (like if no data were present).
Some of the new columns are plotted by themselves (using 2:7) and others are used in an expression (using 2:($7+$8)).
Here is another option: set datafile missing "0". Note, that a value of 0.0 will be plotted.
This will also plot the lines connected in case you use with lines or with linespoints
Code:
### do not plot values "0"
reset session
$Data <<EOD
1 1.1
2 0
3 5.1
4 2.1
5 0
6 0.0
7 5.1
EOD
set datafile missing "0"
plot $Data u 1:2 w lp pt 7, \
'' u 1:($1+$2) w lp pt 7
### end of code
Result:
Also check help set datafile, help set datafile missing or help missing.
gnuplot will not plot values if they are not-a-number, i.e. NaN. You can either use this string in the data instead of 0, or write a function to convert 0 to NaN and use that, eg:
chk(x) = (x==0?NaN:x)
plot "file" using 2:(chk($7)+chk($8)) with lines
Adding a value to NaN results in NaN.

x must be numeric while trying to create histogram in R

I am a newbie in R. I need to generate some graphs. I imported an excel file and need to create a histogram on one column. My importing code is-
file=read.xlsx('femalecommentcount.xlsx',1,header=FALSE)
col=file[2]
col looks like this (part) -
36961 1
36962 1
36963 7
36964 1
36965 2
36966 1
36967 1
36968 4
36969 1
36970 6
36971 3
36972 1
36973 6
36974 6
36975 2
36976 2
36977 8
36978 2
36979 1
36980 1
36981 1
the first column is the row number. I'm not sure how to remove this. The second column is my data that I want a histogram on. hist() function requires a vector, I'm not sure how exactly to convert.
If I just simple call -
hist(col)
it gives-
Error in hist.default(col) : 'x' must be numeric
I have tried few commands randomly from the internet, but they didn't work.
My eventual goal is to just generate a good histogram (and maybe other charts) on that column, to get a good understadning of the spread of my data.
It should be col=file[[2]] or col=file[, 2] --- solution given in comment
data import should be in correct way to avoid numeric issue

Gnuplot: How do I skip columns in matrix input to plot?

I have data file of the form:
unimportant1 unimportant2 unimportant3 matrixdata[i]
1e4 2e5 3e2 1 2 3 4 5
2e3 1e1 7e3 5 4 3 2 1
... ... ... ...
2e3 1e4 4e2 4 4 4 4 4
So it has columnheaders (here "unimportant1" to "unimportant3") as the first row. I want gnuplot to ignore these first three unimportant columns columns so the data entries in exponential notation. I want gnuplot to plot the matrixdata as a matrix. So as if I did it like this:
#!/usr/bin/gnuplot -p
plot '-' matrix with image
1 2 3 4 5
5 4 3 2 1
...
4 4 4 4 4
e
How do I get gnuplot to ignore the first three columns and the header row and plot the rest as matrix image? For compatibility, I would prefere a gnuplot built-in to do that, but I could write a shell script and use the `plot '< ...' syntax preprocessing the data file.
Edit: So neuhaus' answer almost solved it. The only thing I'm missing is, how to ignore the first row (line) with the text header data. Every seems to expect numeric data and so the whole plot fails as it's not a matrix. I don't want to comment out the fist line, as I'm using the unimportant data sets for other 2D plots that, in turn, use the header data.
So how do I skip a row in a matrix plot that already uses every to skip columns?
When using matrix gnuplot must first parse the data file before it can skip rows and columns. Now, your first row evaluates to four invalid number, the second row has 8 number and I get an error that Matrix does not represent a grid.
If you don't want to comment out the first line or skip it with an external tool like < tail -n +2 matrix.dat, then you could change it to contain some dummy strings like
unimportant1 unimportant2 unimportant3 matrixdata[i] B C D E
1e4 2e5 3e2 1 2 3 4 5
2e3 1e1 7e3 5 4 3 2 1
... ... ... ...
2e3 1e4 4e2 4 4 4 4 4
Now your first row has as many entries as the other rows, and you can plot this file with
plot 'test.txt' matrix every ::3:1 with image
This still gives you a warning: matrix contains missing or undefined values, but you don't need to care.
I'm not familiar with matrix plots, but I got some sample data and
plot 'matrix.dat' matrix every ::3 with image
seems to do the trick.
You could probably use shell commands, for instance, the following skips the first six lines of a file:
plot '<tail -n +7 terrain0.dem' matrix with image

Simple line plot using R ggplot2

I have data as follows in .csv format as I am new to ggplot2 graphs I am not able to do this
T L
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
I tried to plot a line graph using following graph
data<-read.csv("sample.csv",head=TRUE,sep=",")
ggplot(data,aes(T,L))+geom_line()]
but I got following image it is not I want
I want following image as follows
Can anybody help me?
You want to use a variable for the x-axis that has lots of duplicated values and expect the software to guess that the order you want those points plotted is given by the order they appear in the data set. This also means the values of the variable for the x-axis no longer correspond to the actual coordinates in the coordinate system you're plotting in, i.e., you want to map a value of "L=1" to different locations on the x-axis depending on where it appears in your data.
This type of fairly non-sensical thing does not work in ggplot2 out of the box. You have to define a separate variable that has a proper mapping to values on the x-axis ("id" in the code below) and then overwrite the labels with the values for "L".
The coe below shows you how to do this, but it seems like a different graphical display would probbaly be better suited for this kind of data.
data <- as.data.frame(matrix(scan(text="
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
"), ncol=2, byrow=TRUE))
names(data) <- c("T", "L")
data$id <- 1:nrow(data)
ggplot(data,aes(x=id, y=T))+geom_line() + xlab("L") +
scale_x_continuous(breaks=data$id, labels=data$L)
You have an error in your code, try this:
ggplot(data,aes(x=L, y=T))+geom_line()
Default arguments for aes are:
aes(x, y, ...)

Resources