GNU plot - count the number of peaks - plot

I have a very huge text file with 11 columns. As I can't post the whole data, I have uploded the text file to a public repo and is found in this link: http://s000.tinyupload.com/?file_id=59483318155908771897
Is there any way to COUNT the number of peaks using GNU plot in Linux? From the above text file, I am plotting the 1st and 7th column as x and y columns where the peaks are variations of the 7th column and that's what I am interested in. For example, to count the number of peaks of frequency as in the following image as 10.
Here a simple plotting script i am using.
set key right top
set xrange [:10]
#show timestamp
set xlabel "time in sec"
set ylabel "Freq"
set title "Testing"
plot "data/freq.csv" using 1:7 title "Freq", \
Thanks for any help.

Gnuplot is for plotting and minor arithmetic, finding peaks in a signal is a signal processing task and you need something like GNU Octave to do a reasonable job. If you load freq.csv file and run findpeaks() on it with a plausible value for MinPeakDistance you get:
The code I used to generate the above plot:
y = dlmread('freq.csv', ' ');
[peak_y, peak_x] = findpeaks(y(:,7), "MinPeakDistance", 40);
plot(y(:,1), y(:,7), y(peak_x,1), peak_y, '.r');
Depending on what you want findpeaks() might be enough, see help findpeaks and demo findpeaks for other options you can tweak.

It's a bit of tweaking but this example should help:
y2=y1=y0=NaN
stat "data/freq.csv" using (y2=y1,y1=y0,y0=$7,(y1>y2&&y1>y0?y1:NaN)) prefix "data"
Now in the variable data_records you should get the COUNT of local maximums you have in column 7.
You can print via
print data_records
To understand more, I post here an example of the sinus function
set table 'test.dat'
plot sin(x)
unset table
x2=x1=x0=NaN
y2=y1=y0=NaN
plot 'test.dat' using (x2=x1,x1=x0,x0=$1,x1):(y2=y1,y1=y0,y0=$2,(y1>y2&&y1>y0?y1:NaN)) w p, 'test.dat' u 1:2 w l
Should plot a sinus and also the maximum points.
In case several points have the same value:
x2=x1=x0=NaN
y2=y1=y0=NaN
plot 'freq.csv' u 0:7 w l, '' using (x2=x1,x1=x0,x0=$0,x1):(y2=y1,y1=y0,y0=$7,(y1>=y2&&y1>y0?y1:NaN)) w p
or
plot 'freq.csv' u 0:7 w l, '' using (x2=x1,x1=x0,x0=$0,x1):(y2=y1,y1=y0,y0=$7,(y1>y2&&y1>=y0?y1:NaN)) w p
depending on which side of the plateau you want to count the peak
The stat command becomes:
stat 'freq.csv' using (y2=y1,y1=y0,y0=$7,(y1>=y2&&y1>y0?y1:NaN)) prefix "data"

Related

Reducing number of datapoints when plotting in loglog scale in Gnuplot

I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)
LogLogPlot of my datapoints
Text file with the datapoints
Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large.
In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?
I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.
As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2
Amended answer
If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).
Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.
If you really want to reduce the number of data to be plotted, you might consider the following script.
s = 0.1 ### sampling interval in log scale
### (try 0.05 for more detail)
c = log10(0.01) ### a parameter used in sampler(x)
### which should be initialized by
### smaller value than any x in log scale
sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN
set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle
This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.

Force 1st point of pointinterval to be plotted

I tried to plot graph using the pointinterval command and I would like the 1st point of my data to be plotted which is not the case for the hot side of my first plot. Indeed we see the purple dashed line but no point at the bottom left corner (around y+=0.35).
My code involves for loop and is displayed below:
plot for [i=1:words(FILES)] myDataFile(i) u (column(1)):(column(6)/word(UTAUS_ch,i)) w lp pointinterval 2 pt myPointtype(i) ps myPointsize(i) dt myDashtype(i) lt myLinetype(i) lw myLinewidth(i) lc rgb myLinecolor(i) title myTitle(i)
If I plot with pointinterval 1 we see that those points exist (see picture below).
How can I force the first point to be plotted with pointinterval?
Is that possible to plot half of my points every 2 points and the other part every 2 points but with an offset of 1 point?
I do not think you will be able to do what you want using the pointinterval property. It is designed so that the offset of the initial point increases by one for each plot drawn, with the intention of reducing the chance that point symbols from successive plots will overlap. This is exactly opposite to what you are trying to do.
Therefore I suggest not plotting each dataset with linespoints pi N. Instead plot each dataset twice, once with lines and once with points using a filter in the using specifier like this:
plot FOO using 1:2 with lines, '' using ((int($0)%N) ? NaN : $1) : 2 with points
The filter (int($0)%N ? NaN : $1) suppresses all points whose line number is not evenly divisible by N. This is essentially what the pointinterval property does, except that pointinterval skips out-of-range points and otherwise unplottable points rather than strictly using the line number as an index.
Edit If individual offset values are required because x-coordinates are not consistent:
array offset[N] = [1,1,2,-1, and so on]
plot for [i=1:N] \
MyDataFile(i) using 1:2 with lines, \
'' using (((int($0)+offset[i] % N) ? NaN : $1) : 2 with points

Gaussian peaks not overlapping in Gnuplot

I’m trying to plot multiple Gaussian functions on the same graph with Gnuplot, which is quite a simple thing. The problem is that the peaks do not overlap and I get the following result that looks like they have different peaks, which they don’t. How can I fix this?
First, it helps to understand how gnuplot generates plots of functions (or really how any computer program must do it). It must convert a continuous function into some kind of discrete representation. The mathematical function to be plotted is evaluated at various points along the independent (x) axis. This creates a set of (x,y) points. A line is then drawn between these points (think "connect the dots"). As you might imagine, the number of discrete samples used affects how accurately the curve is represented, and how smooth it looks.
The problem you have noticed is that the default sample size in gnuplot is a bit too low. The default (I believe) is 100 samples across the visible x-axis. You can adjust the number of samples (to 1000, for example) with
set samples 1000
I have made some example plots of gaussians to illustrate this point. (I made a rough estimate of your gaussian parameters.) Each plot has a different number of samples:
Notice how the lines get too jagged if the sample size is too low. Even the default value of 100 is too low. Setting to 1000 makes it plenty smooth. This is probably more than it needs to be, but it works. If you're using a terminal that generates a bitmap image (e.g. PNG), then you shouldn't need more samples than you have width in pixels used for the x-axis plot area. If you're generating vector based output, then just pick something that "looks right" for whatever you are using it in.
See the question Gnuplot x-axis resolution for more.
By the way, the code to generate the above examples is:
set terminal pngcairo size 640,480 enhanced
# Line styles
set style line 1 lw 2 lc rgb "blue"
set style line 2 lw 2 lc rgb "red"
set style line 3 lw 2 lc rgb "yellow"
# Gaussian function stuff
set yrange [0:1.1]
set xrange [-20:20]
gauss(x,a) = exp(-(x/a)**2)
eqn(a) = sprintf("y = e^{-(x/%d)^2}", a)
# First example (default)
set output "example1.png"
set title "100 samples (default)"
plot gauss(x,8) ls 1 title eqn(8), \
gauss(x,2) ls 2 title eqn(2), \
gauss(x,1) ls 3 title eqn(1)
# Second example (too low)
set output "example2.png"
set title "20 samples (too low)"
set samples 20
replot
# Third example (plenty high)
set output "example3.png"
set title "1000 samples (plenty high)"
set samples 1000
replot

How to plot data from different blocks with lines in Gnuplot?

I have a data file with blocks of x/y values. Each block contains 16 lines with x/y pairs and each block represents those positions in a different time. http://pastebin.com/0teRrfRU
I want to plot the trajectory of a specific particle. To do that, I've written plot 'pos.dat' u 2:3 every ::n:0:n:i, where n is the n-th particle and i is the time up to which I want the trajectory plotted (I can then loop over the i to generate an animation).
This runs fine, but when I add w lines nothing gets plotted, and I don't understand why. Is there a way to plot this with lines? The only alternative I see is writing a script to parse the data file and generate a new one with only the values I want (effectively acting as every), but I don't want to do that if I can do it in Gnuplot.
After a closer look to your data, your case has some speciality.
Like in Plotting same line number of several blocks data with gnuplot you can plot the file into a table via with table which will remove the empty lines and hence lines will be connected.
However, some of your particles disappear on one side and re-appear on the opposite side. If you plot this with lines you will get a line through the whole graph which is certainly undesired. You can workaround this if you introduce a function Break() which returns NaN if the difference of two successive x- or y-values are larger than 90% (to be on the safe side) of the x- or y-range , respectively. The effect of NaN is that the line will interrupted.
Code: (works with gnuplot>=5.0.0 version at the time of OP's question)
### plotting trajectories
reset session
set term gif animate delay 3 size 400,400
set output "SO30744875.gif"
set size square
FILE = 'SO30744875.dat'
set key noautotitle
stats FILE u (N=column(-1),M=column(1),$2):3 nooutput
xrange = STATS_max_x-STATS_min_x
yrange = STATS_max_y-STATS_min_y
set table $Data
plot FILE u 1:2:3 w table
unset table
Break(col1,col2) = (x0=x1,x1=column(col1), y0=y1,y1=column(col2), \
abs(x1-x0)<0.9*xrange && abs(y1-y0)<0.9*yrange ? column(col2) : NaN)
do for [i=0:N] {
plot for [j=1:16] x1=y1=NaN $Data u 2:(Break(2,3))every M::j-1::(i+1)*M w l, \
FILE u 2:3 every :::i::i w p pt 7, \
FILE u 2:3:1 every :::i::i w labels offset 0.7,0.7
}
set output
### end of code
Result:

How to count line segment occurrences by pixel in R?

I am trying to convey the concentration of lines in 2D space by showing the number of crossings through each pixel in a grid. I am picturing something similar to a density plot, but with more intuitive units. I was drawn to the spatstat package and its line segment class (psp) as it allows you to define line segments by their end points and incorporate the entire line in calculations. However, I'm struggling to find the right combination of functions to tally these counts and would appreciate any suggestions.
As shown in the example below with 50 lines, the density function produces values in (0,140), the pixellate function tallies the total length through each pixel and takes values in (0, 0.04), and as.mask produces a binary indictor of whether a line went through each pixel. I'm hoping to see something where the scale takes integer values, say 0..10.
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L = psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# image with 2-dimensional kernel density estimate
D = density.psp(L, sigma=0.03)
# image with total length of lines through each pixel
P = pixellate.psp(L)
# binary mask giving whether a line went through a pixel
B = as.mask.psp(L)
par(mfrow=c(2,2), mar=c(2,2,2,2))
plot(L, main="L")
plot(D, main="density.psp(L)")
plot(P, main="pixellate.psp(L)")
plot(B, main="as.mask.psp(L)")
The pixellate.psp function allows you to optionally specify weights to use in the calculation. I considered trying to manipulate this to normalize the pixels to take a count of one for each crossing, but the weight is applied uniquely to each line (and not specific to the line/pixel pair). I also considered calculating a binary mask for each line and adding the results, but it seems like there should be an easier way. I know that you can sample points along a line, and then do a count of the points by pixel. However, I am concerned about getting the sampling right so that there is one and only one point per line crossing of a pixel.
Is there is a straight-forward way to do this in R? Otherwise would this be an appropriate suggestion for a future package enhancement? Is this more easily accomplished in another language such as python or matlab?
The example above and my testing has been with spatstat 1.40-0, R 3.1.2, on x86_64-w64-mingw32.
You are absolutely right that this is something to put in as a future enhancement. It will be done in one of the next versions of spatstat. It will probably be an option in pixellate.psp to count the number of crossing lines rather than measure the total length.
For now you have to do something a bit convoluted as e.g:
require(spatstat)
set.seed(1234)
numLines = 50
# define line segments
L <- psp(runif(numLines),runif(numLines),runif(numLines),runif(numLines), window=owin())
# split into individual lines and use as.mask.psp on each
masklist <- lapply(1:nsegments(L), function(i) as.mask.psp(L[i]))
# convert to 0-1 image for easy addition
imlist <- lapply(masklist, as.im.owin, na.replace = 0)
rslt <- Reduce("+", imlist)
# plot
plot(rslt, main = "")

Resources