Feeding data into octave plot from a command-line pipe - plot

I have a question concerning real time plotting in octave. The idea is very simple, but unfortunately I could not find a solution on the internet. In my project, I am using netcat to sample data and awk to filter it, for example:
nc 172.16.254.1 23 | awk ' /point/ '
In this way, I get a new data point (approximately) every 4-10 ms, together with a timestamp.
Now I would like to pipe this data to octave and plot it real time. Does anyone have any ideas?
Update
It seems to me that
nc 172.16.254.1 23 | awk ' /point/ ' | octave --silent --persist --eval "sample(stdin)"
pipes the data to my octave script sample, which does the plotting. But now there is still one problem: the replotting is far to slow, and slows down during sampling the data (I get thousands of data points). I have
function sample(stream)
t = NaN; r = NaN; k = 1;
figure(1)
plot(t,r,'o')
hold on
while(~feof(stream))
s = fgets(stream);
t(k) = str2double(s(1:6));
r(k) = str2double(s(8:11));
plot(t(k),r(k),'o')
drawnow()
k = k + 1;
end
What should I add/change?

After some research, feedgnuplot seems to satisfy my purpose of realtime plotting:
nc 172.16.254.1 23 |
awk ' /point/ ' |
feedgnuplot --domain --points --stream 0.01

Related

How to plot data from file from specific lines start at line with some special string

I am trying to execute command similar to
plot "data.asc" every ::Q::Q+1500 using 2 with lines
But i have problem with that "Q" number. Its not a well known value but number of line with some specific string. Lets say i have line with string "SET_10:" and then i have my data to plot after this specific line. Is there some way how to identify the number of that line with specific string?
An easy way is to pass the data through GNU sed to print just the wanted lines:
plot "< sed -n <data.asc '/^SET_10:/,+1500{/^SET_10:/d;p}'" using 1:2 with lines
The -n stops any output, the a,b says between which lines to do the {...} commands, and those commands say to delete the trigger line, and p print the others.
To make sure you have a compatible GNU sed try the command on its own, for a short number of lines, eg 5:
sed -n <data.asc '/^SET_10:/,+5{/^SET_10:/d;p}'
If this does not output the first 5 lines of your data, an alternative is to use awk, as it is too difficult in sed to count lines without this GNU-specific syntax. Test the (standard POSIX, not GNU-specific) awk equivalent:
awk <data.asc 'end!=0 && NR<=end{print} /^start/{end=NR+5}'
and if that is ok, use it in gnuplot as
plot "< awk <data.asc 'end!=0 && NR<=end{print} /^start/{end=NR+1500}'" using 1:2 with lines
Here's a version entirely within gnuplot, with no external commands needed. I tested this on gnuplot 5.0 patchlevel 3 using the following bash commands to create a simple dataset of 20 lines of which only 5 lines are to be printed from the line with "start" in column 1. You don't need to do this.
for i in $(seq 1 20)
do let j=i%2
echo "$i $j"
done >data.asc
sed -i data.asc -e '5a\
start'
The actual gnuplot uses a variable endlno initially set to NaN (not-a-number) and a function f which takes 3 parameters: a boolean start saying if column 1 has the matching string, lno the current linenumber, and the current column 1 value val. If the linenumber is less-than-or-equal-to the ending line number (and therefore it is not still NaN), f returns val, else if the start condition is true the wanted ending line number is set in variable endlno and NaN is returned. If we have not yet seen the start, NaN is returned.
gnuplot -persist <<\!
endlno=NaN
f(start,lno,val) = ((lno<=endlno)?val:(start? (endlno=lno+5,NaN) : NaN))
plot "data.asc" using (f(stringcolumn(1)eq "start", $0, $1)):2 with lines
!
Since gnuplot does not plot points with NaN values, we ignore lines upto the start, and again after the wanted number of lines.
In your case you need to change 5 to 1500 and "start" to "SET_10:".

Using sed or awk (or similar) incrementally or with a loop to do deletions in data file based on lines and position numbers given in another text file

I am looking to do deletions in a data file at specific positions in specific lines, based on a list in a separate text file, and have been struggling to get my head around it.
I'm working in cygwin, and have a (generally large) data file (data_file) to do the deletions in, and a tab-delimited text file (coords_file) listing the relevant line numbers in column 2 and the matching position numbers for each of those lines in column 3.
Effectively, I think I'm trying to do something similar to the following incomplete sed command, where coords_file$2 represents the line number taken from the 2nd column of coords_file and coords_file$3 represents the position in that line to delete from.
sed -r 's coords_file$2/(.{coords_file$3}).*/\1/' datafile
I'm wondering if there's a way to include a loop or iteration so that sed runs first using the values in the first row of coords_file to fill in the relevant line and position coordinates, and then runs again using the values from the second row, etc. for all the rows in coords_file? Or if there's another approach, e.g. using awk to achieve the same result?
e.g. for awk, I identified these coordinates based on string matches using this really handy awk command from Ed Morton's response to this question: line and string position of grep match.
awk 'NR==FNR{strings[$0]; next} {for (string in strings) if ( (idx = index($0,string)) > 0 ) print string, FNR, idx }' strings.txt data_file > coords_file.txt
Was thinking potentially something similar could work doing an in-place deletion rather than just finding the lines, such as incorporating a simple find and replace like {if($0=="somehow_reference_coords_file_values_here"){$0=""}. But it's a bit beyond me (am a coding novice, so I barely understand how that original command is actually working, let alone how to mod it).
File examples
data_file
#vandelay.1
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.2
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.3
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
coords_file (tab-delimited)
(column 1 is just the string that was matched, column 2 is the line number it matched in, and column 3 is the position number of the match).
stringID 2 20
stringID 4 20
stringID 10 27
stringID 12 27
Desired result:
#vandelay.1
blablablablablablab
+
mehmehmehmehmehmehm
#vandelay.2
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.3
blablablablablablablablabl
+
mehmehmehmehmehmehmehmehme
Any guidance would be much appreciated thanks! (And as I mentioned, I'm very new to this coding scene, so apologies if some of that doesn't make sense or my question format's shonky (or if the question itself is rudimentary)).
Cheers.
(Incidentally, this has all been a massive work around to delete strings identified in the blablabla lines of data_file as well as the same positions 2 lines below (i.e. the mehmehmeh lines), since the mehmehmeh characters are quality scores that match the blablabla characters for each sample (each #vandelay.xx). i.e. Essentially this: sed -i 's/string.*//' datafile, but also running the same deletion 2 lines below every time it identifies the string. So if there's actually an easier script to do just that instead of all the stuff in the question above, please let me know!)
You can simply use one liner awk to do that,
$ awk 'NR==FNR{a[$2]=$3;next} (FNR in a){$0=substr($0,0,a[FNR]-1)}1' coords_file data_file
#vandelay.1
blablablablablablab
+
mehmehmehmehmehmehm
#vandelay.2
blablablablablablablablablablablabla
+
mehmehmehmehmehmehmehmehmehmehmehmeh
#vandelay.3
blablablablablablablablabl
+
mehmehmehmehmehmehmehmehme
Brief explanation,
NR==FNR{a[$2]=$3;next}: create the line number and the matching position map in array a. This part of expression would only process coords_file because of NR==FNR
(FNR in a): then awk would start to process data_file. Use the expression to search any FNR contained in array a.
$0=substr($0,0,a[FNR]-1): re-assign the $0 to the line be cut.
1: print all lines

Count occurence of unique value in 2nd field using awk

I'm using this syntax to count occurrence of unique values in 2nd field of the file. Can somebody explain how does this work. How is Unix calculating this count ? Is it reading each line or whole file as one.. how is it assigning count and incrementing it?
Command:
awk -F: '{a[$2]++} END {for ( i in a) { print i,a[i]}}' inputfile
It's not Unix calculating but awk; awk is not Unix or shell, it's a language. Presented awk program calculates how many times each unique value in the second field ($2. separated by :) occurs and outputs the values and related counts.
awk -F: ' # set the field separator to ":"
{
# awk reads in records or lines in a loop
a[$2]++ # here it hashes each value to a and counts each occurrance
}
END { # after all records have been processed
for ( i in a) { # hash a is looped thru in no particular order
print i,a[i] # and value-count pairs are outputed
}
}' inputfile
If you want to learn more about awk, please read following quote (* see below) by #EdMorton: The best source of all awk information is the book Effective Awk Programming, 4th Edition, by Arnold Robbins. If you have any other book, throw it away, and if you're trying to learn from a web site - don't as most of them are full of complete nonsense. Just get the book.
*) Now go read the book.
Edit How a[$2]++ works:
Sample data and a[$2]'s value:
1 val1 # a[$2]++ causes: a["val1"] = 1
2 val2 # a[$2]++ causes: a["val2"] = 1
3 val1 # a[$2]++ causes: a["val1"] = 2
4 val1 # a[$2]++ causes: a["val1"] = 3

Measuring function execution time and time of completion by default in R

Measuring function execution time in R is simple but pollutes the code.
t0 <- Sys.time()
my_function()
t1 <- Sys.time()
t1-t0
Is there some package or setting in R that makes it record the execution time (duration) and the time of completion and print that to the screen after the function output?
In stata this can be done by the setting:
set rmsg on
After that if you run a block of code, with the following 4 commands:
clear
set obs 3
gen x=1
The output window would display:
. clear
r; t=0.00 9:10:28
. set obs 3
number of observations (_N) was 0, now 3
r; t=0.00 9:10:28
. gen x=1
r; t=0.00 9:10:28
.
end of do-file
r; t=0.00 9:10:28
Above we have execution and completion time for:
each command command. This follow the commands own output (bare in mind clear and gen have no screen output).
the hole command block. This is indicated by adding end of do-file and the time information after than.
I find this very useful when working on large datasets.
Is there a way to do this in R?
If not, would it be too complicated to create a package to implement this feature?
Have a look at the microbenchmark package. E.g.
microbenchmark::microbenchmark(my_fun(), times = 100L, unit = "ms")
summary(microbenchmark::microbenchmark(my_fun(), times = 100L, unit = "ms"))$uq
With the latter option you can access entries for further tests.
Running example:
microbenchmark::microbenchmark(factorial(100), times = 100L, unit = "us")

Can't concatenate netCDF files with ncrcat

I am looping over a model that outputs daily netcdf files. I have a 7-year time series of daily files that, ideally, I would like to append into a single file at the end of each loop but it seems that, using nco tools, the best way to merge the data into one file is to concatenate. Each daily file is called test.t.nc and is renamed as the date of the daily file e.g. 20070102.nc, except the first one that I create with
ncks -O --mk_rec_dmn time test.t.nc 2007-01-01.nc
to make time the record dimension for concatenation. If I try to concatenate the first two files such as
ncrcat -O -h 2007-01-01.nc 2007-01-02.nc out.nc
I get the error message
ncrcat: symbol lookup error: /usr/local/lib/libudunits2.so.0: undefined symbol: XML_ParserCreate
I don't understand what this means and, looking at all the help online, ncrcat should be a straightforward process. Does anyway understand what's happening?
Just in case this helps, the ncdump -h for 20070101.nc is
netcdf \20070101 {
dimensions:
time = UNLIMITED ; // (8 currently)
y = 1 ;
x = 1 ;
tile = 9 ;
soil = 4 ;
nt = 2 ;
and 20070102.nc
netcdf \20070102 {
dimensions:
x = 1 ;
y = 1 ;
tile = 9 ;
soil = 4 ;
time = UNLIMITED ; // (8 currently)
nt = 2 ;
This is part of a bigger shell script and I don't have much flexibility over the naming of files - just in case this matters!

Resources