Plotting a dataset with condition over a column in gnuplot

Plotting a dataset with condition over a column in gnuplot - plot

I would like to plot the following dataset.
+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+----+----+----+----+--------------------+--------------------+------------------+--------------------+-----------------+------+
| time_stamp_0|sender_ip_1|receiver_ip_2|count|rank| xi| pi| r| ip5| ip4| ip3| ip2| variance| entropy| pre_chi_square| chi_square| total_chi_square|attack|
+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+----+----+----+----+--------------------+--------------------+------------------+--------------------+-----------------+------+
|10:37:37.000985| 10.0.0.3| 10.0.0.1| 9345| 1|1796|1.070090957731407...|0.19218833600856072|1211|1157|4812|1796|6.982982177427692E-5|9.783410080138751E-4|3.3954177346722574|0.001890544395697248|13.58167093868903| 1|
|10:37:37.000995| 10.0.0.3| 10.0.0.1| 9345| 2|1796|2.140181915462814...|0.19218833600856072|1211|1157|4812|1796|3.497253089848578...|0.001808335909968907| 17.00510593066335|0.009468321787674473|13.58167093868903| 1|
|10:37:37.001002| 10.0.0.2| 10.0.0.1| 9345| 3|1796|3.210272873194221...|0.19218833600856072|1211|1157|4812|1796|8.436389877417202E-4| 0.00258233850119472|41.021252923981834|0.022840341271704808|13.58167093868903| 1|
I need to have a plot that shows me the "rank" over the "time_stamp_0" only for sender_ip_1="10.0.0.3".
I have the following code:
set timefmt '%H:%M:%S'
set xdata time
set format x '%H:%M:%S'
//I have a problem with the below code
plot "test.txt" using 1:($2=="10.0.0.3"?$5:1/0)
However the plotted graph is not correct.
In fact, it seems that, no filtering applies on the data and the graph is as same as the graph without filtering!
I should mention that, the dataframe is inside a file (test.txt) and it does't have any header.
Can you please help me?

Use eq for string equality checking and strcol to get the string value of a column:
plot "test.txt" using 1:(strcol(2) eq "10.0.0.3" ? $5 : 1/0)

You are running into two problems:
The string-equality operator in Gnuplot is eq, not ==.
Data extracted for plotting is not of the string type (I assume it’s a float), so you cannot apply string operations to it.
I don’t see a way to solve the second problem from within Gnuplot. You can however pipe everything through something like AWK before plotting to handle the condition for you:
plot "<awk '{print $1, ($2==\"10.0.0.3\" ? $5 : \"nan\")}' test.dat" u 1:2
(Note that you still have to take care of your ASCII table formatting, e.g., by removing all | characters via SED.)

Related

R: Create dataframe from paste0 content

I am manually creating a "free text" table using cat and paste0 like so:
tab < - cat(paste0("Stage","\t","Number,"\n",
"A","\t",nrow(df[df$stage == "A",]),"\n",
"B","\t",nrow(df[df$stage == "B",]),"\n"
))
i.e.
Stage Number
A 54
B 85
where I want to be able to create a publication ready table (i.e. looks good and probably generated by r markdown.
The xtable() function can do this, but only accepts a dataframe. So my question is how to I get some free text, delimited by column using "\t" and by rows "\n" into a dataframe?
I have tried:
data.frame(do.call(rbind,strsplit(as.character(tab),'\t')))
But get "dataframe with zero columns and zero rows". I think this has to do with the fact I am not declaring "\" to be a new line.
By the way, if this way seems long-winded and there is an easier way, I am happy to take suggestions.

Replacing a specific part

I have a list like this:
DEL075MD1BWP30P140LVT
AN2D4BWP30P140LVT
INVD0P7BWP40P140
IND2D6BWP30P140LVT
I want to replace everything in between D and BWP with a *
How can I do that in unix and tcl

Do you have the whole list available at the same time, or are you getting one item at a time from somewhere?
Should all D-BWP groups be processed, or just one per item?
If just one per item, should it be the first or last (those are the easiest alternatives)?
Tcl REs don't have any lookbehind, which would have been nice here. But you can do without both lookbehinds and lookaheads if you capture the goalpost and paste them into the replacement as back references. The regular expression for the text between the goalposts should be [^DB]+, i.e. one or more of any text that doesn't include D or B (to make sure the match doesn't escape the goalposts and stick to other Ds or Bs in the text). So: {(D)[^DB]+(BWP)} (braces around the RE is usually a good idea).
If you have the whole list and want to process all groups, try this:
set result [regsub -all {(D)[^DB]+(BWP)} $lines {\1*\2}]
(If you can only work with one line at a time, it's basically the same, you just use a variable for a single line instead of a variable for the whole list. In the following examples, I use lmap to generate individual lines, which means I need to have the whole list anyway; this is just an example.)
Process just the first group in each line:
set result [lmap line $lines {
regsub {(D)[^DB]+(BWP)} $line {\1*\2}
}]
Process just the last group in each line:
set result [lmap line $lines {
regsub {(D)[^DB]+(BWP[^D]*)$} $line {\1*\2}
}]
The {(D)[^DB]+(BWP[^D]*)$} RE extends the right goalpost to ensure that there is no D (and hence possibly a new group) anywhere between the goalpost and the end of the string.
Documentation:
lmap (for Tcl 8.5),
lmap,
regsub,
set,
Syntax of Tcl regular expressions

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.

I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

not able to understand NAWK use

I found a command which takes the input data from a binary file and writes into a output file.
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg
It's working but I am not able to find out how...Could anyone please help me out how above command is working and what is it doing?...this nawk is too tough to understand...:(
Thanks in advance......

nawk is not tough to understand and is same like other languages, I guess you are not able to understand it because it not properly formatted, if you format it you will know how it's working.
To answer your question this command is searching lines containing an input text in given input file, and prints few lines before matched line(s) and few lines after the matched line. How many lines to be printed are controlled by variable "b" (no of lines before) and "a" (no of lines after) and string/text to be searched is passed using variable "s".
This command will be helpful in debugging/troubleshooting where one want to extract lines from large size log files (difficult to open in vi or other editor on UNIX/LINUX) by searching some error text and print some lines above it and some line after it.
So in your command
b=1 ## means print only 1 line before the matching line
a=19 ## means print 19 lines after the matching line
s="<Comment>Ericsson_OCS_V1_0.0.0.7" ## means search for this string
/var/opt/fds/config/ServiceConfig/ServiceConfig.cfg ## search in this file
/opt/temp/"$circle"_"$sdpid"_RG.cfg ## store the output in this file
Your formatted command is below, the very first condition which was looking like c-->0 before format is easy to interpret which means c-- greater than 0. NR variable in AWK gives the line number of presently processing line in input file being processed.
nawk '
c-- > 0;
$0 ~ s
{
if(b)
for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];
print;
c=a
}
b
{
r[NR%b]=$0
}' b=1 a=19 s="<Comment>Ericsson_OCS_V1_0.0.0.7" /var/opt/fds/config/ServiceConfig/ServiceConfig.cfg > /opt/temp/"$circle"_"$sdpid"_RG.cfg

Plotting data from two files after performing mathematical operation between them

I have
FileSFC1: contains certain data
FileSFC2: contains some other data
Now what I need to do is divide the second column of FileSFC1 with the second column of FileSFC2 and then plot this result. So something of the form:
plot ( FileSFC1 using 1:1 / FileSFC2 using 1:1 ) * 100
So basically the plot would be a percentage of the columns in the two files. Please help.

Gnuplot can only manipulate columns of data that are from the same 'file' or data stream. What you can do is use the plot '< bash command' construction. When the argument to plot starts with < what happens is the rest of the argument is interpreted as a bash command, and the output of that command is what is plotted. So:
plot '< paste FileSFC1 FileSFC2' u (100*$2/$4)
This assumes that both files have two columns, and you want to plot the percentage of the 2nd column in each file. To perform manipulations on data columns, the syntax is to enclose the argument to using in parentheses and prefix column numbers with a dollar sign.

Andyras' answer is the one you are looking for.
I just wanted to add that with this construct, you can't use bash's process substitution<(, which is the shame because in most of those cases, you have to do additional stream precessing (grepping or removing lines, etc.)
For this, you need the answer to this question Gnuplot and Bash process-substitution, which I'll briefly exemplify here :
# Will concatenante FileSFC1 with the first 200 lines of FileSFC2
plot '< exec bash -c "paste <(cat FileSFC1) <(head -n 200 FileSFC2)"' u (100*$2/$4)