Plotting data from two files after performing mathematical operation between them - plot

I have
FileSFC1: contains certain data
FileSFC2: contains some other data
Now what I need to do is divide the second column of FileSFC1 with the second column of FileSFC2 and then plot this result. So something of the form:
plot ( FileSFC1 using 1:1 / FileSFC2 using 1:1 ) * 100
So basically the plot would be a percentage of the columns in the two files. Please help.

Gnuplot can only manipulate columns of data that are from the same 'file' or data stream. What you can do is use the plot '< bash command' construction. When the argument to plot starts with < what happens is the rest of the argument is interpreted as a bash command, and the output of that command is what is plotted. So:
plot '< paste FileSFC1 FileSFC2' u (100*$2/$4)
This assumes that both files have two columns, and you want to plot the percentage of the 2nd column in each file. To perform manipulations on data columns, the syntax is to enclose the argument to using in parentheses and prefix column numbers with a dollar sign.

Andyras' answer is the one you are looking for.
I just wanted to add that with this construct, you can't use bash's process substitution<(, which is the shame because in most of those cases, you have to do additional stream precessing (grepping or removing lines, etc.)
For this, you need the answer to this question Gnuplot and Bash process-substitution, which I'll briefly exemplify here :
# Will concatenante FileSFC1 with the first 200 lines of FileSFC2
plot '< exec bash -c "paste <(cat FileSFC1) <(head -n 200 FileSFC2)"' u (100*$2/$4)

Related

I want to get the unique data from the file removing duplicates

Iam trying to get unique data from the file containing duplicates.
here is the file sample.txt with the data as below.
01|128
01|132
02|124
02|258
03|858
03|788
04|418
04|129
05|328
05|398
i want to get only unique data based on the column ie only one entry for 01|,
02|, 03|,04|,05|
grep -m1
grep: illegal option -- m
i tried using m1 option but it doesn't support
I/P:sample.txt
01|128
01|132
02|124
02|258
03|858
03|788
04|418
04|129
05|328
05|398
Expected O/P
01|128
02|124
03|858
04|418
05|328
That can be done with:
sort -nu
-n says to use numeric comparison (so 01|... will be treated as 1, 02|... as 2, etc). -u only outputs the first line of a run of equal elements.

Plotting a dataset with condition over a column in gnuplot

I would like to plot the following dataset.
+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+----+----+----+----+--------------------+--------------------+------------------+--------------------+-----------------+------+
| time_stamp_0|sender_ip_1|receiver_ip_2|count|rank| xi| pi| r| ip5| ip4| ip3| ip2| variance| entropy| pre_chi_square| chi_square| total_chi_square|attack|
+---------------+-----------+-------------+-----+----+----+--------------------+-------------------+----+----+----+----+--------------------+--------------------+------------------+--------------------+-----------------+------+
|10:37:37.000985| 10.0.0.3| 10.0.0.1| 9345| 1|1796|1.070090957731407...|0.19218833600856072|1211|1157|4812|1796|6.982982177427692E-5|9.783410080138751E-4|3.3954177346722574|0.001890544395697248|13.58167093868903| 1|
|10:37:37.000995| 10.0.0.3| 10.0.0.1| 9345| 2|1796|2.140181915462814...|0.19218833600856072|1211|1157|4812|1796|3.497253089848578...|0.001808335909968907| 17.00510593066335|0.009468321787674473|13.58167093868903| 1|
|10:37:37.001002| 10.0.0.2| 10.0.0.1| 9345| 3|1796|3.210272873194221...|0.19218833600856072|1211|1157|4812|1796|8.436389877417202E-4| 0.00258233850119472|41.021252923981834|0.022840341271704808|13.58167093868903| 1|
I need to have a plot that shows me the "rank" over the "time_stamp_0" only for sender_ip_1="10.0.0.3".
I have the following code:
set timefmt '%H:%M:%S'
set xdata time
set format x '%H:%M:%S'
//I have a problem with the below code
plot "test.txt" using 1:($2=="10.0.0.3"?$5:1/0)
However the plotted graph is not correct.
In fact, it seems that, no filtering applies on the data and the graph is as same as the graph without filtering!
I should mention that, the dataframe is inside a file (test.txt) and it does't have any header.
Can you please help me?
Use eq for string equality checking and strcol to get the string value of a column:
plot "test.txt" using 1:(strcol(2) eq "10.0.0.3" ? $5 : 1/0)
You are running into two problems:
The string-equality operator in Gnuplot is eq, not ==.
Data extracted for plotting is not of the string type (I assume it’s a float), so you cannot apply string operations to it.
I don’t see a way to solve the second problem from within Gnuplot. You can however pipe everything through something like AWK before plotting to handle the condition for you:
plot "<awk '{print $1, ($2==\"10.0.0.3\" ? $5 : \"nan\")}' test.dat" u 1:2
(Note that you still have to take care of your ASCII table formatting, e.g., by removing all | characters via SED.)

Replacing a specific part

I have a list like this:
DEL075MD1BWP30P140LVT
AN2D4BWP30P140LVT
INVD0P7BWP40P140
IND2D6BWP30P140LVT
I want to replace everything in between D and BWP with a *
How can I do that in unix and tcl
Do you have the whole list available at the same time, or are you getting one item at a time from somewhere?
Should all D-BWP groups be processed, or just one per item?
If just one per item, should it be the first or last (those are the easiest alternatives)?
Tcl REs don't have any lookbehind, which would have been nice here. But you can do without both lookbehinds and lookaheads if you capture the goalpost and paste them into the replacement as back references. The regular expression for the text between the goalposts should be [^DB]+, i.e. one or more of any text that doesn't include D or B (to make sure the match doesn't escape the goalposts and stick to other Ds or Bs in the text). So: {(D)[^DB]+(BWP)} (braces around the RE is usually a good idea).
If you have the whole list and want to process all groups, try this:
set result [regsub -all {(D)[^DB]+(BWP)} $lines {\1*\2}]
(If you can only work with one line at a time, it's basically the same, you just use a variable for a single line instead of a variable for the whole list. In the following examples, I use lmap to generate individual lines, which means I need to have the whole list anyway; this is just an example.)
Process just the first group in each line:
set result [lmap line $lines {
regsub {(D)[^DB]+(BWP)} $line {\1*\2}
}]
Process just the last group in each line:
set result [lmap line $lines {
regsub {(D)[^DB]+(BWP[^D]*)$} $line {\1*\2}
}]
The {(D)[^DB]+(BWP[^D]*)$} RE extends the right goalpost to ensure that there is no D (and hence possibly a new group) anywhere between the goalpost and the end of the string.
Documentation:
lmap (for Tcl 8.5),
lmap,
regsub,
set,
Syntax of Tcl regular expressions

.ksh paste user input value into dataset

Good morning.
First things first: I know next to nothing about shell scripting in Unix, so please pardon my naivety.
Here's what I'd like to do, and I think it's relatively simple: I would like to create a .ksh file to do two things: 1) take a user-provided numerical value (argument) and paste it into a new column at the end of a dataset (a separate .txt file), and 2) execute a different .ksh script.
I envision calling this script at the Unix prompt, with the input value added thereafter. Something like, "paste_and_run.ksh 58", where 58 would populate a new, final (un-headered) column in an existing dataset (specifically, it'd populate the 77th column).
To be perfectly honest, I'm not even sure where to start with this, so any input would be very appreciated. Apologies for the lack of code within the question. Please let me know if I can offer any more detail, and thank you for taking a look.
I have found the answer: the "nawk" command.
TheNumber=$3
PE_Infile=$1
Where the above variables correspond to the third and first arguments from the command line, respectively. "PE_Infile" represents the file (with full path) to be manipulated, and "TheNumber" represents the number to populate the final column. Then:
nawk -F"|" -v TheNewNumber=$TheNumber '{print $0 "|" TheNewNumber/10000}' $PE_Infile > $BinFolder/Temp_Input.txt
Here, the -F"|" dictates the delimiter, and the -v dictates what is to be added. For reasons unknown to myself, the declaration of a new varible (TheNewNumber) was necessary to perform the arithmetic manipulation within the print statement. print $0 means that the whole line would be printed, while tacking the "|" symbol and the value of the command line input divided by 10000 to the end. Finally, we have the input file and an output file (Temp_PE_Input.txt, within a path represented by the $Binfolder variable).
Running the desired script afterward was as simple as typing out the script name (with path), and adding corresponding arguments ($2 $3) afterward as needed, each separated by a space.

How can I insert a column in numeric comma separated input?

Hi i have as text file below
input
326783,326784,402
326783,0326784,402
503534,503535,403
503534,0503535,403
429759,429758,404
429759,0429758,404
409626,409627,405
409626,0409627,405
369917,369916,402
369917,0369916,403
i want to convert it like below
condition :
1)input file column 3 and column 1 should be be same for 326784 and 0326784 and like that so on
2)if it different like the above input file last case then it should be printed in last line
output should be
326783,326784,0326784,402
503534,503535,0503535,403
429759,429758,0429758,404
409626,409627,0409627,405
369917,369916,402
369917,0369916,403
i am using solaris platform
please help me
I don't understand the logic of your computation, but some general advice: the unix tool awk can do such computations. It understands comma-separated files and you can get it to output other comma-separated files, manipulated by your logic (which you'll have to express in awk syntax).
This is, as I understand it, the unix way to do it.
The way I'd do it (being a non-expert on awk and just mentioning it for completeness ;) would be to write a little python script.
you want to
open an input and an output file
get each line from the input file
parse the integers
perform your logic
write integers to your output file
unchecked python-like code:
f_in = open("input", "r")
f_out = open("output", "w")
for line in f_in.readlines():
ints = [int(x) for x in line.split(",")]
f_out.write("%d, %d, %d\n" % (ints[0], ints[1], ints[0]+ints[1]))
f_in.close()
f_out.close()
Here, the logic is in the f_out.write(...) line (this example would output the first, the second and the sum of both input integers)
You can check if you have a Python interpreter at hand by simply typing python and seeing what happens. If you have, save your code into something.py and start it with "python something.py"

Resources