Extract values from a file and average - r

I have a file which from which I'd like to extract two values (Time, C_F[6]) highlighted below. Its in a CentOS 7 environment so can use bash or gnuplot or r. I'm not even sure how to google that (e.g. extract values from file bash doesn't really come up with solutions). Is it possile?
I'd like to be able to:
plot Time vs C_F[6]
Average C_F[6]
EDIT 1:
I think this might be on the lines, but it reproduces the whole file
sed 's/^.*C_F[6]=//' C_F.pressure > outputfile
EDIT 2:
Extract of the file:
/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 3.0.0 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 3.0.0-6abec57f5449
Exec : patchAverage p C_F -parallel
Date : Apr 15 2017
Time : 15:01:20
Host : "login2.jjj.uk"
PID : 59764
Case : /nobackup/jjjj/Silsoe/Solid/solid_0_LES/motorBikeLES
nProcs : 8
Slaves :
7
(
"login2.jjjj.59765"
"login2.jjjj.59766"
"login2.jjjj.59767"
"login2.jjjj.59768"
"login2.jjjj.59769"
"login2.jjjj.59770"
"login2.jjjj.59771"
)
Pstream initialized with:
floatTransfer : 0
nProcsSimpleSum : 0
commsType : nonBlocking
polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Allowing user-supplied system call operations
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time
Create mesh for time = 0.18
Time = 0.18
Reading volScalarField p
Average of volScalarField over patch C_F[6] = -18.3176
Time = 0.19
Reading volScalarField p
Average of volScalarField over patch C_F[6] = -18.299
Time = 0.2
Reading volScalarField p
Average of volScalarField over patch C_F[6] = -18.2704
Time = 0.21
Reading volScalarField p
Average of volScalarField over patch C_F[6] = -18.2349

Here's a crude way to do things:
# extract text from file line by line; will be indexed by line
sample <- readLines("D:\\tempFiles/example.txt")
# index the lines contaning "Time = "
timeI <- grep(x = sample, pattern = "Time = ")
# index the lines contaning "C_F[6]"; note that \\ is escape for [ and ]
C_FI <- grep(x = sample, pattern = "C_F\\[6\\]")
# extract lines and clean them
# note that these lines only contain "Time = values"; so just remove the "Time = "
timeval <-as.numeric(gsub(x = sample[timeI], pattern = "Time = ", replacement = ""))
# extract lines and clean them
# note that gsub removes all characters from te start (^) until "= "
C_FIval <- as.numeric(gsub(x = sample[C_FI], pattern = "^.*= ", ""))
# plot timve vs CF[6]
plot(y = timeval, x = C_FIval )
# get the mean
mean(C_FIval)
There are more elegant ways for the regex, but I'm still finding my way through that. This should be a basic way.

Since the OP tagged also gnuplot, here is a platform independent gnuplot-only solution.
How it's done:
standard datafile separator is whitespace
the function getTime() will check if the first string column is equal to "Time" and at the same time the second stringcolumn must be '=' (because you have Time in the first column in the header as well). If this is true then column 3 is the time and you memorize this value in the variable t0.
the function getValue() will check if the first string column is equal to "Average" and at the same time the 6th stringcolumn must be "C_F[6]". If this is true then column 8 is the value and memorize this value in y0, sum it up in ySum and increase the counter c by 1. If it is false, the function's return value will be NaN and nothing will be plotted. Note, the function has to check the first column because in case there is no 6th column the check will fail.
calculate the average by yAvg=ySum/c and plot and print it into the graph.
You might notice that the plotted datapoints in the first plot are not connected although the plotting style with linespoints was used. The reason is that there are empty lines in the input file and gnuplot will interrupt curves at empty lines.
Hence, in case you want connected lines you have to remove these empty lines which you can do by plotting the file each line as a whole (set datafile separator "\n") into a datablock table (with table). This requires gnuplot>=5.2.0. Furthermore, by using set datafile missing NaN gnuplot will not interrupt lines at NaN values.
This extraction can easily be adapted to any other input data format.
Data: Save the OP's data example as SO43427046.dat
Script: (the first solution works with gnuplot>4.4.4, Nov. 2011 and the second solution with gnuplot>=5.2.0, Sept. 2017)
### extract specific data from a file
reset
FILE = "SO/SO43427046.dat"
getTime(col1,col2,col3) = strcol(col1) eq "Time" && strcol(col2) eq "=" ? t0=column(col3) : t0
getValue(col1,col2,col3) = strcol(col1) eq "Average" && strcol(col2) eq "C_F[6]" ? \
(y0=column(col3),ySum=ySum+y0,c=c+1,y0) : NaN
set key top left
set ytics 0.02
set multiplot layout 2,1
ySum = c = 0
t0 = y0 = NaN
plot FILE u (getTime(1,2,3)):(getValue(1,6,8)) \
w lp pt 7 lc rgb "red" ti "unconnected points", \
(yAvg=ySum/c) w l lc rgb "blue" ti sprintf("Average: %g",yAvg)
set table $Data
set datafile separator "\n"
plot FILE u (strcol(1)) w table
set datafile separator whitespace
unset table
set datafile missing NaN
ySum = c = 0
t0 = y0 = NaN
plot $Data u (getTime(1,2,3)):(getValue(1,6,8)) w lp pt 7 lc rgb "red" ti "linespoints", \
(yAvg=ySum/c) w l lc rgb "blue" ti sprintf("Average: %g",yAvg)
unset multiplot
### end of script
Result:

Related

Computing numerical derivative with gnuplot

I've been trying to compute numerically the derivative using gnuplot, using the scripts in this other discussion, even with the same data file. However I keep getting this error:
gnuplot> d(y) = ($0 == 0) ? (y1 = y, 1/0) : (y2 = y1, y1 = y, y1-y2)
^
"prova.g", line 7: ')' expected
I don't know what to do here. Any help?
Here is an example for numerical derivatives from my collection. Requires gnuplot >=5.0 and with the use of files instead of datablocks (and probably with some tweaking with gnuplot>=4.4.0).
Script: (works with gnuplot>=5.0.0, Jan. 2015)
### numerical derivatives
reset session
# create some data
MyFunction = "sin(x)/x"
set table $Data
set samples 150
plot [-10:10] '+' u 1:(#MyFunction) w table
unset table
DerivX(colX) = (x0=x1,x1=column(colX),(x0+x1)/2.)
DerivY(colY) = (y0=y1,y1=column(colY),(y1-y0)/(x1-x0))
set table $Deriv1
plot x1=y1=NaN $Data u (DerivX(1)):(DerivY(2)) w table
unset table
set table $Deriv2
plot x1=y1=NaN $Deriv1 u (DerivX(1)):(DerivY(2)) w table
unset table
set table $Deriv3
plot x1=y1=NaN $Deriv2 u (DerivX(1)):(DerivY(2)) w table
unset table
plot $Data u 1:2 w l lc rgb "red" ti MyFunction, \
$Deriv1 u 1:2 w l lc rgb "web-green" ti "1st Derivative", \
$Deriv2 u 1:2 w l lc rgb "blue" ti "2nd Derivative", \
$Deriv3 u 1:2 w l lc rgb "magenta" ti "3rd Derivative"
### end of script
Result:
Addition: (version for gnuplot 4.2.6, Sept. 2009)
gnuplot 4.2.6 doesn't have datablocks and serial evaluation, but here is a cumbersome workaround without these features.
for illustration, the script creates a data file SO68198576.dat (you already have your input file)
plot the data file into another file TEMP1 skipping the first data line
merge the files line by line into another file TEMP2 using the system command paste (either on Linux already on the system or on Windows you have to install, e.g. CoreUtils from GnuWin).
now you can calculate dx and dy between two successive datapoints from column 1 and 4 and column 2 and 5, respectively.
since the files have different length, the last line(s) should be skipped. This can be done by the system command head -n -2.
That's how TEMP2 looks like:
#Curve 0 of 1, 150 points #Curve 0 of 1, 150 points
#x y type #x y type
-10 -0.0544021 i -9.86577 -0.0432646 i
-9.86577 -0.0432646 i -9.73154 -0.0310307 i
-9.73154 -0.0310307 i -9.59732 -0.0178886 i
...
Script: (works with gnuplot 4.2.6, requires system commands paste and head)
### numerical derivative for gnuplot 4.2.6
reset
FILE = "SO68198576.dat"
set table FILE
set samples 150
plot sin(x)/x
unset table
TEMP1 = FILE.'1'
TEMP2 = FILE.'2'
set table TEMP1
plot FILE u 1:2 every ::1
unset table
system(sprintf('paste %s %s > %s', FILE, TEMP1, TEMP2))
system(sprintf('head -n -2 %s > %s',TEMP2, TEMP1))
x0(col) = (column(col)+column(col+3))/2.
dx(col) = (column(col+3)-column(col))/2.
dy(col) = (column(col+3)-column(col))/2.
plot FILE u 1:2 w lp pt 7 title "Data", \
TEMP1 u (x0(1)):(dy(2)/dx(1)) w lp pt 7 title "1st Derivative"
### end of script
Result: (screenshot gnuplot 4.2.6)

Giving Value to Python3 Count String

I am running a loop equation that divides or subtracts. Every time it divides I want it to represent each division with the # 2 and every time it subtracts I want it to represent the subtractions with a 1. I then want that count string to be a value that I can manipulate with some basic math. Basically it'll look like this: 20/2 = 10 (2) 10/2 = 5 (2) 5/2 = 2.5 (2) 2.5-.5 = 2 (1) 2/2 = 1 (2)
22212 <=== that I want to make a new value but with the way I have it coded, it's not working. I think it may have something to do with the end='' in the code.
I've tried giving the value of the string = to a int value and tried joining the string but no luck so far.
num = 20
while num >= 1.5:
num /= 2
v = 1
print(v, end='')
if int(num) != num:
num -= .5
v = 2
print(v, end='') #trying to make the output here a value
nv = ''.join(str(int(v)))
nv = int(v) #trying to give the joined strs of nv a value
print(nv) #trying to get this to print the combined valued of v to something that math can be applied to.
print('')
The code doesn't give any errors I just can't figure out how to make the output and actual number that I can manipulate.
you are printing v = 1 after your division. In your post you said you want a 2 for division, I am assuming what you wrote in the post is the result you want.
a = ""
num = 20
while num >= 1.5:
num /= 2
a += "2"
if int(num) != num:
num -= .5
a += "1"
print(a)
now a is a string with your desired result. You can always convert that String to an int to do some math with it.

Plotting a chain of spheres with gnuplot

From a function in C++ I get in a file the coordinates of the centers of a chain of spheres (of constant radius r). I would like to plot this chain with gnuplot. How can I represent the spheres with the true radius? This solution actually does not work, since the unit of pointsize is not the same as that of the axis (and is also changing with the axis limits).
This a slightly dirty solution which uses parametric (and some commands from Unix). For each line of the following data, we will plot a sphere with radius r, and centered at (x,y,z):
# points.dat :
# x y z radius
0 0 0 0.5
1 2 2 1.0
3 4 5 0.7
2 5 7 1.0
1 3 4 0.75
2 0 1 1.5
In other words, we will run commands with the form:
splot x1+r1*cos(v)*cos(u), y1+r1*cos(v)*sin(u), z1+r1*sin(v) title "line 1",\
x2+r2*cos(v)*cos(u), y2+r2*cos(v)*sin(u), z2+r2*sin(v) title "line 2", ...
The following code will do the trick (comments through the script):
set view equal xyz # to scale the axes of the plot
set hidden3d front # draw opaque spheres
set parametric # enable parametric mode with angles (u,v)
set urange [0:2*pi]
set vrange [-pi/2.0:pi/2.0]
filename = 'spheres.dat'
# get number of data-lines in filename
nlines = system(sprintf('grep -v ^# %s | wc -l', filename))
# this will save the plot commands
commands = 'splot '
do for [i=1:nlines] {
# get the i-th line
line = system( sprintf('grep -v ^# %s | awk "NR == %i {print; exit}" ', filename, i) )
# extract the data
x = word(line,1)
y = word(line,2)
z = word(line,3)
r = word(line,4)
# and save the instructions to plot the corresponding sphere
commands = commands . sprintf('%s + %s*cos(v)*cos(u), %s + %s*cos(v)*sin(u), %s + %s*sin(v) t "line %i"', x, r, y, r, z, r, i)
# if not EOF, add a comma to commands
if(i<nlines) { commands = commands . ', ' }
}
# commands is a string. We can run it into the command line through macros
set macros
#commands
This is the output I obtain:

Calculate if trend is up, down or stable

I'm writing a VBScript that sends out a weekly email with client activity. Here is some sample data:
a b c d e f g
2,780 2,667 2,785 1,031 646 2,340 2,410
Since this is email, I don't want a chart with a trend line. I just need a simple function that returns "up", "down" or "stable" (though I doubt it will ever be perfectly stable).
I'm terrible with math so I don't even know where to begin. I've looked at a few other questions for Python or Excel but there's just not enough similarity, or I don't have the knowledge, to apply it to VBS.
My goal would be something as simple as this:
a b c d e f g trend
2,780 2,667 2,785 1,031 646 2,340 2,410 ↘
If there is some delta or percentage or other measurement I could display that would be helpful. I would also probably want to ignore outliers. For instance, the 646 above. Some of our clients are not open on the weekend.
First of all, your data is listed as
a b c d e f g
2,780 2,667 2,785 1,031 646 2,340 2,410
To get a trend line you need to assign a numerical values to the variables a, b, c, ...
To assign numerical values to it, you need to have little bit more info how data are taken. Suppose you took data a on 1st January, you can assign it any value like 0 or 1. Then you took data b ten days later, then you can assign value 10 or 11 to it. Then you took data c thirty days later, then you can assign value 30 or 31 to it. The numerical values of a, b, c, ... must be proportional to the time interval of the data taken to get the more accurate value of the trend line.
If they are taken in regular interval (which is most likely your case), lets say every 7 days, then you can assign it in regular intervals a, b, c, ... ~ 1, 2, 3, ... Beginning point is entirely your choice choose something that makes it very easy. It does not matter on your final calculation.
Then you need to calculate the slope of the linear regression which you can find on this url from which you need to calculate the value of b with the following table.
On first column from row 2 to row 8, I have my values of a,b,c,... which I put 1,2,3, ...
On second column, I have my data.
On third column, I multiplied each cell in first column to corresponding cell in second column.
On fourth column, I squared the value of cell of first column.
On row 10, I added up the values of the above columns.
Finally use the values of row 10.
total_number_of_data*C[10] - A[10]*B[10]
b = -------------------------------------------
total_number_of_data*D[10]-square_of(A[10])
the sign of b determines what you are looking for. If it's positive, then it's up, if it's negative, then it's down, and if it's zero then stable.
This was a huge help! Here it is as a function in python
def trend_value(nums: list):
summed_nums = sum(nums)
multiplied_data = 0
summed_index = 0
squared_index = 0
for index, num in enumerate(nums):
index += 1
multiplied_data += index * num
summed_index += index
squared_index += index**2
numerator = (len(nums) * multiplied_data) - (summed_nums * summed_index)
denominator = (len(nums) * squared_index) - summed_index**2
if denominator != 0:
return numerator/denominator
else:
return 0
val = trend_value([2781, 2667, 2785, 1031, 646, 2340, 2410])
print(val) # -139.5
in python:
def get_trend(numbers):
rows = []
total_numbers = len(numbers)
currentValueNumber = 1
n = 0
while n < len(numbers):
rows.append({'row': currentValueNumber, 'number': numbers[n]})
currentValueNumber += 1
n += 1
sumLines = 0
sumNumbers = 0
sumMix = 0
squareOfs = 0
for k in rows:
sumLines += k['row']
sumNumbers += k['number']
sumMix += k['row']*k['number']
squareOfs += k['row'] ** 2
a = (total_numbers * sumMix) - (sumLines * sumNumbers)
b = (total_numbers * squareOfs) - (sumLines ** 2)
c = a/b
return c
trendValue = get_trend([2781,2667,2785,1031,646,2340,2410])
print(trendValue) # output: -139.5

gnuplot change color of the connecting lines

I am using gnuplot for the following. I have n equations which I want to plot based on the xaxis value. Here is a sample
set xrange[0:25]
f1(x) = x
f2(x) = 3*x
f3(x) = 10*x
plot (x>0)&&(x<10)?f1(x):(x<20)?f2(x):f3(x)
I know that we can set the color of the line easily by using the below. But it changes the whole color
set style line 1 lt 1 lw 3 pt 3 lc rgb "blue"
But what I want is to make the connecting lines a different color. ie if you plot the above graph you will 5 lines. 3 original lines (from the function) and 2 lines (the almost vertical lines) connecting them. I want to change the color of the connecting lines.
Note 1: These functions are automatically generated by a program, and the number of functions could be large. Even the exact plot command is automatically generated
Note 2: I want a way to differentiate my original lines with the interpolated lines which joins my original lines.
Any help is appreciated
What you actually have is one line defined piecewise, and there isn't an easy way to define colors for line segments within a piecewise line in gnuplot.
Easy way (plot a data file)
I would recommend making a data file looking like this:
# x y color
0 0 0
10 10 0
10 10 1
10 30 1
10 30 0
20 60 0
20 60 1
20 200 1
20 200 0
25 250 0
Notice the double points at x=10 and x=20. This is so the line segments meet at the transitions.
Now plot it with linecolor variable:
#!/usr/bin/env gnuplot
reset
set terminal pdfcairo enhanced color dashed rounded lw 5 size 3,2 font 'Arial,14'
set output 'output2.pdf'
set style data lines
set key top left
set tics scale 0.5 out nomirror
plot 'data.dat' u 1:2:3 lc variable
It looks like this:
You can change the palette (set palette) to determine the colors, and you can have more than 2 color values in the data file if you want.
Harder way (only OK for few segments)
You could define 2n-1 separate lines and connect them:
#!/usr/bin/env gnuplot
reset
set terminal pdfcairo enhanced color dashed rounded lw 5 size 3,2 font 'Arial,14'
set output 'output.pdf'
set style data lines
set key top left
set tics scale 0.5 out nomirror
# points every 0.001 units in the range 0:25
set samples 25001
# main lines
f1(x) = (x <= 9.999) ? x : 1/0
f3(x) = (x >= 10.001) && (x <= 19.999) ? 3*x : 1/0
f5(x) = (x >= 20.001) ? 10*x : 1/0
# define slopes and y-offsets of connecting lines
m2 = (f3(10.001)-f1(9.999))/0.002
b2 = (30.0-10.0)/2.0 + 10.0
m4 = (f5(20.001)-f3(19.999))/0.002
b4 = (200.0-60.0)/2.0 + 60.0
# connecting functions
f2(x) = (x >= 9.999) && (x <= 10.001) ? m2*(x-10) + b2 : 1/0
f4(x) = (x >= 19.999) && (x <= 20.001) ? m4*(x-20) + b4 : 1/0
plot [0:25] f1(x), f2(x), f3(x), f4(x), f5(x)
Which looks like this:
You can define a secondary function to define the breakpoints of your function, which is automatically coloring the right linepiece. The below code is easy to extend to different functions and breakpoints (i.e., you can just change x1 or x2). Adding multiple points is also straightforward.
xmin=0.
xmax=25.
x0=0.
x1=10.
x2=20.
nsample=200.
dx=(xmax-xmin)/nsample
print dx
set xrange[xmin:xmax]
set sample nsample
f1(x) = x
f2(x) = 3*x
f3(x) = 10*x
f4(x) = (x>x0)&&(x<x1)?f1(x):(x<x2)?f2(x):f3(x)
f5(x) = x
f5(x) = ( (x>x1&&x<=x1+dx) || (x>x2&&x<=x2+dx) )?1:0
set cbrange [0:1]
unset key
plot '+' using 1:(f4($1)):(f5($1)) lc variable with lines
Not that I have use the special filename '+', which just constructs a data file with equally space datapoints (following nsample).
If it is ok to skip the connecting lines, then you can use a simplified version of #andyras second variant. Just define all functions to be 1/0 when outside a specified range:
set style data lines
unset key
f1(x) = (x > 0) && (x < 10) ? x : 1/0
f2(x) = (x > 10) && (x < 20) ? 3*x : 1/0
f3(x) = (x > 20) ? 10*x : 1/0
plot [0:25] f1(x), f2(x), f3(x)
Following yet another possibility. This assumes, that you can select a sampling high enough, so that the "jumps" which connect the functions are always greater than inside a function:
set style data lines
unset key
set xrange[0:25]
f1(x) = x
f2(x) = 3*x
f3(x) = 10*x
f(x) = ( (x>0)&&(x<10)?f1(x):(x<20)?f2(x):f3(x) )
set samples 1000
curr = 0
prev = 0
lim = 1
plot '+' using (prev = curr, curr=f($1), $1):(f($1)):(abs(curr-prev) < lim ? 0 : 1) lc var

Resources