I have plotted my data on linear scale in xmgrace by using these numbers:
0.001 0
0.00589391 0.10
0.155206 0.20
0.294695 0.30
0.43222 0.40
0.436149 0.50
0.489194 0.60
0.611002 0.70
0.860511 0.80
0.939096 0.90
0.964637 1
1 1
I have use xmgrace in Ubuntu to plot my date and calculate area under the curve (AUC; Data ->Transformation -> Integration-> SumOnly).
After converting linear curve to the logarithmic one, I am having a problem with calculating area under logarithmic curve.
Has anybody else encountered similar issue?
When you set the axis scale to "logarithmic" you are not actually changing your data, just the way you display it. Therefore, since data transformations such as integration act on the actual data you have, the result is bound to be the same.
In other words, you are integrating f(x) regardless of the scale of the axes. If you want to integrate log(f(x)) you have to first convert f(x) to log(f(x)) by using the Data -> Transformation -> Expression, writing something like y = ln(y) and pressing "apply". Be careful though: the first point (which has y = 0) will get an "inf". You'll need to get rid of it manually (double click on a set, select the first row and use edit -> delete) or don't use exactly 0 in your dataset. If you want to convert also the x axis then open the same "Expression" window and write x = ln(x). Integrate the new dataset and you should get the right number (I got -7.9 I think).
Related
I have a large dataset which I need to plot in loglog scale in Gnuplot, like this:
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512)
LogLogPlot of my datapoints
Text file with the datapoints
Datapoints on the x axis are equally spaced, but because of the logscale they get very dense on the right part of the graph, and as a result the output file (I finally export it in .tex) gets very large.
In linear scale, I would simply use the option every to reduce the number of points which get plotted. Is there a similar option for loglogscale, such that the plotted points appear equally spaced?
I am aware of a similar question which was raised a few years ago, but in my opinion the solution is unsatisfactory: plotted points are not equally spaced along the x-axis. I think this is a really unsophisticated problem which deserves a clearer solution.
As I understand it, you don't want to plot the actual data points; you just want to plot a line through them. But you want to keep the appearance of points rather than a line. Is that right?
set log xy
plot 'A_1D_l0.25_L1024_r0.dat' u 1:($2-512) with lines dashtype '.' lw 2
Amended answer
If it is important to present outliers/errors in the data set then you must not use every or any other technique that simply discards or skips most of the data points. In that case I would prefer the plot with points that you show in the original question, perhaps modified to represent each point as a dot rather than a cross. I will simulate this by modifying a single point in your 500000 point data set (first figure below). But I would also suggest that the presence of outliers is even more apparent if you plot with lines (second figure below).
Showing error bounds is another alternative for noisy data, but the options depend on what you have to work with in your data set. If you want to pursue that, please ask a separate question.
If you really want to reduce the number of data to be plotted, you might consider the following script.
s = 0.1 ### sampling interval in log scale
### (try 0.05 for more detail)
c = log10(0.01) ### a parameter used in sampler(x)
### which should be initialized by
### smaller value than any x in log scale
sampler(x) = (x>0 && log10(x)>=c) ? (c=ceil(log10(x)/s+0.5)*s, x) : NaN
set log xy
set grid xtics
plot 'A_1D_l0.25_L1024_r0.dat' using (sampler($1)):($2-512) with points pt 7 lt 1 notitle , \
'A_1D_l0.25_L1024_r0.dat' using 1:($2-512) with lines lt 1 notitle
This script samples the data in increments of roughly 0.1 on x-axis in log scale. It makes use of the property that points whose x value is evaluated as NaN in using are not drawn.
I'm plotting 2 sets of data (x,y) and (a,b). The x axis is at an interval of 0.05 for (x,y) and at an interval of 0.02 for (a,b). I'm trying to interpolate (x,y) so that it fills in data every 0.02 units. I've played with approxfun() and splinefun() but can't figure out how to work with the n or xout parameters properly.
require(graphics)
x<-c(1.00,1.05,1.10,1.15,1.20)
y<-c(4.1,6.4,8.4,5.2,0.5)
a<-c(1.00,1.02,1.04,1.06,1.08)
b<-c(5.0,8.3,7.3,4.0,6.0)
par(mfrow = c(2,1))
plot(x,y)
points(approx(x,y,method="linear"),col=2,pch="*")
plot(a,b)
Ultimately I want all of my vectors to have an x interval of 0.02 like (a,b) so that all of my vectors have the same number of elements, and save the new vector to a variable. I would also like to be able to switch back from 0.02 to 0.05, which I think would involve the same commands but switching the intervals? I think the words for what I want to do is resample my data to a new frequency.
I've looked in various threads for an answer to this, but I don't know enough about R to figure out how to ask this/search for it. Thanks for any help.
Hi You may want to use a <-seq(1,10,by =.5), it increments by what ever factor you wish in this case .5
I have a curve between dff(x axis) and dc(y axis) and I calculated the area under the curve using IN_TABULATED function.
X=[-0.00205553,-0.00186668,-0.00167783,-0.00148899,-0.00130014,-0.00111129,-0.000922443,-0.000733597,-0.000450326,-0.000261480,0.000116216,0.000399487, 0.000588333,0.000777179,0.000966027,0.00115488,0.00134372,0.00153257,0.00172141,0.00181584,0.00200468]
F=[0.00000,21.0000,26.0000,57.0000,94.0000,148.000,248.000,270.000,388.000,418.000,379.000,404.000,358.000,257.000,183.000,132.000,81.0000,47.0000,23.0000,17.0000,431.000]
A=INT_TABULATED(X,F)
print, A
Now, I need to have a loop start from n,0 (from right to left) and calculate A1 which is 0.01 of A and to stop there, then print dff values which represent A1's area. How can I do this? Any suggestion will be helpful.
I'm not sure I fully understand the question, so let me begin by stating my interpretation. You have a curve which integrates to A. Starting from the right, you want the X-value (let's call it X1) which encloses 0.01 of A (the total area under the curve). In other words, 0.99 of the total area under the curve F is to the left of X1, and 0.01 of the area is to the right.
Assuming this interpretation is correct, here's a solution:
First, loop through the data and calculate the integral from 0 to each point.
npoints = n_elements(x)
; Initialize a vector to hold integration results
area_cumulative = []
; Loop through each data point, calculating integrals from 0 to that point
for index = 0, npoints-1 do begin
; Assume area under first point is zero, otherwise, calculate integral
if index eq 0 then area_up_to_point = 0d0 $
else area_up_to_point = int_tabulated(x[0:index], f[0:index])
; Store integral value in the cumulative vector
area_cumulative = [area_cumulative, area_up_to_point]
endfor
Then, you can interpolate to find X1:
;;; Find where cumulative distribution reaches 0.99 of A
a1 = 0.99 * a
x1 = interpol(x, area_cumulative, a1)
Here's an illustration. The upper plot is your data, and the lower plot is the cumulative area (integral from x[0] to x). The red dashed lines show X1 = 0.001952. The gray shaded region contains 0.01 of the total area.
Hope this helps!
I have a data file, the data for y axis are in the third column. I would like to have the scale given by the first column on the x1 and by the second column on the x2. The standard way would be to:
plot data u 1:2 axes x1y1, data u 1:3 x2y1
But that creates two plots which is something I want to avoid. Of course one could make the above work with colours or with some other dirty tricks. It makes the whole plot code very cumbersome. Another nice way is to use multiplot as suggested here. But this is not really my goal, as I want to have the the real x2 axis.
Another way that came to my mind was to set x2range but that means going to the source file and figuring out the min and max or using some statistics in gnuplot (which feels like a waste of time for such a simple thing).
Is there any more simple and elegant way than the above ones? (I am especially concerned about the solution to be short to write, the plot can consist of several (>5) datasets and doing and I want to avoid plotting each dataset twice.
This can be done in this way, by telling gnuplot to re-scan file with 2nd column as x2 values but only invalid y-values for this second plot:
set xtics nomirror
set xrange [:] noextend
set x2tics
set x2range [:] noextend
plot '/tmp/f.gdat' u 1:3 w l, '' u 2:(1/0) ax x2y1
As an example, you can plot this data with Celsius on x and Fahrenheit on x2:
0 32 0
30 86 1
60 140 2
90 194 3
Note that this will only be sensible if column 2 is affinely linked with column 1. If you know the affine relation, using set link is much better.
My math is a bit elementary so I apologize for any assumptions in advance.
I want to fetch values that exist on a simulated bell curve. I don't want to actually create a bell curve or plot one, I'd just like to use a function that given an input value can tell me the corresponding Y axis value on a hypothetical bell curve.
Here's the full problem statement:
I am generating floating point values between 0.0 and 1.0.
0.50 represents 2.0 on the bell curve, which is the maximum. The values < 0.50 and > 0.50 start dropping on this bell curve, so for example 0.40 and 0.60 are the same and could be something like 1.8. 1.8 is arbitrarily chosen for this example, and I'd like to know how I can tweak this 'gradient'.
Right now Im doing a very crude implementation, for example, for any value > 0.40 and < 0.60 the function returns 2.0, but I'd like to 'smooth' this and gain more 'control' over the descent/gradient
Any ideas how I can achieve this in Go
Gaussian function described here : https://en.wikipedia.org/wiki/Gaussian_function
has a bell-curve shape. Example of implementation :
package main
import (
"math"
)
const (
a = 2.0 // height of curve's peak
b = 0.5 // position of the peak
c = 0.1 // standart deviation controlling width of the curve
//( lower abstract value of c -> "longer" curve)
)
func curveFunc(x float64) float64 {
return a *math.Exp(-math.Pow(x-b, 2)/(2.0*math.Pow(c, 2)))
}