Bisection method error in R. Specifying midpoint and length - r

Hi I'm trying to write the bisection method using R. I know there has been questions like this before, but my specific questions is that, how could I write the function if specifying the midpoint and length, but not a and b? Here is my code, but it gives me lots of error.
The reason why I want to specify the midpoint and length is that, if the initial interval does not contain the root, it could double the search length... like from (pt-h/2, pt+h/2) to (pt-3h/2,pt+3h/2).
Here is my codes.
f1<-function(x){exp(x)-1}
find<-function(pt,h,tol = 0.0001){
if f1(pt-h/2)*f1(pt+h/2)<0{
time = 1
while (h>tol) {
if f1(pt)*f1(pt-h/2)<0 pt+h/2<-pt
if f1(pt)*f1(pt+h/2)<0 pt-h/2<-pt
time<-time+1
}
cat("Iteration time is ",time,"\n")
cat("The root is ",md,"\n")
cat("The real y of root is ",fmd,"\n")
}
}

Related

Convert P to Z in log space

Given a -log10(P) value, I'd like to calculate the Z score in log space, how would I do that?
So, given the following code, how to recode the last line so that it calculates Z from log10P in the log space?
Z=10
log10P = -1*(pnorm(-abs(Z),log.p = T)*1/log(10) + log10(2))
Z== -1*(qnorm(10^-log10P/2)) # <- this needs to be in log space.
qnorm also has a log.p argument analogous to pnorm's, so you can reverse the operations that you used to get log10P in the first place (it took me a couple of tries to get this right ...)
I rearranged your log10P calculation slightly.
log10P_from_Z <- function(Z) {
abs((pnorm(-abs(Z),log.p=TRUE)+log(2))/log(10))
}
Z_from_log10P <- function(log10P) {
-1*qnorm(-(log10P*log(10))-log(2), log.p=TRUE)
}
We can check the round-trip accuracy (i.e. convert from -log10(p) to Z and back, see how close we got to the original value.) This works perfectly for values around 20, but does incur a little bit of round-off error for large values (would have to look more carefully to see if there's anything that can be remedied here).
zvec <- seq(20,400)
err <- sapply(zvec, function(z) {
abs(Z_from_log10P(log10P_from_Z(z))-z)
})

How to fix a straight line plot of a logistic map in R?

The logistic map (a map is a function that takes its value at any time step to its value at the next time step) is a model that has its roots in the prediction of animal population sizes. It has become famous, in part, due to special cases of its parameterization that exhibit surprising chaotic behavior. The logistic map equation is
xi+1 = rxi(1 - xi)
where xi ∈ [0,1] is the value ratio of current population size to maximum possible size at time i, xi+1 is the ratio at the next generation and r is the driving rate, representing animal reproduction and death. For r < 3.5 the population eventually reaches a stable size or will oscillate between a set of fixed values. However, if r > 3.5 then the system destabilizes and exhibits chaotic behavior!
That is background or context for the following problem statement:
Generate a set of points S = {r, x} where, for each r ∈ [1.0, 4.1] by increments of 0.001025 there will be a sequence of xi values for i = 0,...,16. So, for each r value there will be 17 xi values. Use x0 = 0.01. Depending on your implementation, you may find the rbind function useful. It may take a few seconds for the code to run since it will generate a lot of points in S. No more than 10 lines of R code.
Admittedly, this is a lab assignment; however, I am not a student in the class. I am learning R, and I am trying to work through the online assignments and come up with a solution myself. I have tried to create the set of points to plot, and based on manual verification of a few points, the set looks accurate.
for(j in c(0:3024)) {
rm(x)
x <- 1:17
x[1] <- 0.01
r <- 1 + (j * 0.001025)
for(i in c(1:(17-1))) {
x[i+1] <- r *x[i] * (1 - x[i])
}
if (j==0) {
binded <- cbind(r,x)
} else {
binded <- rbind(binded, cbind(r,x))
}
}
When I invoke plot(binded, pch='.') RStudio displays the result as a straight line. So I am unsure if I am using plot correctly, or even if I am generating all the points correctly. If I decrease the maximum value of j to something less than 2000, you will see a plot; it is just when the j value iterates up to 3024 that you only plot a straight line.
I believe your code is correct, what happens is when time exceeds 4, the of iterations are widely unstable and are going to -infinity. This large variation in the y value is compressing the scale and making the plot look like a flat line.
Cutting off the tail end of the matrix makes a very interesting plot:
plot(binded[-which(binded[,2]<0),], pch=".")
If you do want to plot the entire matrix, consider manually setting your y-axis limits to [0,1]. This way, the plot won't be stretched down to -1e24.
As an added bonus, here's a version in a different plotting library that has points colored by i.

Error in `[<-`(`*tmp*`, i, succ, value = 1) : subscript out of bounds

I have to simulate an image with a white crack on a black background. So I defined a function that adds to a matrix with all elements equal to zero some consecutive points equal to one.
The function is the following:
crepa<-function(matrice) {
start<-sample(1:ncol(matrice),1)
matrice[1,start]<-1
for (i in 2:nrow(matrice)) {
alpha<-sample(c(-1,0,1),1)
succ<-start+alpha
if (succ==(ncol(matrice)+1)) succ==ncol(matrice)
if (succ==0) succ==1
matrice[i,succ]<-1
start<-succ
}
matrice<-as.matrix(matrice)
}
To control whether the function works well, I applied it over and over again to the following matrix:
m<-matrix(0,64,64)
imma<-crepa(m)
par(mar=rep(0,4))
image(t(imma), axes = FALSE, col = grey(seq(0, 1, length = 256)))
In most cases the result is correct. However, in few cases I run into this Error:
Error in [<-(*tmp*, i, succ, value = 1) : subscript out of bounds
These two lines:
if (succ==(ncol(matrice)+1)) succ==ncol(matrice)
if (succ==0) succ==1
Should be:
if (succ==(ncol(matrice)+1)) succ=ncol(matrice)
if (succ==0) succ=1
In case you still can't see it, you've used the equality test == when you should use assignment = or <-.
The error message told me it had to be the element going off the matrix, so I started printing out the values of succ and then noticed it wasn't being reset within the right range, and only then did I spot the mistake. I probably looked at the code ten times without noticing. I also figured that kind of error was more likely with a small matrix, and so tested with a 6x6 matrix which meant I could be more likely to see it than with a 64x64!

How do I calculate the "difference" between two sequences of points?

I have two sequences of length n and m. Each is a sequence of points of the form (x,y) and represent curves in an image. I need to find how different (or similar) these sequences are given that fact that
one sequence is likely longer than the other (i.e., one can be half or a quarter as long as the other, but if they trace approximately the same curve, they are the same)
these sequences could be in opposite directions (i.e., sequence 1 goes from left to right, while sequence 2 goes from right to left)
I looked into some difference estimates like Levenshtein as well as edit-distances in structural similarity matching for protein folding, but none of them seem to do the trick. I could write my own brute-force method but I want to know if there is a better way.
Thanks.
Do you mean that you are trying to match curves that have been translated in x,y coordinates? One technique from image processing is to use chain codes [I'm looking for a decent reference, but all I can find right now is this] to encode each sequence and then compare those chain codes. You could take the sum of the differences (modulo 8) and if the result is 0, the curves are identical. Since the sequences are of different lengths and don't necessarily start at the same relative location, you would have to shift one sequence and do this again and again, but you only have to create the chain codes once. The only way to detect if one of the sequences is reversed is to try both the forward and reverse of one of the sequences. If the curves aren't exactly alike, the sum will be greater than zero but it is not straightforward to tell how different the curves are simply from the sum.
This method will not be rotationally invariant. If you need a method that is rotationally invariant, you should look at Boundary-Centered Polar Encoding. I can't find a free reference for that, but if you need me to describe it, let me know.
A method along these lines might work:
For both sequences:
Fit a curve through the sequence. Make sure that you have a continuous one-to-one function from [0,1] to points on this curve. That is, for each (real) number between 0 and 1, this function returns a point on the curve belonging to it. By tracing the function for all numbers from 0 to 1, you get the entire curve.
One way to fit a curve would be to draw a straight line between each pair of consecutive points (it is not a nice curve, because it has sharp bends, but it might be fine for your purpose). In that case, the function can be obtained by calculating the total length of all the line segments (Pythagoras). The point on the curve corresponding to a number Y (between 0 and 1) corresponds to the point on the curve that has a distance Y * (total length of all line segments) from the first point on the sequence, measured by traveling over the line segments (!!).
Now, after we have obtained such a function F(double) for the first sequence, and G(double) for the second sequence, we can calculate the similarity as follows:
double epsilon = 0.01;
double curveDistanceSquared = 0.0;
for(double d=0.0;d<1.0;d=d+epsilon)
{
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(d);
//alternatively, use G(1.0-d) to check whether the second sequence is reversed
double distanceOfPoints = pointOnCurve1.EuclideanDistance(pointOnCurve2);
curveDistanceSquared = curveDistanceSquared + distanceOfPoints * distanceOfPoints;
}
similarity = 1.0/ curveDistanceSquared;
Possible improvements:
-Find an improved way to fit the curves. Note that you still need the function that traces the curve for the above method to work.
-When calculating the distance, consider reparametrizing the function G in such a way that the distance is minimized. (This means you have an increasing function R, such that R(0) = 0 and R(1)=1,
but which is otherwise general. When calculating the distance you use
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(R(d));
Subsequently, you try to choose R in such a way that the distance is minimized. (to see what happens, note that G(R(d)) also traces the curve)).
Why not do some sort of curve fitting procedure (least-squares whether it be ordinary or non-linear) and see if the coefficients on the shape parameters are the same. If you run it as a panel-data sort of model, there are explicit statistical tests whether sets of parameters are significantly different from one another. That would solve the problem of the the same curve but sampled at different resolutions.
Step 1: Canonicalize the orientation. For example, let's say that all curved start at the endpoint with lowest lexicographic order.
def inCanonicalOrientation(path):
return path if path[0]<path[-1] else reversed(path)
Step 2: You can either be roughly accurate, or very accurate. If you wish to be very accurate, calculate a spline, or fit both curves to a polynomial of appropriate degree, and compare coefficients. If you'd like just a rough estimate, do as follows:
def resample(path, numPoints)
pathLength = pathLength(path) #write this function
segments = generateSegments(path)
currentSegment = next(segments)
segmentsSoFar = [currentSegment]
for i in range(numPoints):
samplePosition = i/(numPoints-1)*pathLength
while samplePosition > pathLength(segmentsSoFar)+currentSegment.length:
currentSegment = next(segments)
segmentsSoFar.insert(currentSegment)
difference = samplePosition - pathLength(segmentsSoFar)
howFar = difference/currentSegment.length
yield Point((1-howFar)*currentSegment.start + (howFar)*currentSegment.end)
This can be modified from a linear resampling to something better.
def error(pathA, pathB):
pathA = inCanonicalOrientation(pathA)
pathB = inCanonicalOrientation(pathB)
higherResolution = max([len(pathA), len(pathB)])
resampledA = resample(pathA, higherResolution)
resampledB = resample(pathA, higherResolution)
error = sum(
abs(pointInA-pointInB)
for pointInA,pointInB in zip(pathA,pathB)
)
averageError = error / len(pathAorB)
normalizedError = error / Z(AorB)
return normalizedError
Where Z is something like the "diameter" of your path, perhaps the maximum Euclidean distance between any two points in a path.
I would use a curve-fitting procedure, but also throw in a constant term, i.e. 0 =B0 + B1*X + B2*Y + B3*X*Y + B4*X^2 etc. This would catch the translational variance and then you can do a statistical comparison of the estimated coefficients of the curves formed by the two sets of points as a way of classifying them. I'm assuming you'll have to do bi-variate interpolation if the data form arbitrary curves in the x-y plane.

Remove redundant points for line plot

I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.
My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (that is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?
Thank you for your time.
The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:
Compute the line equation between first and last point
Check all other points to find what is the most distant from the line
If the worst point is below the tolerance you want then output a single segment
Otherwise call recursively considering two sub-arrays, using the worst point as splitter
In python this could be
def simplify(pts, eps):
if len(pts) < 3:
return pts
x0, y0 = pts[0]
x1, y1 = pts[-1]
m = float(y1 - y0) / float(x1 - x0)
q = y0 - m*x0
worst_err = -1
worst_index = -1
for i in xrange(1, len(pts) - 1):
x, y = pts[i]
err = abs(m*x + q - y)
if err > worst_err:
worst_err = err
worst_index = i
if worst_err < eps:
return [(x0, y0), (x1, y1)]
else:
first = simplify(pts[:worst_index+1], eps)
second = simplify(pts[worst_index:], eps)
return first + second[1:]
print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)
The output is [(0, 0), (30, 30), (50, 0)].
About python syntax for arrays that may be non obvious:
x[a:b] is the part of array from index a up to index b (excluded)
x[n:] is the array made using elements of x from index n to the end
x[:n] is the array made using first n elements of x
a+b when a and b are arrays means concatenation
x[-1] is the last element of an array
An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.
I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.
So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.
Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.
I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.
I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.
Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)

Resources