Integration Method, hit and miss in R - r

I want to calculate the following integrate by using the hit and miss method.
I=∫x^3dx with lower= 0 and upper =1
I know how to solve it but I cannot find the right code in R to calculate it and generate -for example 100000 random- and then plot them like this:
Thank you.

1. Generate 2 vectors from uniform distribution of the desired length
l = 10000
x = runif(l)
y = runif(l)
2. The approximation of the integral is the number of cases where the (x,y) points are below the function you want to integrate:
sum(y<x^3)/l
3. For the plot, you just have to plot the points, changing their color depending whether they are above or below the curve, and add the function with curve():
plot(x,y,col=1+(y<x^3))
curve(x^3,add=T,col=3)

Related

How to create a random walk in R that goes in different directions than -1 or +1?

Consider this two‐dimensional random walk:
where, Zt, Wt, t = 1,2,3, … are independent and identically distributed standard normal
random variables.
I am having problems in finding a way to simulate and plot the sample path of (X,Y) for t = 0,1, … ,100. I was given a sample:
The following code is an example of the way I am used to plot random walks in R:
set.seed(13579)
r<-sample(c(-1,1),size=100,replace=T,prob=c(0.5,0.5))
r<-c(10,r))
(w<-cumsum(r))
w<-as.ts(w)
plot(w,main="random walk")
I am not very sure of how to achieve this.
The problem I am having is that this kind of codes has a more "simple" result, with a line that goes either up or down, -1 or +1:
while the plot I need to create also goes from left to right and viceversa.
Would you help me in correcting the code I know so that it fits my task/suggesting a smarterst way to go about it? It would be greatly appreciated.
Cheers!
Instead of using sample, you need to use rnorm(100) to draw 100 samples from a standard normal distribution. Since the walk starts at [0, 0], we need to append a 0 at the start and do a cumsum on the result, i.e. cumsum(c(0, rnorm(100))).
We want to do this for both the x and y variables, then plot. The whole thing can be done in a single line of code in base R:
plot(x = cumsum(c(0, rnorm(100))), y = cumsum(c(0, rnorm(100))), type = 'l')

drawing the graph of a function f(x) = x^3 - 6x^2 + 9x - 4 in d3.js

I am back at college learning maths and I want to try and use some this knowledge to create some svg with d3.js.
If I have a function f(x) = x^3 - 3x^2 + 3x - 1
I would take the following steps:
Find the x intercepts for when y = 0
Find the y intercept when x = 0
Find the stationary points when dy\dx = 0
I would then have 2 x values from point 3 to plug into the original equation.
I would then draw a nature table do judge the flow of the graph or curve.
Plot the known points from the above and sketch the graph.
Translating what I would do on pen and paper into code instructions is what I really could do with any sort of advice on the following:
How can I programmatically factorise point 1 of the above to find the x-intercepts for when y = 0. I honestly do not know where to even start.
How would I programmatically find dy/dx and the values for the stationary points.
If I actually get this far then what should I use in d3 to join the points on the graph.
Your other "steps" have nothing to do with d3 or plotting.
Find the x intercepts for when y = 0
This is root finding. Look for algorithms to help with this.
Find the y intercept when x = 0
Easy: substitute to get y = 1.
Find the stationary points when dy\dx = 0
Take the first derivative to get 3x^2 - 12x + 9 and repeat the root finding step. Easy to get using quadratic equation.
I would then have 2 x values from point 3 to plug into the original
equation. I would then draw a nature table do judge the flow of the
graph or curve. Plot the known points from the above and sketch the
graph.
I would just draw the curve. Pick a range for x and go.
It's great to learn d3. You'll end up with something like this:
https://maurizzzio.github.io/function-plot/
For a cubic polynomial, there are closed formulas available to find all the particular points that you want (https://en.wikipedia.org/wiki/Cubic_function), and it is a sound approach to determine them.
Anyway, you will have to plot the smooth curve, which means that you will need to compute close enough points and draw a polyline that joins them.
Doing this, you are actually performing the first steps of numerical root isolation, with such an accuracy that the approximate and exact roots will be practically undistinguishable.
So an easy combined solution is to draw the curve as a polyline and find the intersections with the X axis as well as extrema using this polyline representation, rather than by means of more sophisticated methods.
This approach works for any continuous curve and is very easy to implement. So you actually draw the curve to find particular points rather than conversely as is done by analytical methods.
For best results on complicated curves, you can adapt the point density based on the local curvature, but this is another story.

dimensions of kde object from ks package, R

I am using the ks package from R to estimate 2d space utilization using distance and depth information. What I would like to do is to use the 95% contour output to get the maximum vertical and horizontal distance. So essentially, I want to be able to get the dimensions or measurements of the resulting 95% contour.
Here is a piece of code with as an example,
require(ks)
dist<-c(1650,1300,3713,3718)
depth<-c(22,19.5,20.5,8.60)
dd<-data.frame(cbind(dist,depth))
## auto bandwidth selection
H.pi2<-Hpi(dd,binned=TRUE)*1
ddhat<-kde(dd,H=H.pi2)
plot(ddhat,cont=c(95),lwd=1.5,display="filled.contour2",col=c(NA,"palegreen"),
xlab="",ylab="",las=1,ann=F,bty="l",xaxs="i",yaxs="i",
xlim=c(0,max(dd[,1]+dd[,1]*0.4)),ylim=c(60,-3))
Any information about how to do this will be very helpful. Thanks in advance,
To create a 95% contour polygon from your 'kde' object:
library(raster)
im.kde <- image2Grid (list(x = ddhat$eval.points[[1]], y = ddhat$eval.points[[2]], z = ddhat$estimate))
kr <- raster(im.kde)
It is likely that one will want to resample this raster to a higher resolution before constructing polygons, and include the following two lines, before creation of the polygon object:
new.rast <- raster(extent(im.kde),res = c(50,50))
kr <- resample(kr, new.rast)
bin.kr <- kr
bin.kr[bin.kr < contourLevels(k, prob = 0.05)]<-NA
bin.kr[bin.kr > 0]<-1
k.poly<-rasterToPolygons(bin.kr,dissolve=T)
Note that the results are similar, but not identical, to Hawthorne Beier's GME function 'kde'. He does use the kde function from ks, but must do something slightly different for the output polygon.
At the moment I'm going for the "any information" prize rather than attempting a final answer. The ks:::plot.kde function dispatches to ks:::plotkde.2d in this case. It works its magic through side effects and I cannot get these functions to return values that can be inspected in code. You would need to hack the plotkde.2d function to return the values used to plot the contour lines. You can visualize what is in ddhat$estimate with:
persp(ddhat$estimate)
It appears that contourLevels examines the estimate-matrix and finds the value at which greater than the specified % of the total density will reside.
> contourLevels(ddhat, 0.95)
95%
1.891981e-05
And then draws the contout based on which values exceed that level. (I just haven't found the code that does that yet.)

Probability distribution values plot

I have
probability values: 0.06,0.06,0.1,0.08,0.12,0.16,0.14,0.14,0.08,0.02,0.04 ,summing up to 1
the corresponding intervals where a stochastic variable may take its value with the corresponding probability from the above list:
126,162,233,304,375,446,517,588,659,730,801,839
How can I plot the probability distribution?
On the x axis, the interval values, between the intervals histogram with the probability value?
Thanks.
How about
x <- c(126,162,233,304,375,446,517,588,659,730,801,839)
p <- c(0.06,0.06,0.1,0.08,0.12,0.16,0.14,0.14,0.08,0.02,0.04)
plot(x,c(p,0),type="s")
lines(x,c(0,p),type="S")
rect(x[-1],0,x[-length(x)],p,col="lightblue")
for a quick answer? (With the rect included you might not need the lines call and might be able to change it to plot(x,p,type="n"). As usual I would recommend par(bty="l",lty=1) for my preferred graphical defaults ...)
(Explanation: "s" and "S" are two different stair-step types (see Details in ?plot): I used them both to get both the left and right boundaries of the distribution.)
edit: In your comments you say "(it) doesn't look like a histogram". It's not quite clear what you want. I added rectangles in the example above -- maybe that does it? Or you could do
b <- barplot(p,width=diff(x),space=0)
but getting the x-axis labels right is a pain.

How do I calculate the "difference" between two sequences of points?

I have two sequences of length n and m. Each is a sequence of points of the form (x,y) and represent curves in an image. I need to find how different (or similar) these sequences are given that fact that
one sequence is likely longer than the other (i.e., one can be half or a quarter as long as the other, but if they trace approximately the same curve, they are the same)
these sequences could be in opposite directions (i.e., sequence 1 goes from left to right, while sequence 2 goes from right to left)
I looked into some difference estimates like Levenshtein as well as edit-distances in structural similarity matching for protein folding, but none of them seem to do the trick. I could write my own brute-force method but I want to know if there is a better way.
Thanks.
Do you mean that you are trying to match curves that have been translated in x,y coordinates? One technique from image processing is to use chain codes [I'm looking for a decent reference, but all I can find right now is this] to encode each sequence and then compare those chain codes. You could take the sum of the differences (modulo 8) and if the result is 0, the curves are identical. Since the sequences are of different lengths and don't necessarily start at the same relative location, you would have to shift one sequence and do this again and again, but you only have to create the chain codes once. The only way to detect if one of the sequences is reversed is to try both the forward and reverse of one of the sequences. If the curves aren't exactly alike, the sum will be greater than zero but it is not straightforward to tell how different the curves are simply from the sum.
This method will not be rotationally invariant. If you need a method that is rotationally invariant, you should look at Boundary-Centered Polar Encoding. I can't find a free reference for that, but if you need me to describe it, let me know.
A method along these lines might work:
For both sequences:
Fit a curve through the sequence. Make sure that you have a continuous one-to-one function from [0,1] to points on this curve. That is, for each (real) number between 0 and 1, this function returns a point on the curve belonging to it. By tracing the function for all numbers from 0 to 1, you get the entire curve.
One way to fit a curve would be to draw a straight line between each pair of consecutive points (it is not a nice curve, because it has sharp bends, but it might be fine for your purpose). In that case, the function can be obtained by calculating the total length of all the line segments (Pythagoras). The point on the curve corresponding to a number Y (between 0 and 1) corresponds to the point on the curve that has a distance Y * (total length of all line segments) from the first point on the sequence, measured by traveling over the line segments (!!).
Now, after we have obtained such a function F(double) for the first sequence, and G(double) for the second sequence, we can calculate the similarity as follows:
double epsilon = 0.01;
double curveDistanceSquared = 0.0;
for(double d=0.0;d<1.0;d=d+epsilon)
{
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(d);
//alternatively, use G(1.0-d) to check whether the second sequence is reversed
double distanceOfPoints = pointOnCurve1.EuclideanDistance(pointOnCurve2);
curveDistanceSquared = curveDistanceSquared + distanceOfPoints * distanceOfPoints;
}
similarity = 1.0/ curveDistanceSquared;
Possible improvements:
-Find an improved way to fit the curves. Note that you still need the function that traces the curve for the above method to work.
-When calculating the distance, consider reparametrizing the function G in such a way that the distance is minimized. (This means you have an increasing function R, such that R(0) = 0 and R(1)=1,
but which is otherwise general. When calculating the distance you use
Point pointOnCurve1 = F(d);
Point pointOnCurve2 = G(R(d));
Subsequently, you try to choose R in such a way that the distance is minimized. (to see what happens, note that G(R(d)) also traces the curve)).
Why not do some sort of curve fitting procedure (least-squares whether it be ordinary or non-linear) and see if the coefficients on the shape parameters are the same. If you run it as a panel-data sort of model, there are explicit statistical tests whether sets of parameters are significantly different from one another. That would solve the problem of the the same curve but sampled at different resolutions.
Step 1: Canonicalize the orientation. For example, let's say that all curved start at the endpoint with lowest lexicographic order.
def inCanonicalOrientation(path):
return path if path[0]<path[-1] else reversed(path)
Step 2: You can either be roughly accurate, or very accurate. If you wish to be very accurate, calculate a spline, or fit both curves to a polynomial of appropriate degree, and compare coefficients. If you'd like just a rough estimate, do as follows:
def resample(path, numPoints)
pathLength = pathLength(path) #write this function
segments = generateSegments(path)
currentSegment = next(segments)
segmentsSoFar = [currentSegment]
for i in range(numPoints):
samplePosition = i/(numPoints-1)*pathLength
while samplePosition > pathLength(segmentsSoFar)+currentSegment.length:
currentSegment = next(segments)
segmentsSoFar.insert(currentSegment)
difference = samplePosition - pathLength(segmentsSoFar)
howFar = difference/currentSegment.length
yield Point((1-howFar)*currentSegment.start + (howFar)*currentSegment.end)
This can be modified from a linear resampling to something better.
def error(pathA, pathB):
pathA = inCanonicalOrientation(pathA)
pathB = inCanonicalOrientation(pathB)
higherResolution = max([len(pathA), len(pathB)])
resampledA = resample(pathA, higherResolution)
resampledB = resample(pathA, higherResolution)
error = sum(
abs(pointInA-pointInB)
for pointInA,pointInB in zip(pathA,pathB)
)
averageError = error / len(pathAorB)
normalizedError = error / Z(AorB)
return normalizedError
Where Z is something like the "diameter" of your path, perhaps the maximum Euclidean distance between any two points in a path.
I would use a curve-fitting procedure, but also throw in a constant term, i.e. 0 =B0 + B1*X + B2*Y + B3*X*Y + B4*X^2 etc. This would catch the translational variance and then you can do a statistical comparison of the estimated coefficients of the curves formed by the two sets of points as a way of classifying them. I'm assuming you'll have to do bi-variate interpolation if the data form arbitrary curves in the x-y plane.

Resources