Simulate Inhomogeneous Poisson Process by Gaussian Scattering - r

I am currently trying to redo plots which could be found on p. 120 in the textbook "Statistical Analysis and Modelling of Spatial Point Patterns". The following information should be sufficient to help me without having a look into the mentioned textbook. Using the fantastic spatstat package, I try to simulate point patterns in the unit square resulting from inhomgenous Poisson point processes (IPP) with the intensity functions (a) $\lambda(x,y)=a*(x+y)$ (linear trend) and (b) $/lambda(r)=c*exp(-dr^2)$, with $r$ being the distance from the origin.
For (a) I did the following:
library(spatstat)
linear <- function(x,y,a) {a*(x+y)}
plot(rpoispp(lambda = linear, a=150))
The resulting plot is not too bad from my understanding. I am unable figuring out how to implement (b) and would appreciate any help.
Hopefully, understanding how the implemantation of (b) works helps me to fit a model to an observed point pattern, with only a few clusters, probably one, which is likely to stem from an IPP using ppm(pattern, function describing the simple model) or kppm.
Note. The reason I am asking this question is self-interest. I could easily retrieve the plots from the source, but this does not help me understanding how to implement intensities, or create and fit simple models to observed point patterns.
If my question is answered elsewhere I would appreciate the provision of links. Thank you!

If you want to code an intensity function as a function in the R language, then it should be a function of the spatial location (x,y).
In (b) the intensity function is $\lambda(x,y) = c exp(-d (x^2 + y^2))$ where we use the fact that the distance from the origin (0,0) to the point (x, y) is $r = sqrt(x^2 + y^2)$. The code is
lam <- function(x,y,c,d) { c * exp(- d * (x^2 + y^2))
In this example the value of lambda(x,y) depends only on the distance r, so we say loosely that "the intensity is a function of r", which may be the source of your confusion.

Related

Fitting different parts of data to different models in R

I've had a pet personal project trying to dig through my Dad's old thesis from 1972 and to reproduce a computational solution that he derived. His project was looking at the kinetics of a transition state for alumina ceramics. After collecting the data, he derived the following model for the kinetic curve of the transition (see attached image from his thesis).
In case the picture doesn't come through, the data form a s shaped curve. To the left of the inflection point t* the data fit the equation
y = A * exp(K*t)
To the right of the inflection point, the data fit the equation
y = 1 - B * exp(-J * t^n)
He wrote up a fortran program using Fortran 68 that does dynamic modeling and least squares fitting for this. I am trying to "update" his code to see if I can do it more efficiently in R. So two questions:
What is the best way to just plot his model? i.e. how to plot two equations in this manner. I feel like I could do it brute force with base R, but I'm not sure that it will transition between the two equations smoothly.
In his model, the coefficients A, K, B, J and n as well as the inflection point t* are unknown and are optimized by least squares. He does his modeling in fortran by brute force. Is there a glm or similar solution in R for solving this elegantly?
Here is a sample of the data that he generated:
y <- c(20,30,40,50,55,60,65,70,80,90,100,110,120,150)
t <- c(0.05,0.11,0.185,0.31,0.375,0.445,0.52,0.63,0.8,0.92,0.97,0.98,0.99,0.999)

Discrepancy in Cubic Spline Interpolation, R & matlab

I am trying to replicate the spline() function in matlab using the spline() function in R's splinefun {stat}s package, without having full access to matlab (I don't have a licence for it). I am able to input all of the necessary data into R that would be present in matlab, but my spline output is different than matlab's by an average of .0036 (maxdif is .0342, mindif is -.0056, stdev is .0094). My main question is, how does matlab's formula compare to R's, and is that where my calculation discrepancy might come from?
The first part of my code is feeding the excel spreadsheet into R, then calculating the necessary variables to get tau and quick delta. After this, I run the spline calculation and then rotate the output for the purposes of exporting back into excel. Below is the essential script, plus some data to try out to see if there is something flawed in my calculation. I use spline(natural), as it returns the closest values to matlab's model.
#establishing what tau is for quick Delta calculation
today<-Sys.Date()
month<-as.Date(5/1/2016)
difday<-difftime(month,today,units=c("days"))
Tau<-as.numeric((month-today)/365)
Pu<-as.numeric(1.94)
Vol<-as.numeric(.4261)
#Pf is the representation of my fixed strike prices, the points used for interpolation
Pf<-c(Pu-.3,Pu-.25,Pu-.2,Pu-.1,Pu,Pu+.1,Pu+.2,Pu+.25,Pu+.3)
qDtable<-data.frame(matrix(ncol=length(Pf),nrow=length(month)))
colnames(qDtable)<-c(Pf)
rownames(qDtable)<-format.Date(month)
#my quick Delta calculation & table as a result
qD<-data.frame(pnorm(log(Pf/Pu)/(Vol*sqrt(Tau))))
Qd<-t(qD[1:24,1])
qDtable[1,]=c(Qd)
#setting up for spline interpolation
qDpoint<-as.numeric(qDtable[1,1:24])
ncsibyPf<-data.frame(matrix(ncol=length(Pf),nrow=length(month)))
colnames(ncsibyPf)<-Pf
rownames(ncsibyPf)<-format.Date(month)
qDvol<-data.frame(matrix(ncol=14,nrow=2)
colnames(qDvol)<-c("",0,.05,.1,.2,.3,.4,.5,.6,.7,.8,.9,.95,1)
rownames(qDvol)<-format.Date(month)
qDvol[2,2:14]<-c(.59612,.51112,.46112,.45612,.44612,.42612,.42612,.42612,.42612,.42612,.42612,.42612,.42612)
#x is the quick Vol point
x<-as.numeric(qDvol[1,2:14])
#y is the vol at the quick Vol point
y<-as.numeric(qDvol[2,2:14])
ncsivol<-data.frame(spline(x,y,xout=qDpoint,method="natural"))
nroutput<-t(ncsivol[1:24,2])
ncsibyPf[1,]=c(nroutput)
The essential data points for this spline run are all included (I think), and everything should line up correctly. Thank you for your help ahead of time!

Is there an implementation of loess in R with more than 3 parametric predictors or a trick to a similar effect?

Calling all experts on local regression and/or R!
I have run into a limitation of the standard loess function in R and hope you have some advice. The current implementation supports only 1-4 predictors. Let me set out our application scenario to show why this can easily become a problem as soon as we want to employ globally fit parametric covariables.
Essentially, we have a spatial distortion s(x,y) overlaid over a number of measurements z:
z_i = s(x_i,y_i) + v_{g_i}
These measurements z can be grouped by the same underlying undistorted measurement value v for each group g. The group membership g_i is known for each measurement, but the underlying undistorted measurement values v_g for the groups are not known and should be determined by (global, not local) regression.
We need to estimate the two-dimensional spatial trend s(x,y), which we then want to remove. In our application, say there are 20 groups of at least 35 measurements each, in the most simple scenario. The measurements are randomly placed. Taking the first group as reference, there are thus 19 unknown offsets.
The below code for toy data (with a spatial trend in one dimension x) works for two or three offset groups.
Unfortunately, the loess call fails for four or more offset groups with the error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize, :
only 1-4 predictors are allowed"
I tried overriding the restriction and got
k>d2MAX in ehg136. Need to recompile with increased dimensions.
How easy would that be to do? I cannot find a definition of d2MAX anywhere, and it seems this might be hardcoded -- the error is apparently triggered by line #1359 in loessf.f
if(k .gt. 15) call ehg182(105)
Alternatively, does anyone know of an implementation of local regression with global (parametric) offset groups that could be applied here?
Or is there a better way of dealing with this? I tried lme with correlation structures but that seems to be much, much slower.
Any comments would be greatly appreciated!
Many thanks,
David
###
#
# loess with parametric offsets - toy data demo
#
x<-seq(0,9,.1);
x.N<-length(x);
o<-c(0.4,-0.8,1.2#,-0.2 # works for three but not four
); # these are the (unknown) offsets
o.N<-length(o);
f<-sapply(seq(o.N),
function(n){
ifelse((seq(x.N)<= n *x.N/(o.N+1) &
seq(x.N)> (n-1)*x.N/(o.N+1)),
1,0);
});
f<-f[sample(NROW(f)),];
y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);
l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red'); # input data
points(x,yp,pch='o',col='blue'); # fit of that
d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o); # spatial distortion
lines(x,yp0,pch='+'); # estimate of that
op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});
cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");
Why don't you use an additive model for this? Package mgcv will handle this sort of model, if I understand your Question, just fine. I might have this wrong, but the code you show is relating x ~ y, but your Question mentions z ~ s(x, y) + g. What I show below for gam() is for response z modelled by a spatial smooth in x and y with g being estimated parametrically, with g stored as a factor in the data frame:
require(mgcv)
m <- gam(z ~ s(x,y) + g, data = foo)
Or have I misunderstood what you wanted? If you want to post a small snippet of data I can give a proper example using mgcv...?

Drawing random values from a Fisher Distribution

In my research, I am generating discrete planes that are intended to represent fractures in rock. The orientation of a fracture plane is specified by its dip and dip direction. Knowing this, I also know the components of the normal vector for each plane.
So far, I have been drawing dip and dip direction independently from normal distributions. This is OK, but I would like to add the ability to draw from the Fisher distribution.
The fisher distribution is described
HERE
Basically, I want to be able to specify an average dip and dip direction (or a mean vector) and a "fisher constant" or dispersion factor, k, and draw values randomly from that orientation distribution.
Additional Info: It seems like the "Von Mises-Fisher distribution" is either the same as what I've been calling the "Fisher distribution" or is somehow related. Some info on the Von Mises-Fisher distribution:
As you can see, I've done some looking into this, but I admit that I don't fully understand the mathematics. I feel like I'm close, but am not quite getting it... Any help is much appreciated!
If it helps, my programming is in FORTRAN.
The algorithm is on page 59 of "Statistical analysis of spherical data" by N. I. Fisher, T. Lewis and B. J. J. Embleton. I highly recommend that book -- it will help you understand the mathematics.
The following will produce random Fisher distribution locations centered on the North pole. If you want them randomly centered, then you produce additional uniform random locations on the sphere and rotate these locations to be centered on those locations. If you are not sure of those steps, consult the aforementioned book. This Fortran code fragment uses a random number generator that produces uniform deviates from 0 to 1.
lambda = exp (-2.0 * kappa)
term1 = get_uniform_random () * (1.0 - lambda) + lambda
CoLat = 2.0 * asin ( sqrt ( -log (term1) / (2.0 * kappa) ) )
Long = 2.0 * PI * get_uniform_random ()
I think that you can do the math by hand
Integrate the density function of the Fisher Distribution to get the cumulative distribution function
F(theta)=exp(K cos(theta)))/(exp(k)-exp(-k))
The next step is to find the inverse cumulative distribution function function, F^(-1)(y). This function fulfills
F(theta)= y <=> F^(-1)(y) =theta
I think that you get the following.
F^(-1)(y) = arccos(log((exp(k)-exp(-k))*y)/K)
Draw y1, y2, y3, y4... from a uniform distribution on the interval [0, 1]
Now, the numbers F^(-1)(y1), F^(-1)(y2), F^(-1)(y3), F^(-1)(y4) will be distributed according to the Fisher distribution..

approximation methods

I attached image:
(source: piccy.info)
So in this image there is a diagram of the function, which is defined on the given points.
For example on points x=1..N.
Another diagram, which was drawn as a semitransparent curve,
That is what I want to get from the original diagram,
i.e. I want to approximate the original function so that it becomes smooth.
Are there any methods for doing that?
I heard about least squares method, which can be used to approximate a function by straight line or by parabolic function. But I do not need to approximate by parabolic function.
I probably need to approximate it by trigonometric function.
So are there any methods for doing that?
And one idea, is it possible to use the Least squares method for this problem, if we can deduce it for trigonometric functions?
One more question!
If I use the discrete Fourier transform and think about the function as a sum of waves, so may be noise has special features by which we can define it and then we can set to zero the corresponding frequency and then perform inverse Fourier transform.
So if you think that it is possible, then what can you suggest in order to identify the frequency of noise?
Unfortunately many solutions here presented don't solve the problem and/or they are plain wrong.
There are many approaches and they are specifically built to solve conditions and requirements you must be aware of !
a) Approximation theory: If you have a very sharp defined function without errors (given by either definition or data) and you want to trace it exactly as possible, you are using
polynominal or rational approximation by Chebyshev or Legendre polynoms, meaning that you
approach the function by a polynom or, if periodical, by Fourier series.
b) Interpolation: If you have a function where some points (but not the whole curve!) are given and you need a function to get through this points, you can use several methods:
Newton-Gregory, Newton with divided differences, Lagrange, Hermite, Spline
c) Curve fitting: You have a function with given points and you want to draw a curve with a given (!) function which approximates the curve as closely as possible. There are linear
and nonlinear algorithms for this case.
Your drawing implicates:
It is not remotely like a mathematical function.
It is not sharply defined by data or function
You need to fit the curve, not some points.
What do you want and need is
d) Smoothing: Given a curve or datapoints with noise or rapidly changing elements, you only want to see the slow changes over time.
You can do that with LOESS as Jacob suggested (but I find that overkill, especially because
choosing a reasonable span needs some experience). For your problem, I simply recommend
the running average as suggested by Jim C.
http://en.wikipedia.org/wiki/Running_average
Sorry, cdonner and Orendorff, your proposals are well-minded, but completely wrong because you are using the right tools for the wrong solution.
These guys used a sixth polynominal to fit climate data and embarassed themselves completely.
http://scienceblogs.com/deltoid/2009/01/the_australians_war_on_science_32.php
http://network.nationalpost.com/np/blogs/fullcomment/archive/2008/10/20/lorne-gunter-thirty-years-of-warmer-temperatures-go-poof.aspx
Use loess in R (free).
E.g. here the loess function approximates a noisy sine curve.
(source: stowers-institute.org)
As you can see you can tweak the smoothness of your curve with span
Here's some sample R code from here:
Step-by-Step Procedure
Let's take a sine curve, add some
"noise" to it, and then see how the
loess "span" parameter affects the
look of the smoothed curve.
Create a sine curve and add some noise:
period <- 120 x <- 1:120 y <-
sin(2*pi*x/period) +
runif(length(x),-1,1)
Plot the points on this noisy sine curve:
plot(x,y, main="Sine Curve +
'Uniform' Noise") mtext("showing
loess smoothing (local regression
smoothing)")
Apply loess smoothing using the default span value of 0.75:
y.loess <- loess(y ~ x, span=0.75,
data.frame(x=x, y=y))
Compute loess smoothed values for all points along the curve:
y.predict <- predict(y.loess,
data.frame(x=x))
Plot the loess smoothed curve along with the points that were already
plotted:
lines(x,y.predict)
You could use a digital filter like a FIR filter. The simplest FIR filter is just a running average. For more sophisticated treatment look a something like a FFT.
This is called curve fitting. The best way to do this is to find a numeric library that can do it for you. Here is a page showing how to do this using scipy. The picture on that page shows what the code does:
(source: scipy.org)
Now it's only 4 lines of code, but the author doesn't explain it at all. I'll try to explain briefly here.
First you have to decide what form you want the answer to be. In this example the author wants a curve of the form
f(x) = p0 cos (2π/p1 x + p2) + p3 x
You might instead want the sum of several curves. That's OK; the formula is an input to the solver.
The goal of the example, then, is to find the constants p0 through p3 to complete the formula. scipy can find this array of four constants. All you need is an error function that scipy can use to see how close its guesses are to the actual sampled data points.
fitfunc = lambda p, x: p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x # Target function
errfunc = lambda p: fitfunc(p, Tx) - tX # Distance to the target function
errfunc takes just one parameter: an array of length 4. It plugs those constants into the formula and calculates an array of values on the candidate curve, then subtracts the array of sampled data points tX. The result is an array of error values; presumably scipy will take the sum of the squares of these values.
Then just put some initial guesses in and scipy.optimize.leastsq crunches the numbers, trying to find a set of parameters p where the error is minimized.
p0 = [-15., 0.8, 0., -1.] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:])
The result p1 is an array containing the four constants. success is 1, 2, 3, or 4 if ths solver actually found a solution. (If the errfunc is sufficiently crazy, the solver can fail.)
This looks like a polynomial approximation. You can play with polynoms in Excel ("Add Trendline" to a chart, select Polynomial, then increase the order to the level of approximation that you need). It shouldn't be too hard to find an algorithm/code for that.
Excel can show the equation that it came up with for the approximation, too.

Resources