generating random x and y coordinates with a minimum distance - r

Is there a way in R to generate random coordinates with a minimum distance between them?
E.g. what I'd like to avoid
x <- c(0,3.9,4.1,8)
y <- c(1,4.1,3.9,7)
plot(x~y)

This is a classical problem from stochastic geometry. Completely random points in space where the number of points falling in disjoint regions are independent of each other corresponds to a homogeneous Poisson point process (in this case in R^2, but could be in almost any space).
An important feature is that the total number of points has to be random before you can have independence of the counts of points in disjoint regions.
For the Poisson process points can be arbitrarily close together. If you define a process by sampling the Poisson process until you don't have any points that are too close together you have the so-called Gibbs Hardcore process. This has been studied a lot in the literature and there are different ways to simulate it. The R package spatstat has functions to do this. rHardcore is a perfect sampler, but if you want a high intensity of points and a big hard core distance it may not terminate in finite time... The distribution can be obtained as the limit of a Markov chain and rmh.default lets you run a Markov chain with a given Gibbs model as its invariant distribution. This finishes in finite time but only gives a realisation of an approximate distribution.
In rmh.default you can also simulate conditional on a fixed number of points. Note that when you sample in a finite box there is of course an upper limit to how many points you can fit with a given hard core radius, and the closer you are to this limit the more problematic it becomes to sample correctly from the distribution.
Example:
library(spatstat)
beta <- 100; R = 0.1
win <- square(1) # Unit square for simulation
X1 <- rHardcore(beta, R, W = win) # Exact sampling -- beware it may run forever for some par.!
plot(X1, main = paste("Exact sim. of hardcore model; beta =", beta, "and R =", R))
minnndist(X1) # Observed min. nearest neighbour dist.
#> [1] 0.102402
Approximate simulation
model <- rmhmodel(cif="hardcore", par = list(beta=beta, hc=R), w = win)
X2 <- rmh(model)
#> Checking arguments..determining simulation windows...Starting simulation.
#> Initial state...Ready to simulate. Generating proposal points...Running Metropolis-Hastings.
plot(X2, main = paste("Approx. sim. of hardcore model; beta =", beta, "and R =", R))
minnndist(X2) # Observed min. nearest neighbour dist.
#> [1] 0.1005433
Approximate simulation conditional on number of points
X3 <- rmh(model, control = rmhcontrol(p=1), start = list(n.start = 42))
#> Checking arguments..determining simulation windows...Starting simulation.
#> Initial state...Ready to simulate. Generating proposal points...Running Metropolis-Hastings.
plot(X3, main = paste("Approx. sim. given n =", 42))
minnndist(X3) # Observed min. nearest neighbour dist.
#> [1] 0.1018068

OK, how about this? You just generate random number pairs without restriction and then remove the onces which are too close. This could be a great start for that:
minimumDistancePairs <- function(x, y, minDistance){
i <- 1
repeat{
distance <- sqrt((x-x[i])^2 + (y-y[i])^2) < minDistance # pythagorean theorem
distance[i] <- FALSE # distance to oneself is always zero
if(any(distance)) { # if too close to any other point
x <- x[-i] # remove element from x
y <- y[-i] # and remove element from y
} else { # otherwise...
i = i + 1 # repeat the procedure with the next element
}
if (i > length(x)) break
}
data.frame(x,y)
}
minimumDistancePairs(
c(0,3.9,4.1,8)
, c(1,4.1,3.9,7)
, 1
)
will lead to
x y
1 0.0 1.0
2 4.1 3.9
3 8.0 7.0
Be aware, though, of the fact that these are not random numbers anymore (however you solve problem).

You can use rejection sapling https://en.wikipedia.org/wiki/Rejection_sampling
The principle is simple: you resample until you data verify the condition.
> set.seed(1)
>
> x <- rnorm(2)
> y <- rnorm(2)
> (x[1]-x[2])^2+(y[1]-y[2])^2
[1] 6.565578
> while((x[1]-x[2])^2+(y[1]-y[2])^2 > 1) {
+ x <- rnorm(2)
+ y <- rnorm(2)
+ }
> (x[1]-x[2])^2+(y[1]-y[2])^2
[1] 0.9733252
>

The following is a naive hit-and-miss approach which for some choices of parameters (which were left unspecified in the question) works well. If performance becomes an issue, you could experiment with the package gpuR which has a GPU-accelerated distance matrix calculation.
rand.separated <- function(n,x0,x1,y0,y1,d,trials = 1000){
for(i in 1:trials){
nums <- cbind(runif(n,x0,x1),runif(n,y0,y1))
if(min(dist(nums)) >= d) return(nums)
}
return(NA) #no luck
}
This repeatedly draws samples of size n in [x0,x1]x[y0,y1] and then throws the sample away if it doesn't satisfy. As a safety, trials guards against an infinite loop. If solutions are hard to find or n is large you might need to increase or decrease trials.
For example:
> set.seed(2018)
> nums <- rand.separated(25,0,10,0,10,0.2)
> plot(nums)
runs almost instantly and produces:

Im not sure what you are asking.
if you want random coordinates here.
c(
runif(1,max=y[1],min=x[1]),
runif(1,max=y[2],min=x[2]),
runif(1,min=y[3],max=x[3]),
runif(1,min=y[4],max=x[4])
)

Related

How to find the probability of extinction = 1 using Galton-Watson process in R?

I am simulating a basic Galton-Watson process (GWP) using a geometric distribution. I'm using this to find the probability of extinction for each generation. My question is, how do I find the generation at which the probability of extinction is equal to 1?
For example, I can create a function for the GWP like so:
# Galton-Watson Process for geometric distribution
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- sum(rgeom(Sn[i - 1], p))
}
return(Sn)
}
where, n is the number of generations.
Then, if I set the geometric distribution parameter p = 0.25... then to calculate the probability of extinction for, say, generation 10, I just do this:
N <- 10 # Number of elements in the initial population.
GWn <- replicate(N, GWP(10, 0.25)[10])
probExtinction <- sum(GWn==0)/N
probExtinction
This will give me the probability of extinction for generation 10... to find the probability of extinction for each generation I have to change the index value (to the corresponding generation number) when creating GWn... But what I'm trying to do is find at which generation will the probability of extinction = 1.
Any suggestions as to how I might go about solving this problem?
I can tell you how you would do this problem in principle, but I'm going to suggest that you may run into some difficulties (if you already know everything I'm about to say, just take it as advice to the next reader ...)
theoretically, the Galton-Watson process extinction probability never goes exactly to 1 (unless prob==1, or in the infinite-time limit)
of course, for any given replicate and random-number seed you can compute the first time point (if any) at which all of your lineages have gone extinct. This will be highly variable across runs, depending on the random-number seed ...
the distribution of extinction times is extremely skewed; lineages that don't go extinct immediately will last a loooong time ...
I modified your GWP function in two ways to make it more efficient: (1) stop the simulation when the lineage goes extinct; (2) replace the sum of geometric deviates with a single negative binomial deviate (see here)
GWP <- function(n, p) {
Sn <- c(1, rep(0, n))
for (i in 2:(n + 1)) {
Sn[i] <- rnbinom(1, size=Sn[i - 1], prob=p)
if (Sn[i]==0) break ## extinct, bail out
}
return(Sn)
}
The basic strategy now is: (1) run the simulations for a while, keep the entire trajectory; (2) compute extinction probability in every generation; (3) find the first generation such that p==1.
set.seed(101)
N <- 10 # Number of elements in the initial population.
maxgen <- 100
GWn <- replicate(N, GWP(maxgen, 0.5), simplify="array")
probExtinction <- rowSums(GWn==0)/N
which(probExtinction==1)[1]
(Subtract 1 from the last result if you want to start indexing from generation 0.) In this case the answer is NA, because there's 1/10 lineages that manages to stay alive (and indeed gets very large, so it will probably persist almost forever)
plot(0:maxgen, probExtinction, type="s") ## plot extinction probability
matplot(1+GWn,type="l",lty=1,col=1,log="y") ## plot lineage sizes (log(1+x) scale)
## demonstration that (sum(rgeom(n,...)) is equiv to rnbinom(1,size=n,...)
nmax <- 70
plot(prop.table(table(replicate(10000, sum(rgeom(10, prob=0.3))))),
xlim=c(0,nmax))
points(0:nmax,dnbinom(0:nmax, size=10, prob=0.3), col=2,pch=16)

Maximum pseudo-likelihood estimator for soft-core point process

I am trying to fit a soft-core point process model on a set of point pattern using maximum pseudo-likelihood. I followed the instructions given in this paper by Baddeley and Turner
And here is the R-code I came up with
`library(deldir)
library(tidyverse)
library(fields)
#MPLE
# irregular parameter k
k <- 0.4
## Generate dummy points 50X50. "RA" and "DE" are x and y coordinates
dum.x <- seq(ramin, ramax, length = 50)
dum.y <- seq(demin, demax, length = 50)
dum <- expand.grid(dum.x, dum.y)
colnames(dum) <- c("RA", "DE")
## Combine with data and specify which is data point and which is dummy, X is the point pattern to be fitted
bind.x <- bind_rows(X, dum) %>%
mutate(Ind = c(rep(1, nrow(X)), rep(0, nrow(dum))))
## Calculate Quadrature weights using Voronoi cell area
w <- deldir(bind.x$RA, bind.x$DE)$summary$dir.area
## Response
y <- bind.x$Ind/w
# the sum of distances between all pairs of points (the sufficient statistics)
tmp <- cbind(bind.x$RA, bind.x$DE)
t1 <- rdist(tmp)^(-2/k)
t1[t1 == Inf] <- 0
t1 <- rowSums(t1)
t <- -t1
# fit the model using quasipoisson regression
fit <- glm(y ~ t, family = quasipoisson, weights = w)
`
However, the fitted parameter for t is negative which is obviously not a correct value for a softcore point process. Also, my point pattern is actually simulated from a softcore process so it does not make sense that the fitted parameter is negative. I tried my best to find any bugs in the code but I can't seem to find it. The only potential issue I see is that my sufficient statistics is extremely large (on the order of 10^14) which I fear may cause numerical issues. But the statistics are large because my observation window spans a very small unit and the average distance between a pair of points is around 0.006. So sufficient statistics based on this will certainly be very large and my intuition tells me that it should not cause a numerical problem and make the fitted parameter to be negative.
Can anybody help and check if my code is correct? Thanks very much!

Simulate Compound poisson process in r

I'm trying to simulate a compound Poisson process in r. The process is defined by $ \sum_{j=1}^{N_t} Y_j $ where $Y_n$ is i.i.d sequence independent $N(0,1)$ values and $N_t$ is a Poisson process with parameter $1$. I'm trying to simulate this in r without luck. I have an algorithm to compute this as follows:
Simutale the cPp from 0 to T:
Initiate: $ k = 0 $
Repeat while $\sum_{i=1}^k T_i < T$
Set $k = k+1$
Simulate $T_k \sim exp(\lambda)$ (in my case $\lambda = 1$)
Simulate $Y_k \sim N(0,1)$ (This is just a special case, I would like to be able to change this to any distribution)
The trajectory is given by $X_t = \sum_{j=1}^{N_t} Y_j $ where $N(t) = sup(k : \sum_{i=1}^k T_i \leq t )$
Can someone help me to simulate this in r so that I can plot the process? I have tried, but can't get it done.
Use cumsum for the cumulative sums that determine the times N_t as well as the X_t. This illustrative code specifies the number of times to simulate, n, simulates the times in n.t and the values in x, and (to display what it has done) plots the trajectory.
n <- 1e2
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X")
This algorithm, since it relies on low-level optimized functions, is fast: the six-year-old system I tested it on will generate over three million (time, value) pairs per second.
That's usually good enough for simulation, but it doesn't quite satisfy the problem, which asks to generate a simulation out to time T. We can leverage the preceding code, but the solution is a little trickier. It computes a reasonable upper limit on how many times will occur in the Poisson process before time T. It generates the inter-arrival times. This is wrapped in a loop that will repeat the procedure in the (rare) event the time T is not actually reached.
The additional complexity doesn't change the asymptotic calculation time.
T <- 1e2 # Specify the end time
T.max <- 0 # Last time encountered
n.t <- numeric(0) # Inter-arrival times
while (T.max < T) {
#
# Estimate how many random values to generate before exceeding T.
#
T.remaining <- T - T.max
n <- ceiling(T.remaining + 3*sqrt(T.remaining))
#
# Continue the Poisson process.
#
n.new <- rexp(n)
n.t <- c(n.t, n.new)
T.max <- T.max + sum(n.new)
}
#
# Sum the inter-arrival times and cut them off after time T.
#
n.t <- cumsum(n.t)
n.t <- n.t[n.t <= T]
#
# Generate the iid random values and accumulate their sums.
#
x <- c(0,cumsum(rnorm(length(n.t))))
#
# Display the result.
#
plot(stepfun(n.t, x), xlab="t", ylab="X", sub=paste("n =", length(n.t)))

Draw random numbers from distribution within a certain range

I want to draw a number of random variables from a series of distributions. However, the values returned have to be no higher than a certain threshold.
Let’s say I want to use the gamma distribution and the threshold is 10 and I need n=100 random numbers. I now want 100 random number between 0 and 10. (Say scale and shape are 1.)
Getting 100 random variables is obviously easy...
rgamma(100, shape = 1, rate = 1)
But how can I accomplish that these values range from 0 to 100?
EDIT
To make my question clearer. The 100 values drawn should be scaled beween 0 and 10. So that the highest drawn value is 10 and the lowest 0. Sorry if this was not clear...
EDIT No2
To add some context to the random numbers I need: I want to draw "system repair times" that follow certain distributions. However, within the system simulation there is a binomial probability of repairs beeing "simple" (i.e. short repair time) and "complicated" (i.e. long repair time). I now need a function that provides "short repair times" and one that provides "long repair times". The threshold would be the differentiation between short and long repair times. Again, I hope this makes my question a little clearer.
This is not possible with a gamma distribution.
The support of a distribution determine the range of sample data drawn from it.
As the support of the gamma distribution is (0,inf) this is not possible.(see https://en.wikipedia.org/wiki/Gamma_distribution).
If you really want to have a gamma distribution take a rejection sampling approach as Alex Reynolds suggests.
Otherwise look for a distribution with a bounded/finite support (see https://en.wikipedia.org/wiki/List_of_probability_distributions)
e.g. uniform or binomial
Well, fill vector with rejection, untested code
v <- rep(-1.0, 100)
k <- 1
while (TRUE) {
q <- rgamma(1, shape=1, rate=1)
if (q > 0.0 && q < 100) {
v[k] <- q
k<-k+1
if (k>100)
break
}
}
I'm not sure you can keep the properties of the original distribution, imposing additional conditions... But something like this will do the job:
Filter(function(x) x < 10, rgamma(1000,1,1))[1:100]
For the scaling - beware, the outcome will not follow the original distribution (but there's no way to do it, as the other answers pointed out):
# rescale numeric vector into (0, 1) interval
# clip everything outside the range
rescale <- function(vec, lims=range(vec), clip=c(0, 1)) {
# find the coeficients of transforming linear equation
# that maps the lims range to (0, 1)
slope <- (1 - 0) / (lims[2] - lims[1])
intercept <- - slope * lims[1]
xformed <- slope * vec + intercept
# do the clipping
xformed[xformed < 0] <- clip[1]
xformed[xformed > 1] <- clip[2]
xformed
}
# this is the requested data
10 * rescale(rgamma(100,1,1))
Use truncdist package. It truncates any distribution between upper and lower bounds.
Hope that helped.

Generating Random Variables with given correlations between pairs of them:

I want to generate 2 continuous random variables Q1, Q2 (quantitative traits, each are normal) and 2 binary random variables Z1, Z2 (binary traits) with given pairwise correlations between all possible pairs of them.
Say
(Q1,Q2):0.23
(Q1,Z1):0.55
(Q1,Z2):0.45
(Q2,Z1):0.4
(Q2,Z2):0.5
(Z1,Z2):0.47
Please help me generate such data in R.
This is crude but might get you started in the right direction.
library(copula)
options(digits=3)
probs <- c(0.5,0.5)
corrs <- c(0.23,0.55,0.45,0.4,0.5,0.47) ## lower triangle
Simulate correlated values (first two quantitative, last two transformed to binary)
sim <- function(n,probs,corrs) {
tmp <- normalCopula( corrs, dim=4 , "un")
getSigma(tmp) ## test
x <- rCopula(1000, tmp)
x2 <- x
x2[,3:4] <- qbinom(x[,3:4],size=1,prob=rep(probs,each=nrow(x)))
x2
}
Test SSQ distance between observed and target correlations:
objfun <- function(corrs,targetcorrs,probs,n=1000) {
cc <- try(cor(sim(n,probs,corrs)),silent=TRUE)
if (is(cc,"try-error")) return(NA)
sum((cc[lower.tri(cc)]-targetcorrs)^2)
}
See how bad things are when input corrs=target:
cc0 <- cor(sim(1000,probs=probs,corrs=corrs))
cc0[lower.tri(cc0)]
corrs
objfun(corrs,corrs,probs=probs) ## 0.112
Now try to optimize.
opt1 <- optim(fn=objfun,
par=corrs,
targetcorrs=corrs,probs=c(0.5,0.5))
opt1$value ## 0.0208
Stops after 501 iterations with "max iterations exceeded". This will never work really well because we're trying to use a deterministic hill-climbing algorithm on a stochastic objective function ...
cc1 <- cor(sim(1000,probs=c(0.5,0.5),corrs=opt1$par))
cc1[lower.tri(cc1)]
corrs
Maybe try simulated annealing?
opt2 <- optim(fn=objfun,
par=corrs,
targetcorrs=corrs,probs=c(0.5,0.5),
method="SANN")
It doesn't seem to do much better than the previous value. Two possible problems (left as an exercise for the reader are) (1) we have specified a set of correlations that are not feasible with the marginal distributions we have chosen, or (2) the error in the objective function surface is getting in the way -- to do better we would have to average over more replicates (i.e. increase n).

Resources