Levy Walk simulation in R - r

I am trying to generate a series of numbers to simulate a Levy Walk in R. Currently I am using the following code:
alpha=2
n=1000
x=rep(0,n)
y=rep(0,n)
for (i in 2:n){
theta=runif(1)*2*pi
f=runif(1)^(-1/alpha)
x[i]=x[i-1]+f*cos(theta)
y[i]=y[i-1]+f*sin(theta)
}
The code is working as expected and I am able to generate the numbers according to my requirements. The figure below shows on such Levy Walk:
The following histogram confirms that the numbers generated (i.e. f) actually belong to a power law:
My question is as follows:
The step lengths generated (i.e. f) are quite large. Haw can I modify the code so that the step lengths only fall within some bound [fmin, fmax]?
P.S. I have intentionally not vectorized the code.

Try using this:
f=runif(1, fmax^(-alpha), fmin^(-alpha))^(-1/alpha)
Note that you need 0 < fmin < fmax.
BTW, you can vectorize your code like this:
theta <- runif(n-1)*2*pi
f <- runif(n-1, fmax^(-alpha), fmin^(-alpha))^(-1/alpha)
x <- c(0, cumsum(f*cos(theta)))
y <- c(0, cumsum(f*sin(theta)))

Just for precision, what you're simmulating here is a Lévy flight. For it to be a Lévy walk, you should allow the particle to "walk" from the beginning to the end of each flight (with a for, for example). If you plot your resulting simmulation with plot(x, y, type = "o") you will see that there are no positions within flights (no walking) using your code.

library(ggplot2)
library(gridExtra)
alpha= 5
n= 1000
x= rep(0,n)
y= rep(0,n)
fmin= 1
fmax= n
for (i in 2:n){
theta= runif(n-1)*2*pi
f= runif(n-1, fmax^(-alpha), fmin^(-alpha))^(-1/alpha)
x= c(0, cumsum(f*cos(theta)))
y= c(0, cumsum(f*sin(theta)))
}
ggplot(data.frame(x=x, y=y), aes(x, y))+geom_point()+geom_path()

Related

How can I fit a smooth hysteresis in R?

I have a measurment of which should fit an hysteresis. For visualisation purpose I would like to plot a line approximating the hysteresis to help explain this pattern.
I created an example in the following image using the code below.
I would like to have an output similar to the green curve - however I don't have this data directly available, and I don't care whether it is pointy.
However most smoothing functions such as smooth.spline which I plotted in blue - allow no loops. The closest I can find is from the bezier library - plotted in red. Not nicely visible here but it produces a loop, however it fits poorly (and gives some warnings and takes quite some time).
Can you suggest a method?
set.seed(12345)
up <- seq(0,1,length.out=100)^3
down <- sqrt(seq(1,0,length.out=100))
x <- c(seq(0,1,length.out=length(up)),
seq(1,0, length.out=length(down)))
data <- data.frame(x=x, y=c(up,down),
measuredx=x + rnorm(length(x))*0.01,
measuredy=c(up,down) + rnorm(length(up)+length(down))*0.03)
with(data,plot(measuredx,measuredy, type = "p"))
with(data,lines(x,y, col='green'))
sp <- with(data,smooth.spline(measuredx, measuredy))
with(sp, lines(x,y, col="blue"))
library(bezier)
bf <- bezierCurveFit(as.matrix(data[,c(1,3)]))
lines(bezier(t=seq(0, 1, length=500), p=bf$p), col="red", cex=0.25)
UPDATE
As it turns out my actual problem is slightly different I ask another question to reflect my actual issue in the question: How to fit a smooth hysteresis in a poorly distributed data set?
set.seed(12345)
up <- seq(0,1,length.out=100)^3
down <- sqrt(seq(1,0,length.out=100))
x <- c(seq(0,1,length.out=length(up)),
seq(1,0, length.out=length(down)))
data <- data.frame(x=x, y=c(up,down),
measuredx=x + rnorm(length(x))*0.01,
measuredy=c(up,down) + rnorm(length(up)+length(down))*0.03)
Instead of smoothing data$measuredy directly over data$measuredx, do two separate smoothing, by smoothing each against a time stamp variable. Then combine the fitted values from two smoothing. This is a general way for smoothing a closed curve or a loop. (See also Q & A: Smoothing Continuous 2D Points)
t <- seq_len(nrow(data) + 1)
xs <- smooth.spline(t, c(data$measuredx, data$measuredx[1]))$y
ys <- smooth.spline(t, c(data$measuredy, data$measuredy[1]))$y
with(data, plot(measuredx, measuredy))
lines(xs, ys)
c(data$measuredx, data$measuredx[1]) for example is just to ensure that the last value in the vector agrees with the first, so that it completes a cycle.
The curve is not really closed at the bottom left corner, because smooth.spline is doing smoothing not interpolation, so even if we have ensure that data vector completes a cycle, the fitted one may not be a closed one. A practical workaround is to use weighted regression, imposing heavy weight on this spot to make it closed.
t <- seq_len(nrow(data) + 1)
w <- rep(1, length(t)) ## initially identical weight everywhere
w[c(1, length(w))] <- 100000 ## give heavy weight
xs <- smooth.spline(t, c(data$measuredx, data$measuredx[1]), w)$y
ys <- smooth.spline(t, c(data$measuredy, data$measuredy[1]), w)$y
with(data, plot(measuredx, measuredy), col = 8)
lines(xs, ys, lwd = 2)

How to Plot an exponential distribution of spike times over a histogram of them, in R?

So my question follows the development after my last one. I have been trying to work on getting the spike times as a rastor plot for a spike train. I took a firing rate of 100 and got spike train for 20 trials: The code for that is:
fr = 100
dt = 1/1000 #dt in milisecond
duration = 2 #no of duration in s
nBins = 2000 #SpikeTrain
nTrials = 20 #NumberOfSimulations
MyPoissonSpikeTrain = function(p, fr= 100) {
p = runif(nBins)
q = ifelse(p < fr*dt, 1, 0)
return(q)
}
set.seed(1)
SpikeMat <- t(replicate(nTrials, MyPoissonSpikeTrain()))
plot(x=-1,y=-1, xlab="time (s)", ylab="Trial",
main="Spike trains",
ylim=c(0.5, nTrials+1), xlim=c(0, duration))
for (i in 1: nTrials)
{
clip(x1 = 0, x2= duration, y1= (i-0.2), y2= (i+0.4))
abline(h=i, lwd= 1/4)
abline(v= dt*which( SpikeMat[i,]== 1))
}
This gives the result:
After all this was done, my next task was to get a vector of Inter-Spike intervals and get a histogram of them. Because the distribution of ISIs follows the exponential distribution, if I plot the exponential distribution of ISIs with the same data, it will match the curve made by the height of the histograms.
So to get the interspike timings first, I used:
spike_times <- c(dt*which( SpikeMat[i, ]==1))
Then to get a vector for interspike intervals and their histogram, I used the following command line,
ISI <- diff(spike_times)
hist(ISI, density= 10, col= 'blue', xlab='ISI(ms)', ylab='number of occurences')
and it gave me this plot:
Now, What I want is to plot the exponential distributions within the histograms that justifies the exponential distribution nature of the inter spike intervals. I am confused about what parameters to use and which rate to use. If somebody has worked with Interspike interval plotting, please help. And I am sorry if my data seems incomplete, please let me know if I am missing something.
My fellow researcher just told me a simple line of codes:
x <- seq(0, 0.05, length=1000)
y <- dexp(x, rate=100)
lines(x,y)
which gave me, this:
If somebody has any way of making this process more efficient, please help me.

Graphing a polynomial output of calc.poly

I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)

Generate two categorical variables with a chosen degree of association in R

I'd like to use R to generate two categorical variables (such as eye color and hair color, for instance) where I can specify the degree to which these two variables are associated. It doesn't really matter to me which levels of eye color would be associated with which levels of hair color, but just being able to specify an overall association, such as by specifying the odds ratio, is a requirement. Also, I know there are ways to do this for two normally distributed continuous variables using, for example, the mvtnorm package, so I could take that route and then choose cut points to make the variables categorical after the fact, but I don't want to do it that way if I can avoid it. Any help would be greatly appreciated!
Edit: apologies for not being clearer from the start, but what I'm really asking I suppose is whether or not there's a function anybody knows of in some R package that will do this in one or two lines.
If you can specify the odds ratios (and you also need to specify the baseline odds), you just convert them to probabilities and use runif().
Edit (I misunderstood the question): Take a look at the bindata package.
If you like, here is a function I wrote that you can use to generate such data without the package. It is rather clunky; it's intended to be self-explanatory rather than elegant or fast.
odds.to.probs <- function(odds){
probs <- odds / (odds+1)
return(probs)
}
get.correlated.binary.data <- function(N, odds.x.eq.0, odds.y.eq.0.x.eq.0,
odds.ratio){
odds.y.eq.0.x.eq.1 <- odds.y.eq.0.x.eq.0*odds.ratio
prob.x.eq.0 <- odds.to.probs(odds.x.eq.0)
prob.y.eq.0.x.eq.0 <- odds.to.probs(odds.y.eq.0.x.eq.0)
prob.y.eq.0.x.eq.1 <- odds.to.probs(odds.y.eq.0.x.eq.1)
x <- ifelse(runif(N)<=prob.x.eq.0, 0, 1)
y <- rep(NA, N)
y <- ifelse(x==0, ifelse(runif(sum(x))<=prob.y.eq.0.x.eq.0, 0, 1), y)
y <- ifelse(x==1, ifelse(runif( (N-sum(x)) )<=prob.y.eq.0.x.eq.1, 0, 1), y)
dat <- data.frame(x=x, y=y)
return(dat)
}
> set.seed(9)
> dat <- get.correlated.binary.data(30, 3, 1.5, -.03)
> table(dat)
y
x 0 1
0 10 13
1 0 7

Long vector-plot/Coverage plot in R

I really need your R skills here. Been working with this plot for several days now. I'm a R newbie, so that might explain it.
I have sequence coverage data for chromosomes (basically a value for each position along the length of every chromosome, making the length of the vectors many millions). I want to make a nice coverage plot of my reads. This is what I got so far:
Looks alright, but I'm missing y-labels so I can tell which chromosome it is, and also I've been having trouble modifying the x-axis, so it ends where the coverage ends. Additionally, my own data is much much bigger, making this plot in particular take extremely long time. Which is why I tried this HilbertVis plotLongVector. It works but I can't figure out how to modify it, the x-axis, the labels, how to make the y-axis logged, and the vectors all get the same length on the plot even though they are not equally long.
source("http://bioconductor.org/biocLite.R")
biocLite("HilbertVis")
library(HilbertVis)
chr1 <- abs(makeRandomTestData(len=1.3e+07))
chr2 <- abs(makeRandomTestData(len=1e+07))
par(mfcol=c(8, 1), mar=c(1, 1, 1, 1), ylog=T)
# 1st way of trying with some code I found on stackoverflow
# Chr1
plotCoverage <- function(chr1, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr1[start:end]), type="l")
}
plotCoverage(chr1, start=1, end=length(chr1)) # Plots coverage result.
# Chr2
plotCoverage <- function(chr2, start, end) { # Defines coverage plotting function.
plot.new()
plot.window(c(start, length(chr1)), c(0, 10))
axis(1, labels=F)
axis(4)
lines(start:end, log(chr2[start:end]), type="l")
}
plotCoverage(chr2, start=1, end=length(chr2)) # Plots coverage result.
# 2nd way of trying with plotLongVector
plotLongVector(chr1, bty="n", ylab="Chr1") # ylab doesn't work
plotLongVector(chr2, bty="n")
Then I have another vector called genes that are of special interest. They are about the same length as the chromosome-vectors but in my data they contain more zeroes than values.
genes_chr1 <- abs(makeRandomTestData(len=1.3e+07))
genes_chr2 <- abs(makeRandomTestData(len=1e+07))
These gene vectors I would like plotted as a red dot under the chromosomes! Basically, if the vector has a value there (>0), it is presented as a dot (or line) under the long vector plot. This I have not idea how to add! But it seems fairly straightforward.
Please help me! Thank you so much.
DISCLAIMER: Please do not simply copy and paste this code to run off the entire positions of your chromosome. Please sample positions (for example, as #Gx1sptDTDa shows) and plot those. Otherwise you'd probably get a huge black filled rectangle after many many hours, if your computer survives the drain.
Using ggplot2, this is really easily achieved using geom_area. Here, I've generated some random data for three chromosomes with 300 positions, just to show an example. You can build up on this, I hope.
# construct a test data with 3 chromosomes and 100 positions
# and random coverage between 0 and 500
set.seed(45)
chr <- rep(paste0("chr", 1:3), each=100)
pos <- rep(1:100, 3)
cov <- sample(0:500, 300)
df <- data.frame(chr, pos, cov)
require(ggplot2)
p <- ggplot(data = df, aes(x=pos, y=cov)) + geom_area(aes(fill=chr))
p + facet_wrap(~ chr, ncol=1)
You could use the ggplot2 package.
I'm not sure what exactly you want, but here's what I did:
This has 7000 random data points (about double the amount of genes on Chromosome 1 in reality). I used alpha to show dense areas (not many here, as it's random data).
library(ggplot2)
Chr1_cov <- sample(1.3e+07,7000)
Chr1 <- data.frame(Cov=Chr1_cov,fil=1)
pl <- qplot(Cov,fil,data=Chr1,geom="pointrange",ymin=0,ymax=1.1,xlab="Chromosome 1",ylab="-",alpha=I(1/50))
print(pl)
And that's it. This ran in less than a second. ggplot2 has a humongous amount of settings, so just try some out. Use facets to create multiple graphs.
The code beneath is for a sort of moving average, and then plotting the output of that. It is not a real moving average, as a real moving average would have (almost) the same amount of data points as the original - it will only make the data smoother. This code, however, takes an average for every n points. It will of course run quite a bit faster, but you will loose a lot of detailed information.
VeryLongVector <- sample(500,1e+07,replace=TRUE)
movAv <- function(vector,n){
chops <- as.integer(length(vector)/n)
count <- 0
pos <- 0
Cov <-0
pos[1:chops] <- 0
Cov[1:chops] <- 0
for(c in 1:chops){
tmpcount <- count + n
tmppos <- median(count:tmpcount)
tmpCov <- mean(vector[count:tmpcount])
pos[c] <- tmppos
Cov[c] <- tmpCov
count <- count + n
}
result <- data.frame(pos=pos,cov=Cov)
return(result)
}
Chr1 <- movAv(VeryLongVector,10000)
qplot(pos,cov,data=Chr1,geom="line")

Resources