R: Writing a recursive function for a Random Walk (initial values) - r

I'm a new user to R, and I am trying to create a function that will simulate a random walk. The issue for me is trying to integrate some initial values smoothly. Say I have this basic function.
y(t) = y(t-2) + eps(t)
Epsilon (or eps(t)) will be the randomness factor. I want to define y(-1)=0, and y(0)=0.
Here is my code:
ran.walk=function(n){ # 'n' steps will be the input
eps=rnorm(n) # creates a vector taking random values from N(0,1)
y= c(eps[1], eps[2]) # this will set up my initial vector
for (i in 3:n){
ytemp = y[i-2] + eps[i] ## !!! problem is here. Details below !!!
y= c(y, ytemp)
}
return(y)
}
I'm trying to get this start adding y3, y4, y5, etc, but I think there is a flaw in this design... I'm not sure if I should just set up two separate lines, with an if statement: testing if n is even or odd, perhaps with:
if i%%2 == 1 #using modulus
Since,
y1= eps1,
y2= eps2,
y3= y1 + eps3,
y4= y2 + eps4,
y5= y3 + eps5 and so on...
Currently, I see the error in my code.
I have y1, and y2 concatenated, but I don't think it knows how to incorporate y[1]
Can I define beforehand somehow y[-1]=0, and y[0]=0 ? I tried this also and got an error.
Thank you kindly in advance for any assistance. This is first times attempting a for loop with recursion.
-N (sorry for any formatting issues, I had a lot of problems getting this question to go through)

I found that your odd and even series is independent one of the other. Assuming that it is the case, I just split the problem in two columns and use cumsum to get the random walk. The final data frame include the random numbers and the random walk, so you can compare it is working properly.
Hoping it helps
ran.walk=function(n) {
eps=rnorm(ceiling(n / 2)*2)
dim(eps) <- c(2,ceiling(n/2))
# since each series is independent, we can tally each one in its own
eps2 <- apply(eps, 1, cumsum)
# and just reorganize it
eps2 <- as.numeric(t(eps2))
rndwlk <- data.frame(rnd=as.numeric(eps), walk=eps2)
# remove the extra value if needed
rndwlk <- rndwlk[1:n,]
return(rndwlk)
}
ran.walk(13)

After taking a break with my piano, it came to me. It's funny how simple the answer becomes once you discover it... almost trivial.
Setting the initial value to be a vector, that is:
[y(1) = y(-1) + eps(1), y(2)= y(0) + eps(2)]
everything works out. It is still true that the evens and odds don't interact, but there is no reason to specify any of that.
The method to split the iterations with modulus, then concatenating it back into the main vector would also work, but is unnecessary and more complicated. Shorter is better for users and computers. As Einstein said, make it as simple as possible, but no simpler.

Related

How to create a random walk in R that goes in different directions than -1 or +1?

Consider this two‐dimensional random walk:
where, Zt, Wt, t = 1,2,3, … are independent and identically distributed standard normal
random variables.
I am having problems in finding a way to simulate and plot the sample path of (X,Y) for t = 0,1, … ,100. I was given a sample:
The following code is an example of the way I am used to plot random walks in R:
set.seed(13579)
r<-sample(c(-1,1),size=100,replace=T,prob=c(0.5,0.5))
r<-c(10,r))
(w<-cumsum(r))
w<-as.ts(w)
plot(w,main="random walk")
I am not very sure of how to achieve this.
The problem I am having is that this kind of codes has a more "simple" result, with a line that goes either up or down, -1 or +1:
while the plot I need to create also goes from left to right and viceversa.
Would you help me in correcting the code I know so that it fits my task/suggesting a smarterst way to go about it? It would be greatly appreciated.
Cheers!
Instead of using sample, you need to use rnorm(100) to draw 100 samples from a standard normal distribution. Since the walk starts at [0, 0], we need to append a 0 at the start and do a cumsum on the result, i.e. cumsum(c(0, rnorm(100))).
We want to do this for both the x and y variables, then plot. The whole thing can be done in a single line of code in base R:
plot(x = cumsum(c(0, rnorm(100))), y = cumsum(c(0, rnorm(100))), type = 'l')

In R, find non-linear lines from two sets of points and then find the intersection of those points

Using R, I want to estimate two curves using points from two vectors, and then find the x and y coordinates where those estimated curves intersect.
In a strategic setting with players "t" and "p", I am simulating best responses for both players in response to what the other would pick in a strategic setting (game theory). The problem is that I don't have functions or lines, I have two sets of points originating from simulation, with one set of points corresponding to the player's best response to given actions by the other player. The actual math was too difficult for me (or matlab) to solve, which is why I'm using this simulated visual approach. I want to estimate best response functions (i.e. create non-linear curves) using the points, and then take the two estimated curves and find where they intersect in order to identify nash equilibrium (where the best response curves intersect).
As an example, here are two such vectors I am working with:
t=c(10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.1,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0)
p=c(12.3,12.3,12.3,12.3,12.3,12.3,12.4,12.4,12.4,12.5,12.5,12.5,12.6,12.6,12.7,12.7,12.8,12.8,12.9,12.9,13.0,13.1,13.1,13.2,13.3,13.4,13.5,13.4,13.5,13.6,13.6,13.7,13.8,13.8,13.9,13.9,13.9,14.0,14.0,14.0,14.0)
For the first line, the sample is made up of (t,a), and for the second line, the sample is made up of (a,p) where a is a third vector given by
a = seq(10, 14, by = 0.1)
For example, the first point for the sample corresponding to the first vector would be (10.0,10.0) and the second point would be (10.0,10.1). The first point for the sample corresponding to the second vector would be (10.0,12.3) and the second point would be (10.1,12.3).
What I originally tried to do is estimate the lines using polynomials produced by lm models, but those don't seem to always work:
plot(a,t, xlim=c(10,14), ylim=c(10,14), col="purple")
points(p,a, col="red")
fit4p <- lm(a~poly(p,3,raw=TRUE))
fit4t <- lm(t~poly(a,3,raw=TRUE))
lines(a, predict(fit4t, data.frame(x=a)), col="purple", xlim=c(10,14), ylim=c(10,14),type="l",xlab="p",ylab="t")
lines(p, predict(fit4p, data.frame(x=a)), col="green")
fit4pCurve <- function(x) coef(fit4p)[1] +x*coef(fit4p)[2]+x^2*coef(fit4p)[3]+x^3*coef(fit4p)[4]
fit4tCurve <- function(x) coef(fit4t)[1] +x*coef(fit4t)[2]+x^2*coef(fit4t)[3]+x^3*coef(fit4t)[4]
a_opt1 = optimise(f=function(x) abs(fit4pCurve(x)-fit4tCurve(x)), c(10,14))$minimum
b_opt1 = as.numeric(fit4pCurve(a_opt1))
EDIT:
After fixing the type, I get the correct answer, but it doesn't always work if the samples don't come back as cleanly.
So my question can be broken down a few ways. First, is there a better way to accomplish what I'm trying to do. I know what I'm doing isn't perfectly accurate by any means, but it seems like a decent approximation for my purposes. Second, if there isn't a better way, is there a way I could improve on the methodology I have listed above.
Restart your R session, make sure all variables are cleared and copy/paste this code. I found a few mistakes in referenced variables. Also note that R is case sensitive. My suspicion is that you've been overwriting variables.
plot(a,t, xlim=c(10,14), ylim=c(10,14), col="purple")
points(p,a, col="red")
fit4p <- lm(a~poly(p,3,raw=TRUE))
fit4t <- lm(t~poly(a,3,raw=TRUE))
lines(a, predict(fit4t, data.frame(x=a)), col="purple", xlim=c(T,P), ylim=c(10,14),type="l",xlab="p",ylab="t")
lines(p, predict(fit4p, data.frame(x=a)), col="green")
fit4pCurve <- function(x) coef(fit4p)[1] +x*coef(fit4p)[2]+x^2*coef(fit4p)[3]+x^3*coef(fit4p)[4]
fit4tCurve <- function(x) coef(fit4t)[1] +x*coef(fit4t)[2]+x^2*coef(fit4t)[3]+x^3*coef(fit4t)[4]
a_opt = optimise(f=function(x) abs(fit4pCurve(x)-fit4tCurve(x)), c(T,P))$minimum
b_opt = as.numeric(fit4pCurve(a_opt))
As you will see:
> a_opt
[1] 12.24213
> b_opt
[1] 10.03581

how to intersect an interpolated surface z=f(x,y) with z=z0 in R

I found some posts and discussions about the above, but I'm not sure... could someone please check if I am doing anything wrong?
I have a set of N points of the form (x,y,z). The x and y coordinates are independent variables that I choose, and z is the output of a rather complicated (and of course non-analytical) function that uses x and y as input.
My aim is to find a set of values of (x,y) where z=z0.
I looked up this kind of problem in R-related forums, and it appears that I need to interpolate the points first, perhaps using a package like akima or fields.
However, it is less clear to me: 1) if that is necessary, or the basic R functions that do the same are sufficiently good; 2) how I should use the interpolated surface to generate a correct matrix of the desired (x,y,z=z0) points.
E.g. this post seems somewhat related to the problem I am describing, but it looks extremely complicated to me, so I am wondering whether my simpler approach is correct.
Please see below some example code (not the original one, as I said the generating function for z is very complicated).
I would appreciate if you could please comment / let me know if this approach is correct / suggest a better one if applicable.
df <- merge(data.frame(x=seq(0,50,by=5)),data.frame(y=seq(0,12,by=1)),all=TRUE)
df["z"] <- (df$y)*(df$x)^2
ta <- xtabs(z~x+y,df)
contour(ta,nlevels=20)
contour(ta,levels=c(1000))
#why are the x and y axes [0,1] instead of showing the original values?
#and how accurate is the algorithm that draws the contour?
li2 <- as.data.frame(contourLines(ta,levels=c(1000)))
#this extracts the contour data, but all (x,y) values are wrong
require(akima)
s <- interp(df$x,df$y,df$z)
contour(s,levels=c(1000))
li <- as.data.frame(contourLines(s,levels=c(1000)))
#at least now the axis values are in the right range; but are they correct?
require(fields)
image.plot(s)
fancier, but same problem - are the values correct? better than the akima ones?

Remove redundant points for line plot

I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.
My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (that is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?
Thank you for your time.
The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:
Compute the line equation between first and last point
Check all other points to find what is the most distant from the line
If the worst point is below the tolerance you want then output a single segment
Otherwise call recursively considering two sub-arrays, using the worst point as splitter
In python this could be
def simplify(pts, eps):
if len(pts) < 3:
return pts
x0, y0 = pts[0]
x1, y1 = pts[-1]
m = float(y1 - y0) / float(x1 - x0)
q = y0 - m*x0
worst_err = -1
worst_index = -1
for i in xrange(1, len(pts) - 1):
x, y = pts[i]
err = abs(m*x + q - y)
if err > worst_err:
worst_err = err
worst_index = i
if worst_err < eps:
return [(x0, y0), (x1, y1)]
else:
first = simplify(pts[:worst_index+1], eps)
second = simplify(pts[worst_index:], eps)
return first + second[1:]
print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)
The output is [(0, 0), (30, 30), (50, 0)].
About python syntax for arrays that may be non obvious:
x[a:b] is the part of array from index a up to index b (excluded)
x[n:] is the array made using elements of x from index n to the end
x[:n] is the array made using first n elements of x
a+b when a and b are arrays means concatenation
x[-1] is the last element of an array
An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.
I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.
So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.
Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.
I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.
I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.
Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)

This is more a matlab/math brain teaser than a question

Here is the setup. No assumptions for the values I am using.
n=2; % dimension of vectors x and (square) matrix P
r=2; % number of x vectors and P matrices
x1 = [3;5]
x2 = [9;6]
x = cat(2,x1,x2)
P1 = [6,11;15,-1]
P2 = [2,21;-2,3]
P(:,1)=P1(:)
P(:,2)=P2(:)
modePr = [-.4;16]
TransPr=[5.9,0.1;20.2,-4.8]
pred_modePr = TransPr'*modePr
MixPr = TransPr.*(modePr*(pred_modePr.^(-1))')
x0 = x*MixPr
Then it was time to apply the following formula to get myP
, where μij is MixPr. I used this code to get it:
myP=zeros(n*n,r);
Ptables(:,:,1)=P1;
Ptables(:,:,2)=P2;
for j=1:r
for i = 1:r;
temp = MixPr(i,j)*(Ptables(:,:,i) + ...
(x(:,i)-x0(:,j))*(x(:,i)-x0(:,j))');
myP(:,j)= myP(:,j) + temp(:);
end
end
Some brilliant guy proposed this formula as another way to produce myP
for j=1:r
xk1=x(:,j); PP=xk1*xk1'; PP0(:,j)=PP(:);
xk1=x0(:,j); PP=xk1*xk1'; PP1(:,j)=PP(:);
end
myP = (P+PP0)*MixPr-PP1
I tried to formulate the equality between the two methods and seems to be this one. To make things easier, I skipped the summation of matrix P in both methods .
where the first part denotes the formula that I used, and the second comes from his code snippet. Do you think this is an obvious equality? If yes, ignore all the above and just try to explain why. I could only start from the LHS, and after some algebra I think I proved it equals to the RHS. However I can't see how did he (or she) think of it in the first place.
Using E for expectation, the one dimensional version of your formula is the familiar:
Variance(X) = E((X-E(X))^2) = E(X^2) - E(X)^2
While the second form might be easier programming, I'd worry about ending up with a negative (or, in the multidimensional case, non positive definite) answer by using it, due to rounding error.

Resources