I am working on a script that should estimate the probability of having at least 2 out of n people having a same birthday within k days from eachother. To estimate this I have the following function:
birthdayRangeCheck.prob = function(nPeople, seperation, nSimulations) {
count = 0
for (i in 1:nSimulations) {
count = count + birthdayRangeCheck(nPeople, seperation)
}
return(count / nSimulations)
}
Now just entering simple values for nPeople, seperation, nSimulations gives me a normal number.
e.g.
birthdayRangeCheck.prob(10,4,100)
-> 0.75
However when I want to plot the probability as a function of nPeople, and seperation I stumble upon the following problem:
x = 1:999
y = 0:998
z = outer(X = x, Y = y, FUN = birthdayRangeCheck.prob, nSimulations = 100)
numerical expression has 576 elements: only the first used... (a lot of times)
So it seems like outer is not entering single elements of x and y, but rather the vectors themselfs, which is the opposite of what outer should do right?
Am I overlooking something? Because I can't figure out what is causing this error. (replacing FUN with e.g. sin(x+y) works like a charm so I did pin it down to the function itself. But since the function works just fine with numeric arguments I don't see why R doesn't understand to just enter elements of x and y as arguments.)
Any help would be greatly appreciated. Thanks ;)
Related
Given a -log10(P) value, I'd like to calculate the Z score in log space, how would I do that?
So, given the following code, how to recode the last line so that it calculates Z from log10P in the log space?
Z=10
log10P = -1*(pnorm(-abs(Z),log.p = T)*1/log(10) + log10(2))
Z== -1*(qnorm(10^-log10P/2)) # <- this needs to be in log space.
qnorm also has a log.p argument analogous to pnorm's, so you can reverse the operations that you used to get log10P in the first place (it took me a couple of tries to get this right ...)
I rearranged your log10P calculation slightly.
log10P_from_Z <- function(Z) {
abs((pnorm(-abs(Z),log.p=TRUE)+log(2))/log(10))
}
Z_from_log10P <- function(log10P) {
-1*qnorm(-(log10P*log(10))-log(2), log.p=TRUE)
}
We can check the round-trip accuracy (i.e. convert from -log10(p) to Z and back, see how close we got to the original value.) This works perfectly for values around 20, but does incur a little bit of round-off error for large values (would have to look more carefully to see if there's anything that can be remedied here).
zvec <- seq(20,400)
err <- sapply(zvec, function(z) {
abs(Z_from_log10P(log10P_from_Z(z))-z)
})
I'm trying to create a loop that will evaluate this equation.
10
y = ∑X^j
j=0
When x = 5. I am trying to use this code
y=0 # initialize y to 0
x = 5
for(i in 1:5){y[i] = (exp(x[0:10]))}
print(y)
but I can't seem to even get the exponents right, let alone the summation. Anyone know how to use a for loop to evaluate this sum?
The code is mixing a for loop with a sequence which is likely not going to produce the results you want. Also, the error that "number of items to replace is not a multiple of replacement length" shows there is a problem with the sequence and trying to index a single value.
x <- 5
y <- 0
for (i in 0:10) {
y <- y + x ^ i
}
Comparing the results to the most succint way listed above shows the results are the same.
> setequal(y, sum(x^(0:10)))
[1] TRUE
I have a scalar function which is obtained by iterative calculations. I wish to differentiate(find the directional derivative) of the values with respect to a matrix elementwise. How should I employ the finite difference approximation in this case. Does diff or gradient help in this case. Note that I only want numerical derivatives.
The typical code that I would work on is:
n=4;
for i=1:n
for x(i)=-2:0.04:4;
for y(i)=-2:0.04:4;
A(:,:,i)=[sin(x(i)), cos(y(i));2sin(x(i)),sin(x(i)+y(i)).^2];
B(:,:,i)=[sin(x(i)), cos(x(i));3sin(y(i)),cos(x(i))];
R(:,:,i)=horzcat(A(:,:,i),B(:,:,i));
L(i)=det(B(:,:,i)'*A(:,:,i)B)(:,:,i));
%how to find gradient of L with respect to x(i), y(i)
grad_L=tr((diff(L)/diff(R)')*(gradient(R))
endfor;
endfor;
endfor;
I know that the last part for grad_L would syntax error saying the dimensions don't match. How do I proceed to solve this. Note that gradient or directional derivative of a scalar functionf of a matrix variable X is given by nabla(f)=trace((partial f/patial(x_{ij})*X_dot where x_{ij} denotes elements of matrix and X_dot denotes gradient of the matrix X
Both your code and explanation are very confusing. You're using an iteration of n = 4, but you don't do anything with your inputs or outputs, and you overwrite everything. So I will ignore the n aspect for now since you don't seem to be making any use of it. Furthermore you have many syntactical mistakes which look more like maths or pseudocode, rather than any attempt to write valid Matlab / Octave.
But, essentially, you seem to be asking, "I have a function which for each (x,y) coordinate on a 2D grid, it calculates a scalar output L(x,y)", where the calculation leading to L involves multiplying two matrices and then getting their determinant. Here's how to produce such an array L:
X = -2 : 0.04 : 4;
Y = -2 : 0.04 : 4;
X_indices = 1 : length(X);
Y_indices = 1 : length(Y);
for Ind_x = X_indices
for Ind_y = Y_indices
x = X(Ind_x); y = Y(Ind_y);
A = [sin(x), cos(y); 2 * sin(x), sin(x+y)^2];
B = [sin(x), cos(x); 3 * sin(y), cos(x) ];
L(Ind_x, Ind_y) = det (B.' * A * B);
end
end
You then want to obtain the gradient of L, which, of course, is a vector output. Now, to obtain this, ignoring the maths you mentioned for a second, if you're basically trying to use the gradient function correctly, then you just use it directly onto L, and specify the grid X Y used for it to specify the spacings between the different elements in L, and collect its output as a two-element array, so that you capture both the x and y vector-components of the gradient:
[gLx, gLy] = gradient(L, X, Y);
I'd first like to describe my problem:
What i want to do is to calculate the number of spikes on prices in a 24 hour window, while I possess half hourly data.
I have seen all Stackoverflow posts like e.g. this one:
Rollapply for time series
(If there are more relevant ones, please let me know ;) )
As I cannot and probably also should not upload my data, here's a minimal example:
I simulate a random variable, convert it to an xts object, and use a user defined function to detect "spikes" (of course pretty ridiculous in this case, but illustrates the error).
library(xts)
##########Simulate y as a random variable
y <- rnorm(n=100)
##########Add a date variable so i can convert it to a xts object later on
yDate <- as.Date(1:100)
##########bind both variables together and convert to a xts object
z <- cbind(yDate,y)
z <- xts(x=z, order.by=yDate)
##########use the rollapply function on the xts object:
x <- rollapply(z, width=10, FUN=mean)
The function works as it is supposed to: it takes the 10 preceding values and calculates the mean.
Then, I defined an own function to find peaks: A peak is a local maximum (higher than m points around it) AND is at least as big as the mean of the timeseries+h.
This leads to:
find_peaks <- function (x, m,h){
shape <- diff(sign(diff(x, na.pad = FALSE)))
pks <- sapply(which(shape < 0), FUN = function(i){
z <- i - m + 1
z <- ifelse(z > 0, z, 1)
w <- i + m + 1
w <- ifelse(w < length(x), w, length(x))
if(all(x[c(z : i, (i + 2) : w)] <= x[i + 1])&x[i+1]>mean(x)+h) return(i + 1) else return(numeric(0))
})
pks <- unlist(pks)
pks
}
And works fine: Back to the example:
plot(yDate,y)
#Is supposed to find the points which are higher than 3 points around them
#and higher than the average:
#Does so, so works.
points(yDate[find_peaks(y,3,0)],y[find_peaks(y,3,0)],col="red")
However, using the rollapply() function leads to:
x <- rollapply(z,width = 10,FUN=function(x) find_peaks(x,3,0))
#Error in `[.xts`(x, c(z:i, (i + 2):w)) : subscript out of bounds
I first thought, well, maybe the error occurs because for it might run int a negative index for the first points, because of the m parameter. Sadly, setting m to zero does not change the error.
I have tried to trace this error too, but do not find the source.
Can anyone help me out here?
Edit: A picture of spikes:Spikes on the australian Electricity Market. find_peaks(20,50) determines the red points to be spikes, find_peaks(0,50) additionally finds the blue ones to be spikes (therefore, the second parameter h is important, because the blue points are clearly not what we want to analyse when we talk about spikes).
I'm still not entirely sure what it is that you are after. On the assumption that given a window of data you want to identify whether its center is greater than the rest of the window at the same time as being greater than the mean of the window + h then you could do the following:
peakfinder = function(x,h = 0){
xdat = as.numeric(x)
meandat = mean(xdat)
center = xdat[ceiling(length(xdat)/2)]
ifelse(all(center >= xdat) & center >= (meandat + h),center,NA)
}
y <- rnorm(n=100)
z = xts(y, order.by = as.Date(1:100))
plot(z)
points(rollapply(z,width = 7, FUN = peakfinder, align = "center"), col = "red", pch = 19)
Although it would appear to me that if the center point is greater than it's neighbours it is necessarily greater than the local mean too so this part of the function would not be necessary if h >= 0. If you want to use the global mean of the time series, just substitute the calculation of meandat with the pre-calculated global mean passed as an argument to peakfinder.
Here is the setup. No assumptions for the values I am using.
n=2; % dimension of vectors x and (square) matrix P
r=2; % number of x vectors and P matrices
x1 = [3;5]
x2 = [9;6]
x = cat(2,x1,x2)
P1 = [6,11;15,-1]
P2 = [2,21;-2,3]
P(:,1)=P1(:)
P(:,2)=P2(:)
modePr = [-.4;16]
TransPr=[5.9,0.1;20.2,-4.8]
pred_modePr = TransPr'*modePr
MixPr = TransPr.*(modePr*(pred_modePr.^(-1))')
x0 = x*MixPr
Then it was time to apply the following formula to get myP
, where μij is MixPr. I used this code to get it:
myP=zeros(n*n,r);
Ptables(:,:,1)=P1;
Ptables(:,:,2)=P2;
for j=1:r
for i = 1:r;
temp = MixPr(i,j)*(Ptables(:,:,i) + ...
(x(:,i)-x0(:,j))*(x(:,i)-x0(:,j))');
myP(:,j)= myP(:,j) + temp(:);
end
end
Some brilliant guy proposed this formula as another way to produce myP
for j=1:r
xk1=x(:,j); PP=xk1*xk1'; PP0(:,j)=PP(:);
xk1=x0(:,j); PP=xk1*xk1'; PP1(:,j)=PP(:);
end
myP = (P+PP0)*MixPr-PP1
I tried to formulate the equality between the two methods and seems to be this one. To make things easier, I skipped the summation of matrix P in both methods .
where the first part denotes the formula that I used, and the second comes from his code snippet. Do you think this is an obvious equality? If yes, ignore all the above and just try to explain why. I could only start from the LHS, and after some algebra I think I proved it equals to the RHS. However I can't see how did he (or she) think of it in the first place.
Using E for expectation, the one dimensional version of your formula is the familiar:
Variance(X) = E((X-E(X))^2) = E(X^2) - E(X)^2
While the second form might be easier programming, I'd worry about ending up with a negative (or, in the multidimensional case, non positive definite) answer by using it, due to rounding error.