I am in the midst of solving a problem in Reconstructing (or recovering) a probability distribution function when only the moments of the distribution are known. I have written codes in R for it and although the logic seems right to me, I am not getting the output that I want.
The equation I am trying to using as the approximated (or reconstructed or recovered) CDF is what you see in the image below. I am writing codes for the right hand side of the equation and equating that to a vector that I call F in my codes.
The link to paper that contains the original equation can be found here.
http://www.sciencedirect.com/science/article/pii/S0167715208000187
It is marked as equation (2) in the paper.
Here is the code I wrote.:
#R Codes:
alpha <- 50
T <- 1
x <- seq(0, T, by = 0.1)
# Original CDF equation
Ft <- (1-log(x^3))*(x^3)
plot(x, Ft, type = "l", ylab = "", xlab = "")
# Approximated CDF equation using Moment type reconstruction
k<- floor(alpha*y/T)
for(i in 1:length(k))
{
for(j in k[i]:alpha)
{
F[x+1] <- (factorial(alpha)/(factorial(alpha-j)*factorial(j-k)*factorial(k)))*(((-1)^(j-k))/(T^j))*((9/(j+3))^2)
}
}
plot(x[1:7], F, type = "l", ylab = "", xlab = "")
Any help will be appreciated here because the approximation and the graph obtained using my codes is grossly different from the original curve.
It seems clear that your problem is in here.
F[x+1] <- (factorial(alpha)/(factorial(alpha-j)*factorial(j-k)*factorial(k)))*(((-1)^(j-k))/(T^j))*((9/(j+3))^2)
You are trying to get something varying in x, yes? So how can you get that, if the right hand side of this equation has nothing varying in x, while the left hand side has an assignment using non-integer indices?
alpha <- 30 #In the exemple you try to reproduce, they use an alpha of 30 if i understood correctly (i'm a paleontologist not a mathematician so this paper's way beyond my area of expertise :) )
tau <- 1 #tau is your T (i changed it to avoid confusion with TRUE)
x <- seq(0, tau, by = 0.001)
f<-rep(0,length(x)) #This is your F (same reason as above for the change).
#It has to be created as a vector of 0 before your loop since the whole idea of the loop is that you want to proceed by incrementation.
#You want a value of f for each of your element of x so here is your first loop:
for(i in 1:length(x)){
#Then you want the sum for all k going from 1 to alpha*x[i]/tau:
for(k in 1:floor(alpha*x[i]/tau)){
#And inside that sum, the sum for all j going from k to alpha:
for(j in k:alpha){
#This sum needs to be incremented (hence f[i] on both side)
f[i]<-f[i]+(factorial(alpha)/(factorial(alpha-j)*factorial(j-k)*factorial(k)))*(((-1)^(j-k))/(tau^j))*(9/(j+3)^2)
}
}
}
plot(x, f, type = "l", ylab = "", xlab = "")
Related
I am having some trouble with a homework I have at Statistics.
I am required to graphical represent the density and the distribution function in two inline plots for a set of parameters at my choice ( there must be minimum 4 ) for Student, Fisher and ChiS repartitions.
Let's take only the example of Student Repartition.
From what I have searched on the internet, I have come with this:
First, I need to generate some random values.
x <- rnorm( 20, 0, 1 )
Question 1: I need to generate 4 of this?
Then I have to plot these values with:
plot(dt( x, df = 1))
plot(pt( x, df = 1))
But, how to do this for four set of parameters? They should be represented in the same plot.
Is this the good approach to what I came so far?
Please, tell me if I'm wrong.
To plot several densities of a certain distribution, you have to first have a support vector, in this case x below.
Then compute the values of the densities with the parameters of your choice.
Then plot them.
In the code that follows, I will plot 4 Sudent-t pdf's, with degrees of freedom 1 to 4.
x <- seq(-5, 5, by = 0.01) # The support vector
y <- sapply(1:4, function(d) dt(x, df = d))
# Open an empty plot first
plot(1, type = "n", xlim = c(-5, 5), ylim = c(0, 0.5))
for(i in 1:4){
lines(x, y[, i], col = i)
}
Then you can make the graph prettier, by adding a main title, changing the axis titles, etc.
If you want other distributions, such as the F or Chi-squared, you will use x strictly positive, for instance x <- seq(0.0001, 10, by = 0.01).
Update 2.0: Now with data such that the errors should be reproducible:
Data for the different functions:
z <- seq(0,2,length=1000)
t <- grid <- c(0.1,0.55,0.9)
parA <- c(0.21,-0.93)
parB <- c(0.21,1.008)
p <- c(1,2,1,2)
## for plotting ##
f_func <- function(x) exp(-x^3+x)
envARS1 <- function(x){ exp(parA[1]*x+parB[1])}
envARS2 <- function(x){ exp(parA[2]*x+parB[2])}
plot(x=z,y=envARS1(z), type = "l", col = "blue", ylim = c(0,2), xlim = c(0,2))
lines(x=z,y=envARS2(z), type = "l", col = "red")
lines(x = z,(f_func(z)), type = "l", col = "black")
I'm trying to implement an Adaptive rejection sampler using a derivative-free approach. Along the way of this implementation, I have to implement a dynamic envelope function, which is able to adjust depending on the values/number of some Zt's.
I have accomplished to write a dynamic envelope function which seems to work fine but when I try to integrate the envelope, with the final aim of drawing from this envelope, I get errors.
DynamicEnv <- function(x){
exp(parA[p[max(which(x>=grid))]]*x+
parB[p[max(which(x>=grid))]])
}
The envelope function is a exponential linear line and the parameters a and b depends on where the x, it's input, is located relatively to the Zt's.
The variable 'grid' contains the Zt's and is therefore a vector, p is a dynamic position variable, which essentially tells the function which parameters to use.
So the first problem I had was that, when I gave my dynamic envelope a vector as input, I get troubles with the 'which' function which only can handle numeric values as far as I understand.
Updated with the error I receive from 'which'
I get the below error with which:
Error in which(x > grid) :
dims [product 3] do not match the length of object [1000]
Which I believe occurs because 'which' tries to compare both vectors to each other, and not the n'th element in x with the entire vector of grid.
Then I try to incorporate a loop, to loop over all the values in the x-vector, and return a vector with the output values, but then I got the error message 'non-finite function values' when I tried to integrate my dynamic envelope.
The dynamic envelope with a loop inside is;
DynamicEnv1 <- function(x){
Draws <- matrix(0,length(x),1)
for (i in 1:length(x)) Draws[i,1] <-
exp(parA[p[max(which(x[i]>=grid))]]*x[i] + parB[p[max(which(x[i]>=grid))]])
return(Draws)
}
I have written this 'static' envelope function, which works fine with respect to making draws from it (thereby integrate).
envARSup <- function(x){ (ifelse((x <= t[1] | t[2] < x & x <= t[3]),
exp(parA[1]*x+parB[1]),exp(parA[2]*x+parB[2])))*1*(x>0)}
Here the t's are the Zt's mentioned above. The idea of the dynamic envelope should be clear from this function, since they ideally should be able to return the same for the same grid (Zt's/t's).
The above function checks which interval the value of x belongs to, and based on the interval it uses a specific exponential linear line.
I would really appreciate if someone could suggest an alternative to the 'which' function, in order to locate a position in a vector or help me understand why I get the error message with the loop-based dynamic envelope.
Essentially I want to plot a compound Poisson process. Everything works fine except that I don't know how to edit the plot parameters correctly.
I want to have the continuity points with a full dot and the discontinuity points with an empty dot. Right now I only am able to manage the full dot.
Minimal working example (plots an compound Poisson path with 10 jumps)
n <- 10
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X",do.points = TRUE,pch = 16,col.points = "blue",verticals = FALSE)
So how can I add the discontinuity points to the right? Any idea?
You can use points to add the points after the original plot.
set.seed(2017) ## For reproducibility
## Your code
n <- 10
n.t <- cumsum(rexp(n))
x <- c(0,cumsum(rnorm(n)))
plot(stepfun(n.t, x), xlab="t", ylab="X",
do.points = TRUE,pch = 16,col.points = "blue",verticals = FALSE)
## Add the endpoints
points(n.t, x[-length(x)], pch = 1)
I need help in customizing the smoothing to my time series data. The code below smooths the data using sm.regression and approx functions, but the degree of smoothing is not user controlled, i.e. By changing function parameters I want the ability to control whether the smoothed curve follows the underlying data more closely or less closely.
find.extrema <- function(x)
{
if(is.xts(x)) {
y = as.vector( Cl(x) )
} else {
y = x
}
n = len(y)
t = 1:n
h = h.select(t, y, method = 'cv')
temp = sm.regression(t, y, h=h, display = 'none')
mhat = approx(temp$eval.points, temp$estimate, t, method='linear')$y
#mhat = y #to exactly match underlying data
return (mhat)
}
Any help would be appreciated.
Thanks.
There are not many questions regarding the sm package. Perhaps it is not widely used nowadays, but I still remember playing with it a lot when doing my MRes degree.
You can't control smoothness because you are using cross-validation for auto-selection of smoothing parameter. Just get rid of the h.select line and pass h as an argument of your function.
find.extrema <- function(x, h = NULL)
{
if(is.xts(x)) {
y = as.vector( Cl(x) )
} else {
y = x
}
n = len(y)
t = 1:n
## if not given, do auto-selection from data
if (is.null(h)) h = h.select(t, y, method = 'cv')
temp = sm.regression(t, y, h=h, display = 'none')
mhat = approx(temp$eval.points, temp$estimate, t, method='linear')$y
#mhat = y #to exactly match underlying data
return (mhat)
}
The whole point of sm package on kernel smoothing and / or kernel density estimation, is the cross-validation part. If you don't utilize it, you can just use ksmooth from R base for Nadaraya-Watson kernel estimator. You may read more about it from Scatter plot kernel smoothing: ksmooth() does not smooth my data at all. I did made a comparison with sm.regression there.
I have a matrix of complex values.
If I issue the command:
plot(myMatrix)
then it shows on the graphics device a kind of scatterplot, with X-axis labeled Re(myMatrix) and Y-axis with Im(myMatrix). This shows the information I'm looking for, as I can see distinct clusters, that I cannot see with only one column.
My questions are :
I assume there is one point per matrix row. Is it right ?
How is calculated Re(myMatrix) for each row vector ?
It is not Re(myMatrix[1,row]), but seems to be a mix of all values of row vector. I would like to be able to get these values, so to know how to compute them with R.
No, there is one point for each matrix element.
set.seed(42)
mat <- matrix(complex(real = rnorm(16), imaginary = rlnorm(16)), 4)
plot(mat)
points(Re(mat[1,1]), Im(mat[1,1]), col = "red", pch = ".", cex = 5)
Look for the red dot:
You'd get the same plot, if you plotted a vector instead of a matrix, i.e., plot(c(mat)).
This happens because plot.default calls xy.coords and that function contains the following code:
else if (is.complex(x)) {
y <- Im(x)
x <- Re(x)
xlab <- paste0("Re(", ylab, ")")
ylab <- paste0("Im(", ylab, ")")
}
else if (is.matrix(x) || is.data.frame(x)) {
This means, that the fact that input is complex takes priority over it being a matrix.