R fit function returning only starting number - r

I am automating my trading in R. I am trying to use the nls and fit function to optimise my formula, however only get returned the initial starting parameter which I enter. Instead of using trial and error I am trying to find a way to use a function to be returned the optimal value for my strategy.
I have tried entering various values for the variables "a" and "b" however only get returned the starting values I enter and no optimisation is taking place. I am not sure if I am using the wrong function or if there is a more appropriate one I should be using. The code below shows what I have tried, the variable values (given by the model, not the ones I am trying to optimise have been generated randomly as I do not know how to get the market data uploaded into this question post.
# VARIABLES
x <- 1:1000 # number instead of date
y <- round((runif(1000, min=0, max=50)), digits=2) # highest price of the day minus the opening price of the day
z <- round((runif(1000, min=0.001, max=0.040)), digits=6) # implied volatility for the day
w <- sample(2000:2800, 1000, replace=TRUE) # opening price for the day
# FORMULA
# OPEN PRICE OF THE DAY - MULTIPLIED - BY IMPLIED VOLATILITY FOR THE DAY = (APPROXIMATLY) HIGHEST PRICE OF THE DAY - MINUS - OPEN PRICE FOR THE DAY
( w * (1 + z)) - w = y
# OPTMISED FORMULA FORMAT
(( w * ((1 + z) * a)) * b) - w = y # ATTEMPTING TO OPTMISE MY FORMULA TO IMPROVE THE ACCURACY OF RESULT FOR EXPECTED HIGH (y)
# TRYING WITH STARTING VALUES
a <- 0.000001
b <- 0.000001
# USING nls function and fit
m<-nls( y~ (( w * ((1 + z) * a)) - w)) + b
# OR
m<-nls( y~(( w * ((1 + z) * a)) * b)) - w
I am trying to get the values of the variables "a" and "b" which best suit either version of my formulas, so that the expected high approximates the realised high better. Thanks in advance for any help you guys might be able to offer.

It is difficult to understand your function to be optimized. Try something like this
m<-nls( y~ w * (1 + z) * a - b* w,start=list(a=a,b=b))
m
> m
Nonlinear regression model
model: y ~ w * (1 + z) * a - b * w
data: parent.frame()
a b
0.0089771 -0.0008416
residual sum-of-squares: 221244
Number of iterations to convergence: 1
Achieved convergence tolerance: 1.944e-07
> coef(m)
a b
0.0089771178 -0.0008416359

Related

How can I find LGCP random field Lambda values in overall area?

There is a rLGCP model example in the RandomField package.
if(require(RandomFields)) {
# homogeneous LGCP with exponential covariance function
X <- rLGCP("exp", 3, var=0.2, scale=.1)
# inhomogeneous LGCP with Gaussian covariance function
m <- as.im(function(x, y){5 - 1.5 * (x - 0.5)^2 + 2 * (y - 0.5)^2}, W=owin())
X <- rLGCP("gauss", m, var=0.15, scale =0.5)
plot(attr(X, "Lambda"))
points(X)
}
I think that the Lambda attribute of X does not show the overall values in the overall two dimensional area.
How can I find the overall Lambda values in overall area?
I'm not entirely sure if this is what you are looking for, but the matrix of values of Lambda for each point in the plot are stored in the Lambda attribute of the model created by spatstat::rLGCP.
You can access them like this:
m <- as.im(function(x, y){5 - 1.5 * (x - 0.5)^2 + 2 * (y - 0.5)^2}, W=owin())
X <- rLGCP("gauss", m, var=0.15, scale = 0.5)
lambda_matrix <- attr(X, "Lambda")$v
Now lambda_matrix is a 128 x 128 matrix containing the value of Lambda at each point on the grid.

Importance sampling in R

I'm a beginner to statistics and currently learning Importance Sampling. I have searched through similar problems here but still can't get mine solved.
If I need to evaluate E(x) of a target distribution
f(x)=2 * x * exp(-x^2), x>0
By using Importance Sampling, I take a proposal distribution
g(x)=exp(-x)
Then
E(x)=integral(x* (f(x)/g(x)) * g(x) dx)
=integral(exp(-x) * 4 * x^2 dx)
My R code was like this
x=rexp(1000)
w=4*x^2
y=exp(-w)
mean(y)
Am I doing it right?
Thanks a lot for your help!
I think you might want to do something like this:
x<-rexp(n=1000,r=1)
fx<-function(x){
return(x^2*exp(-(x^2)))
}
gx<-function(x){
return(exp(-x))
}
Ex=mean(x*fx(x)/gx(x))
It is simply the weighted sample mean.
Non weighted sample mean mean(x) gives you the expectation of proposal density; while weighted sample mean mean(w * x) gives the expectation of target density. But you are using a wrong weight. I think the correct one is w <- 2 * x * exp(-x^2 + x).
If I were you, I would not compute weights myself. I would do
set.seed(0)
x <- rexp(1000) ## samples from proposal density
f <- function(x) 2 * x *exp(-x^2) ## target density
w <- f(x) / dexp(x) ## importance weights
mean(x) ## non-weighted sample mean
# [1] 1.029677
mean(w * x) ## weighted sample mean
# [1] 0.9380861
In theory, the expectation of weights should be 1. But practically you only get close to 1:
mean(w)
[1] 1.036482
So, you might want the normalized version:
mean(w * x) / mean(w)
[1] 0.9050671

Probability choose (N, K) R

I have calculated a likelihood function for a sampling without replacement problem.
How can, theoretically, we can convert this likelihood function into a choose(N, K) form?
Additionally, if I plot this function such that N is my X axis and probability given by this function is the Y axis, what is the variance of the plotted distribution?
Thanks,
Your question is a follow-up to How to plot a factorial function in R. I will not repeat information / background / code given in my answer there.
Regarding your requestion for derivation, it is simply:
Don't ask any more; do a little math yourself. This is a programming site, not for a question like this.
Now, regarding computation of variance, we use statistical result: var(X) = E(X^2) - E(X) ^ 2.
## P has been scaled in below
## mean
MEAN <- sum(N * P)
# [1] 726.978
## variance
VAR <- sum(N * (N * P)) - MEAN ^ 2
# [1] 55342.9
## standard deviation
SD <- sqrt(VAR)
# [1] 235.2507

Calculate the volume under a plot of kernel bivariate density estimation

I need to calculate a measure called mutual information. First of all, I need to calculate another measure, called entropy, for example, the joint entropy of x and y:
-∬p(x,y)·log p(x,y)dxdy
So, to calculate p(x,y), I used the kernel density estimator (in this way, function kde2d, and it returned the Z values (probability of having x and y in that window).
Again, by now, I have a matrix of Z values [1x100] x [1x100], that's equal my p(x,y). But I have to integrate it, by discovering the volume under the surface (doble integral). But I didn't found a way to do that. The function quad2d, to compute the double quadrature didn't work, because I only integrated a numerical matrix p(x,y), and it gives me a constant....
Anyone knows something to find that volume/calculate the double integral?
The image of the plot from persp3d:
Thanks everybody !!!!
Once you have the results from kde2d, it is very straighforward to compute a numerical integral. The example session below sketches how to do it.
As you know, numerical double integral is just a 2D summation. The kde2d, by default takes range(x) and range(y) as 2D domain. I see that you got a 100*100 matrix, so I think you have set n = 100 in using kde2d. Now, kde$x, kde$y defines a 100 * 100 grid, with den$z giving density on each grid cell. It is easy to compute the size of each grid cell (they are all equal), then we do three steps:
find normalizing constants; although we know that in theory, density sums up (or integrates) to 1, but after computer discretization, it only approximates 1. So we first compute this normalizing constant for later rescaling;
the integrand for entropy is z * log(z); since z is a 100 * 100 matrix, this is also a matrix. You simply sum them up, and multiply it by the cell size cell_size, then you get a non-normalized entropy;
rescale the non-normalized entropy for a normalized one.
## sample data: bivariate normal, with covariance/correlation 0
set.seed(123); x <- rnorm(1000, 0, 2) ## marginal variance: 4
set.seed(456); y <- rnorm(1000, 0, 2) ## marginal variance: 4
## load MASS
library(MASS)
## domain:
xlim <- range(x)
ylim <- range(y)
## 2D Kernel Density Estimation
den <- kde2d(x, y, n = 100, lims = c(xlim, ylim))
##persp(den$x,den$y,den$z)
z <- den$z ## extract density
## den$x, den$y expands a 2D grid, with den$z being density on each grid cell
## numerical integration is straighforward, by aggregation over all cells
## the size of each grid cell (a rectangular cell) is:
cell_size <- (diff(xlim) / 100) * (diff(ylim) / 100)
## normalizing constant; ideally should be 1, but actually only close to 1 due to discretization
norm <- sum(z) * cell_size
## your integrand: z * log(z) * (-1):
integrand <- z * log(z) * (-1)
## get numerical integral by summation:
entropy <- sum(integrand) * cell_size
## self-normalization:
entropy <- entropy / norm
Verification
The above code gives entropy of 4.230938. Now, Wikipedia - Multivariate normal distribution gives entropy formula:
(k / 2) * (1 + log(2 * pi)) + (1 / 2) * log(det(Sigma))
For the above bivariate normal distribution, we have k = 2. We have Sigma (covariance matrix):
4 0
0 4
whose determinant is 16. Hence, the theoretical value is:
(1 + log(2 * pi)) + (1 / 2) * log(16) = 4.224171
Good match!

normal approximation in R wilcox.test()

I have a question about normal approximations in the wilcox.test() function.
I would intuitively expect the results of these calculation to be identical:
vec1 <- c(10,11,12)
wilcox.test(vec1,rep(0,10),exact=FALSE,correct = FALSE)
wilcox.test(vec1,c(runif(8),0,0),exact=FALSE,correct=FALSE)
but this is far from the case. (0.0006056 vs 0.01112)
From the wilcox.test documentation:
"an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used."
It is unclear to me how the normal approximation is calculated based on the documentation.
Searching the net (eg. wiki, Mann-Whitney U-test), it seems that it can be calculated by:
U = sum of ranks of vec1 (-1 in R)
mU = length(vec1)*length(vec2)/2
sdU = sqrt(length(vec1)*length(vec2)*(length(vec1)+length(vec2)+1)/12)
z = (U-mU)/sdU
pval = 2*pnorm(-abs(z))
But since U and the vector lengths in this case are identical, this obviously is not the way R calculates the normal approximation.
So my question is how the normal approximation is calculated by wilcox.test() in R.
Inconsistency with formulas above is due to ties, which are taken into account in variance calculation. Below is wilcox.test code taken from
R source
NTIES <- table(r)
z <- STATISTIC - n.x * n.y / 2
SIGMA <- sqrt((n.x * n.y / 12) *
((n.x + n.y + 1) - sum(NTIES^3 - NTIES)
/ ((n.x + n.y) * (n.x + n.y - 1))))
where n.x, n.y are lengths of first and second sample, r is rank vector of combined samples.
By the way, change varU to other name, as you took square root.

Resources