How does the density function in R incorporate weights if they are specified (assume weights sum to 1, which is what the function wants)? I mean mathematically, how does it work? I know how to look at the underlying R code for a function but not when it just returns a generic method like this:
> density
function (x, ...)
UseMethod("density")
<bytecode: 0x00000000079ee728>
<environment: namespace:stats>
The reason I'm asking is that I have made some nice side-by-side empirical density plots using unweighted and weighted samples which shows the benefits of weighting one's sample to make the distribution of covariates more balanced between groups. These were all continuous covariates. Now I want to do the same thing with dichotomous variables, but the density function isn't great for this. I want to see if I can apply the same weighting method to generate side-by-side box plots for the dichotomous covariates that I have.
This is an exercise in source code hunting, but here goes:
In density.default, the relevant part (besides checking the weights are valid) is only the line:
y <- .Call(C_BinDist, x, weights, lo, up, n) * totMass
In the relevant source file, massdist.c we find (comments my own):
for(R_xlen_t i = 0; i < XLENGTH(sx) ; i++) {
if(R_FINITE(x[i])) {
double xpos = (x[i] - xlo) / xdelta;
int ix = (int) floor(xpos);
double fx = xpos - ix;
double wi = w[i]; // w: weights vector
if(ixmin <= ix && ix <= ixmax) {
y[ix] += (1 - fx) * wi;
y[ix + 1] += fx * wi;
}
else if(ix == -1) y[0] += fx * wi;
else if(ix == ixmax + 1) y[ix] += (1 - fx) * wi;
}
}
Related
I'm trying to write a square root function in R. The function is supposed to behave like sqrt() but not use that function of course. I'm supposed to use Newton's method for computing the square root, which is:
y(a+1) = [y(a) + x / y(a)]/2
Here x is the number I'm trying to calculate the square root of and y(0) would be the initial guess of the square root of x.
The function is supposed to take in four arguments: x (the number I'm trying to compute the square root of), eps (the difference in value between iterations that are considered be equal), iter (the max number of iterations), and verbose (says I want to output intermediate results).
My issue is that I am not very well versed in writing functions in R. I have experience in C++, but they are slightly different in R.
I believe I'm supposed to write something that goes like this.
Asks the user to input a number as a guess for the value we want to calculate the square root of. Make a for loop from 1 to iter with two if statements 1) that stop the function and output the y value if the max number of iterations have been reached 2) stop the function and output the y value if the difference between successive iterations is less than eps.
Here is the code I have so far:
MySqrt <- function (x, eps = 1e-6, iter = 100, verbose = TRUE) {
for (i in 0:itmax) {
y[0] <- readline(prompt="Please enter your initial square root guess: ")
y[i + 1] = (y[i] + x / y[i])/2
if (i == 100) {
stop (return(y[i + 1]))
}
if (abs(y[i + 1] - y[i]) < eps) {
stop (return(y[i + 1]))
}
}
return(y[i + 1])
}
Here is the error I receive after entering the initial square root guess: Error in y[0] <- readline(prompt = "Please enter your initial square root guess: ") :
object 'y' not found
Honestly, I didn't expect the code to work because I'm sure there are more than one errors.
You should use iter instead of itmax.
I initialized y within the function and input of y should be formatted as a number instead of a character. You could also simplify the if statement by using | (or).
I also added "cat" function so you could see what i is before the function prints out the square root value.
MySqrt <- function (x, eps = 1e-6, iter = 100, verbose = TRUE) {
y = 0
y[1] = as.numeric(readline(prompt="Please enter your initial square root guess: "))
for (i in 1:iter) {
y[i+1] = as.numeric((y[i] + (x/y[i]))/2)
if (i == 100 || abs(y[i+1] - y[i]) < eps) {
cat("This is", i,"th try: \n")
return(y[i+1])
}
}
}
Try this simply:
newton.raphson <- function(x, start, epsilon=0.0001, maxiter=100) {
y <- c(start) # initial guess
a <- 1 # number of iterations
while (TRUE) {
y <- c(y, (y[a] + x / y[a])/2)
if (abs(y[a+1] - y[a]) < epsilon | a > maxiter) { # converged or exceeded maxiter
return(y[a+1])
}
a <- a + 1
}
}
newton.raphson(2, 0.5, 0.01)
# [1] 1.414234
newton.raphson(3, 0.5, 0.01)
# [1] 1.732051
since sqrt(n) < n/2 then with precision of 1/10000
sqrnt=function(y){
x=y/2
while (abs(x*x-y) > 1e-10)
{x=(x+y/x)/2 }
x
}
In Newton’s method. If you want to know the square root of a, you can start estimate a number, x (for examples a/2), you can compute a better estimate with the following formula:
y = (x + a / x) / 2
If y != x, you set x = y, and repeat until y == x. Then you get the square root of a. Please see the code below:
square_root <- function(a) {
x <- a/2
while (TRUE) {
y <- (x + a / x) / 2
if (y == x) break
x <- y
}
return(y)
}
Suppose that I have a function
double f(vector <double> &x) {
// do something with x
return answer;
}
Mathematically, f is a continuous function with respect to each component of x. Now I want to evaluate the numerical gradient of x. There are two methods as follows
Method 1.
double DELTA = 1e-5;
double y1 = f(x);
vector <double> gradX(x.size());
for (int i = 0; i < x.size(); ++i) {
x[i] += DELTA;
double y2 = f(x);
gradX[i] = (y2 - y1) / DELTA;
x[i] -= DELTA;
}
Method 2.
double DELTA = 1e-5;
vector <double> gradX(x.size());
for (int i = 0; i < x.size(); ++i) {
x[i] += DELTA;
double y2 = f(x);
x[i] -= 2.0 * DELTA;
double y1 = f(x);
gradX[i] = (y2 - y1) / (2.0 * DELTA);
x[i] += DELTA;
}
I am observing that Method 1 gives very unreasonable numbers (ones with 6 digits), while Method 2 gives more reasonable ones.
Is there any reason that Method 2 is a better one? Should it always be preferred?
Thanks.
Edit: For a little more context. These implementations are done in C, with some uses of CUDA kernels.
It is an expected result, as the method-1 is first order accurate (forward difference), while the method-2 is second order accurate method (central difference). This can be easily proved using the Taylor's series. For further information you may read any book on finite difference methods.
To get similar accuracy for first order method, you have to use smaller DELTA for the first order method compared to the second order method.
As it is clear from your implementation that method-2 is more costly, (evaluating f1 and f2 for every x), it would be beneficial to use method-1 with smaller DELTA. However, if accuracy is of more concern, then you may use method-2 with smaller DELTA.
What is behind Approx and approxfun? I know that these two functions perform a linear interpolation, however I didn't find any reference on how they do that. I guess they use a least square regression model but I am not sure.
Finally, if it's true that they use a least square regression model what is the difference between them and lm + predict?
As commented , you should read the source code. Interpolation problem
Find y(v), given (x,y)[i], i = 0,..,n-1 */
For example approxfun use a simple this algorithm for linear approximation :
y(v), given (x,y)[i], i = 0,..,n-1 */
find the correct interval (i,j) by bisection */
Use i,j for linear interpolation
Here an R code that aprahrase the C function approx1 :
approx1 <-
function( v, x, y)
{
## Approximate y(v), given (x,y)[i], i = 0,..,n-1 */
i <- 1
j <- length(x)
ij <- 0
## find the correct interval by bisection */
while(i < (j-1) ) {
ij <- floor((i + j)/2)
if(v < x[ij])
j <- ij
else
i <- ij
}
## linear interpolation */
if(v == x[j]) return(y[j])
if(v == x[i]) return(y[i])
return (y[i] + (y[j] - y[i]) * ((v - x[i])/(x[j] - x[i])))
}
I tried to implement bessel function using that formula, this is the code:
function result=Bessel(num);
if num==0
result=bessel(0,1);
elseif num==1
result=bessel(1,1);
else
result=2*(num-1)*Bessel(num-1)-Bessel(num-2);
end;
But if I use MATLAB's bessel function to compare it with this one, I get too high different values.
For example if I type Bessel(20) it gives me 3.1689e+005 as result, if instead I type bessel(20,1) it gives me 3.8735e-025 , a totally different result.
such recurrence relations are nice in mathematics but numerically unstable when implementing algorithms using limited precision representations of floating-point numbers.
Consider the following comparison:
x = 0:20;
y1 = arrayfun(#(n)besselj(n,1), x); %# builtin function
y2 = arrayfun(#Bessel, x); %# your function
semilogy(x,y1, x,y2), grid on
legend('besselj','Bessel')
title('J_\nu(z)'), xlabel('\nu'), ylabel('log scale')
So you can see how the computed values start to differ significantly after 9.
According to MATLAB:
BESSELJ uses a MEX interface to a Fortran library by D. E. Amos.
and gives the following as references for their implementation:
D. E. Amos, "A subroutine package for Bessel functions of a complex
argument and nonnegative order", Sandia National Laboratory Report,
SAND85-1018, May, 1985.
D. E. Amos, "A portable package for Bessel functions of a complex
argument and nonnegative order", Trans. Math. Software, 1986.
The forward recurrence relation you are using is not stable. To see why, consider that the values of BesselJ(n,x) become smaller and smaller by about a factor 1/2n. You can see this by looking at the first term of the Taylor series for J.
So, what you're doing is subtracting a large number from a multiple of a somewhat smaller number to get an even smaller number. Numerically, that's not going to work well.
Look at it this way. We know the result is of the order of 10^-25. You start out with numbers that are of the order of 1. So in order to get even one accurate digit out of this, we have to know the first two numbers with at least 25 digits precision. We clearly don't, and the recurrence actually diverges.
Using the same recurrence relation to go backwards, from high orders to low orders, is stable. When you start with correct values for J(20,1) and J(19,1), you can calculate all orders down to 0 with full accuracy as well. Why does this work? Because now the numbers are getting larger in each step. You're subtracting a very small number from an exact multiple of a larger number to get an even larger number.
You can just modify the code below which is for the Spherical bessel function. It is well tested and works for all arguments and order range. I am sorry it is in C#
public static Complex bessel(int n, Complex z)
{
if (n == 0) return sin(z) / z;
if (n == 1) return sin(z) / (z * z) - cos(z) / z;
if (n <= System.Math.Abs(z.real))
{
Complex h0 = bessel(0, z);
Complex h1 = bessel(1, z);
Complex ret = 0;
for (int i = 2; i <= n; i++)
{
ret = (2 * i - 1) / z * h1 - h0;
h0 = h1;
h1 = ret;
if (double.IsInfinity(ret.real) || double.IsInfinity(ret.imag)) return double.PositiveInfinity;
}
return ret;
}
else
{
double u = 2.0 * abs(z.real) / (2 * n + 1);
double a = 0.1;
double b = 0.175;
int v = n - (int)System.Math.Ceiling((System.Math.Log(0.5e-16 * (a + b * u * (2 - System.Math.Pow(u, 2)) / (1 - System.Math.Pow(u, 2))), 2)));
Complex ret = 0;
while (v > n - 1)
{
ret = z / (2 * v + 1.0 - z * ret);
v = v - 1;
}
Complex jnM1 = ret;
while (v > 0)
{
ret = z / (2 * v + 1.0 - z * ret);
jnM1 = jnM1 * ret;
v = v - 1;
}
return jnM1 * sin(z) / z;
}
}
I have 2 tables of values and want to scale the first one so that it matches the 2nd one as good as possible. Both have the same length. If both are drawn as graphs in a diagram they should be as close to each other as possible. But I do not want quadratic, but simple linear weights.
My problem is, that I have no idea how to actually compute the best scaling factor because of the Abs function.
Some pseudocode:
//given:
float[] table1= ...;
float[] table2= ...;
//wanted:
float factor= ???; // I have no idea how to compute this
float remainingDifference=0;
for(int i=0; i<length; i++)
{
float scaledValue=table1[i] * factor;
//Sum up the differences. I use the Abs function because negative differences are differences too.
remainingDifference += Abs(scaledValue - table2[i]);
}
I want to compute the scaling factor so that the remainingDifference is minimal.
Simple linear weights is hard like you said.
a_n = first sequence
b_n = second sequence
c = scaling factor
Your residual function is (sums are from i=1 to N, the number of points):
SUM( |a_i - c*b_i| )
Taking the derivative with respect to c yields:
d/dc SUM( |a_i - c*b_i| )
= SUM( b_i * (a_i - c*b_i)/|a_i - c*b_i| )
Setting to 0 and solving for c is hard. I don't think there's an analytic way of doing that. You may want to try https://math.stackexchange.com/ to see if they have any bright ideas.
However if you work with quadratic weights, it becomes significantly simpler:
d/dc SUM( (a_i - c*b_i)^2 )
= SUM( 2*(a_i - c*b_i)* -c )
= -2c * SUM( a_i - c*b_i ) = 0
=> SUM(a_i) - c*SUM(b_i) = 0
=> c = SUM(a_i) / SUM(b_i)
I strongly suggest the latter approach if you can.
I would suggest trying some sort of variant on Newton Raphson.
Construct a function Diff(k) that looks at the difference in area between your two graphs between fixed markers A and B.
mathematically I guess it would be integral ( x = A to B ){ f(x) - k * g(x) }dx
anyway realistically you could just subtract the values,
like if you range from X = -10 to 10, and you have a data point for f(i) and g(i) on each integer i in [-10, 10], (ie 21 datapoints )
then you just sum( i = -10 to 10 ){ f(i) - k * g(i) }
basically you would expect this function to look like a parabola -- there will be an optimum k, and deviating slightly from it in either direction will increase the overall area difference
and the bigger the difference, you would expect the bigger the gap
so, this should be a pretty smooth function ( if you have a lot of data points )
so you want to minimise Diff(k)
so you want to find whether derivative ie d/dk Diff(k) = 0
so just do Newton Raphson on this new function D'(k)
kick it off at k=1 and it should zone in on a solution pretty fast
that's probably going to give you an optimal computation time
if you want something simpler, just start with some k1 and k2 that are either side of 0
so say Diff(1.5) = -3 and Diff(2.9) = 7
so then you would pick a k say 3/10 of the way (10 = 7 - -3) between 1.5 and 2.9
and depending on whether that yields a positive or negative value, use it as the new k1 or k2, rinse and repeat
In case anyone stumbles upon this in the future, here is some code (c++)
The trick is to first sort the samples by the scaling factor that would result in the best fit for the 2 samples each. Then start at both ends iterate to the factor that results in the minimum absolute deviation (L1-norm).
Everything except for the sort has a linear run time => Runtime is O(n*log n)
/*
* Find x so that the sum over std::abs(pA[i]-pB[i]*x) from i=0 to (n-1) is minimal
* Then return x
*/
float linearFit(const float* pA, const float* pB, int n)
{
/*
* Algebraic solution is not possible for the general case
* => iterative algorithm
*/
if (n < 0)
throw "linearFit has invalid argument: expected n >= 0";
if (n == 0)
return 0;//If there is nothing to fit, any factor is a perfect fit (sum is always 0)
if (n == 1)
return pA[0] / pB[0];//return x so that pA[0] = pB[0]*x
//If you don't like this , use a std::vector :P
std::unique_ptr<float[]> targetValues_(new float[n]);
std::unique_ptr<int[]> indices_(new int[n]);
//Get proper pointers:
float* targetValues = targetValues_.get();//The value for x that would cause pA[i] = pB[i]*x
int* indices = indices_.get(); //Indices of useful (not nan and not infinity) target values
//The code above guarantees n > 1, so it is safe to get these pointers:
int m = 0;//Number of useful target values
for (int i = 0; i < n; i++)
{
float a = pA[i];
float b = pB[i];
float targetValue = a / b;
targetValues[i] = targetValue;
if (std::isfinite(targetValue))
{
indices[m++] = i;
}
}
if (m <= 0)
return 0;
if (m == 1)
return targetValues[indices[0]];//If there is only one target value, then it has to be the best one.
//sort the indices by target value
std::sort(indices, indices + m, [&](int ia, int ib){
return targetValues[ia] < targetValues[ib];
});
//Start from the extremes and meet at the optimal solution somewhere in the middle:
int l = 0;
int r = m - 1;
// m >= 2 is guaranteed => l > r
float penaltyFactorL = std::abs(pB[indices[l]]);
float penaltyFactorR = std::abs(pB[indices[r]]);
while (l < r)
{
if (l == r - 1 && penaltyFactorL == penaltyFactorR)
{
break;
}
if (penaltyFactorL < penaltyFactorR)
{
l++;
if (l < r)
{
penaltyFactorL += std::abs(pB[indices[l]]);
}
}
else
{
r--;
if (l < r)
{
penaltyFactorR += std::abs(pB[indices[r]]);
}
}
}
//return the best target value
if (l == r)
return targetValues[indices[l]];
else
return (targetValues[indices[l]] + targetValues[indices[r]])*0.5;
}