Creating Mills Ratio in R for large values - r

I'm using R to create a function, that amongst others uses Mills Ratio (See here). This is not a complicated formula, and at first I just programmed it like this:
mill <- function(x) {
return((1 - pnorm(x)) / dnorm(x))
}
I soon found out however, that for very large values (x >= 9) of x , this function returns zero. Even more dramatic, at around x >= 37, it starts returning NaN , which really messes up my stuff.
Following the article, for now I've changed the function into this:
mill <- function(x) {
if (x >= 9) {
return(1 / x)
} else {
return((1 - pnorm(x)) / dnorm(x))
}
}
This seems to work. I use this function to calculate a vector however, and when I use simulation to find the same vector, I get more or less the same answer, only it's always a bit off..
I think this has to do with my implementation of Mills Ratio, since the rest of the function is just exponentials, which R should have no trouble with.
I want to ask you guys if there is any way to solve this problem: to either implement this function better, or give me another way to find the Mills Ratio (perhaps through integration of some sorts, but wouldn't I run into the same issues there?). Thank you kindly for any help you can provide!

I would make two changes to your original mill function.
Change 1-pnorm(x) to pnorm(lower.tail=FALSE)
Use log's and take exponentials if needed.
So this gives
new_mill = function(x)
pnorm(x, lower.tail=FALSE, log.p=TRUE) - dnorm(x, log=TRUE)
So
R> exp(new_mill(10))
[1] 0.09903
R> exp(new_mill(40))
[1] 0.02498
Using a plot as a sanity check
x = seq(0, 10, 0.001)
plot(x, exp(new_mill(x)), type="l")
lines(x, mill(x), col=2)
gives

Related

How to solve an equation y=ax^2+bx+c when x is unknown and y known

I have this equation:
y = -0.00248793*x^2+20.77173764*x-371.01805798
And I would like to obtain the result of the equation when I give "y" numbers,
edited explanation 2/06/20:
I want to add a vector as my "y", and receive an output of one vector also.
This problem is a biological one, in which I performed citokine bead array (CBA) and I stablished a reference curve which is sinusoidal.
after stablishing the degree of the equation making the following:
fitil6_1=lm(Standards$`IL6 median`~poly(concentration,1,raw=TRUE))
fitil6_2=lm(Standards$`IL6 median`~poly(concentration,2,raw=TRUE))
fitil6_3=lm(Standards$`IL6 median`~poly(concentration,3,raw=TRUE))
fitil6_4=lm(Standards$`IL6 median`~poly(concentration,4,raw=TRUE))
lines(concentration,predict(fitil6_1,data.frame(x=concentration)),col="red")
lines(concentration,predict(fitil6_2,data.frame(x=concentration)),col="green")
lines(concentration,predict(fitil6_3,data.frame(x=concentration)),col="blue")
lines(concentration,predict(fitil6_4,data.frame(x=concentration)),col="purple)
legend(20,40000,legend=c("de grau 1","de grau 2","de grau 3","de grau 4"),lty=1,col=c("red","green","blue","purple"))
I have chosen the degree 2 formula as it fits better to my dots for this cytokine (and most cytokines in this study)
So when I make
coef(fitil6_2)
(Intercept) poly(concentration, 2, raw = TRUE)1 poly(concentration, 2, raw = TRUE)2
-8.262381e+02 2.371377e+01 -2.847135e-03
I receive that output and then I am able to build the formula (in this case):
y=-2.847135e-03 *x^2+2.371377e+01*x-8.262381e+02
but as my independent value is what I know is pretty difficult to isolate x!
(end of the editing)
I have tried many things like making function(x,y) but when you specify this you need to give a number of y, so really I am litlle bit lost!
Thank you
As #Dave2e said, you can solve this particular example by algebra. But you might need a programmatic solution, or you might be using the quadratic as an easy example. in which case...
Rewrite your problem as "what value of y satisfies -0.00248793*x^2+20.77173764*x-371.01805798 - y = 0?".
There are plenty of ways to find the zeroes of a function. That's what you've turned your problem into. Suppose your "known value of y" is 10...
f <- function(x, y) {
-0.00248793*x^2+20.77173764*x-371.01805798 - y
}
answer <- stats::uniroot(f, interval=c(0, 50), y=10)
# Check we've got the right answer
f(answer$root, 10)
Giving
[1] -1.186322e-10
Using this method, you do need to find/guess a range within which the answer might lie. That's the purpose of the interval=c(0.50) part of the call to uniroot. You can read the online help for more information about the value returned by uniroot and things you might want to look out for.
Thank you for all who answered I have just started in this page, this worked for me:
isolating "y" and then making a function with the quadratic formula to x:
delta<-function(y,a,b,c)
{k=(-b+sqrt(b^2-4*a*(c-y)))/(2*a)
print(k)
}
delta(citoquines_valero$`IFNg median`,-1.957128e-03,1.665741e+01,-7.522327e+02)
#I will use that one as a provisional solution.
#I have also been told to use this, but is not working properly:
result <- function(y,a,b,c){
# Constructing delta
delta<-function(y,a,b,c){
b^2-4*a*(c-y)
}
if(delta(a,b,d) > 0){ # first case D>0
x_1 = (-b+sqrt(delta(y,a,b,c)))/(2*a)
x_2 = (-b-sqrt(delta(y,a,b,c)))/(2*a)
if (x_1 >= 0) {
print(x_1)
else if (x_2 >= 0){
print(x_2)
}
}
print(result)
else if(delta(a,b,d) == 0){ # second case D=0
x = -b/(2*a); return(x)
}
else {"There are no real roots."}; # third case D<0```
return("There are no real roots.")
}
}

function to create a vectorized piecwise function R

I'm pretty new to R so apologies in advance if this question is poorly constructed. Basically I have a piece-wise function that I need to calculate the value for a large number of rows. My current function looks something like this:
f <- function(x){
(x <= 1000) * x^2 +
(x > 1000 & x <= 2000) * x^3 +
(x > 2000 & x <= 3000) * x^4 +
(x > 4000) * x^5
}
However I need to be able to create or generalize this function for a variety of different sets of breakpoints (ie maybe 1500,2500,3500, etc) and for different numbers of breakpoints. Also given the large number of rows that will need to be calculated on, the function has to be vectorized. Any advice?
Edit:
To clarify, I made the function above from some table of breakpoints (1000,2000,3000,4000) and associated powers to raise x to (2,3,4,5). However I need to be able to take multiple of such tables, each with varying breakpoints and number of breakpoints (with potentially 100 or so breakpoints) and be able to apply the resulting piecewise function to a large number of rows.
A vectorised version of your function with additional breaks and power arguments can be written this way:
function(x, breaks, power){
x^power[as.numeric(cut(x, breaks))]
}
as.numeric(cut(...)) gets the position of all x values in the breaks, then the square bracket looks up the power in the power vector and raises the corresponding x to the correct power. Tests:
Some breaks points and powers:
> bp <- c(10,20,30,40)
> po = c(2,3,4)
Note the breakpoints are left-excluded:
> f(9,bp,po)
[1] NA
> f(10,bp,po)
[1] NA
So the first valid x has to be above 10:
> f(11,bp,po)
[1] 121
And gets us 11^2 as expected. So 20 gets squared and 21 gets cubed:
> f(20,bp,po)
[1] 400
> f(21,bp,po)
[1] 9261
Good so far. Vectorised?
> f(19:22, bp, po)
[1] 361 400 9261 10648
Yes - the change from square to cube happens between 20 and 21.
See the help for the right option for the cut function if you want the intervals to be closed on the left or right.
From what I understand from your example code, you basically want to minimize the coding, and also want the code to be dynamic, so that you can dynamically vary the breaks and power.
Below is the sample code, which tries to do the same.
f <- function(x, breakPoints, powerX) {
cutX <- cut(x, breaks=breakPoints)
cutX1 <- factor(cutX, labels=powerX)
retX <- x ^ as.numeric(as.character(cutX1))
retX
}
x1 <- sample(1:10000, 1000)
x1 <- x1[order(x1)]
breakPoints1 <- c(min(x1)-1, 1000, 2000, 3000, max(x1))
powerX1 <- c(2, 3, 4, 5)
newX1 <- f(x1, breakPoints1, powerX1)
head(newX1) # manual check whether the values make sense
head(x1)
This code will do that.
But my suggestion will be to test this code, as much as possible, so that you can use it reliably. Hope this code is useful to you.

Harmonic series sum function in R

I am trying to write a function which takes a positive real number and keeps adding terms of the harmonic series until the total sum exceeds the initial argument.
I need my function to display the total number of terms of the series that were added.
Here's my code so far:
harmonic<-function(n){
x<-c(0,1)
while (length(x) < n) {
position <- length(x)
new <- 1/(x[position] + x[position-1])
x <- c(x,new)
}
return(x)
}
I apologise for the errors in my code, unfortunately I have been working with R only for a month and this is the first time that I am using the while loop and I couldn't find any useful information around.
Thank you, I'd really appreciate your help.
Here's an attempt based on some info from this post at maths.stackexchange: https://math.stackexchange.com/q/496116
I can't speak as to whether it is highly accurate in all circumstances or even the best or an appropriate way to go about this. Caveat emptor.
harmsum.cnt <- function(x,tol=1e-09) {
em.cons <- 0.577215664901533
difffun <- function(x,n) x - (log(n) + em.cons + 1/(2*n) - 1/(12*n^2))
ceiling(uniroot(difffun, c(1, 1e10), tol = tol, x = x)$root)
}
Seems to work alright though:
harmsum.cnt(7)
#[1] 616
harmsum.cnt(15)
#[1] 1835421
Compare:
tail(cumsum(1/1:616),1); tail(cumsum(1/1:615),1)
#7.001274
#6.999651
dput(tail(cumsum(1/1:1835421),1)); dput(tail(cumsum(1/1:1835420),1))
#15.0000003782678
#14.9999998334336
This is a partial answer, which I'll try to fill in later. On the assumption that you want an exact answer, rather than the excellent approximation formula thelatemail found, there are a few tools to consider.
First, use of a hash-table or memoise methods will allow you to save previous calculations, thus saving a lot of time.
Second, since the sum of a (finite) sequence is independent of the grouping, you can calculate, e.g. the first N terms and the second (N+1):2N terms independently. Use parallel package to divide and conquer.
Third, before you get too deep into the morass, check the limits of floating-point accuracy via a call to .Machine$double.eps Once your 1/n term comes close to that, you'll need to switch over to gmp and Rmpfr to get full accuracy in your calculations.
Now, just to clarify what you "should" be doing, a correct loop is
mylimit <- [pick a value]
harmsum<-0
for(k in 1:N){
harmsum <- harmsum + 1/k
if (harmsum >= mylimit) break
}
(or similar setup using while)

Double integration over a triangular area in R

I am very new to programming and have been essentially learning by trial and error, but have reached a problem I do not know how to approach. I need to do a double integration over a triangular area in R. As the usual integrate function doesn't seem able to handle this, I tried using cubature package (*edited - see below for the full code).
Update/Edit:
I've been working on this more and am still coming up against the same issue. I understand that I have to ensure that values are within the appropriate bounds with respect to the asin calculation. However, this still isn't getting around the fundamental problem of the triangular area. Perhaps it will be clearer if I post my full code below:
L <- 25
n <- -4
area <- 30
distances <- L*seq(0.005, 100, 0.05)
cond <- area*pi
d <- 5
fun <- function(x=1,r=0)
{
if (x<cond) {
return(0)
} else {
return((-1)*((n+2)/(2*pi*(L^2)))*(1+((x/L)^2))^(n/2)*(1/pi)*(1/pi)*acos(d/x))*asin(sqrt((pi*area)/d+r))
}
}
fun(5)
fun(300)
library(cubature)
integrationone <- function()
{
integrand <- adaptIntegrate(fun, lowerLimit=c(d,0), upperLimit=c(80,80))
return(integrand$integral)
}
integrationone()
warnings()
From looking at the warning messages, R seems unable to carry out the evaluation of the conditional argument while integrating over x, so I still can't get values for only the exact area I want to integrate over. Does anyone have any ideas or advice?
I don't think that the code behind adaptIntegrate will help you what's happen. You can type in a console adaptIntegrate and you will get the code. It is essentially a call to a C algorithm.
In order to understand what it is happen , I think you need before to understand what you integrate. Try to simplify your function, to see his definition domain.
INV_PI <- 1/pi
fun <- function(X){
scale <- -1*((n+2)/(2*pi*(L^2)))*INV_PI^2 *acos(d/(d+r))
res <- scale*asin(sqrt((pi*area)/X))* (1+((X/L)^2))^(n/2)
sqrt(prod(res))
}
Here the 2 terms on X , but only one can produce problem.
asin(sqrt((pi*area)/X))
asin is defined only between[-1,1], sqrt is defined only for positive numbers.
So here fun is defined between [pi*area,INF], and you have to integrate in this domain.
for example :
low.Lim <- pi*area
doubleintegration <- function()
{
integrand <- adaptIntegrate(fun, lowerLimit=c(low.Lim,low.Lim),
upperLimit=c(200*low.Lim,200*low.Lim))
return(integrand$integral)
}
doubleintegration()
[1] 0.1331089

Why is nlogn so hard to invert?

Let's say I have a function that is nlogn in space requirements, I want to work out the maximum size of input for that function for a given available space. i.e. I want to find n where nlogn=c.
I followed an approach to calculate n, that looks like this in R:
step = function(R, z) { log(log(R)-z)}
guess = function(R) log(log(R))
inverse_nlogn = function(R, accuracy=1e-10) {
zi_1 = 0
z = guess(R)
while(abs(z - zi_1)>accuracy) {
zi_1 = z
z = step(R, z)
}
exp(exp(z))
}
But I can't get understand why it must be solved iteratively. For the range we are interested (n>1), the function is non singular.
There's nothing special about n log n — nearly all elementary functions fail to have elementary inverses, and so have to be solved by some other means: bisection, Newton's method, Lagrange inversion theorem, series reversion, Lambert W function...
As Gareth hinted the Lambert W function (eg here) gets you almost there, indeed n = c/W(c)
A wee google found this, which might be helpful.
Following up (being completely explicit):
library(emdbook)
n <- 2.5
c <- 2.5*log(2.5)
exp(lambertW(c)) ## 2.5
library(gsl)
exp(lambert_W0(c)) ## 2.5
There are probably minor differences in speed, accuracy, etc. of the two implementations. I haven't tested/benchmarked them extensively. (Now that I tried
library(sos)
findFn("lambert W")
I discover that it's implemented all over the place: the games package, and a whole package that's called LambertW ...

Resources