R inverse CDF for a vector

R inverse CDF for a vector - r

I have a vector which is a cumulative distribution function, let say in form
y=seq(0, 1.0, 0.05)
Now I need a function that will get inverse cdf of y, so that
f(0.1)=3
and
f(0.9)=19
Any ideas how to do this?

The which function does this.
> y <- seq(0, 1.0, 0.05)
> which(y == .1)
[1] 3
> which(y == .9)
[1] 19
However, if you are trying to find the inverse cdf, you might have to deal with two vectors x and y, since it is possible that x does take only indices as values. So you might want to do this.
> y <- seq(0, 1.0, 0.05)
> x <- 1:21
> x[which(y == .1)]
[1] 3
> x[which(y == .9)]
[1] 19

Related

Find maximum value for x for a polynomial function

I am using a simple polynomial to fit a curve.
poly <- function(a, b, c, x) a * x^2 + b * x + c
I'd like to find the value of x that results in the maximum value of the curve. Currently I create a grid with a range of x from 20000 to 50000, run the function for each row, then use max() on the result. It works, but I have a lot of groups and it creates a big dataframe every time I do it. It is very clunky and I feel like there must be a better way.
Some typical coefficients are:
a <- -0.000000179
b <- 0.011153167
c <- 9.896420781

If you rearrange your function so the variable you want to maximize is first and you set the default values like so:
poly <- function(x, a, b, c) a * x^2 + b * x + c
formals(poly)$a <- -0.000000179
formals(poly)$b <- 0.011153167
formals(poly)$c <- 9.896420781
Then you can use the optimize function to maximize over your interval:
optimize(poly, c(20000, 50000), maximum = T)
$`maximum`
[1] 31154.1
$objective
[1] 183.6298
Where $maximum is the x value at which the maximum occurs and $objective is the height.

If a is negative, maximum of parabola a * x^2 + b * x + c is reached at -b/(2*a) :
a<0
#[1] TRUE
-b/(2*a)
#[1] 31154.1

You could use optim. I think the other solutions answered in this thread are more appealing, but I'll write this up for completeness:
a <- -0.000000179
b <- 0.011153167
c <- 9.896420781
o <- optim(
par=list(x=0),
fn=function(x){ -poly(a,b,c,x=x) },
method="Brent",
lower=-50e3, upper=50e3
)
Output:
> o
$par
[1] 31154.1
$value
[1] -183.6298
$counts
function gradient
NA NA
$convergence
[1] 0
$message
NULL

R Optimization over Dataframe

I have the following code where I want to find the beste Values for x,y and z.
df <- data.frame(replicate(3,sample(0:100,100,rep=TRUE)))
find_best <- function(xyz) {
x <- xyz[1]
y <- xyz[2]
z <- xyz[3]
nr <- count(df)
val <- count(df[df[, "X1"] < x & df[, "X2"] < y & df[, "X3"] < z, ] )
return(val$n/nr$n)
}
optim(par = c(30,15,15), fn = find_best, lower=c(0,0,0), upper=c(100,100,100), method="L-BFGS-B")
The function does not achieve much at the moment, but I will add constraints later. However if I run this, I only get the value of the initial values back.
$par
[1] 30 15 15
So the question is, how can I get the best values for x,y,z either with optim or with anything else.

Here is an example of how you can use optim for your purpose
set.seed(1)
df <- data.frame(replicate(3,sample(0:100,1e5,rep=TRUE)))
find_best <- function(xyz) {
x <- xyz[1]
y <- xyz[2]
z <- xyz[3]
r <- nrow(subset(df,X1 < x & X2 < y & X3 < z))/nrow(df)
}
res <- optim(par = c(35,15,15), fn = find_best, lower=c(0,0,0), upper=c(100,100,100), control = list(fnscale = -1))
which gives
> res
$par
[1] 35.085 15.205 15.225
$value
[1] 0.00881
$counts
function gradient
2 2
$convergence
[1] 0
$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"

Generate samples from data following normal distribution but with new mean

I have a vector of numbers that is
set.seed(1)
x <- rnorm(8334, 1.456977, 0.3552899)
mean(x)
[1] 1.454307
Essentially, I want to randomly sample 2000 numbers from x such that mean of this sample is lower.
The key is I don't want to generate new random numbers but only sample from x, without replacement, such that I get a subset with a different mean.
Can anyone help me?
Thanks!

This method is not truly "random" as it only picks from values that are smaller than mean(x). Let me know if this is good enough for you -
set.seed(1)
x <- rnorm(8334, 1.456977, 0.3552899)
mean(x)
[1] 1.454307
y <- sample(x, 2000, prob = x <= mean(x)) # x > mean(x) has 0 chance of getting sampled
all(y %in% x)
[1] TRUE
mean(y)
[1] 1.170856
This is effectively the same as -
z <- sample(x[x <= mean(x)], 2000)
all(z %in% x)
[1] TRUE
mean(z)
[1] 1.172033
Also, for 2000 values, the lowest possible mean is this -
mean(sort(x)[1:2000])
[1] 0.9847526
UPDATE -
Here's one way to get random sample from both sides of mean(x) although it is arbitrary and I don't know if this would guarantee sample mean less than mean(x). -
z <- sample(x, 2000, prob = (x <= mean(x)) + 0.1)
mean(z)
[1] 1.225991
table(z <= mean(x))
FALSE TRUE
202 1798

How about doing rejection sampling, i.e. sampling 2000 numbers from your vector until you hit one sample that fulfills the desired properties?
set.seed(1)
x <- rnorm(8334, 1.456977, 0.3552899)
m_x <-mean(x)
y <- sample(x, 2000)
while(mean(y) >= m_x)
y <- sample(x, 2000)
mean(y)
#> [1] 1.4477
Created on 2019-06-18 by the reprex package (v0.3.0)
This should be quite fast since there is an (roughly) even chance for the new mean to be greater or smaller than the old one.

randomize normal distribution for the example
x= rnorm(8334,1.45,0.355)
pick a sample of 2000 nums
y= sample(x,2000)
lower y mean by 0.5
y=y-05
increase y's sd by 1.5
y= y*1.5
now the sd and the mean of Y will be about
mean(y)# ~0.9325603
sd(y)# ~0.5348885
hope it is the answer you are looking for

Optimize function in r with the boundary value

I have the following function and want find $x$ satisfying this requirement.
$$\frac{X^{2}(1.5)^{2}\exp{1.5X^{2}}}{2} < 1$$
I wrote the following r function.
f <- function(X) 0.5*X^2 * 1.5^2 * exp(X*1.5) < 1
optimize(f, c(0, 1))
But it is giving me an error. I want to find the value of X satisfying the requirement. Thank you for the help.

If we define ff as
ff <- function(X) 0.5*X^2 * 1.5^2 * exp(X*1.5)
then graphing it
curve(ff)
we see that ff(0) = 0 and ff(x) is monotonically increasing in x. The largest value of x for which ff(x) <= 1 can be calculated as the solution to ff(x) = 1 which occurs at the minimum of g:
g <- function(x) (ff(x) - 1)^2
optimize(g, c(0, 1))
giving:
$minimum
[1] 0.6008074
$objective
[1] 1.058761e-09
Thus any value of x between 0 and 0.6008074 gives a value of ff in the closed interval [0, 1].
# create graph
curve(ff)
opt <- optimize(g, c(0, 1))
abline(h = 0:1)
abline(v = c(0, opt$minimum))

Triple integral in R (how to specifying the domain)

I would like to compute the triple integral of a function of three variables f(x,y,z) in R. I'm using the package cubature and the function adaptIntegrate(). The integrand is equal to 1 only in a certain domain (x<y<z, 0 otherwise) which I don't know how to specify. I'm trying 2 different implementations of the function, but none of them work:
#First implementation
fxyz <- function(w) {
x <- w[1]
y <- w[2]
z <- w[3]
x*y*z*(x < y)&(y < z)
}
#Second implementation
fxyz <- function(w) {
x <- w[1]
y <- w[2]
z <- w[3]
if(x<y&y<z)
out<-1
else
out<-0
out
}
#Computation of integral
library(cubature)
lower <- rep(0,3)
upper <- rep(1, 3)
adaptIntegrate(f=fxyz, lowerLimit=lower, upperLimit=upper, fDim = 3)
Any idea on how to specify the domain correctly?

I don't know about the cubature package, but you can do this by repeated application of base R's integrate function for one-dimensional integration.
f.xyz <- function(x, y, z) ifelse(x < y & y < z, 1, 0)
f.yz <- Vectorize(function(y, z) integrate(f.xyz, 0, 1, y=y, z=z)$value,
vectorize.args="y")
f.z <- Vectorize(function(z) integrate(f.yz, 0, 1, z=z)$value,
vectorize.args="z")
integrate(f.z, 0, 1)
# 0.1666632 with absolute error < 9.7e-05
You'll probably want to play with the control arguments to set the numeric tolerances; small errors in the inner integration can turn into big ones on the outside.

In your first function the return value is wrong. It should be as.numeric(x<=y)*as.numeric(y<=z). In your second function you should also use <= instead of <, otherwise `adapIntegrate won't work correctly. You also need to specify a maximum number of evaluations. Try this
library(cubature)
lower <- rep(0,3)
upper <- rep(1,3)
# First implementation (modified)
fxyz <- function(w) {
x <- w[1]
y <- w[2]
z <- w[3]
as.numeric(x <= y)*as.numeric(y <= z)
}
adaptIntegrate(f=fxyz,lowerLimit=lower,upperLimit=upper,doChecking=TRUE,
maxEval=2000000,absError=10e-5,tol=1e-5)
#$integral
#[1] 0.1664146
#$error
#[1] 0.0001851699
#$functionEvaluations
#[1] 2000031
#$returnCode
#[1] 0

The domain 0 <= x <= y <= z <= 1 is the "canonical" simplex. To integrate over a simplex, use the SimplicialCubature package.
library(SimplicialCubature)
f <- function(x) 1
S <- CanonicalSimplex(3)
> adaptIntegrateSimplex(function(x) 1, S)
$integral
[1] 0.1666667
$estAbsError
[1] 1.666667e-13
$functionEvaluations
[1] 55
$returnCode
[1] 0
$message
[1] "OK"
Note that integrating the constant function f(x)=1 over the simplex simply gives the volume of the simplex, which is 1/6. The integration is useless for this example.
> SimplexVolume(S)
[1] 0.1666667

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R inverse CDF for a vector - r

I have a vector which is a cumulative distribution function, let say in form y=seq(0, 1.0, 0.05) Now I need a function that will get inverse cdf of y, so that f(0.1)=3 and f(0.9)=19 Any ideas how to do this?

Related

Find maximum value for x for a polynomial function

R Optimization over Dataframe

Generate samples from data following normal distribution but with new mean

Optimize function in r with the boundary value

Triple integral in R (how to specifying the domain)

Categories

Resources