Julia's equivalent of R's qnorm()? - r

I am trying to "translate" these lines from R to Julia:
n <- 100
mean <- 0
sd <- 1
x <- qnorm(seq(1 / n, 1 - 1 / n, length.out = n), mean, sd)
However, I have trouble with the qnorm function. I've searched for "quantile function" and found the quantile() function. However, the R's version returns a vector of length 100, while the Julia's version returns a vector of length 5.
Here's my attempt:
import Distributions
n = 100
x = Distributions.quantile(collect(range(1/n, stop=1-1/n, length=n)))

Under Julia 1.1 you should broadcast the call to quantile like this:
quantile.(Normal(0, 1), range(1/n, 1-1/n, length = n))

Try
using Distributions
n = 100
qs = range(1/n, stop=1-1/n, length=n) # no need to collect it
d = Normal() # default is mean = 0, std = 1
result = [quantile(d, q) for q in qs]
Julia uses multiple dispatch to select the appropriate quantile method for a given distribution, in constrast to R where you seem to have prefixes. According to the documentation the first argument should be the distribution, the second argument the point where you want to evaluate the inverse cdf.
Strangely I get an error when I try to do quantile.(d, qs) (broadcast the quantile call). UPDATE: See Bogumil's answer in this case. In my benchmarks, both approaches have the same speed.

Related

Is it possible to flip a formula in R?

I was working with a project and I used the VaR() function from the PerformanceAnalytics package to calculate Value-at-risk. I wanted to find out the probability of a stock generating making a loss of 1% or more. I found a solution to the problem by plugging numbers in to the probability variable, and controlling to see if it was approaching -1%. However, I was curious if it was possible to flip the formula so that I can just plug in the output and then the function will produce what would have been the input.
Produced the loss at 97.5% probability:
VaR(DNOlog, p = 0.975)
Produced a loss of -1% by changing the probability until it fit:
VaR(DNOlog, p = 0.6512184)
Let's get a reproducible example to demonstrate how you would go about this:
library(PerformanceAnalytics)
set.seed(2)
returns <- rnorm(1000, sd = 0.01)
This gives us a sensible result from VaR
VaR(returns, p = 0.975)
#> [,1]
#> VaR -0.01893631
To reverse this, we can use uniroot. This is a function which uses an iterative approach to finding the input value that makes a function return 0:
inverse_VaR <- function(x, target) {
f <- function(p) VaR(x, p)[1, 1] - target
uniroot(f, c(0.6, 0.99999), tol = .Machine$double.eps)$root
}
In our example, if we want to find the p value that makes VaR give an output of -0.01 with our vector returns, we can do:
inverse_VaR(returns, -0.01)
#> [1] 0.848303
And to show this works, we can do:
VaR(returns, 0.848303)
#> [,1]
#> VaR -0.009999999
Created on 2022-04-16 by the reprex package (v2.0.1)
What you want is the inverse function. If it is not too expensive to compute a lot of values of your function, then you can get a good approximation of this by computing many x-y pairs and then getting y as a function of x. Since you don't really say what your function is, I will use a simple function y = x + sin(x) as an example.
x = seq(0,6, 0.01)
y = x + sin(x)
InverseFunction = approxfun(y,x)
## Test with an example
InverseFunction(4) ## gives 4.967601
x1 = 4.967601
x1 + sin(x1) ## 3.999991
If you want more accuracy, use a smaller spacing between the x's.

Nested integration for incomplete convolution of gauss densities

Let g(x) = 1/(2*pi) exp ( - x^2 / 2) be the density of the normal distribution with mean 0 and standard deviation 1. In some calculation on paper appeared integrals of the form
where c>0 is a positive number.
Since I could not evaluate this by hand, I had the idea to approximate and plot it. I tried this in R, because R provides the dnorm function and a function to do integrals.
You see that I need to integrate numerically n times, where n shall be chosed by the call of a plot function. My code has an for-loop to create those "incomplete" convolutions iterativly.
For example even with n=3 and c=1 this gives me an error. n=2 (thus it's one integration) works.
N = 3
ngauss <- function(x) dnorm(x , mean = 0, sd = 1)
convoluts <- list()
convoluts[[1]] <- ngauss
for (i in 2:N) {
h <- function(y) {
g <- function(z) {ngauss(y-z)*convoluts[[i-1]](z)}
return(integrate(g, lower = -1, upper = 1)$value)
}
h <- Vectorize(h)
convoluts[[i]] <- h
}
convoluts[[3]](0)
What I get is:
Error: evaluation nested too deeply: infinite recursion /
options(expressions=)?
I understand that this is a hard computation, but for "small" n something similar should possible.
Maybe someone can help me to fix my code or provide a recommendation how I can implement this in a better way. Another language that is more appropriate for this would be also okay.
The issue appears to be in how integrate deals with variables in different environments. In particular, it doesn't really deal with i correctly in each iteration. Instead using
h <- evalq(function(y) {
g <- function(z) {ngauss(y - z) * convoluts[[i - 1]](z)}
integrate(g, lower = -1, upper = 1)$value
}, list(i = i))
does the job and, say, setting N <- 6 quickly gives
convoluts[[N]](0)
# [1] 0.03423872
As your integration is simply the pdf of a sum of N independent standard normals (which then follows N(0, N)), we may also verify this approach by setting lower = -Inf and upper = Inf. Then with N <- 4 we have
dnorm(0, sd = sqrt(N))
# [1] 0.1994711
convoluts[[N]](0)
# [1] 0.1994711
So, for practical purposes, when c = Inf, you are way better off using dnorm rather than manual computations.

Z-transform of a function in R language

I have a function f(x) that gives me results in time domain. I want to get the z-transform of that function so that I can compare both. I know this would be easy to calculate in MATLAB. However, I'm wondering if there is a way to do it in R by a package or writing a code from scratch. The reason for using R because I have done most of the required work and other calculations in R.(Plus R is free)
I searched and found some suggestions to use scale. However, I think it has to do with data not the function. Also, I found a package GeneNet which has a function called z-transform. However, it gives a vector of numbers. I want to get the z-transform as function of z.
By definition z-transform calculated from :
Update for simplicity:
if we have f(x)= x, where x= 0,1,2,3,4,....100. I want to get the z-transform for the given function f(x).
Based on the above definition of z-transform and by substitution:
x(z) = SUM from n=0 to n=100 of (Xn) *(Z ^-n)
for n=0 => x(z)= (0) (Z^-0)
for n=1 => x(z)= 0 + (1) (z^-1)
for n=2 => x(z)= 0 + (1) (z^-1) + (2) (z^-2)
...
..
Any suggestions?
Seems like you've got two problems: calculating f(x) = x XOR 16, and then computing the z-transform of the result.
Here's an (updated) z-transform function which will work on a defined x optionally an arbitrary n vector (with the default assumption that n starts at 0 and goes up by one for each value of x). It now returns a function that can be used to evaluate various z values:
ztransform = function(x, n = seq_along(x) - 1) {
function(z) sum(x * z ^ -n)
}
my_z_trans = ztransform(x = 0:100, n = 0:100)
my_z_trans(z = 1)
# [1] 5050
my_z_trans(z = 2)
# [1] 2
my_z_trans(z = 3)
# [1] 0.75

Compute multiple Integral and plot them (with R)

I'm having trouble to compute and then plot multiple integral. It would be great if you could help me.
So I have this function
> f = function(x, mu = 30, s = 12){dnorm(x, mu, s)}
which i want to integrate multiple time between z(1:100) to +Inf to plot that with x=z and y = auc :
> auc = Integrate(f, z, Inf)
R return :
Warning message:
In if (is.finite(lower)) { :
the condition has length > 1 and only the first element will be used
I have tested to do a loop :
while(z < 100){
z = 1
auc = integrate(f,z,Inf)
z = z+1}
Doesn't work either ... don't know what to do
(I'm new to R , so I'm already sorry if it is really easy .. )
Thanks for your help :) !
There is no need to do the integrating by hand. pnorm gives the integral from negative infinity to the input for the normal density. You can get the upper tail instead by modifying the lower.tail parameter
z <- 1:100
y <- pnorm(z, mean = 30, sd = 12, lower.tail = FALSE)
plot(z, y)
If you're looking to integrate more complex functions then using integrate will be necessary - but if you're just looking to find probabilities for distributions then there will most likely be a function built in that does the integration for you directly.
Your problem is actually somewhat subtle, and in a certain sense gets to the core of how R works, so here is a slightly longer explanation.
R is a "vectorized" language, which means that just about everything works on vectors. If I have 2 vectors A and B, then A+B is the element-by-element sum of A and B. Nearly all R functions work this way also. If X is a vector, then Y <- exp(X) is also a vector, where each element of Y is the exponential of the corresponding element of X.
The function integrate(...) is one of the few functions in R that is not vectorized. So when you write:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- integrate(f, z, Inf)
the integrate(...) function does not know what to do with z when it is a vector. So it takes the first element and complains. Hence the warning message.
There is a special function in R, Vectorize(...) that turns scalar functions into vectorized functions. You would use it this way:
f <- function(x, mu = 30, s = 12){dnorm(x, mu, s)}
auc <- Vectorize(function(z) integrate(f,z,Inf)$value)
z <- 1:100
plot(z,auc(z), type="l") # plot lines

In R, how do I find the optimal variable to minimise the correlation between two datasets [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, how do I find the optimal variable to maximize or minimize correlation between several datasets
This can be done in Excel, but my dataset has gotten too large. In excel, I would use solver.
I have 5 variables and I want to recreate a weighted average of these 5 variables so that they have the lowest correlation to a 6th variable.
Column A,B,C,D,E = random numbers
Column F = random number (which I want to minimise the correlation to)
Column G = Awi1+Bwi2+C*2i3+D*wi4+wi5*E
where wi1 to wi5 are coefficients resulted from solver In a separate cell, I would have correl(F,G)
This is all achieved with the following constraints in mind:
1. A,B,C,D, E have to be between 0 and 1
2. A+B+C+D+E= 1
I'd like to print the results of this so that I can have an efficient frontier type chart.
How can I do this in R? Thanks for the help.
I looked at the other thread mentioned by Vincent and I think I have a better solution. I hope it is correct. As Vincent points out, your biggest problem is that the optimization tools for such non-linear problems do not offer a lot of flexibility for dealing with your constraints. Here, you have two types of constraints: 1) all your weights must be >= 0, and 2) they must sum to 1.
The optim function has a lower option that can take care of your first constraint. For the second constraint, you have to be a bit creative: you can force your weights to sum to one by scaling them inside the function to be minimized, i.e. rewrite your correlation function as function(w) cor(X %*% w / sum(w), Y).
# create random data
n.obs <- 100
n.var <- 6
X <- matrix(runif(n.obs * n.var), nrow = n.obs, ncol = n.var)
Y <- matrix(runif(n.obs), nrow = n.obs, ncol = 1)
# function to minimize
correl <- function(w)cor(X %*% w / sum(w), Y)
# inital guess
w0 <- rep(1 / n.var, n.var)
# optimize
opt <- optim(par = w0, fn = correl, method = "L-BFGS-B", lower = 0)
optim.w <- opt$par / sum(opt$par)

Resources