i want to generate a second vector in r which is correlated to my first vector.
the first vector is simply created as following:
x <- rbinom(n=10,1,p=0.8)
x
[1] 0 0 1 0 1 1 1 1 0 0
my second vector should be generated with a defined correlation e.g. 0.8.
i know that you can use mvrnorm() for the normal distribution, but i dont know how to do it for the binomial distribution. i tried to find some solution but the suggestions were a bit too complicated for me or i could not apply to my code.
I second the recommendation that you visit Cross Validated. It is not clear how you are planning to use the correlated binomial distribution. Given your stipulation that you are starting with one vector and you want to create a second based on the first, all you need to do is adjust the probabilities of the second vector:
set.seed(42)
x <- rbinom(n=1000, size=1, p=0.8) # Your first vector
y <- rbinom(n=1000, size=1, p=ifelse(x==1, .95, .05))
cor(x, y)
# [1] 0.8505885
y <- rbinom(n=1000, size=1, p=ifelse(x==1, .94, .06))
cor(x, y)
# [1] 0.821918
y <- rbinom(n=1000, size=1, p=ifelse(x==1, .93, .07))
cor(x, y)
# [1] 0.7679597
In generating the second vector, the probability for the second vector is greater than .8 (the probability of a 1) if the value is 1 in the first vector and less than .2 (the probability of a 0) if the value is 0 in the first vector.
Related
For example I have a vector about possibility is
myprob <- (0.58, 0.51, 0.48, 0.46, 0.62)
And I want to sampling a series of number between 1 and 0 each time by the probability of c(1-myprob, myprob),
which means in the first number in the series, the function sample 1 and 0 by (0.42, 0.58), the second by (0.49, 0.50) and so on,
how can I generate the 5 numbers by sample?
The syntax of
Y <- sample(c(1,0), 1, replace=F, prob=c(1-myprob, prob))
would have incorrect number of probabilities and only 1 number output if I specify the prob;
while the syntax of
Y <- sample(c(1,0), 5, replace=F, prob=c(1-myprob, prob))
would have the probabilities focus on only 0.62(or not I am not sure, but the results seems not correct at all)
Thanks for any reply in advance!
If myprob is the probability of drawing 1 for each iteration, then you can use rbinom, with n = 5 and size = 1 (5 iterations of a 1-0 draw).
set.seed(2)
rbinom(n = 5, size = 1, prob = myprob)
[1] 1 0 1 0 0
Maël already proposed a great solution sampling from a binomial distribution. There are probably many more alternatives and I just wanted to suggest two of them:
runif()
as.integer(runif(5) > myprob)
This will first generate a series of 5 uniformly distributed random numbers between 0 and 1, then compare that vector against myprob and convert the logical values TRUE/FALSE to 1/0.
vapply(sample())
vapply(myprob, function(p) sample(1:0, 1, prob = c(1-p, p)), integer(1))
This is what you may have been looking for in the first place. This executes the sample() command by iterating over the values of myprob as p and returns the 5 draws as a vector.
I am trying to run some code on R based on this paper here through example 5.1. I want to simulate the following:
My background on R isn't great so I have the following code below, how can I generate a histogram and samples from this?
xseq<-seq(0, 100, 1)
n<-100
Z<- pnorm(xseq,0,1)
U<- pbern(xseq, 0.4, lower.tail = TRUE, log.p = FALSE)
Beta <- (-1)^U*(4*log(n)/(sqrt(n)) + abs(Z))
Some demonstrations of tools that will be of use:
rnorm(1) # generates one standard normal variable
rnorm(10) # generates 10 standard normal variables
rnorm(1, 5, 6) # generates 1 normal variable with mu = 5, sigma = 6
# not needed for this problem, but perhaps worth saying anyway
rbinom(5, 1, 0.4) # generates 5 Bernoulli variables that are 1 w/ prob. 0.4
So, to generate one instance of a beta:
n <- 100 # using the value you gave; I have no idea what n means here
u <- rbinom(1, 1, 0.4) # make one Bernoulli variable
z <- rnorm(1) # make one standard normal variable
beta <- (-1)^u * (4 * log(n) / sqrt(n) + abs(z))
But now, you'd like to do this many times for a Monte Carlo simulation. One way you might do this is by building a function, having beta be its output, and using the replicate() function, like this:
n <- 100 # putting this here because I assume it doesn't change
genbeta <- function(){ # output of this function will be one copy of beta
u <- rbinom(1, 1, 0.4)
z <- rnorm(1)
return((-1)^u * (4 * log(n) / sqrt(n) + abs(z)))
}
# note that we don't need to store beta anywhere directly;
# rather, it is just the return()ed value of the function we defined
betadraws <- replicate(5000, genbeta())
hist(betadraws)
This will have the effect of making 5000 copies of your beta variable and putting them in a histogram.
There are other ways to do this -- for instance, one might just make a big matrix of the random variables and work directly with it -- but I thought this would be the clearest approach for starting out.
EDIT: I realized that I ignored the second equation entirely, which you probably didn't want.
We've now made a vector of beta values, and you can control the length of the vector in the first parameter of the replicate() function above. I'll leave it as 5000 in my continued example below.
To get random samples of the Y vector, you could use something like:
x <- replicate(5000, rnorm(17))
# makes a 17 x 5000 matrix of independent standard normal variables
epsilon <- rnorm(17)
# vector of 17 standard normals
y <- x %*% betadraws + epsilon
# y is now a 17 x 1 matrix (morally equivalent to a vector of length 17)
and if you wanted to get many of these, you could wrap that inside another function and replicate() it.
Alternatively, if you didn't want the Y vector, but just a single Y_i component:
x <- rnorm(5000)
# x is a vector of 5000 iid standard normal variables
epsilon <- rnorm(1)
# epsilon_i is a single standard normal variable
y <- t(x) %*% betadraws + epsilon
# t() is the transpose function; y is now a 1 x 1 matrix
Beginner R user here. I am using the cor function to get the Kendal's tau-b rank correlation coefficient between 2 columns of a dataframe. Examples of such columns are as folows:
A B
1 1
1 2
1 3
when I use cor(d,method="kendall")
The result is NA for the correlation between A and B. Shouldnt it be 0? And if not is there a way that I can replace this NA result with 0 using a parameter in the cor function?
Consider what would happen if we slightly perturb the constant column. We get vastly different solutions depending on the particular perturbation used. In fact we can get any correlation we like with different perturbations. As a result it really makes no sense to use any particular value for the correlation and it would be best left as NA.
x <- c(1, 1, 1)
y <- 1:3
cor(x + (1:3) * 1e-10, y, method = "spearman")
## [1] 1
cor(x - (1:3) * 1e-10, y, method = "spearman")
## [1] -1
I want to compute the coefficients of a polynomial based on a vector containing the roots. I first defined a vector of coefficients:
pol <- c(0,1,2,3,4)
and computed the roots
roots <- polyroot(pol)
to have a test result.
Then i tried the following:
result <- 1
for (n in 1:(length(roots))){
result <- c(result, 0) + c(0,-roots[n]*result)
}
But the my result is the following:
result
[1] 1.00+0i 0.75+0i 0.50+0i 0.25+0i 0.00+0i
What am I missing here?
Notice that
identical(polyroot(pol), polyroot(pol / 4))
# [1] TRUE
That is, by going from a polynomial to its roots you lose information about the coefficient of the highest degree term (in this case, 4). For instance, 2x^2-x=2x(x-1/2), but also x^2-x/2=x(x-1/2), so that the roots are the same and we only normalized the first polynomial with respect to the quadratic term. So,
Re(result) * 4
# [1] 4 3 2 1 0
gives the result but also requires the knowledge of tail(pol, 1).
I am a new R user and I am trying to produce vectors with numbers randomly generated based on a specific distribution (with the rnorm command for example) with the vectors having a pre-defined sum of probability densities or sum of cumulative distributions.
For example, when generating vectors x1, x2 … xn I want them to obey either
sum(pnorm(x1)) = sum(pnorm(x2)) = … sum(pnorm(xn))
or
sum(pnorm(xi)) = ”fixed value”
or do the same but with dnorm. In other words, is there a possibility to set such parameters when using rnorm or any other RNG in R?
Tips and suggestions for strategies instead of complete solutions would also be greatly appreciated.
Many thanks in advance for your time.
1.
In the case of a Gaussian distribution,
sampling from (X1,...,Xn) under the condition that X1+...+Xn=s
is just sampling from a
conditional Gaussian distribution.
The vector (X1,X2,...,Xn,X1+...+Xn) has a Gaussian distribution, with zero mean,
and variance matrix
1 0 0 ... 0 1
0 1 0 ... 0 1
0 0 1 ... 0 1
...
0 0 0 ... 1 1
1 1 1 ... 1 n.
We can therefore sample from it as follows.
s <- 1 # Desired sum
n <- 10
mu1 <- rep(0,n)
mu2 <- 0
V11 <- diag(n)
V12 <- as.matrix(rep(1,n))
V21 <- t(V12)
V22 <- as.matrix(n)
mu <- mu1 + V12 %*% solve(V22, s - mu2)
V <- V11 - V12 %*% solve(V22,V21)
library(mvtnorm)
# Random vectors (in each row)
x <- rmvnorm( 100, mu, V )
# Check the sum and the distribution
apply(x, 1, sum)
hist(x[,1])
qqnorm(x[,1])
For an arbitrary distribution, this approach would require you to compute the conditional distribution, which may not be easy.
2.
There is another easy special case: a uniform distribution.
To uniformly sample n (positive) numbers that sum up to 1,
you can take n-1 numbers,
uniformly in [0,1],
and sort them: they define n intervals,
whose lengths turn sum up to 1, and happen to be uniformly distributed.
Since those points form a Poisson process,
you can also generate them with an exponential distribution.
x <- rexp(n)
x <- x / sum(x) # Sums to 1, and each coordinate is uniform in [0,1]
This idea is explained (with a lot of pictures) in the following article:
Portfolio Optimization for VaR, CVaR, Omega and Utility with General Return Distributions,
(W.T. Shaw, 2011), pages 6 to 8.
3.
(EDIT) I had initially misread the question, which was asking about sum(pnorm(x)), not sum(x). This turns out to be easier.
If X has a Gaussian distribution, then pnorm(X) has a uniform distribution:
the problem is then to sample from a uniform distribution, with a prescribed sum.
n <- 10
s <- 1 # Desired sum
p <- rexp(n)
p <- p / sum(p) * s # Uniform, sums to s
x <- qnorm(p) # Gaussian, the p-values sum to s