How to write a distribution of piecewise functions in R? For example, if a random variable X is a N(0,1) if p=1 and X~N(0,2) when p=0. I try the following code:
if(p==1)(X=rnorm(1,0,2))?
You can use ifelse:
X <- function(size){
ifelse(sample(0:1,size,replace = TRUE),rnorm(size,0,1),rnorm(size,0,2))
}
50% of the time (on average), X will sample from a N(0,1) variable and the other 50% of the time it will sample from N(0,2).
How it works can be seen more clearly if you change the definition of X so that the means of the two variables sampled from are different:
X <- function(size){
ifelse(sample(0:1,size,replace = TRUE),rnorm(size,0,1),rnorm(size,4,1))
}
Then hist(X(10000)) yields:
library(tidyverse)
#define the function pieces
g =function(x) rnorm(1,0,2)
h =function(x) rnorm(1,0,1)
#define the input
p = c(1,0,1,1,0)
#longer input
#p = sample(c(0,1),2000,replace = T)
piecewise_function= function(p) {
case_when( p==1 ~ g() , # a condition a tilde and a function
p==0 ~ h() ,
T ~ NA) #what to do if neither condition is met.
}
piecewise_function(p)
Try any of these where n is the number of sample size:
rnorm(n, 0, 1 * (p == 1) + 2 * (p == 0))
rnorm(n, 0, ifelse(p == 1, 1, 2))
rnorm(n, 0, 1 + !p)
Related
So I have an assignment, where I have to show the convergence of regression coefficients to a certain value if the observed variable has a measurement error. The idea is to show the convergence depending on the number of observations as well as on the standard deviations of the variables.
I built the following function that should create a matrix with the regression coefficients depending on the number of observations. In a later step I would want to show this in a plot and then in a shiny webapp.
The function is:
Deviation <- function(N, sd_v = 1, sd_u = 1, sd_w = 1){
b_1 <- 1
b_2 <- 2
for ( j in length(1:N)){
v <- rnorm(j, mean = 0, sd_v)
u <- rnorm(j, mean = 0, sd_u)
w <- rnorm(j, mean = 0, sd_w)
X <- u + w
Y <- b_1 + b_2 * X + v
Reg <- lm(Y~X)
if (j==1) {
Coeffs <- matrix(Reg$coefficients)
} else {
Coeffs <- rbind(Coeffs, Reg$coefficients)
}
}
Coeffs <- as.data.frame(Coeffs)
return(Coeffs)
}
Deviation(100)
I always get the error that the variable Coeffs is not defined...
Thanks in advance!
As pointed out in the discussion, one possible solution is to change the length(1:N), to simply 1:Nas written below. This works for me.
Deviation <- function(N, sd_v = 1, sd_u = 1, sd_w = 1){
b_1 <- 1
b_2 <- 2
for ( j in 1:N){
v <- rnorm(j, mean = 0, sd_v)
u <- rnorm(j, mean = 0, sd_u)
w <- rnorm(j, mean = 0, sd_w)
X <- u + w
Y <- b_1 + b_2 * X + v
Reg <- lm(Y~X)
if (j==1) {
Coeffs <- matrix(Reg$coefficients)
} else {
Coeffs <- rbind(Coeffs, Reg$coefficients)
}
}
Coeffs <- as.data.frame(Coeffs)
return(Coeffs)
}
followed by...
Deviation(100)
For a simulation study, I want to generate a set of random variables (both continuous and binary) that have predefined associations to an already existing binary variable, denoted here as x.
For this post, assume that x is generated following the code below. But remember: in real life, x is an already existing variable.
set.seed(1245)
x <- rbinom(1000, 1, 0.6)
I want to generate both a binary variable and a continuous variable. I have figured out how to generate a continuous variable (see code below)
set.seed(1245)
cor <- 0.8 #Correlation
y <- rnorm(1000, cor*x, sqrt(1-cor^2))
But I can't find a way to generate a binary variable that is correlated to the already existing variable x. I found several R packages, such as copula which can generate random variables with a given dependency structure. However, they do not provide a possibility to generate variables with a set dependency on an already existing variable.
Does anyone know how to do this in an efficient way?
Thanks!
If we look at the formula for correlation:
For the new vector y, if we preserve the mean, the problem is easier to solve. That means we copy the vector x and try to flip a equal number of 1s and 0s to achieve the intended correlation value.
If we let E(X) = E(Y) = x_bar , and E(XY) = xy_bar, then for a given rho, we simplify the above to:
(xy_bar - x_bar^2) / (x_bar - x_bar^2) = rho
Solve and we get:
xy_bar = rho * x_bar + (1-rho)*x_bar^2
And we can derive a function to flip a number of 1s and 0s to get the result:
create_vector = function(x,rho){
n = length(x)
x_bar = mean(x)
xy_bar = rho * x_bar + (1-rho)*x_bar^2
toflip = sum(x == 1) - round(n * xy_bar)
y = x
y[sample(which(x==0),toflip)] = 1
y[sample(which(x==1),toflip)] = 0
return(y)
}
For your example it works:
set.seed(1245)
x <- rbinom(1000, 1, 0.6)
cor(x,create_vector(x,0.8))
[1] 0.7986037
There are some extreme combinations of intended rho and p where you might run into problems, for example:
set.seed(111)
res = lapply(1:1000,function(i){
this_rho = runif(1)
this_p = runif(1)
x = rbinom(1000,1,this_p)
data.frame(
intended_rho = this_rho,
p = this_p,
resulting_cor = cor(x,create_vector(x,this_rho))
)
})
res = do.call(rbind,res)
ggplot(res,aes(x=intended_rho,y=resulting_cor,col=p)) + geom_point()
Here's a binomial one - the formula for q only depends on the mean of x and the correlation you desire.
set.seed(1245)
cor <- 0.8
x <- rbinom(100000, 1, 0.6)
p <- mean(x)
q <- 1/((1-p)/cor^2+p)
y <- rbinom(100000, 1, q)
z <- x*y
cor(x,z)
#> [1] 0.7984781
This is not the only way to do this - note that mean(z) is always less than mean(x) in this construction.
The continuous variable is even less well defined - do you really not care about its mean/variance, or anything else about its distibution?
Here's another simple version where it flips the variable both ways:
set.seed(1245)
cor <- 0.8
x <- rbinom(100000, 1, 0.6)
p <- mean(x)
q <- (1+cor/sqrt(1-(2*p-1)^2*(1-cor^2)))/2
y <- rbinom(100000, 1, q)
z <- x*y+(1-x)*(1-y)
cor(x,z)
#> [1] 0.8001219
mean(z)
#> [1] 0.57908
Find the MLE of the non-linear distribution (in R, using a Gauss-Newton method):
y = sin(x*theta) + epsilon
where epsilon ~ N(0 , 0.01^2)
To do this, I've been asked to generate some data that is uniformly (and randomly) distributed from 0 <= x <= 10 , with n = 200 and theta = 2 (just for generation).
For instance, values that are close to the maximum of the sin function (1, 4 etc.) will converge but others won't.
EDITED
I now understand what theta.iter means but I cannot seem to understand why it converges only sometimes and even then, which values to input to get a useful output of. Can someone explain?
theta <- 2
x <- runif(200, 0, 10)
x <- sort(x) #this is just to sort the generated data so that plotting it
#actually looks like a sine funciton
y <- sin(x*theta) + rnorm(200, mean = 0, sd = 0.1^2)
GN_sin <- function(theta.iter, x , y, epsilon){
index <- TRUE
while (index){
y.iter <- matrix(y - sin(x*theta.iter), 200, 1)
x.iter <- matrix(theta.iter*cos(x*theta.iter), 200, 1)
theta.new <- theta.iter +
solve(t(x.iter)%*%x.iter)%*%t(x.iter)%*%y.iter
if (abs(theta.new-theta.iter) < epsilon) {index <- FALSE}
theta.iter <- as.vector(theta.new)
cat(theta.iter, '\n')
}
}
I am trying to set up a function in R that computes a polynomial
P(x) = c1 + c2*x + c3*x^2 + ... + cn-1*x^n-2 + cn*x^n-1
for various values of x and set coefficients c.
Horner's method is to
Set cn = bn
For i = n-1, n-1, ..., 2, 1, set bi = bi+1*x + ci
Return the output
What I have so far:
hornerpoly1 <- function(x, coef, output = tail(coef,n=1), exp = seq_along(coef)-1) {
for(i in 1:tail(exp,n=1)) {
(output*x)+head(tail(coef,n=i),n=1)
}
}
hornerpoly <- function(x, coef) {
exp<-seq_along(coef)-1
output<-tail(coef,n=1)
if(length(coef)<2) {
stop("Must be more than one coefficient")
}
sapply(x, hornerpoly1, coef, output,exp)
}
I also need to error check on the length of coef, that's what the if statement is for but I am not struggling with that part. When I try to compute this function for x = 1:3 and coef = c(4,16,-1), I get three NULL statements, and I can't figure out why. Any help on how to better construct this function or remedy the null output is appreciated. Let me know if I can make anything more clear.
How about the following:
Define a function that takes x as the argument at which to evaluate the polynomial, and coef as the vector of coefficients in decreasing order of degree. So the vector coef = c(-1, 16, 4) corresponds to P(x) = -x^2 + 16 * x + 4.
The Horner algorithm is implemented in the following function:
f.horner <- function(x, coef) {
n <- length(coef);
b <- rep(0, n);
b[n] <- coef[n];
while (n > 0) {
n <- n - 1;
b[n] <- coef[n] + b[n + 1] * x;
}
return(b[1]);
}
We evaluate the polynomial at x = 1:3 for coef = c(-1, 16, 4):
sapply(1:3, f.horner, c(-1, 16, 4))
#[1] 19 47 83
Some final comments:
Note that the check on the length of coef is realised in the statement while (n > 0) {...}, i.e. we go through the coefficients starting from the last and stop when we reach the first coefficient.
You don't need to save the intermediate b values as a vector in the function. This is purely for (my) educational/trouble-shooting purposes. It's easy to rewrite the code to store bs last value, and then update b every iteration. You could then also vectorise f.horner to take a vector of x values instead of only a scalar.
I'm simulating another dataset here, and am stuck again!
Here's what I want to do:
200 observations, with 90 independent variables (mean 0, sd 1)
the equation to create y is: y = 2x_1 + ... + 2x_30 - x_31 - ... - x_60 + 0*x_61 + ... + 0*x_90 + mu
(In other words, the first 30 x values will have a coefficient of 2, next 30 values have coefficient of -1 and last 30 values have coefficient of 0). mu is also a random generated normal variable with mean 0, sd 10.
Here's what I have so far:
set.seed(11)
n <- 200
mu <- rnorm(200,0,10)
p1 <- for(i in 1:200){
rnorm(200,0,1)
}
p2 <- cbind(p1)
p3 <- for(i in 1:90){
if i<=30, y=2x
if i>30 & i<=60, y=-x
if i>60 & i<=90, y=0x
}
I'm still learning many aspects of R, so I'm pretty sure the code has much wrong with it, even in terms of syntax. Your help would really be appreciated!
Thanks!
Try
library(mvtnorm)
coefs <- rep(c(2, -1, 0), each=30)
mu <- rnorm(200, 0, 10)
m <- rep(0, 90) # mean of independent variables
sig <- diag(90) # cov of indep variables
x <- rmvnorm(200, mean=m, sigma=sig) # generates 200 observations from multivariate normal
y <- x%*%coefs + mu
In case, if you are not comfortable with linear-algebra
n <- 200
coefs <- rep(c(2, -1, 0), each=30)
mu <- rnorm(n, 0, 10)
x <- matrix(nrow=n, ncol=90) # initializes the indep.vars
for(i in 1:90){
x[, i] <- rnorm(200, 0, 1)
}
y <- rep(NA, n) # initializes the dependent vars
for(i in 1:n){
y[i] = sum(x[i,]*coefs) + m[i]
}
x[i,]*coefs gives exactly (2*x_1,..., 2*x_30, -x_31,...,- x_60,0*x_61,...,0*x_90) because * is element-wise operation.
You'd better learn the rudimentaries of R, before actually doing something with it.