How to write a distribution of piecewise functions in R? For example, if a random variable X is a N(0,1) if p=1 and X~N(0,2) when p=0. I try the following code:
You can use ifelse:
X <- function(size){
ifelse(sample(0:1,size,replace = TRUE),rnorm(size,0,1),rnorm(size,0,2))
50% of the time (on average), X will sample from a N(0,1) variable and the other 50% of the time it will sample from N(0,2).
How it works can be seen more clearly if you change the definition of X so that the means of the two variables sampled from are different:
X <- function(size){
ifelse(sample(0:1,size,replace = TRUE),rnorm(size,0,1),rnorm(size,4,1))
Then hist(X(10000)) yields:
#define the function pieces
g =function(x) rnorm(1,0,2)
h =function(x) rnorm(1,0,1)
#define the input
p = c(1,0,1,1,0)
#longer input
#p = sample(c(0,1),2000,replace = T)
piecewise_function= function(p) {
case_when( p==1 ~ g() , # a condition a tilde and a function
p==0 ~ h() ,
T ~ NA) #what to do if neither condition is met.
Try any of these where n is the number of sample size:
rnorm(n, 0, 1 * (p == 1) + 2 * (p == 0))
rnorm(n, 0, ifelse(p == 1, 1, 2))
rnorm(n, 0, 1 + !p)
So I have an assignment, where I have to show the convergence of regression coefficients to a certain value if the observed variable has a measurement error. The idea is to show the convergence depending on the number of observations as well as on the standard deviations of the variables.
I built the following function that should create a matrix with the regression coefficients depending on the number of observations. In a later step I would want to show this in a plot and then in a shiny webapp.
The function is:
Deviation <- function(N, sd_v = 1, sd_u = 1, sd_w = 1){
b_1 <- 1
b_2 <- 2
for ( j in length(1:N)){
v <- rnorm(j, mean = 0, sd_v)
u <- rnorm(j, mean = 0, sd_u)
w <- rnorm(j, mean = 0, sd_w)
X <- u + w
Y <- b_1 + b_2 * X + v
Reg <- lm(Y~X)
if (j==1) {
Coeffs <- matrix(Reg$coefficients)
} else {
Coeffs <- rbind(Coeffs, Reg$coefficients)
Coeffs <-
I always get the error that the variable Coeffs is not defined...
Thanks in advance!
As pointed out in the discussion, one possible solution is to change the length(1:N), to simply 1:Nas written below. This works for me.
Deviation <- function(N, sd_v = 1, sd_u = 1, sd_w = 1){
b_1 <- 1
b_2 <- 2
for ( j in 1:N){
v <- rnorm(j, mean = 0, sd_v)
u <- rnorm(j, mean = 0, sd_u)
w <- rnorm(j, mean = 0, sd_w)
X <- u + w
Y <- b_1 + b_2 * X + v
Reg <- lm(Y~X)
if (j==1) {
Coeffs <- matrix(Reg$coefficients)
} else {
Coeffs <- rbind(Coeffs, Reg$coefficients)
Coeffs <-
followed by...
For a simulation study, I want to generate a set of random variables (both continuous and binary) that have predefined associations to an already existing binary variable, denoted here as x.
For this post, assume that x is generated following the code below. But remember: in real life, x is an already existing variable.
x <- rbinom(1000, 1, 0.6)
I want to generate both a binary variable and a continuous variable. I have figured out how to generate a continuous variable (see code below)
cor <- 0.8 #Correlation
y <- rnorm(1000, cor*x, sqrt(1-cor^2))
But I can't find a way to generate a binary variable that is correlated to the already existing variable x. I found several R packages, such as copula which can generate random variables with a given dependency structure. However, they do not provide a possibility to generate variables with a set dependency on an already existing variable.
Does anyone know how to do this in an efficient way?
If we look at the formula for correlation:
For the new vector y, if we preserve the mean, the problem is easier to solve. That means we copy the vector x and try to flip a equal number of 1s and 0s to achieve the intended correlation value.
If we let E(X) = E(Y) = x_bar , and E(XY) = xy_bar, then for a given rho, we simplify the above to:
(xy_bar - x_bar^2) / (x_bar - x_bar^2) = rho
Solve and we get:
xy_bar = rho * x_bar + (1-rho)*x_bar^2
And we can derive a function to flip a number of 1s and 0s to get the result:
create_vector = function(x,rho){
n = length(x)
x_bar = mean(x)
xy_bar = rho * x_bar + (1-rho)*x_bar^2
toflip = sum(x == 1) - round(n * xy_bar)
y = x
y[sample(which(x==0),toflip)] = 1
y[sample(which(x==1),toflip)] = 0
For your example it works:
x <- rbinom(1000, 1, 0.6)
[1] 0.7986037
There are some extreme combinations of intended rho and p where you might run into problems, for example:
res = lapply(1:1000,function(i){
this_rho = runif(1)
this_p = runif(1)
x = rbinom(1000,1,this_p)
intended_rho = this_rho,
p = this_p,
resulting_cor = cor(x,create_vector(x,this_rho))
res =,res)
ggplot(res,aes(x=intended_rho,y=resulting_cor,col=p)) + geom_point()
Here's a binomial one - the formula for q only depends on the mean of x and the correlation you desire.
cor <- 0.8
x <- rbinom(100000, 1, 0.6)
p <- mean(x)
q <- 1/((1-p)/cor^2+p)
y <- rbinom(100000, 1, q)
z <- x*y
#> [1] 0.7984781
This is not the only way to do this - note that mean(z) is always less than mean(x) in this construction.
The continuous variable is even less well defined - do you really not care about its mean/variance, or anything else about its distibution?
Here's another simple version where it flips the variable both ways:
cor <- 0.8
x <- rbinom(100000, 1, 0.6)
p <- mean(x)
q <- (1+cor/sqrt(1-(2*p-1)^2*(1-cor^2)))/2
y <- rbinom(100000, 1, q)
z <- x*y+(1-x)*(1-y)
#> [1] 0.8001219
#> [1] 0.57908
Find the MLE of the non-linear distribution (in R, using a Gauss-Newton method):
y = sin(x*theta) + epsilon
where epsilon ~ N(0 , 0.01^2)
To do this, I've been asked to generate some data that is uniformly (and randomly) distributed from 0 <= x <= 10 , with n = 200 and theta = 2 (just for generation).
For instance, values that are close to the maximum of the sin function (1, 4 etc.) will converge but others won't.
I now understand what theta.iter means but I cannot seem to understand why it converges only sometimes and even then, which values to input to get a useful output of. Can someone explain?
theta <- 2
x <- runif(200, 0, 10)
x <- sort(x) #this is just to sort the generated data so that plotting it
#actually looks like a sine funciton
y <- sin(x*theta) + rnorm(200, mean = 0, sd = 0.1^2)
GN_sin <- function(theta.iter, x , y, epsilon){
index <- TRUE
while (index){
y.iter <- matrix(y - sin(x*theta.iter), 200, 1)
x.iter <- matrix(theta.iter*cos(x*theta.iter), 200, 1) <- theta.iter +
if (abs( < epsilon) {index <- FALSE}
theta.iter <- as.vector(
cat(theta.iter, '\n')
I am trying to set up a function in R that computes a polynomial
P(x) = c1 + c2*x + c3*x^2 + ... + cn-1*x^n-2 + cn*x^n-1
for various values of x and set coefficients c.
Horner's method is to
Set cn = bn
For i = n-1, n-1, ..., 2, 1, set bi = bi+1*x + ci
Return the output
What I have so far:
hornerpoly1 <- function(x, coef, output = tail(coef,n=1), exp = seq_along(coef)-1) {
for(i in 1:tail(exp,n=1)) {
hornerpoly <- function(x, coef) {
if(length(coef)<2) {
stop("Must be more than one coefficient")
sapply(x, hornerpoly1, coef, output,exp)
I also need to error check on the length of coef, that's what the if statement is for but I am not struggling with that part. When I try to compute this function for x = 1:3 and coef = c(4,16,-1), I get three NULL statements, and I can't figure out why. Any help on how to better construct this function or remedy the null output is appreciated. Let me know if I can make anything more clear.
How about the following:
Define a function that takes x as the argument at which to evaluate the polynomial, and coef as the vector of coefficients in decreasing order of degree. So the vector coef = c(-1, 16, 4) corresponds to P(x) = -x^2 + 16 * x + 4.
The Horner algorithm is implemented in the following function:
f.horner <- function(x, coef) {
n <- length(coef);
b <- rep(0, n);
b[n] <- coef[n];
while (n > 0) {
n <- n - 1;
b[n] <- coef[n] + b[n + 1] * x;
We evaluate the polynomial at x = 1:3 for coef = c(-1, 16, 4):
sapply(1:3, f.horner, c(-1, 16, 4))
#[1] 19 47 83
Some final comments:
Note that the check on the length of coef is realised in the statement while (n > 0) {...}, i.e. we go through the coefficients starting from the last and stop when we reach the first coefficient.
You don't need to save the intermediate b values as a vector in the function. This is purely for (my) educational/trouble-shooting purposes. It's easy to rewrite the code to store bs last value, and then update b every iteration. You could then also vectorise f.horner to take a vector of x values instead of only a scalar.
I'm simulating another dataset here, and am stuck again!
Here's what I want to do:
200 observations, with 90 independent variables (mean 0, sd 1)
the equation to create y is: y = 2x_1 + ... + 2x_30 - x_31 - ... - x_60 + 0*x_61 + ... + 0*x_90 + mu
(In other words, the first 30 x values will have a coefficient of 2, next 30 values have coefficient of -1 and last 30 values have coefficient of 0). mu is also a random generated normal variable with mean 0, sd 10.
Here's what I have so far:
n <- 200
mu <- rnorm(200,0,10)
p1 <- for(i in 1:200){
p2 <- cbind(p1)
p3 <- for(i in 1:90){
if i<=30, y=2x
if i>30 & i<=60, y=-x
if i>60 & i<=90, y=0x
I'm still learning many aspects of R, so I'm pretty sure the code has much wrong with it, even in terms of syntax. Your help would really be appreciated!
coefs <- rep(c(2, -1, 0), each=30)
mu <- rnorm(200, 0, 10)
m <- rep(0, 90) # mean of independent variables
sig <- diag(90) # cov of indep variables
x <- rmvnorm(200, mean=m, sigma=sig) # generates 200 observations from multivariate normal
y <- x%*%coefs + mu
In case, if you are not comfortable with linear-algebra
n <- 200
coefs <- rep(c(2, -1, 0), each=30)
mu <- rnorm(n, 0, 10)
x <- matrix(nrow=n, ncol=90) # initializes the indep.vars
for(i in 1:90){
x[, i] <- rnorm(200, 0, 1)
y <- rep(NA, n) # initializes the dependent vars
for(i in 1:n){
y[i] = sum(x[i,]*coefs) + m[i]
x[i,]*coefs gives exactly (2*x_1,..., 2*x_30, -x_31,...,- x_60,0*x_61,...,0*x_90) because * is element-wise operation.
You'd better learn the rudimentaries of R, before actually doing something with it.