How to draw the normal distribution for Y|x - r

Now there are random variables X and Y which have following properties: E(X)=10 Var(X)=4 E(Y|x)=30-x/2 and Var(Y|x)=x
The question is: simulate 10000 realizations(x,y) from this model by assuming normal distribution for X and Y|x, and plot x on y
I only know use rnorm and dnorm function like this
x<-rnorm(10,mean=10,sd=2)
curve(dnorm(x),xlim=c(-5,5),ylim=c(0,0.5),col="red")
but how to deal dnorm(Y|x)
I am not sure this is right:
y<-rnorm(10,mean=(30-0.5*x),sd=sqrt(x))
because it show some error when I want to
curve(dnorm(y),xlim=c(-5,5),ylim=c(0,0.5))

You've already calculated the x's and y's correctly, your issue lies in the curve function. You need to pass a distribution with parameters to the curve function, not realisations of said distributions.
In your case it is easier to plot the distributions using a histogram.
x<-rnorm(1e4,mean=10,sd=2)
y<-rnorm(1e4,mean=(30-0.5*x),sd=sqrt(x))
hist(x)
hist(y)

Related

Empirical CDF vs Theoretical CDF in R

I want to check the "probability integral transform" theorem using R.
Let's suppose X is an exponential random variable with lambda = 5.
I want to check that the random variable U = F_X = 1 - exp(-5*X) has a uniform (0,1) distribution.
How would you do it?
I would start in this way:
nsample <- 1000
lambda <- 5
x <- rexp(nsample, lambda) #1000 exponential observation
u <- 1- exp(-lambda*x) #CDF of x
Then I need to find the CDF of u and compare it with the CDF of a Uniform (0,1).
For the empirical CDF of u I could use the ECDF function:
ECDF_u <- ecdf(u) #empirical CDF of U
Now I should create the theoretical CDF of Uniform (0,1) and plot it on the same graph of the ECDF in order to compare the two graphs.
Can you help with the code?
You are almost there. You don't need to compute the ECDF yourself – qqplot will take care of this. All you need is your sample (u) and data from the distribution you want to check against. The lazy (and not quite correct) approach would be to check against a random sample drawn from a uniform distribution:
qqplot(runif(nsample), u)
But of course, it is better to plot against the theoretical quantiles:
# the actual plot
qqplot( qunif(ppoints(length(u))), u )
# add a line
qqline(u, distribution=qunif, col='red', lwd=2)
Looks pretty good to me.

Octave distribution plots not working

I am trying to plot the cdf of a uniform distribution in octave but I am not getting the cdf. I am simply getting the original distribution. Also the original distribution, which is meant to be a uniform distribution, is not a uniform distribution at all!
Here is my octave code:
x = unifrnd(0,1,100,1);
hist(x)
cdfPlot = unifcdf(x)
hist(cdfPlot)
The histogram for the 1st one (hist(x)):
and the second one (hist(cdfPlot)) :
I also tried to use cdfplot(x) in octave but it said :
warning: the 'cdfplot' function belongs to the statistics package from
Octave Forge but has not yet been implemented.
Please read http://www.octave.org/missing.html to learn how you can
contribute missing functionality.
please help!
Judging by the submitted code, what you are trying to do is obtain a sample from a uniform distribution and then show a flat (mostly) histogram corresponding to a uniform distribution and a line corresponding to the cumulative distribution of the distribution.
For the first part:
Of course, with 100 samples (and no averaging), you are not going to observe a flat distribution, but if you try:
x=unifrnd(0,1,100000,1);
hist(x);
Then you are more likely to get a flat-looking histogram.
For the second part:
unifcdf(x,A,B) will return the value of a uniform distribution's CDF at some value x, between the interval set by parameters A,B. That is, the value of the CDF model itself, NOT the cumulative sum of the sample's histogram. To obtain that, you need to:
x=unifrnd(0,1,100000,1);
[counts, intervals] = hist(x);
xCDF = cumsum(counts);
bar(xCDF);
Finally, if you are looking for the model values, that is the values that would be returned by a formula describing a distribution, then for the uniform distribution that would be a probability of (1/nBins) between your A, B interval (in this case, 0,1) and a count of (1/nBins)*NSamples, while the CDF would be a line of slope (1/nBins) (i.e. the interval of the density function) and of binNum*((1/nBins)*NSamples). In the example above and using the default nBins for hist which is 10, x is decomposed to 10 intervals each with an approximate number of counts of 10000 items of x and the last value of the cumulative sum is 100000 which is of course the total number of samples in x.
For more information please see this link.
Hope this helps.

Fitting Model Parameters To Histogram Data in R

So I've got a data set that I want to parameterise but it is not a Gaussian distribution so I can't parameterise it in terms of it's mean and standard deviation. I want to fit a distribution function with a set of parameters and extract the values of the parameters (eg. a and b) that give the best fit. I want to do this exactly the same as the
lm(y~f(x;a,b))
except that I don't have a y, I have a distribution of different x values.
Here's an example. If I assume that the data follows a Gumbel, double exponential, distribution
f(x;u,b) = 1/b exp-(z + exp-(z)) [where z = (x-u)/b]:
#library(QRM)
#library(ggplot2)
rg <- rGumbel(1000) #default parameters are 0 and 1 for u and b
#then plot it's distribution
qplot(rg)
#should give a nice skewed distribution
If I assume that I don't know the distribution parameters and I want to perform a best fit of the probability density function to the observed frequency data, how do I go about showing that the best fit is (in this test case), u = 0 and b = 1?
I don't want code that simply maps the function onto the plot graphically, although that would be a nice aside. I want a method that I can repeatedly use to extract variables from the function to compare to others. GGPlot / qplot was used as it quickly shows the distribution for anyone wanting to test the code. I prefer to use it but I can use other packages if they are easier.
Note: This seems to me like a really obvious thing to have been asked before but I can't find one that relates to histogram data (which again seems strange) so if there's another tutorial I'd really like to see it.

Probability transformation using R

I want to turn a continuous random variable X with cdf F(x) into a continuous random variable Y with cdf F(y) and am wondering how to implement it in R.
For example, perform a probability transformation on data following normal distribution (X) to make it conform to a desirable Weibull distribution (Y).
(x=0 has CDF F(x=0)=0.5, CDF F(y)=0.5 corresponds to y=5, then x=0 corresponds to y=5 etc.)
There are many built in distribution functions, those starting with a 'p' will transform to a uniform and those starting with a 'q' will transform from a uniform. So the transform in your example can be done by:
y <- qweibull( pnorm( x ), 2, 6.0056 )
Then just change the functions and/or parameters for other cases.
The distr package may also be of interest for additional capabilities.
In general, you can transform an observation x on X to an observation y on Y by
getting the probability of X≤x, i.e. FX(x).
then determining what observation y has the same probability,
I.e. you want the probability Y≤y = FY(y) to be the same as FX(x).
This gives FY(y) = FX(x).
Therefore y = FY-1(FX(x))
where FY-1 is better known as the quantile function, QY. The overall transformation from X to Y is summarized as: Y = QY(FX(X)).
In your particular example, from the R help, the distribution functions for the normal distribution is pnorm and the quantile function for the Weibull distribution is qweibull, so you want to first of all call pnorm, then qweibull on the result.

Plotting fitted values vs observed ones in R or winbugs

I want to plot the fitted values versus the observed ones and want to put straight line showing the goodness of fit. However, I do not want to use abline() because I did not calculate the fitted values using lm command as my I used a model that R does not cover. I calculated the coefficients and used them to calculate the fitted values. So, what can I do to obtain such a plot in R or in winbugs?
Here is what I want
Still no data provided, but maybe this simple example using the curve function will inform the process:
x <- 1:10
y <- 2+ 3*(1:10) + rnorm(10)
plot(1:10, y)
curve( 2+3*x, 0, 10, add=TRUE)
Note to new R users. the expression y_i = 1 - xbeta + delta_i + e_i would fail in R in part because the x and beta are not separated by an operator. But if you do understand R's matrix syntax it might be a very compact expression even if "X" were multidimensional. All of htis depends on the specifics which we are so far lacking.

Resources