uppose that i have a poisson distribution with mean of 6 i would like to plot a probability mass function which includes an overlay of the approximating normal density.
This is what i have tried
plot( dpois( x=0:10, lambda=6 ))
this produces
which is wrong since it doesnt contain an overlay of approxiamating noral density
How do i go about this?
Something like what you seem to be asking for (I'm outlining the commands and the basic ideas, but checking the help on the functions and trying should fill in the remaining details):
taking a wider range of x-values (out to at least 13 or so) and use xlim to extend the plot slightly into the negatives (maybe to -1.5) and
plotting the pmf of the Poisson with solid dots (similar to your command but with pch=16 as an argument to plot) with a suitable color, then
call points with the same x and y arguments as above and have type=h and lty=3 to get vertical dotted lines (to give a clear impression of the relative heights, somewhat akin to the appearance of a Cleveland dot-chart); I'd use the same colour as the dots or a slightly lighter/greyer version of the dot-colour
use curve to draw the normal curve with the same mean and standard deviation as the Poisson with mean 6 (see details at the Wikipedia page for the Poisson which gives the mean and variance), but across the wider range we plotted; I'd use a slightly contrasting colour for that.
I'd draw a light x-axis in (e.g. using abline with the h argument)
Putting all those suggestions together:
(However, while it's what you're asking for it's not strictly a suitable way to compare discrete and continuous variables since density and pmf are not on the same scale, since density is not probability -- the "right" comparison between a Poisson and an approximating normal would be on the scale of the cdfs so you compare like with like -- they'd both be on the scale of probabilities then)
Related
I have an algorithm that uses an x,y plot of sorted y data to produce an ogive.
I then derive the area under the curve to derive %'s.
I'd like to do something similar using kernel density estimation. I like how the upper/lower bounds are smoothed out using kernel densities (i.e. the min and max will extend slightly beyond my hard coded input).
Either way... I was wondering if there is a way to treat an ogive as a type of cumulative distribution function and/or use kernel density estimation to derive a cumulative distribution function given y data?
I apologize if this is a confusing question. I know there is a way to derive a cumulative frequency graph (i.e. ogive). However, I can't determine how to derive a % given this cumulative frequency graph.
What I don't want is an ecdf. I know how to do that, and I am not quite trying to capture an ecdf. But, rather integration of an ogive given two intervals.
I'm not exactly sure what you have in mind, but here's a way to calculate the area under the curve for a kernel density estimate (or more generally for any case where you have the y values at equally spaced x-values (though you can, of course, generalize to variable x intervals as well)):
library(zoo)
# Kernel density estimate
# Set n to higher value to get a finer grid
set.seed(67839)
dens = density(c(rnorm(500,5,2),rnorm(200,20,3)), n=2^5)
# How to extract the x and y values of the density estimate
#dens$y
#dens$x
# x interval
dx = median(diff(dens$x))
# mean height for each pair of y values
h = rollmean(dens$y, 2)
# Area under curve
sum(h*dx) # 1.000943
# Cumulative area
# cumsum(h*dx)
# Plot density, showing points at which density is calculated
plot(dens)
abline(v=dens$x, col="#FF000060", lty="11")
# Plot cumulative area under curve, showing mid-point of each x-interval
plot(dens$x[-length(dens$x)] + 0.5*dx, cumsum(h*dx), type="l")
abline(v=dens$x[-length(dens$x)] + 0.5*dx, col="#FF000060", lty="11")
UPDATE to include ecdf function
To address your comments, look at the two plots below. The first is the empirical cumulative distribution function (ECDF) of the mixture of normal distributions that I used above. Note that the plot of this data looks the same below as it does above. The second is a plot of the ECDF of a plain vanilla normal distribution, mean=0, sd=1.
set.seed(67839)
x = c(rnorm(500,5,2),rnorm(200,20,3))
plot(ecdf(x), do.points=FALSE)
plot(ecdf(rnorm(1000)))
I have some data which I have fit a quadratic curve to using
model<-lm(Frequency ~ poly(Distance, 2, raw=TRUE))
I want to then draw this curve on the scatterplot of my data. I've tried using
lines(predict(model))
based on some information I find online, but it doesn't work quite right as the resulting curve is squished into the left side of the plot.
Ignore the regression line.
I believe the problem is that my variable Distance is a set of values each 5 greater than the previous, and that when I plot the curve it ignores this and plots using increments of 1. What I'm not sure of is how to fix it. Any help would be appreciated.
I would like to create a Student's t distribution density plot with a mean of 0.02 instead of 0. is that possible to do?
the distribtion should have 2 degrees of freedom.
is this possible to do?
I tried the following:
X<-rnorm(100000,mean=0.02, sd=(1/sqrt(878)))
pop.mean<-mean(X)
t<-sapply(1:10000, function(x) (mean(sample(X,100))-pop.mean)/(1/sqrt(878)))
plot(density(t))
Is this approach correct?
If it is correct, how can I get the real densities, not just the approximation?
Your statement and example contradict each other somewhat.
Do you want a non-central t distribution which is based on a normal with mean 0.02? This is what your example suggests, but note that the non-central t is not just a shifted t, it is now skewed.
If you want the non-central t then you can plot it with a command like:
curve(dt(x,2,0.02), from=-5, to=6)
Or, do you want a shifted t distribution? A distribution that is symmetric around 0.02 with the shape of a t distribution?
You can plot the curve shifted by using a command like:
curve(dt(x-0.02,2), from=-5, to=6 )
The curve function has an add argument that you could use to plot both on the same plot if you want to compare them (not much difference in this case), changing the color on one of them would be suggested.
I have two data sets that I am comparing using a ked2d contour plot on a log10 scale,
Here I will use an example of the following data sets,
b<-log10(rgamma(1000,6,3))
a<-log10((rweibull(1000,8,2)))
density<-kde2d(a,b,n=100)
filled.contour(density,color.palette=colorRampPalette(c('white','blue','yellow','red','darkred')))
This produces the following plot,
Now my question is what does the z values on the legend actually mean? I know it represents where most the data lies but 0-15 confuses me. I thought it could be a percentage but without the log10 scale I have values ranging from 0-1? And I have also produced plots with scales 1-1.2, 1-2 using my real data.
The colors represent the the values of the estimated density function ranging from 0 to 15 apparently. Just like with your other question about the odd looking linear regression I can relate to your confusion.
You just have to understand that a density's integral over the full domain has to be 1, so you can use it to calculate the probability of an observation falling into a specific region.
I know that you can adjust the scale of the x and y axes to change the geometric angle of a regression line. For example, if you plotted a regression line with slope of b=0.3, perhaps the default settings of axes length etc. would create a regression angle of 35 degrees.
If you adjust the axes, you will change the angle the regression line makes with the x-axis so that it is greater or less than 35 degrees-WITHOUT changing the mathematical value of the slope--it will still stay as b=0.3.
What systematic equation/set of equations is there that allows me to know how the geometric angle of the regression line will be changed as I change the axes of the graph itself?
I have spent a lot of time on the internet looking for the answer to this and have not yet succeeded. For some reason statistics and geometry do not overlap much.
Refer to this web page: http://www.mathworks.in/help/matlab/ref/axis.html
Based on the data you have, set the same ranges for all the axes in your plot. Then the regression line would have the same angle for both the datasets.
Hope this helps!