So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.
Related
Why I did get lines instead of standard bubbles in my q-q plot?
My code:
data <- read.csv("C:\\Users\\anton\\SanFrancisco.csv")
x <- data$ï..San.Francisco
head(x)
library("fitdistrplus")
fitnor <- fitdist(x, "norm")
fitlogis <- fitdist(x, "logis")
qqcomp(list(fitnor, fitlogis), legendtext=c("Normal", "Logistic"))
From the documentation for qqcomp - get to it by ?qqcomp.
qqcomp provides a plot of the quantiles of each theoretical
distribution (x-axis) against the empirical quantiles of the data
(y-axis), by default defining probability points as (1:n - 0.5)/n for
theoretical quantile calculation (data are assumed continuous). For
large dataset (n > 1e4), lines are drawn instead of points and
customized with the fitpch parameter.
This is a design feature. Your data must have more than 10000 values. If that is the case, the bubbles on the q-q plot would be difficulty to individually distinguish. Additionally, they are large enough that the bubbles for one model would cover those for the other.
I made a histogram in R and I have to fit an exponential curve on it.
But the curve doesn't appear on the histogram.
This is the code:
hist(Adat$price, main="histogram",xlab="data")
curve(dexp(x, rate=1,log=FALSE), add = TRUE)
Could someone help me please?
You need to add set the argument freq=FALSE if you want the histogram to be normalized:
set.seed(32418)
sim <- rexp(100) + rnorm(100,0,.01)
hist(sim, freq=FALSE)
curve(dexp(x, rate=1, log=FALSE), add = TRUE)
Otherwise, the height of the bins will be a function of the number of samples. In fact, the curve technically did appear on your graph, it's just so small that you can't distinguish it from a flat line at y = 0.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
ggplot2: Overlay histogram with density curve
sorry for what is probably a simple question, but I have a bit of a problem.
I have created a histogram that is based on a binomial distribution with mean=0.65 and sd=0.015 with 10000 samples. The histogram itself looks fine. However, I need to overlay a normal distribution on top of this (with the same mean and standard deviation). Currently, I have the following:
qplot(x, data=prob, geom="histogram", binwidth=.05) + stat_function(geom="line", fun=dnorm, arg=list(mean=0.65, sd=0.015))
A distribution shows up, but it is TINY. This is likely because the mean's count goes up to almost 2,000, while the normal distribution is much smaller. Simply put, it is not fitted with the data the way that R automatically would do. Is there a way to specify the line of the normal distribution to fit the histogram, or is there some way to manipulate the histogram to fit the normal distribution?
Thanks in advance.
"The distribution is tiny" because you are plotting a density function over counts. You should use the same metric in both plot, eg.:
I try to generate some data for your example:
x <- rbinom(10000, 10, 0.15)
prob <- data.frame(x=x/(mean(x)/0.65))
And plot both as density functions:
library(ggplot2)
ggplot(prob, aes(x=x)) + geom_histogram(aes(y = ..density..), binwidth=.05) + stat_function(geom="line", fun=dnorm, arg=list(mean=0.65, sd=0.015))
#daroczig's answer is correct about needing to be consistent in plotting densities rather than counts, but: I'm having trouble seeing how you managed to get a binomial sample with those properties. In particular, the mean of the binomial is n*p, the variance is n*p*(1-p), the standard deviation is sqrt(n*p*(1-p)), so ..
b.m <- 0.65
b.sd <- 0.015
Calculate variance:
b.v <- b.sd^2 ## n*p*(1-p)
Calculate p:
## (1-p) = b.v/(n*p) = b.v/b.m
## p = 1-b.v/b.m
b.p <- 1-b.v/b.m
Calculate n:
## n = n*p/p = b.m/b.p
b.n <- b.m/b.p
This gives n=0.6502251, p=0.9996538 -- so I don't see how you can get this binomial distribution without n<1, unless I messed up the algebra ...
I have a probability density function in a plot called ph that i derived from two samples of data, by the help of a user of stackoverflow, in this way
few <-read.table('outcome.dat',head=TRUE)
many<-read.table('alldata.dat',head=TRUE)
mh <- hist(many$G,breaks=seq(0,1.,by=0.03), plot=FALSE)
fh <- hist(few$G, breaks=mh$breaks, plot=FALSE)
ph <- fh
ph$density <- fh$counts/(mh$counts+0.001)
plot(ph,freq=FALSE,col="blue")
I would like to fit the best curve of the plot of ph, but i can't find a working method.
how can i do this? I have to extract the vaule from ph and then works on they? or there is same function that works on
plot(ph,freq=FALSE,col="blue")
directly?
Assuming you mean that you want to perform a curve fit to the data in ph, then something along the lines of
nls(FUN, cbind(ph$counts, ph$mids),...) may work. You need to know what sort of function 'FUN' you think the histogram data should fit, e.g. normal distribution. Read the help file on nls() to learn how to set up starting "guess" values for the coefficients in FUN.
If you simply want to overlay a curve onto the histogram, then smoo<-spline(ph$mids,ph$counts);
lines(smoo$x,smoo$y)
will come close to doing that. You may have to adjust the x and/or y scaling.
Do you want a density function?
x = rnorm(1000)
hist(x, breaks = 30, freq = FALSE)
lines(density(x), col = "red")
I have two density curves plotted using this:
Network <- Mydf$Networks
quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)
I'd like to compute the area under each curve for a given Avg.Position range. Sort of like pnorm for the normal curve. Any ideas?
Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.
require(zoo)
X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1
# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)
This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at ?integrate
Three possibilities:
The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.
You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.
Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.