I have a probability density function in a plot called ph that i derived from two samples of data, by the help of a user of stackoverflow, in this way
few <-read.table('outcome.dat',head=TRUE)
many<-read.table('alldata.dat',head=TRUE)
mh <- hist(many$G,breaks=seq(0,1.,by=0.03), plot=FALSE)
fh <- hist(few$G, breaks=mh$breaks, plot=FALSE)
ph <- fh
ph$density <- fh$counts/(mh$counts+0.001)
plot(ph,freq=FALSE,col="blue")
I would like to fit the best curve of the plot of ph, but i can't find a working method.
how can i do this? I have to extract the vaule from ph and then works on they? or there is same function that works on
plot(ph,freq=FALSE,col="blue")
directly?
Assuming you mean that you want to perform a curve fit to the data in ph, then something along the lines of
nls(FUN, cbind(ph$counts, ph$mids),...) may work. You need to know what sort of function 'FUN' you think the histogram data should fit, e.g. normal distribution. Read the help file on nls() to learn how to set up starting "guess" values for the coefficients in FUN.
If you simply want to overlay a curve onto the histogram, then smoo<-spline(ph$mids,ph$counts);
lines(smoo$x,smoo$y)
will come close to doing that. You may have to adjust the x and/or y scaling.
Do you want a density function?
x = rnorm(1000)
hist(x, breaks = 30, freq = FALSE)
lines(density(x), col = "red")
Related
I made a histogram in R and I have to fit an exponential curve on it.
But the curve doesn't appear on the histogram.
This is the code:
hist(Adat$price, main="histogram",xlab="data")
curve(dexp(x, rate=1,log=FALSE), add = TRUE)
Could someone help me please?
You need to add set the argument freq=FALSE if you want the histogram to be normalized:
set.seed(32418)
sim <- rexp(100) + rnorm(100,0,.01)
hist(sim, freq=FALSE)
curve(dexp(x, rate=1, log=FALSE), add = TRUE)
Otherwise, the height of the bins will be a function of the number of samples. In fact, the curve technically did appear on your graph, it's just so small that you can't distinguish it from a flat line at y = 0.
So, I am challenged and request a little guidance.
I have used the rriskDistributions package to evaluate some CDFs for some industrial sector injury data with the get.lnorm.par() function. It fits the data great, unfortunately, the axes require swapping because my response variable is currently on the x-axis, and needs to be on the y-axis. Unfortunately again, the get.lnorm.par() function requires that the probabilities be only on the y-axis, and I cannot figure out how to create the same curve with swapped axes.
I want to get it to look something like this:
An example of the code that I have worked through in ggplot follows:
x <- c(0.0416988,0.0656371,0.1015444,0.1270270,0.1536680,0.1694981,0.2509653)
y <- c(3170221,6810103,14999840,26623982,48903587,74177290,266181110)
prob <- c(x) ## There are 389 different x values, but keeping it simple!
quant <- c(y) ## Same as x.
df1 <- data.frame(prob,quant)
plot2 <- ggplot(df1, aes(x=prob, y=quant)) + geom_point() +
geom_smooth(method="lm", formula= log(y)~x, se=FALSE) +
labs(y="quantiles", x="probabilities", title="Probs vs Quants")
plot2
I have created lines that fit this data, but everything ends at the last data point.
When I used get.lnorm.par(), the fit was great, but like stated previously, the axes require flipping. When I tried this, I continued to get errors about infinite output and could not define the bounds of the function to be plotted.
So, here is the code using the rriskDistributions package:
pct <- c(0.0416988,0.0656371,0.1015444,0.1270270,0.1536680,0.1694981,0.2509653)
my.lnorm<-get.lnorm.par(p=pct, q=c(3170221,6810103,14999840,26623982,48903587,74177290,266181110),
tol = 0.001, scaleX = c(0,0.0809))
Essentially, I am trying to create a fit curve for the data (either exponential or power) that expands, or predicts beyond the final data point. This I cannot figure out for the life of me, and changing any of the parameters in the rriskDistributions functions is quite challenging.
Any thoughts?
Thanks.
So I have a barplot in which the y axis is the log (frequencies). From just eyeing it, it appears that bars decrease exponentially, but I would like to know this for sure. What I want to do is also plot an exponential on this same graph. Thus, if my bars fall below the exponential, I would know that my bars to decrease either exponentially or faster than exponential, and if the bars lie on top of the exponential, I would know that they dont decrease exponentially. How do I plot an exponential on a bar graph?
Here is my graph if that helps:
If you're trying to fit density of an exponential function, you should probably plot density histogram (not frequency). See this question on how to plot distributions in R.
This is how I would do it.
x.gen <- rexp(1000, rate = 3)
hist(x.gen, prob = TRUE)
library(MASS)
x.est <- fitdistr(x.gen, "exponential")$estimate
curve(dexp(x, rate = x.est), add = TRUE, col = "red", lwd = 2)
One way of visually inspecting if two distributions are the same is with a Quantile-Quantile plot, or Q-Q plot for short. Typically this is done when inspecting if a distribution follows standard normal.
The basic idea is to plot your data, against some theoretical quantiles, and if it matches that distribution, you will see a straight line. For example:
x <- qnorm(seq(0,1,l=1002)) # Theoretical normal quantiles
x <- x[-c(1, length(x))] # Drop ends because they are -Inf and Inf
y <- rnorm(1000) # Actual data. 1000 points drawn from a normal distribution
l.1 <- lm(sort(y)~sort(x))
qqplot(x, y, xlab="Theoretical Quantiles", ylab="Actual Quantiles")
abline(coef(l.1)[1], coef(l.1)[2])
Under perfect conditions you should see a straight line when plotting the theoretical quantiles against your data. So you can do the same plotting your data against the exponential function you think it will follow.
I have a simple data set with two columns of data- K and SwStr.
K = c(.259, .215, .224, .223, .262, .233)
SwStr = c(.130, .117, .117, .114, .113, .111)
I plotted the data using:
plot(res$K, res$SwStr)
I want to plot the result of a linear model, using SwStr to predict K. I try to do that using:
graphic<-lm(K~SwStr-1, data=res)
P=predict(graphic)
plot(res$K, res$SwStr)
lines(P, lty="dashed", col="green", lwd=3)
But when I do this, I don't get any line plotted. What am I doing wrong?
(1) You are inverting the axes of the original plot. If you want SwStr on the x axis and K on the y axis you need
plot(res$SwStr, res$K)
or
with(res,plot(K~SwStr))
If you check the actual values of the plotted points on the graph, this might be obvious (especially if K and SwStr have different magnitudes) ...
For lm fits you can also use abline(graphic,...)
edit: (2) You also have to realize that predict gives just the predicted y values, not the x values. So you want something like this:
K=c(.259, .215, .224, .223, .262, .233)
SwStr=c(.130, .117, .117, .114, .113, .111)
g <- lm(K~SwStr-1)
par(las=1,bty="l") ## my favourites
plot(K~SwStr)
P <- predict(g)
lines(SwStr,P)
Depending on the situation, you may also want to use the newdata argument to predict to specify a set of evenly spaced x values ...
I have two density curves plotted using this:
Network <- Mydf$Networks
quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)
I'd like to compute the area under each curve for a given Avg.Position range. Sort of like pnorm for the normal curve. Any ideas?
Calculate the density seperately and plot that one to start with. Then you can use basic arithmetics to get the estimate. An integration is approximated by adding together the area of a set of little squares. I use the mean method for that. the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. I use the rollmeans function in the zoo package, but this can be done using the base package too.
require(zoo)
X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1
# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)
This gives you a good approximation up to 3 digits behind the decimal sign. If you know the density function, take a look at ?integrate
Three possibilities:
The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result.
You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation.
Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data.