How to calculate the area under a frequency polygon in R? - r

I am an struggeling to find a way of calculating the area under a frequency polygon like this:
x1 <- 1:5
y1 <- c(0.2, 0.14, 0.7, 0.11, 0.1)
plot(x1, y1, type = "l", lwd = 3)
polygon(c(1, x1, 5), c(0, y1, 0), col = "red")
points(x1, y1,cex = 2,pch = 15)
segments(x1, 0, x1, y1)
so basically the area of the red zone from 1 to 5..
any suggestion would be much appreciated!!
Many thanks

If the bottom side of your polygons are always 0 and left and right sides are vertical, then it's nothing but summing areas of a set of trapezoids:
h = x1[2:5]-x1[1:4] # heights
b1 = y1[1:4] #first bases
b2 = y1[2:5] #second bases
ta = h*(b1+b2)/2 # trapezoids
pa = sum # total area
Of course you can generalize indices to make it work with arrays of different lengthes.

Related

How to plot orthogonal vectors in basic R plot()?

I am trying to plot the vectors [0.7, 0.7] and [0,7, -0.7] in a way that it is visually obvious that they are orthogonal.
Since R plots points (not vectors) the origin of the vectors will be cut-off unless I adjust the x-axis to include the origin:
dat <- cbind(c(.7,.7),c(.7,-.7))
plot(dat, main = "data", xlim=c(0,.8), xlab=NA, ylab=NA, type ="n")
arrows(x0 = 0, y0 = 0, x1 = dat[1,1], y1 = dat[2,1], lwd = 5, col="purple")
arrows(x0 = 0, y0 = 0, x1 = dat[1,2], y1 = dat[2,2], lwd = 5, col="orange")
But on top of it I have different spacings between ticks in the x and y axis distorting the geometry of the vectors:
To prove some attempt at solving this issue, I resorted unsuccessfully to plotting the axes after the plot:
plot(dat, axes = FALSE)
axis(side = 1, at = seq(0,0.8, 0.01))
axis(side = 2, at = seq(-.8,.8,0.05))
arrows(x0 = 0, y0 = 0, x1 = dat[1,1], y1 = dat[2,1], lwd = 5, col="purple")
arrows(x0 = 0, y0 = 0, x1 = dat[1,2], y1 = dat[2,2], lwd = 5, col="orange")
... not a pretty picture.
Specify the asp argument, which determines y/x aspect ratio.
dat <- cbind(c(.7,.7),c(.7,-.7))
plot(dat, main = "data", xlim=c(0,.8), xlab=NA, ylab=NA, type ="n", asp=1)
arrows(x0 = 0, y0 = 0, x1 = dat[1,1], y1 = dat[2,1], lwd = 5, col="purple")
arrows(x0 = 0, y0 = 0, x1 = dat[1,2], y1 = dat[2,2], lwd = 5, col="orange")
You can find details on this argument from:
?plot.window
asp: If asp is a finite positive value then the window is set up so
that one data unit in the x direction is equal in length to asp * one
data unit in the y direction.
Note that in this case, par("usr") is no longer determined by, e.g.,
par("xaxs"), but rather by asp and the device's aspect ratio. (See
what happens if you interactively resize the plot device after running
the example below!)
The special case asp == 1 produces plots where distances between
points are represented accurately on screen. Values with asp > 1 can
be used to produce more accurate maps when using latitude and
longitude.
try setting limits:
xlim = c(-.1, 1)
ylim = c(-.8, .8)
This will draw the full extent of the space your vectors are described by. If your goal is to constrain the proportions as well you can change the scope of the limits and not fill the whole space, but preserve the proportions for both axes
xlim = c(-1, 1)
ylim = c(-1, 1)

How to fill area between curves with Plots.jl?

Suppose I have a curve y, and two other curves u and l in the form of vectors. How to plot:
plot(y, lab="estimate")
plot!(y-l, lab="lower bound")
plot!(y+u, lab="upper bound")
That is, an asymmetric confidence interval? I know how to plot the symmetric case with the option ribbon as explained here.
The current answers are NOT correct. Here are two ways that are correct (as of v1.10.1 of Plots.jl):
Method 1: Using fillrange
plot(x, l, fillrange = u, fillalpha = 0.35, c = 1, label = "Confidence band")
Method 2: Using ribbon
plot(x, (l .+ u) ./ 2, ribbon = (l .- u) ./ 2, fillalpha = 0.35, c = 1, label = "Confidence band")
(Here, l and u denote the the "lower" and "upper" y values, respectively, and x denotes their common x values.) The key difference between these two methods is that fillrange shades the region between l and u, while the ribbon argument is a radius, i.e. half the width of the ribbon (or in other words, the vertical deviation from the midpoints).
Example using fillrange:
x = collect(range(0, 2, length= 100))
y1 = exp.(x)
y2 = exp.(1.3 .* x)
plot(x, y1, fillrange = y2, fillalpha = 0.35, c = 1, label = "Confidence band", legend = :topleft)
Let's scatter y1 and y2 on top of the plot, just to make sure we're filling in the right region.
plot!(x,y1, line = :scatter, msw = 0, ms = 2.5, label = "Lower bound")
plot!(x,y2, line = :scatter, msw = 0, ms = 2.5, label = "Upper bound")
Result:
Example using ribbon:
mid = (y1 .+ y2) ./ 2 #the midpoints (usually representing mean values)
w = (y2 .- y1) ./ 2 #the vertical deviation around the means
plot(x, mid, ribbon = w , fillalpha = 0.35, c = 1, lw = 2, legend = :topleft, label = "Mean")
plot!(x,y1, line = :scatter, msw = 0, ms = 2.5, label = "Lower bound")
plot!(x,y2, line = :scatter, msw = 0, ms = 2.5, label = "Upper bound")
(Here, x, y1, and y2 are the same as before.)
Result:
Notice that the labels for ribbon and fillrange are different in the legends: the former labels the midpoints/means, while the latter labels the shaded region itself.
Some additional comments:
The OP's answer of plot(y, ribbon=(l,u), lab="estimate") is not correct (at least for Plots v1.10.1.). I realize this thread is over 3 years old, so perhaps it worked in the earlier version of Plots.jl that the OP was using at the time)
Similar to one of the answers given,
plot(x, [mid mid], fillrange=[mid .- w, mid .+ w], fillalpha=0.35, c = [1 4], label = ["Band 1" "Band 2"], legend = :topleft, dpi = 80)
will work, but this actually creates TWO ribbons (and hence, two icons in the legend) which may or may not be what the OP was looking for. To illustrate the point:
It turns out that the option ribbon accepts both lower and upper bounds:
plot(y, ribbon=(l,u), lab="estimate")
Notice that by passing l and u in the ribbon option, the filled area will correspond to the region between y-l and y+u. In other words, l and u should be the "deviations" from the mean curve y.
Something like this? (seen here).
plot([y y], fillrange=[y.-l y.+u], fillalpha=0.3, c=:orange)
plot!(y)
The fillrange solution in #leonidas 's answer might bring an additional boundary line (at least in Plots v1.35). To remove such a line, a workaround is to specify linealpha = 0, that is,
plot(x, l, fillrange = u, fillalpha = 0.35, c = 1, label = "Confidence band", linealpha = 0)

ggplot - altering the height of each overlapping variable on a density plot

I'm quite new to R and ggplot2 so apologies if this is an obvious question, but I've searched around and can't find anything about this exact issue
I have a ggplot density plot for 6 variables on the same plot, overlapping. What I am trying to do is to change the maximum height of each variable to be a certain value without changing the distribution. e.g. :
variable_1 - 1, //on Y axis
variable_2 - 0.5 etc.
This way I can get an idea of the distribution (across the x axis) whilst also showing a second independent parameter through the y axis
Is this possible at all?
Yes this is possible although I wouldn't recommend it. What you can do is just divide the distribution by it's maximum and then multiply with the target height.
# some example data:
x = seq(-5, 5, .1)
y1 = dnorm(x)
y2 = dnorm(x, .5, .2)
Y = cbind(y1, y2)
matplot(x, Y, type = 'l', bty = 'n', lty = 1, las = 1)
# now I want the red line to be max 1
# and the black line to be mack .5
y1 = .5*y1 / max(y1)
y2 = 1*y2 / max(y2)
Y = cbind(y1, y2)
matplot(x, Y, type = 'l', bty = 'n', lty = 1, las = 1)
The important part here is that I used two different transformations for y1 and y2. The consequence is that in the second figure the distributions cannot be compared anymore. You can avoid this by only applying the same transformation to all distributions.

How to draw the curves in an energy diagram in R?

I wrote following R script:
#energy diagram
x <- c(0.1, 0.3, 0.5, 0.7, 0.9 ) #chosen randomly, reaction axis
y <- c(-5.057920, -5.057859, -5.057887,-5.057674, -5.057919 ) #energy of the educt, intermediate, transtition states and product
plot(x,y, type="p",
xlim=c(0,1),
ylim=c(-5.058,-5.0575),
xlab="reaction axis",
ylab=expression(paste(E[el] ," / ",10^6," ",kJ/mol)),
xaxt="n" #hide x-axis
)
#h- and v-lines, so i can draw curves by hand
abline(v=seq(0,1,0.1),h=seq(-5.0600,-5.0500,0.00005),col="black",lty=1,lwd=1)
abline(h=c(-5.057920, -5.057859, -5.057887,-5.057674), col="blue", lty=1,lwd=0.7)
Is it possible to draw a curve through the points that would look like a energy diagram. An example of an energy diagram is here:
A lot could be done to streamline / vectorize this code, but for a smallish diagram this works pretty well:
# get that data
x <- c(0.1, 0.3, 0.5, 0.7, 0.9 ) # reaction axis
y <- c(-5.057920, -5.057859, -5.057887,-5.057674, -5.057919 ) # energies
I'm going to make a little Bezier curve to connect each point to the next---this way we can make sure the smooth line passes through the data, not just close to it. I'll give each point a single 'control point' to define the slope. By using the same y-values for a point and it's control point, the slope at the point will be 0. I'll call the offset between the point and the control point delta. We'll start with one point-pair:
library(Hmisc)
delta = 0.15
bezx = c(0.1, 0.1 + delta, 0.3 - delta, 0.3)
bezy = rep(y[1:2], each = 2)
plot(bezx, bezy, type = 'b', col = "gray80")
lines(bezier(bezx, bezy), lwd = 2, col = "firebrick4")
Here I plotted the points and control points in gray, and the smooth line in red so we can see what's going on.
It looks promising, let's turn it into a function that we can apply to each pair of points:
bezf = function(x1, x2, y1, y2, delta = 0.15) {
bezier(x = c(x1, x1 + delta, x2 - delta, x2), y = c(y1, y1, y2, y2))
}
You can play with the delta parameter, I think 0.1 looks pretty good.
plot(x, y, xlab = "Reaction coordinate", ylab = "E", axes = F)
box(bty = "L")
axis(side = 2)
for(i in 1:(length(x) - 1)) {
lines(bezf(x1 = x[i], x2 = x[i + 1], y1 = y[i], y2 = y[i + 1], delta = 0.1))
}
You can of course tweak the plot, add labels, and ablines as in your original. (Use my for loop with the lines command to draw only the smoothed lines.) I left the points on to show that we are passing through them, not just getting close.
I prefer plotting in ggplot2, if you do too you'll need to extract the data into a data.frame:
bezlist = list()
for (i in 1:(length(x) - 1)) {
bezlist[[i]] = bezf(x1 = x[i], x2 = x[i + 1], y1 = y[i], y2 = y[i + 1], delta = 0.1)
}
xx = unlist(lapply(bezlist, FUN = '[', 'y'))
yy = unlist(lapply(bezlist, FUN = '[', 'y'))
bezdat = data.frame(react = xx, E = yy)
library(ggplot2)
ggplot(bezdat, aes(x = react, y = E)) +
geom_line() +
labs(x = "Reaction coordinate")
You could use a spline fit. Define some points along the energy diagram, and then fit to them using a spline function. The more points that you provide, the better that your fit will be. You can check out the smooth.splines function in the stats package for one implementation of the spline fit.

R - Plotting two bivariate normals in 3d and their contours respectively

I have been playing around with the MASS package and can plot the two bivariate normal simply using image and par(new=TRUE) for example:
# lets first simulate a bivariate normal sample
library(MASS)
bivn <- mvrnorm(1000, mu = c(0, 0), Sigma = matrix(c(1, .5, .5, 1), 2))
bivn2 <- mvrnorm(1000, mu = c(0, 0), Sigma = matrix(c(1.5, 1.5, 1.5, 1.5), 2))
# now we do a kernel density estimate
bivn.kde <- kde2d(bivn[,1], bivn[,2], n = 50)
bivn.kde2 <- kde2d(bivn2[,1], bivn[,2], n = 50)
# fancy perspective
persp(bivn.kde, phi = 45, theta = 30, shade = .1, border = NA)
par(new=TRUE)
persp(bivn.kde2, phi = 45, theta = 30, shade = .1, border = NA)
Which doesn't look very good, I guess I have to just play around with the axis and stuff.
But if I try a similar approach with the contour the plots do not overlap. They are simply replaced:
# fancy contour with image
image(bivn.kde); contour(bivn.kde, add = T)
par(new=TRUE)
image(bivn.kde2); contour(bivn.kde, add = T)
Is this the best approach to what I want or am I just making it hard on myself? Any suggestions are welcome. Thank you!
Perhaps you can use rgl library. It allows you to create interactive 3d plots.
require(rgl)
col1 <- rainbow(length(bivn.kde$z))[rank(bivn.kde$z)]
col2 <- heat.colors(length(bivn.kde2$z))[rank(bivn.kde2$z)]
persp3d(x=bivn.kde, col = col1)
with(bivn.kde2, surface3d(x,y,z, color = col2))
If you want to plot difference between two surfaces then you can do something like below.
res <- list(x = bivn.kde$x, y = bivn.kde$y, z = bivn.kde$z - bivn.kde2$z)
col3 <- heat.colors(length(res$z))[rank(res$z)]
persp3d(res, col = col3)

Resources