This question already has answers here:
Shading a kernel density plot between two points.
(5 answers)
Closed 6 years ago.
I'm trying to shade an area under a curve in R. I can't quite get it right and I'm not sure why. The curve is defined by
# Define the Mean and Stdev
mean=1152
sd=84
# Create x and y to be plotted
# x is a sequence of numbers shifted to the mean with the width of sd.
# The sequence x includes enough values to show +/-3.5 standard deviations in the data set.
# y is a normal distribution for x
x <- seq(-3.5,3.5,length=100)*sd + mean
y <- dnorm(x,mean,sd)
The plot is
# Plot x vs. y as a line graph
plot(x, y, type="l")
The code I'm using to try to color under the curve where x >= 1250 is
polygon(c( x[x>=1250], max(x) ), c(y[x==max(x)], y[x>=1250] ), col="red")
but here's the result I'm getting
How can I correctly color the portion under the curve where x >= 1250
You need to follow the x,y points of the curve with the polygon, then return along the x-axis (from the maximum x value to the point at x=1250, y=0) to complete the shape. The final vertical edge is drawn automatically, because polygon closes the shape by returning to its start point.
polygon(c(x[x>=1250], max(x), 1250), c(y[x>=1250], 0, 0), col="red")
If, rather than dropping the shading all the way down to the x-axis, you prefer to have it at the level of the curve, then you can use the following instead. Although, in the example given, the curve drops almost to the x-axis, so its hard to see the difference visually.
polygon(c(x[x>=1250], 1250), c(y[x>=1250], y[x==max(x)]), col="red")
Related
I'm trying to draw curves and I need to have it crossing the x and y axis with where it crossing shown clearly on the graph.
Im using the code
curve(x^2-8*x+16, -2, 10)
this is making the curve I want but not showing the curve clearly crossing the axis
I have used Desmos to show what im aiming to create.
I am new to R, hopefully someone can help me. I am trying to figure out how to plot a very simple graph (with plot()):
y-axis should be ay <- c(-1,1) and x-axis should be ax <- c(0:6). I have a vector v1<-c(0,-0.1,-0.3,-0.6,-0.2,-0.4, 0.2), that gives the slopes for each segment on the x-axis (i.e. between 0 and 1, the slope is -0.1, moving from 1 to 2, the slope is -0.3, and so on.).
I simply need to draw a line that goes from 0 to 6 on the x-axis with slopes between the segments given by v2.
Additionally, there should be a separate straight line with slope -0.5 starting from 0, i.e. abline(0,-0.5) in the same figure.
This should be something extremely simple, but I just can't get it right. Thanks in advance!
How about this one?
x <- 0:6
v1 <- c(0,-0.1,-0.3,-0.6,-0.2,-0.4, 0.2)
Note that the slopes are given by the difference in y-values over the difference in x-values. Since the difference in x-values are always 1 for each successive entry in v1, it essentially contains the difference in y-values. Taking the first entry in v1 to be y(x=0) (see note below), cumsum(v1) gives you the y-values you need to give to plot.
y <- cumsum(v1)
plot(x, y, type="l")
Note that there are seven slope-values in v1 but only six differences from 0 to 6. If you count, that is 0-1, 1-2, 2-3, 3-4, 4-5, and 5-6, which gives six differences. If the first entry is supposed to be a slope, then the range of x needs to be reconsidered.
I have not fully understood your problem. In my opinion the range on the x axis should be (0, 6).
Anyway, see the code below. Hope it can help you.
v1 <- c(-0.1,-0.3,-0.6,-0.2,-0.4, 0.2)
plot(0:6, 1+c(0,cumsum(v1)), type="o", ylim=c(-1,1))
abline(v=0:6, h=seq(-1,1,by=0.1), col="gray", lty=3)
I was wondering how I can efficiently (using short R code) fill a curve with points that can fill up the area under my curve?
I have tried something without success, here is my R code:
data = rnorm(1000) ## random data points to fill the curve
curve(dnorm(x), -4, 4) ## curve to be filled by "data" above
points(data) ## plotting the points to fill the curve
Here's a method that uses interpolation to ensure that the plotted points won't exceed the height of the curve (although, if you want the actual point markers to not stick out above the curve, you'll need to set the threshold slightly below the height of the curve):
# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
A method for plotting exactly a desired number of points under a curve
Responding to #d.b's comment, here's a way to get exactly a desired number of points plotted under a curve:
First, let's figure out how many random points we need to generate over the entire plot region in order to get (roughly) a target number of points under the curve. We do this as follows:
Calculate the area under the curve as a fraction of the area of the rectangle bounded by zero and the maximum height of the curve on the vertical axis, and by the width of the curve on the horizontal axis.
The number of random points we need to generate is the target number of points, divided by the area ratio calculated above.
# Area ratio
aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))
# Target number of points under curve
n.target = 1000
# Number of random points to generate
n = ceiling(n.target/aa)
But we need more points than this to ensure we get at least n.target, because random variation will result in fewer than n.target points about half the time, once we limit the plotted points to those below the curve. So we'll add an excess.factor in order to generate more points under the curve than we need, then we'll just randomly select n.target of those points to plot. Here's a function that takes care of the entire process for a general curve.
# Plot a specified number of points under a curve
pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {
# Area under curve as fraction of area of plot region
aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))
# Number of random points to generate
n = excess.factor*ceiling(n.target/aa)
# Generate n random points in x-range of the data and with y value between
# zero and the maximum y-value of the curve
pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]
# Randomly select only n.target points
pts = pts[sample(1:nrow(pts), n.target), ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
}
Let's run the function for the original curve:
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
pts.under.curve(c.pts)
Now let's test it with a different distribution:
# Curve to be filled
c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001))
pts.under.curve(c.pts, n.target=200)
n_points = 10000 #A large number
#Store curve in a variable and plot
cc = curve(dnorm(x), -4, 4, n = n_points)
#Generate 1000 random points
p = data.frame(x = seq(-4,4,length.out = n_points), y = rnorm(n = n_points))
#OR p = data.frame(x = runif(n_points,-4,4), y = rnorm(n = n_points))
#Find out the index of values in cc$x closest to p$x
p$ind = findInterval(p$x, cc$x)
#Only retain those points within the curve whose p$y are smaller than cc$y
p2 = p[p$y >= 0 & p$y < cc$y[p$ind],] #may need p[p$y < 0.90 * cc$y[p$ind],] or something
#Plot points
points(p2$x, p2$y)
I have an algorithm that uses an x,y plot of sorted y data to produce an ogive.
I then derive the area under the curve to derive %'s.
I'd like to do something similar using kernel density estimation. I like how the upper/lower bounds are smoothed out using kernel densities (i.e. the min and max will extend slightly beyond my hard coded input).
Either way... I was wondering if there is a way to treat an ogive as a type of cumulative distribution function and/or use kernel density estimation to derive a cumulative distribution function given y data?
I apologize if this is a confusing question. I know there is a way to derive a cumulative frequency graph (i.e. ogive). However, I can't determine how to derive a % given this cumulative frequency graph.
What I don't want is an ecdf. I know how to do that, and I am not quite trying to capture an ecdf. But, rather integration of an ogive given two intervals.
I'm not exactly sure what you have in mind, but here's a way to calculate the area under the curve for a kernel density estimate (or more generally for any case where you have the y values at equally spaced x-values (though you can, of course, generalize to variable x intervals as well)):
library(zoo)
# Kernel density estimate
# Set n to higher value to get a finer grid
set.seed(67839)
dens = density(c(rnorm(500,5,2),rnorm(200,20,3)), n=2^5)
# How to extract the x and y values of the density estimate
#dens$y
#dens$x
# x interval
dx = median(diff(dens$x))
# mean height for each pair of y values
h = rollmean(dens$y, 2)
# Area under curve
sum(h*dx) # 1.000943
# Cumulative area
# cumsum(h*dx)
# Plot density, showing points at which density is calculated
plot(dens)
abline(v=dens$x, col="#FF000060", lty="11")
# Plot cumulative area under curve, showing mid-point of each x-interval
plot(dens$x[-length(dens$x)] + 0.5*dx, cumsum(h*dx), type="l")
abline(v=dens$x[-length(dens$x)] + 0.5*dx, col="#FF000060", lty="11")
UPDATE to include ecdf function
To address your comments, look at the two plots below. The first is the empirical cumulative distribution function (ECDF) of the mixture of normal distributions that I used above. Note that the plot of this data looks the same below as it does above. The second is a plot of the ECDF of a plain vanilla normal distribution, mean=0, sd=1.
set.seed(67839)
x = c(rnorm(500,5,2),rnorm(200,20,3))
plot(ecdf(x), do.points=FALSE)
plot(ecdf(rnorm(1000)))
I have the following code, in R.
x = c(rep(2,10),rep(4,10))
y1 = c(5.1,3,4.2,4.1,4.8,4.0,5,4.15,3,4.5)
y2 = c(9.1,8,9.2,8.2,7,9.5,8.8,9.3,10,10.4)
y = c(y1,y2)
plot(x,y,pch=16,cex=0.9,xlim=c(0,6),ylim=c(0,13))
This code produces a plot with two bands of dots. I've overlayed normal curves sideways on those bands using powerpoint. How can I do this in R (drawing the sideways normal curves), using the actual means and sd values? NOTE: I repeat, the normal curves are not part of the plot. The code above just produces the raw plot.
First, calculate mean and standard deviation for y1 and y2.
m1<-mean(y1)
s1<-sd(y1)
m2<-mean(y2)
s2<-sd(y2)
Then made two data frame (for convenience) that contains y values as sequence of numbers (wider than actual y1 and y2 values). Then calculated density values for x using dnorm() and calculated mean and standard deviation values. Then added 2 or 4 to shift values to desired position.
df1<-data.frame(yval=seq(1,7,0.1),xval=(dnorm(seq(1,7,0.1),m1,s1)+2))
df2<-data.frame(yval=seq(6,12,0.1),xval=(dnorm(seq(6,12,0.1),m2,s2)+4))
Added density lines with function lines().
plot(x,y,pch=16,cex=0.9,xlim=c(0,6),ylim=c(0,13))
with(df1,lines(xval,yval))
with(df2,lines(xval,yval))