Statistical distribution plot with shaded rejection areas - r

I found this R code online:
stdDev <- 0.75;
x <- seq(-5,5,by=0.01)
y <- dnorm(x,sd=stdDev)
right <- qnorm(0.95,sd=stdDev)
plot(x,y,type="l",xaxt="n",ylab="p",
xlab=expression(paste('Assumed Distribution of ',bar(x))),
axes=FALSE,ylim=c(0,max(y)*1.05),xlim=c(min(x),max(x)),
frame.plot=FALSE)
axis(1,at=c(-5,right,0,5),
pos = c(0,0),
labels=c(expression(' '),expression(bar(x)[cr]),expression(mu[0]),expression('')))
axis(2)
xReject <- seq(right,5,by=0.01)
yReject <- dnorm(xReject,sd=stdDev)
polygon(c(xReject,xReject[length(xReject)],xReject[1]),
c(yReject,0, 0), col='red')
It is doing what I need, which is plotting the normal distribution, and shading a right rejection area according to some number (0.95). What I want to ask is:
How can I change this code to shade a two sided rejection area?
How do I change it for a left side one sided area?
And assume that I want a chi square or F distribution instead, is it enough to just change the dnorm & qnorm commands accordingly?
Another question: In this plot, the plot itself is higher than the y-axis. How do I fix it that the axis matches the height of the plot?
Thank you!

You can start with a polygon covering the whole area under the curve and removing the part that is not rejected:
## Calculate the 5th percentile
left <- qnorm(0.05,sd=stdDev)
## x and y for the whole area
xReject <- c(seq(-5,5,by=0.01))
yReject <- dnorm(xReject,sd=stdDev)
## set y = 0 for the area that is not rejected
yReject[xReject > left & xReject < right] <- 0
## Plot the red areas
polygon(c(xReject,xReject[length(xReject)],xReject[1]),
c(yReject,0, 0), col='red')
As before but set to zero the not rejected areas
yReject[xReject > left] <- 0
Almost. For example for the chi squared distribution you have to give the df (degrees of freedom and not sd). And also the xlim has to be changed. But apart from that the code would be the same.
The line axis(2) draws the y-axis. You can give some extra arguments to have it the way you want. You can try for example:
s <- seq(0,0.55,0.05)
axis(2, at = s, labels = s)
Hope it helps,
alex

Take the polygon calls which shade the right-side rejection area and repeat those lines, substituting the coordinates of the left-side area.
i think this will do it
left <- qnorm(0.05,sd=stdDev)
xLeject <- seq(left,-5,by=-0.01)
yLeject <- dnorm(xLeject,sd=stdDev)
polygon(c(xLeject,xLeject[length(xLeject)],xLeject[1]),
c(yLeject,0, 0), col='red')
As to graph extent, see plot(..., ylim=(lower,upper))

Related

In ggplot, how to draw a circle/disk with a line that divides its area according to a given ratio and colored points inside?

I want to visualize proportions using points inside a circle. For example, let's say that I have 100 points that I wish to scatter (somewhat randomly jittered) in a circle.
Next, I want to use this diagram to represent the proportions of people who voted Biden/Harris in 2020 US presidential elections, in each state.
Example #1 -- Michigan
Biden got 50.62% of Michigan's votes. I'm going to draw a horizontal diameter that splits the circle to two halves, and then color the points under the diameter in blue (Democrats' color).
Example #2 -- Wyoming
Unlike Michigan, in Wyoming Biden got only 26.55% of the votes, which is approximately a quarter of the vote. In this case I'd draw a horizontal chord that divides the circle such that the disk's area under the chord is 25% of the entire disk area. Then I'll color the respective points in that area in blue. Since I have 100 points in total, 25 points represent the 25% who voted Biden in Wyoming.
My question: How can I do this with ggplot? I researched this issue, and there's a lot of geometry going on here. First, the kind of area I'm talking about is called a "circular segment". Second, there are many formulas to calculate its area, if we know some other parameters about the shape (such as the radius length, etc.). See this nice demo.
However, my goal isn't to solve geometry problems, but just to represent proportions in a very specific way:
draw a circle
sprinkle X number of points inside
draw a (real or invisible) horizontal line that divides the circle/disk area according to a given proportion
ensure that the points are arranged respective to the split. That is, if we want to represent a 30%-70% split, then have 30% of the points under the line that divides the disk.
color the points under the line.
I understand that this is somewhat an exotic visualization, but I'll be thankful for any help with this.
EDIT
I've found a reference to a JavaScript package that does something very similar to what I'm asking.
I took a crack at this for fun. There's a lot more that could be done. I agree that this is not a great way to visualize proportions, but if it's engaging your audience ...
Formulas for determining appropriate heights are taken from Wikipedia. In particular we need the formulas
a/A = (theta - sin(theta))/(2*pi)
h = 1-cos(theta/2)
where a is the area of the segment; A is the whole area of the circle; theta is the angle described by the arc that defines the segment (see Wikipedia for pictures); and h is the height of the segment.
Machinery for finding heights.
afun <- function(x) (x-sin(x))/(2*pi)
## curve(afun, from=0, to = 2*pi)
find_a <- function(a) {
uniroot(
function(x) afun(x) -a,
interval=c(0, 2*pi))$root
}
find_h <- function(a) {
1- cos(find_a(a)/2)
}
vfind_h <- Vectorize(find_h)
## find_a(0.5)
## find_h(0.5)
## curve(vfind_h(x), from = 0, to= 1)
set up a circle
dd <- data.frame(x=0,y=0,r=1)
library(ggforce)
library(ggplot2); theme_set(theme_void())
gg0 <- ggplot(dd) + geom_circle(aes(x0=x,y0=y,r=r)) + coord_fixed()
finish
props <- c(0.2,0.5,0.3) ## proportions
n <- 100 ## number of points to scatter
cprop <- cumsum(props)[-length(props)]
h <- vfind_h(cprop)
set.seed(101)
r <- runif(n)
th <- runif(n, 0, 2 * pi)
dd <-
data.frame(x = sqrt(r) * cos(th),
y = sqrt(r) * sin(th))
dd2 <- data.frame(x=r*cos(2*pi*th), y = r*sin(2*pi*th))
dd2$g <- cut(dd2$y, c(1, 1-h, -1))
gg0 + geom_point(data=dd2, aes(x, y, colour = g), size=3)
There are a bunch of tweaks that would make this better (meaningful names for the categories; reverse the axis order to match the plot; maybe add segments delimiting the sections, or (more work) polygons so you can shade the sections.
You should definitely check this for mistakes — e.g. there are places where I may have used a set of values where I should have used their first differences, or vice versa (values vs cumulative sum). But this should get you started.

Selecting overlapping points on a plot

I have two matrices which are built as follows
x1=cbind(V1,V2,ID)
X2=cbind(V1,V2,ID)
X3=rbind(X1,X2)
ID takes only the values "red" and "blue"
when I plot X1 and X2 I have the following plot
I want to select the data points which are within 1 unit distance (euclidian distance) basically filtering only the red points which are overlapping or quasi-overlapping a blue point or vice versa.
Red overlapping red and blue overlapping blue is not interesting for me.
Thanks a lot for your assistance.
You definitely need to provide a reproducible example for this one to get the best answer; however, I think below script will serve the purpose:
library(spatstat)
# setting seeds
set.seed(222)
# two different point patterns
X <- runifpoint(15)
Y <- runifpoint(20)
plot(X, pch=19, main="")
plot(Y, col="red", pch=19, add=T)
#you can get both which and dist from nncross
#N.which <- nncross(X,Y, k=1:20, what="which")
#N.dist <- nncross(X,Y, k=1:20, what="dist")
out <- subset(X, nncross(X,Y, k=1:20, what="dist") < 0.1) # you may change 0.1
plot(out, col="blue", pch=19, add=T)
For the above plot, black points represent X and red points represent Y. Blue are intersecting points which are within 0.1 unit distance. This distance can be further modified. For more detaild please see spatstat to compute distances between two different datasets using nncross.

R: simple plot with varying slope by value of x axis

I am new to R, hopefully someone can help me. I am trying to figure out how to plot a very simple graph (with plot()):
y-axis should be ay <- c(-1,1) and x-axis should be ax <- c(0:6). I have a vector v1<-c(0,-0.1,-0.3,-0.6,-0.2,-0.4, 0.2), that gives the slopes for each segment on the x-axis (i.e. between 0 and 1, the slope is -0.1, moving from 1 to 2, the slope is -0.3, and so on.).
I simply need to draw a line that goes from 0 to 6 on the x-axis with slopes between the segments given by v2.
Additionally, there should be a separate straight line with slope -0.5 starting from 0, i.e. abline(0,-0.5) in the same figure.
This should be something extremely simple, but I just can't get it right. Thanks in advance!
How about this one?
x <- 0:6
v1 <- c(0,-0.1,-0.3,-0.6,-0.2,-0.4, 0.2)
Note that the slopes are given by the difference in y-values over the difference in x-values. Since the difference in x-values are always 1 for each successive entry in v1, it essentially contains the difference in y-values. Taking the first entry in v1 to be y(x=0) (see note below), cumsum(v1) gives you the y-values you need to give to plot.
y <- cumsum(v1)
plot(x, y, type="l")
Note that there are seven slope-values in v1 but only six differences from 0 to 6. If you count, that is 0-1, 1-2, 2-3, 3-4, 4-5, and 5-6, which gives six differences. If the first entry is supposed to be a slope, then the range of x needs to be reconsidered.
I have not fully understood your problem. In my opinion the range on the x axis should be (0, 6).
Anyway, see the code below. Hope it can help you.
v1 <- c(-0.1,-0.3,-0.6,-0.2,-0.4, 0.2)
plot(0:6, 1+c(0,cumsum(v1)), type="o", ylim=c(-1,1))
abline(v=0:6, h=seq(-1,1,by=0.1), col="gray", lty=3)

Filling a curve with points that fit under the curve in R plot

I was wondering how I can efficiently (using short R code) fill a curve with points that can fill up the area under my curve?
I have tried something without success, here is my R code:
data = rnorm(1000) ## random data points to fill the curve
curve(dnorm(x), -4, 4) ## curve to be filled by "data" above
points(data) ## plotting the points to fill the curve
Here's a method that uses interpolation to ensure that the plotted points won't exceed the height of the curve (although, if you want the actual point markers to not stick out above the curve, you'll need to set the threshold slightly below the height of the curve):
# Curve to be filled
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
# Generate 1000 random points in the same x-interval and with y value between
# zero and the maximum y-value of the curve
set.seed(2)
pts = data.frame(x=runif(1000,-4,4), y=runif(1000,0,max(c.pts$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(c.pts$x,c.pts$y,xout=pts$x)$y, ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
A method for plotting exactly a desired number of points under a curve
Responding to #d.b's comment, here's a way to get exactly a desired number of points plotted under a curve:
First, let's figure out how many random points we need to generate over the entire plot region in order to get (roughly) a target number of points under the curve. We do this as follows:
Calculate the area under the curve as a fraction of the area of the rectangle bounded by zero and the maximum height of the curve on the vertical axis, and by the width of the curve on the horizontal axis.
The number of random points we need to generate is the target number of points, divided by the area ratio calculated above.
# Area ratio
aa = sum(c.pts$y*median(diff(c.pts$x)))/(diff(c(-4,4))*max(c.pts$y))
# Target number of points under curve
n.target = 1000
# Number of random points to generate
n = ceiling(n.target/aa)
But we need more points than this to ensure we get at least n.target, because random variation will result in fewer than n.target points about half the time, once we limit the plotted points to those below the curve. So we'll add an excess.factor in order to generate more points under the curve than we need, then we'll just randomly select n.target of those points to plot. Here's a function that takes care of the entire process for a general curve.
# Plot a specified number of points under a curve
pts.under.curve = function(data, n.target=1000, excess.factor=1.5) {
# Area under curve as fraction of area of plot region
aa = sum(data$y*median(diff(data$x)))/(diff(range(data$x))*max(data$y))
# Number of random points to generate
n = excess.factor*ceiling(n.target/aa)
# Generate n random points in x-range of the data and with y value between
# zero and the maximum y-value of the curve
pts = data.frame(x=runif(n,min(data$x),max(data$x)), y=runif(n,0,max(data$y)))
# Using interpolation, keep only those points whose y-value is less than y(x)
pts = pts[pts$y < approx(data$x,data$y,xout=pts$x)$y, ]
# Randomly select only n.target points
pts = pts[sample(1:nrow(pts), n.target), ]
# Plot the points
points(pts, pch=16, col="red", cex=0.7)
}
Let's run the function for the original curve:
c.pts = as.data.frame(curve(dnorm(x), -4, 4))
pts.under.curve(c.pts)
Now let's test it with a different distribution:
# Curve to be filled
c.pts = as.data.frame(curve(df(x, df1=100, df2=20),0,5,n=1001))
pts.under.curve(c.pts, n.target=200)
n_points = 10000 #A large number
#Store curve in a variable and plot
cc = curve(dnorm(x), -4, 4, n = n_points)
#Generate 1000 random points
p = data.frame(x = seq(-4,4,length.out = n_points), y = rnorm(n = n_points))
#OR p = data.frame(x = runif(n_points,-4,4), y = rnorm(n = n_points))
#Find out the index of values in cc$x closest to p$x
p$ind = findInterval(p$x, cc$x)
#Only retain those points within the curve whose p$y are smaller than cc$y
p2 = p[p$y >= 0 & p$y < cc$y[p$ind],] #may need p[p$y < 0.90 * cc$y[p$ind],] or something
#Plot points
points(p2$x, p2$y)

Create bubble chart with biggest bubble at the center

I'm trying to create a bubble chart using a set of data as follows:
X --> 10
Y --> 20
Z --> 5
Q --> 10
I simply need to have the biggest bubble (based on its number) to be at the centre (give or take) and the rest of the bubbles be around it without overlapping.
All of the R examples I have seen require a two dimensional dataset, and since the data I have are only one dimensional, I like to know if it's at all possible to create such graphs in R.
It would be great if someone could suggest me some useful hints or so. By the way for this task I need to use a SA tools so something like d3js is out of options. However, I am open to using a tool other than R.
I wasn't quite sure if this question should be asked in On Stack Overflow or Cross Validated, so if moderators believe it doesn't belong here, I'll remove it.
This should do, the main idea being that you sort by the value of the radius, so the first is the biggest, then shift the values around it (odd on one side, even on the other) so that the values are decreasing both ways.
Further explanations in the code.
library(plotrix)
library(RColorBrewer)
# Set the random seed, to get reproducible results
set.seed(54321)
# Generate some random values for the radius
num.circles <- 11
rd <- runif(num.circles, 1, 20)
df <- data.frame(labels=paste("Lbl", 1:num.circles), radius=rd)
# Sort by descending radius. The biggest circle is always row 1
df <- df[rev(order(df$radius)),]
# Now we want to put the biggest circle in the middle and the others on either side
# To do so we reorder the data frame taking the even values first reversed, then the odd values.
# This ensure the biggest circle is in the middle
df <- df[c(rev(seq(2, num.circles, 2)), seq(1, num.circles, 2)),]
# Space between the circles. 0.2 * average radius seems OK
space.between <- 0.2 * mean(df$radius)
# Creat an empty plot
plot(0, 0, "n", axes=FALSE, bty="n", xlab="", ylab="",
xlim=c(0, sum(df$radius)*2+space.between*num.circles),
ylim=c(0, 2.5 * max(df$radius)))
# Draw the circle at half the height of the biggest circle (plus some padding)
xx <- 0
mid.y <- max(df$radius) * 1.25
# Some nice degrading tones of blue
colors <- colorRampPalette(brewer.pal(8,"Blues"))(num.circles/2)
for (i in 1:nrow(df))
{
row <- df[i,]
x <- xx + row$radius + i*space.between
y <- mid.y
# Draw the circle
draw.circle(x, y, row$radius,
col=colors[abs(num.circles/2-i)])
# Add the label
text(x, y, row$labels, cex=0.6)
# Update current x position
xx <- xx + row$radius * 2
}
The result:
Live version on RFiddle.

Resources