I have made a contour plot in R with the following code:
library(mvtnorm)
# Define the parameters for the multivariate normal distribution
mu = c(0,0)
sigma = matrix(c(1,0.2,0.2,3),nrow = 2)
# Make a grid in the x-y plane centered in mu, +/- 3 standard deviations
xygrid = expand.grid(x = seq(from = mu[1]-3*sigma[1,1], to = mu[1]+3*sigma[1,1], length.out = 100),
y = seq(from = mu[2]-3*sigma[2,2], to = mu[2]+3*sigma[2,2], length.out = 100))
# Use the mvtnorm library to calculate the multivariate normal density for each point in the grid
distribution = as.matrix(dmvnorm(x = xygrid, mean = mu, sigma = sigma))
# Plot contours
df = as.data.frame(cbind(xygrid, distribution))
myPlot = ggplot() + geom_contour(data = df,geom="polygon",aes( x = x, y = y, z = distribution))
myPlot
I want to illustrate cumulative probability by shading/colouring certain parts of the plot, for instance everything in the region {x<0, y<0} (or any other self defined region).
Is there any way of achieving this in R with ggplot?
So you are able to get the coordinates used to draw the circles in the plot using ggplot_build. Subsequently you could try to use these coordinates in combination with geom_polygon to shade a particular region. My best try:
library(dplyr)
data <- ggplot_build(myPlot)$data[[1]]
xCoor <- 0
yCoor <- 0
df <- data %>% filter(group == '-1-001', x <= xCoor, y <= yCoor) %>% select(x,y)
# Insert the [0,0] coordinate in the right place
index <- which.max(abs(diff(rank(df$y))))
df <- rbind( df[1:index,], data.frame(x=xCoor, y=yCoor), df[(index+1):nrow(df),] )
myPlot + geom_polygon(data = df, aes(x=x, y=y), fill = 'red', alpha = 0.5)
As you can see it's not perfect because the [x,0] and [0,y] coordinates are not included in the data, but it's a start.
Related
My goal is to plot two different densities in the same plot of the same variable. I want to do this as it is common to show robustness of the forcing variable (here z) in a Regression Discontinuity Design. In the code below, I got it working however I do not want the density to be plotted before the cutoff (here 0) if it the key is "above" and vice-versa. Also, the graph should not just be hidden because of the smoothing. It should start computing the density just until (or start) the cutoff.
library(ggplot2)
x <- rnorm(1000, mean = 0)
y <- rnorm(500, mean = 2)
z <- append(x,y)
d <- tibble(value = z, key = ifelse(z <= 0, "below", "above"))
ggplot(d) +
geom_density(aes(z, group = key)) +
geom_vline(aes(xintercept = 0))
Does anybody know how to implement this? For linear regressions I got it working, but with geom_density() it plots the other side of the cutoff as well and smoothes it.
Thanks in advance for your help.
You can use trim = TRUE in geom_density to only calculate density over the range of values in the data:
library(ggplot2)
library(dplyr)
x <- rnorm(1000, mean = 0)
y <- rnorm(500, mean = 2)
z <- append(x,y)
d <- tibble(value = z, key = ifelse(z <= 0, "below", "above"))
ggplot(d) +
# added fill for easier discrimination
geom_density(aes(value, group = key, fill = key),
alpha = 0.5, trim = TRUE) +
geom_vline(aes(xintercept = 0), lty = 2, colour = 'red')
I have a data frame with 20 columns, and I want to plot one specific column (called BB) against each single column in the data frame. The plots I need are probability density plots, and I’m using the following code to generate one plot (plotting columns BB vs. AA as an example):
mydata = as.data.frame(fread("filename.txt")) #read my data as data frame
#function to calculate density
get_density <- function(x, y, n = 100) {
dens <- MASS::kde2d(x = x, y = y, n = n)
ix <- findInterval(x, dens$x)
iy <- findInterval(y, dens$y)
ii <- cbind(ix, iy)
return(dens$z[ii])
}
set.seed(1)
#define the x and y of the plot; x = column called AA; y = column called BB
xy1 <- data.frame(
x = mydata$AA,
y = mydata$BB
)
#call function get_density to calculate density for the defined x an y
xy1$density <- get_density(xy1$x, xy1$y)
#Plot
ggplot(xy1) + geom_point(aes(x, y, color = density), size = 3, pch = 20) + scale_color_viridis() +
labs(title = "BB vs. AA") +
scale_x_continuous(name="AA") +
scale_y_continuous(name="BB")
Would appreciate it if someone can suggest a method to produce multiple plot of BB against every other column, using the above density function and ggplot command. I tried adding a loop, but found it too complicated especially when defining the x and y to be plotted or calling the density function.
Since you don't provide sample data, I'll demo on mtcars. We convert the data to long format, calculate the densities, and make a faceted plot. We plot the mpg column against all others.
library(dplyr)
library(tidyr)
mtlong = gather(mtcars, key = "var", value = "value", -mpg) %>%
group_by(var) %>%
mutate(density = get_density(value, mpg))
ggplot(mtlong, aes(x = value, y = mpg, color = density)) +
geom_point(pch = 20, size = 3) +
labs(x = "") +
facet_wrap(~ var, scales = "free")
I have the following example:
require(mvtnorm)
require(ggplot2)
set.seed(1234)
xx <- data.frame(rmvt(100, df = c(13, 13)))
ggplot(data = xx, aes(x = X1, y= X2)) + geom_point() + geom_density2d()
Here is what I get:
However, I would like to get the density contour from the mutlivariate t density given by the dmvt function. How do I tweak geom_density2d to do that?
This is not an easy question to answer: because the contours need to be calculated and the ellipse drawn using the ellipse package.
Done with elliptical t-densities to illustrate the plotting better.
nu <- 5 ## this is the degrees of freedom of the multivariate t.
library(mvtnorm)
library(ggplot2)
sig <- matrix(c(1, 0.5, 0.5, 1), ncol = 2) ## this is the sigma parameter for the multivariate t
xx <- data.frame( rmvt(n = 100, df = c(nu, nu), sigma = sig)) ## generating the original sample
rtsq <- rowSums(x = matrix(rt(n = 2e6, df = nu)^2, ncol = 2)) ## generating the sample for the ellipse-quantiles. Note that this is a cumbersome calculation because it is the sum of two independent t-squared random variables with the same degrees of freedom so I am using simulation to get the quantiles. This is the sample from which I will create the quantiles.
g <- ggplot( data = xx
, aes( x = X1
, y = X2
)
) + geom_point(colour = "red", size = 2) ## initial setup
library(ellipse)
for (i in seq(from = 0.01, to = 0.99, length.out = 20)) {
el.df <- data.frame(ellipse(x = sig, t = sqrt(quantile(rtsq, probs = i)))) ## create the data for the given quantile of the ellipse.
names(el.df) <- c("x", "y")
g <- g + geom_polygon(data=el.df, aes(x=x, y=y), fill = NA, linetype=1, colour = "blue") ## plot the ellipse
}
g + theme_bw()
This yields:
I still have a question: how does one reduce the size of the plotting ellispe lines?
For reasons I won't go into I need to plot a vertical normal curve on a blank ggplot2 graph. The following code gets it done as a series of points with x,y coordinates
dfBlank <- data.frame()
g <- ggplot(dfBlank) + xlim(0.58,1) + ylim(-0.2,113.2)
hdiLo <- 31.88
hdiHi <- 73.43
yComb <- seq(hdiLo, hdiHi, length = 75)
xVals <- 0.79 - (0.06*dnorm(yComb, 52.65, 10.67))/0.05
dfVertCurve <- data.frame(x = xVals, y = yComb)
g + geom_point(data = dfVertCurve, aes(x = x, y = y), size = 0.01)
The curve is clearly discernible but is a series of points. The lines() function in basic plot would turn these points into a smooth line.
Is there a ggplot2 equivalent?
I see two different ways to do it.
geom_segment
The first uses geom_segment to 'link' each point with its next one.
hdiLo <- 31.88
hdiHi <- 73.43
yComb <- seq(hdiLo, hdiHi, length = 75)
xVals <- 0.79 - (0.06*dnorm(yComb, 52.65, 10.67))/0.05
dfVertCurve <- data.frame(x = xVals, y = yComb)
library(ggplot2)
ggplot() +
xlim(0.58, 1) +
ylim(-0.2, 113.2) +
geom_segment(data = dfVertCurve, aes(x = x, xend = dplyr::lead(x), y = y, yend = dplyr::lead(y)), size = 0.01)
#> Warning: Removed 1 rows containing missing values (geom_segment).
As you can see it just link the points you created. The last point does not have a next one, so the last segment is removed (See the warning)
stat_function
The second one, which I think is better and more ggplotish, utilize stat_function().
library(ggplot2)
f = function(x) .79 - (.06 * dnorm(x, 52.65, 10.67)) / .05
hdiLo <- 31.88
hdiHi <- 73.43
yComb <- seq(hdiLo, hdiHi, length = 75)
ggplot() +
xlim(-0.2, 113.2) +
ylim(0.58, 1) +
stat_function(data = data.frame(yComb), fun = f) +
coord_flip()
This build a proper function (y = f(x)), plot it. Note that it is build on the X axis and then flipped. Because of this the xlim and ylim are inverted.
This question is a follow-up of "How can a data ellipse be superimposed on a ggplot2 scatterplot?".
I want to create a 2D scatterplot using ggplot2 with filled superimposed confidence ellipses. Using the solution of Etienne Low-Décarie from the above mentioned post, I do get superimposed ellipses to work. The solution is based on stat_ellipse available from https://github.com/JoFrhwld/FAAV/blob/master/r/stat-ellipse.R
Q: How can I fill the inner area of the ellipse(s) with a certain color (more specifically I want to use the color of the ellipse border with some alpha)?
Here is the minimal working example modified from the above mentioned post:
# create data
set.seed(20130226)
n <- 200
x1 <- rnorm(n, mean = 2)
y1 <- 1.5 + 0.4 * x1 + rnorm(n)
x2 <- rnorm(n, mean = -1)
y2 <- 3.5 - 1.2 * x2 + rnorm(n)
class <- rep(c("A", "B"), each = n)
df <- data.frame(x = c(x1, x2), y = c(y1, y2), colour = class)
# get code for "stat_ellipse"
library(devtools)
library(ggplot2)
source_url("https://raw.github.com/JoFrhwld/FAAV/master/r/stat-ellipse.R")
# scatterplot with confidence ellipses (but inner ellipse areas are not filled)
qplot(data = df, x = x, y = y, colour = class) + stat_ellipse()
Output of working example:
As mentioned in the comments, polygon is needed here:
qplot(data = df, x = x, y = y, colour = class) +
stat_ellipse(geom = "polygon", alpha = 1/2, aes(fill = class))